Вы находитесь на странице: 1из 57

ISICC Walldorf Technical Brief 23/07/04

Colgate
Customer Test-Box Whitepaper

IBM SAP

SAP SCM 4.1 HotStandby liveCache on pSeries and IBM


TotalStorage

International SAP IBM Competence Centre


Walldorf, Germany

Version: 1.5
July 2004

15.07.04 8:31 Page 1 of 57


ISICC Walldorf Technical Brief 23/07/04

Introduction 3
About the Team and the Authors 3
Executive Summary 5
Document Objectives 5
Purpose of the Hot-Standby in SCM 5
Summary of Results 6
Overview of Application Level Tests 6
Application Failover Test Results 7
Application Team Conclusions 7
Implementation into SAPGUI 7
Concept of liveCache Hot-Standby 8
Proof of Concept Environment 10
SCOPE of the Proof of Concept 10
Migration to Hot Standby Configuration 11
Preparation 12
Preparing the SVC 12
Building the hot environment 12
SAN Infrastructure 12
Comments 13
Heterogeneous Storage Layout under SVC 13
Preparing the Primary liveCache Instance 15
LiveCache Migration 18
Attaching the APO System 20
Installation of the Standby 21
Integrating SCM 4.1 with Hot-Standby liveCache 21
Issues and Recovery 25
Failover Tests 27
Application Level Hot Standby Verification Tests In Detail 27
Technical View of the Failover Test in Detail 30
Summary of the PoC 39
Failover Solution Comparison 39
Traditional HA Cluster 39
Shadow Database 40
Hot Standby Database 40
Highly Available SCM Landscape Designs 41
Using a “Traditional” Failover Cluster 41
Using SAP Multiple Components in One Database (MCOD) 42
Overview of the Consolidated Colgate Asia Test-Box Project 47
The Functional Migration 47
Introducing Storage Virtualization 48
Overview of Hardware Infrastructure 50
Software Versions 51
SAN Environment 51
Hardware Environment 51
APPENDIX 53
RTE_config FIle 53
LiveCache Parameters 54

15.07.04 8:31 Page 2 of 57


ISICC Walldorf Technical Brief 23/07/04

Introduction
This document is one of a series of documents produced in a joint IBM SAP customer test-
box in Walldorf, Germany. This test case project was carried out with the support of Colgate
USA. Colgate provided the system clones which formed the landscape for this test series, and
helped to implement the job-load scenarios which allowed a life-like customer system to be
emulated as verification of the successful proof of concept stages. The project infrastructure
was based on pSeries p690 LPARs and IBM TotalStorage SAN components, including ESS
and FastT storage systems. This infrastructure allowed the initial clone, built from Colgate
tapes, to be further “multi-cloned“ and for several project to run simultaneously sharing the
environment. One of these projects dealt with the functional upgrade of the Colgate SAP
landscape, including the first SCM 4.1 implementation and the first mySAP ECC 5.0 upgrade
from R/3. Another project covers the migration of the Colgate system to an adaptive
computing basis and explores the benefits of the new SAN functionality offered with the SAN
Volume Controller and the SAN File System.
This project focuses on new hot-standby solution for SCM 4.1 liveCache and its integration
into the IBM TotalStorage products.

Related Documents
IBM/SAP Test-Box Whitepaper: “Moving Toward Adaptive Computing SAP on pSeries and
IBM TotalStorage” Walldorf June 2004
IBM/SAP Test-Box Whitepaper: “Moving forward: early upgrade insights of SAP ERP and
SAP SCM” Walldorf June 2004
IBM/SAP “MaxDB 7.5 on IBM TotalStorage® Enterprise Storage Server Integration of
the Hot-StanbySystem Solution” Mainz 2004

A word to our Sponsors


The team would like to thank the following people for their support and sponsorship which
made this project a success:

Jim Capraro, Director, Enterprise Service Center, Colgate Worldwide


Mike Crowe, Director Global Supply Chain Development, Colgate Worldwide
Volker Loehr, General Manager, Global IBM/SAP Alliance, IBM
Eva Lau, Business Development & Alliances WW Marketing, SSG, IBM
Chuck Calio, IBM Systems Group Solutions Relationship Manager for SAP, IBM
Dr. Christian H. Bartels, Manager, Mainz EBC, TIC, SSB, IBM
Keith Murray, Manager, IBM SAP International Competence Centre, IBM
Martin Kühn, Senior Vice President BSG M Quality Management, SAP AG
Manfred Dietz, Vice President BSG M Quality Management, SAP AG
Albrecht Diener, Senior Vice President BSG M Development Area Supply Chain, SAP AG

About the Team and the Authors


The authors of this document are:
Carol Davis, IBM Jan Muench, IT Services Jochen Kellner, IBM
Herbert Diether, IBM Werner Thesing, SAP Rajeev Kumar Das, SAP

HA landscapes are curtsey of Matthias Koechl and Bernd Beyrodt of IBM.

15.07.04 8:31 Page 3 of 57


ISICC Walldorf Technical Brief 23/07/04

The document was supported by the Colgate Test-Box team and other technical colleagues
who all contributed to the test scenarios, the evaluation, and the success of the PoC.

Colgate Test-Box Team


Name Company Position Focus
Carol Davis IBM Walldorf pSeries Technical Support IBM Project Lead
ISICC Certified IT Specialist pSeries Basis
SAP/Oracle/liveCache
Jan Muench IT Services & Solutions Senior Systems Engineer SAN-, Storage & Network
Chemnitz Infrastructure pSeries Basis
Jochen Kellner SerCon GmbH Senior Architect SAN Functionality
SAP Basis, Oracle
Juergen Daedlow IBM Walldorf, ISICC Senior Architect Storage Performance
Dnyaneshwar Shelke SAP Bangalore, India SCM Consultant SNP and Demand Planning
International Consulting Group
Brigitte Reinelt SAP Walldorf Quality Project Lead SAP Project Lead

Rajeev Kumar Das SAP Bangalore, SCM Consultant SNP and jobflow
India International Consulting Group
Frank Eurich SAP Walldorf Quality Engineer Demand Planning
Wolfgang Wolesak SAP Walldorf Quality Engineer LiveCache
Vilas Patil SAP Bangalore, Technology Consultant SAP/Oracle/liveCache Basis
India International Consulting Group
Peter Jäger SAP Walldorf SAP Technical Consultant, SAP Basis
SCM CoE
Patty Vollmar Colgate USA Team Leader, Global Supply Colgate Project Lead,
Chain Development VMI
Ambrish Mathur Colgate USA Application Team Leader, Colgate Co-Lead,
Global Supply Chain SNP GSN
Development
Dave Agey Colgate USA Assoc Director, Global Supply APO Project Manager
Chain Development DP
Mike Crowe Colgate USA Director, Global Supply Chain APO Project Sponsor
Development
Siew Lan Chai Colgate Asia APO Team Leader Asia Pacific All APO functional areas
Shared Service Organization
Geoff Graham Colgate USA Global Supply Chain PP/DS
Development

Additional Team for HotStandby


Werner Thesing SAP Walldorf Development Architect for Solution integration,
SAP MaxDB and performance and
LiveCache functionality
Oliver Goos IBM Mainz Storage Specialist and Hot-Standby Integration with
Solutions Developer IBM TotalStorage
Herbert Diether SerCon Walldorf HA Specialist HACMP integration for
ISICC HotStandby
Bernd Vorsprach SAP, Berlin Development Architect for MaxDB Tools
SAP MaxDB Migration Support
Matthias Koechl IBM Walldorf Technical Marketing HA Landscape Solutions
ISICC eServer pSeries for SAP
Certified Consulting IT
Architect
Bernd Beyrodt IBM Böblingen DB2/Informix SAP DB2 HA Landscapes
Technical Presales EMEA

15.07.04 8:31 Page 4 of 57


ISICC Walldorf Technical Brief 23/07/04

Executive Summary
Document Objectives
The purpose of this document is to show how a customer might implement the new hot-
standby functionality into an existing SCM system. It demonstrates the successful integration
of this solution with the new mySAP SCM, and highlights the additional functionality,
flexibility, and high availability features offered by pSeries and TotalStorage in the IBM hot-
standby integration. This document covers migration of an existing system to a hot-standby
environment, tests the hot standby solution in a real-life scenario, and proposes several HA
SCM landscapes based on hot-standby.

Purpose of the Hot-Standby in SCM

SCM Application Server SCM Application Server SCM Application Server

ip-Alias ip-Alias ip-Alias


live
Cache
Memory
live live
live
Cache Cache
Memory
Cache
Memory
failed LiveCache failed live Memory
LiveCache LiveCache LiveCache
Cache
LiveCache LiveCache failover-Server LiveCache Memory
liveCache Server Server failover-Server
new
failover-Server Server Server
Server LiveCache-Server

After the failover, the liveCache memory must be rebuilt and loaded.

In a normal failover situation in which a database is taken over, the activities depicted above
take place when the original running database instance is lost
- the server cluster software is triggered by the failure
- the disks are moved to the standby server,
- the service address for the application is moved to the standby server
- the application on the new server is started.

In the picture above, the application being failed over is liveCache, and the user waiting for
the liveCache is the SCM system. This solution has been available for pSeries with HACMP,
and endorsed by SAP, since liveCache version 7.2. It provides automatic failover in case of a
liveCache server or liveCache database crash.

The real issue being addressed by the hot-standby solution is the speed of recovery. As SCM
systems continue to grow and take a more and more important position in the supply chain
management landscape, this recovery time is becoming an increasingly sensitive issue. The
size of the memory structure being used to support the planning algorithms is many gigabytes
15.07.04 8:31 Page 5 of 57
ISICC Walldorf Technical Brief 23/07/04

in size: tendency growing exponentially. (30-80 GB currently, with planned systems in the 3
figure range). A traditional failover scenario must first take over and activate the resources
(disk and network), restart the application and rebuild this large memory structure, reload the
memory structure with the most critical data (continuing to loading the rest in background),
perform any rollback requirements and then any redo actions. All this must be done before
liveCache is ready to resume production. This effort can require a time-span from several
minutes to several hours depending on the state of the application at crash time. An
application level data resynchronization with the SCM system may also be necessary to
achieve data consistency between the SCM database and the liveCache after a liveCache
recovery.
The hot-standby solution provides for a duplicate cache filled with the most relevant data,
removes the need to move physical resources other than the IP service address, and requires
no roll-back and only minimal roll-forward activity. It can be online in seconds.

Summary of Results
Details of the implementation, migration, and the workings of the hot-standby solution are
documented in the subsequent chapters. The real proof of concept for a hot-standby solution is
recovery at the application level. The IBM basis team therefore requested the SAP application
team to define a scenario which would prove data consistency following a failover. The
definition of this scenario and the results are presented here in summary and in detail later in
the document.

Overview of Application Level Tests


This chapter describes the test scenario defined by the SAP SCM application group
supporting the Colgate test-box. It defines the success criteria, and documents the results.

Test Criteria: Hot standby technology for liveCache is supposed to enable fail-over of
liveCache data from one liveCache server to another while the SCM system is running. No
loss of persistent data is expected however, the session link to liveCache will be terminated
and any open OMS versions will be lost (transactional simulations).

Test Scenario : In order to test the smooth fail-over from one LC to another LC as part of
hot-standby functionality :

Create a job consisting of 1 single step of SNP Heuristic run


Run the job and download the Application log for the above job step run on Day1
( with liveCache 1 connected to the system )
Rerun the job on Day2 (while job is running, perform the fail-over wherein
connection to liveCache1 is broken and liveCache 2 is connected. Following a
failover, the data on livecache2 is expected to be identical to that of liveCache1,
however, the job is expected to fail since connection to liveCache has been broken.
If the data is consistent, on re-running the job, we expect successful completion of
the job which can be verified by looking in the new Application log. The date
change is used to help verify that previous data is consistent and a new re-planning
job (SNP) actually ran.

15.07.04 8:31 Page 6 of 57


ISICC Walldorf Technical Brief 23/07/04

Application Failover Test Results


SNP SNP
SNP
Job Job
Job

SCM Application Server SCM Application Server


SCM Application Server

ip-Alias
ip-Alias ip-Alias

live live live live


live live
Cache Cache Cache Cache
Cache Cache
Memory Memory Memory Memory
Memory Memory
LiveCache LiveCache LiveCache LiveCache newLiveCache new LiveCace
Server standby-Server Server standby-Server standby-Server Server

Ready to resume production


in 64 Seconds!!

Primary Failure forced at 14:30:55


HACMP Failover in action 14:31:02
HACMP Resource transfer finished 14:31:34
Primary LiveCache Server Online 14:31:59
Standby LiveCache Server Online 14:32:13
Job restarted (manually) 14:55:37 Successful Completion 14:58:32
__________________________________________
LiveCache Ready for Action within 64 seconds (14:30:55-14:31:59)

Application Team Conclusions


This failover is completely independent of liveCache size – 60GB would be online with the
same speed.

Application Team Conclusions


The SNP job results showed that the criteria was successfully fulfilled. Following a failover,
the data is consistent at the application level. Un-committed data is lost as a result of the
failover, but the application can immediately be restarted and as it would be following a
temporary service interruption. Data is maintained in consistent synchronization between the
two liveCache instances under application level load and following a failover, liveCache
looks to the SCM system as it would do following a temporary communication interruption.
SCM and liveCache data remains in a consistent state.

Implementation into SAPGUI


SAP has also integrated the Hot-Standby information into the transactions DB50 and LC01,
making it available via SAPGUI. This integration was done for the following versions:
Hot Standby Integration
(4.6C SP48, 4.6D SP37, 6.10 SP40, 6.20 SP40, 6.40 SP03)
In case of a hot standby scenario, the hot standby relevant information is displayed in
LC10/DB50 --> Properties.

15.07.04 8:31 Page 7 of 57


ISICC Walldorf Technical Brief 23/07/04

Application

HA Software
Primary (HACMP) Standby

Cluster
After Data
image
lliiv
ve C a c h e liv e C a c h e

continous
RESTART

Data Log Data


Volumes Volumes Volumes

Concept of liveCache Hot-Standby


The base of the hot-standby design is a fail over system consisting of several (at least two)
physically separated database server, each with their own copy of the database data files.
Each instance maintains its own version of the data, but all instances must have concurrent
access to a shared online log. The online log is written by the primary, and read by the
secondary(ies). The log is used to keep all instances in synchronization. Cluster software
must be offered by the platform operating system to allow failover conditions to be
recognized and to perform the actions necessary to trigger a failover, and redirect the client
connections.
The SAP hot-standby design requires a close integration between the storage subsystem and
the liveCache as the liveCache primary instance controls the data initialization of the standby
server(s) at storage system level. The supporting storage system must provide “split mirror”
technology, concurrent access to the log volume from multiple servers, and provide a shared
library which exports the control of the split mirror initialization and resynchronization to
application level, making it accessible to the liveCache primary instance.

IBM TotalStorage and Hot-Standby

IBM TotalStorage is the first storage infrastructure to support the special SAP design for a
hot-standby solution, announced for liveCache 7.5 with mySAP Supply Chain Management
(SCM) version 4.1.
This integration of the SAP Hot Standby API, done for the TotalStorage products, ESS and
SVC, uses flash-copy functionality to fulfil the requirement for “split mirror”, used by the
primary to initiate a secondary hot-standby server. IBM’s SVC implementation allows high

15.07.04 8:31 Page 8 of 57


ISICC Walldorf Technical Brief 23/07/04

availability aspects of this solution to be taken a step further, as it provides flash-copy


functionality across physical server boundaries. This eliminates a storage server as single
point of failure: the primary and secondary need not be on the same storage server. SVC also
provides the option of a heterogeneous implementation in which the two servers are on
different storage architectures.
This solution is supported by both MaxDB and liveCache at version 7.5. This paper is
concerned with liveCache in an SCM environment.

Design and Infrastructure Tests


IBM has implemented this API on IBM TotalStorage for ESS native and SAN Volume
Controller (thereby including all storage systems supported by SVC), using flash-copy
technology. The ISICC, together with the European Storage Competence Centre in Mainz,
carried out a joint PoC with SAP development in December of 2003, testing the technical
basis of this solution. This technical PoC was intended to insure that the design and
implementation fulfilled the following requirements:
* speed of recovery and return to production
* coverage of server outage
* coverage of data disk failures
* automated failover and fail back
* no performance impact to the master
* ease of management capability for DB administrators

This project was done using a standalone liveCache, as the first application to support it,
SCM 4.1, was not yet available. During these tests, an automated failover-recovery-reconnect
mechanism was implemented and liveCache was
Simulated failed and recovered some 5000 times over the year-
Application end holiday using a 10GB large liveCache. These
tests verified the robustness of the technology and
IP-Alias the hot-standby performance. These tests were done
for both SVC and ESS native configurations.
The next step for verification of this solution is the
successful use of this technology with the application
Primary 2ndary to verify integrity at database level.
The SAP/IBM test-box project being carried out in
the spring of 2004 on behalf of Colgate targeted an
liveCache1 liveCache2
upgrade test to SCM version 4.1. This test
environment provided the opportunity for the
application level test of the new hot-standby
Primary technology and this environment provides the basis
2ndary
for the PoC described in the following chapters of
this document.
Failover LC1 to LC2:
LC2 to LC1
5000 times - under simulated application load

15.07.04 8:31 Page 9 of 57


ISICC Walldorf Technical Brief 23/07/04

Proof of Concept Environment


This PoC takes the “production system” from a consolidated APO/LC server to a multi-server
hot-standby configuration. The starting point is a file-system based liveCache which must be
migrated to raw devices as required by hot-standby. A related test-box documents covers in
detail the functional migration from APO 3.0/LC 7.4.2 through to SCM 4.1 with liveCache
7.5, which is the starting point of this document. Another of the related documents follows the
storage migration from direct connected ESS to the SAN Volume Controller (SVC)
virtualization level. This document focuses on the SVC solution. A further joint IBM/SAP
document “SAP liveCache 7.5 MaxDB 7.5 on IBM TotalStorage”, resulting from the earlier
technical POC, describes the implementation and configuration of the hot-standby in detail,
and these steps were followed and verified in this PoC project. This document can be found at
the following location:
http://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP100442

As this project was embedded in a customer test-box, it was able to take advantage of the test
scenarios built for the main upgrade project to simulate a production system and verify the
integrity of this solution at application level. A load scenario which concentrates on moving
data from the SCM business warehouse info-cubes into the liveCache, provides the storage
stress test as this involves massive write activity in liveCache. A further functional test which
uses the data in the liveCache for supply and network planning heuristics provides the
verification of the data consistency at application level. An additional built-in tool
( transaction /sapapo/om17) allows us to verify the data consistency between liveCache and
the SCM database directly, providing an additional data verification.

SCOPE of the Proof of Concept


Using a clone of an SCM 4.1 system from the Colgate customer test box, this POC will
demonstrate how this system was migrated to a hot-standby configuration and test the failover
and recovery reactions of the new configuration in a “production” environment.
This POC consists of the following stages:

1) Demonstrate a “heterogeneous” implementation spanning storage servers


Document the migration path and demonstrate the most complex and interesting solution. By
placing the data copies on separate storage servers, and using a solution such as PPRC for a
disaster recovery copy of the shared online log, the hot-standby can survive the loss of the
either liveCache server, the loss of a disk in either copy, and the full loss of one of the storage
servers. Only in the case of loss of the storage server containing the active online log will
manual intervention be required to bring the disaster recovery log online. Although the PPRC
is outside of the scope of this POC, the use of two separate servers for the database copies is
demonstrated in a heterogeneous implementation

2) Demonstrate liveCache integrated Hot-Standby in production with APO 4.1


Test the integration of the total solution with SCM 4.1 including the control and monitoring of
liveCache from the application level, and integration with HACMP.

3) Demonstrate liveCache HA Hot-Standby


Perform failover tests during a “production scenarios” and demonstrate failover and
recovery with logical data consistency.

15.07.04 8:31 Page 10 of 57


ISICC Walldorf Technical Brief 23/07/04

Migration to Hot Standby Configuration


IP Alias

LC
APO
LC APO
APO/LC Primary Primary

SAN Switch
SVC
APO
APO LC
LC 7,5 LC log APO LC log
4.1 LC
LC 7,5
4.1 LC 7,5 data data
data
LC log New LC on Raw Connect new
Consolidated
Devices loaded LC via service
APO/LC on from export
JFS2 address
LC data export
IP Alias

LC LC
APO Secondary
Primary

Initiate a new
Hot-Standby SAN Infrastructure
Secondary
APO LC 7,5
LC
LC 7,5
4.1 data data copy

LC log

Migration Steps
Preparation:
Documented in detail in “SAP liveCache 7.5 and MaxDB 7.5 on IBM TotalStorage”
1. install and configure the hot-standby servers
2. prepare the raw-devices for hot-standby
3. install the liveCache executables
4. install HACMP for ip-alias takeover
5. test hot-standby and HACMP failover
Migration:
Documented in this POC
6. backup the content of the file system based liveCache
7. import liveCache data into hot-standby liveCache on raw devices
8. connect APO to new liveCache

APO should not be allowed to continue any activity to the source liveCache during the
migration as this may cause a loss in data synchronization. Ideally APO is offline until the
liveCache is switched to and activated on the hot-standby server. The steps which define the
downtime are the migration steps only.

15.07.04 8:31 Page 11 of 57


ISICC Walldorf Technical Brief 23/07/04

Preparation
Hot-standby spans storage server boundaries: one data copy on ESS one on
FastT

HACMP is02d6 is02d8 APO


Rotating
disk for LC "hotLC1" "hotLC1" iso2d4
archive
logs

SVC
SAN Volume Controller

LC7.5 LC "hot LC7.5 SCM


archive 4.1
Data1 log" Data2

ESS FastT600
FastT600
ESS offline
emergency
recovery
pprc log copy

As the hot-standby architecture provides a complete backup copy of the liveCache data, the
SVC implementation would optionally allow each copy to be placed on separate storage
servers to increase high availability. In this case the liveCache log would also need to be
duplicated for disaster recovery. There are various options for doing this and they will be
explored and tested soon albeit not within the scope of this POC. The method used, however,
must ensure that both primary and secondary see the same view of the log. A solution such as
PPRC is recommended.

Preparing the SVC


Building the hot environment

Until now APO (Advanced Planner and Optimizer of SCM) and the LiveCache were running
on a single host. This is a recommended configuration on AIX for resource sharing. In order
to move from this configuration to a hot-standby environment, liveCache was moved to
separate server, and the liveCache data files (devspaces) were changed from file-system based
to raw devices. It is possible for one instance of a hot-standby liveCache to inhabit the same
server as the APO system, but it would complicate the HACMP configuration which was not
the focus of this PoC.

SAN Infrastructure

We started with a fresh copy of an APO release 4.1, (a flash-copy clone). The APO and
LiveCache file systems were mounted on the designated APO host (is02d4t. For the
LiveCache cluster, three SAN disks were defined; all under the SAN volume controller.
The rotating resource used for archive logs was not taken under SVC control for the POC as
this functionality is driven by HACMP in either case and it can be SVC or not.

15.07.04 8:31 Page 12 of 57


ISICC Walldorf Technical Brief 23/07/04

LiveCache Configuration for Hot-Standby

sapbackupvg
HACMP Rotating resource for Archive logs
located on ESS - not controlled by SVC

is02d8 is02d6
enhot1 enhot2

SVC

sapdatavg
sapdatavg
Data Disk Copy1
Data Disk Copy2
Located on Ess
Located on Ess

saplogvg
concurrent access:
liveCache Log
Located on Ess

The only difficulty encountered in building the above configuration for hot-standby under the
SVC was when it came to making the shared log available to both liveCache hosts. The SVC
GUI did not support the ability to export a volume to multiple hosts. It was necessary to do
this via the command line interface as shown below.
svc> svctask addvdiskhostmap –host is02d6 vd_LCcp4_703
svc> svctask addvdiskhostmap –host is02d8 –force vd_LCcp4_703

Since the volume was already mapped to the first hot standby host, the GUI didn’t allow us to
define a further mapping (see message response below). Be aware that when you create disks
with concurrent access, the concurrency synchronization and all access issues are left to the
application. In this case liveCache handles the concurrent access to the online log.
IBM_2145:admin>svctask mkvdiskhostmap -host is02d8 vd_LCcp4_703 -force
CMMVC5701E No object ID was specified.
IBM_2145:admin>svctask mkvdiskhostmap -host is02d8 vd_LCcp4_703
CMMVC6071E This action will result in the creation of multiple mappings.
Use the -force flag if you are sure that this is what you wish to do.
IBM_2145:admin>svctask mkvdiskhostmap -host is02d8 -force vd_LCcp4_703
Virtual Disk to Host map, id [0], successfully created

Comments
Preparing the SAN environment for the hot standby test was very easy. The only thing that
was not possible with the SAN volume controller GUI was to map a vdisk to a second host.
This was accomplished via the command line.

Heterogeneous Storage Layout under SVC


We defined three volumes on our storage boxes; two volumes on the ESS, one volume on the
FastT. The objective of this was to demonstrate a heterogeneous with a data copy on two

15.07.04 8:31 Page 13 of 57


ISICC Walldorf Technical Brief 23/07/04

separate storage servers. Details are shown below in the edited output of „svcinfo
lsmdisk“ on the SAN volume controller.

IBM_2145:admin>svcinfo lsmdisk
id name status mode
mdisk_grp_id mdisk_grp_name capacity ctrl_LUN_#
controller_name UID
2 mdisk_210543508 online managed
1 LC_Cp4_hot_ess 32.6GB 0000000005508
ESS 49424d2020202020323130352020202020202020202020203530383231303534
...
5 mdisk_210543703 online managed
1 LC_Cp4_hot_ess 32.6GB 0000000005703
ESS 49424d2020202020323130352020202020202020202020203730333231303534
...
19 mdisk_LCcp4 online managed
8 LC_Cp4_hot_fstt 70.0GB 00000000000006
fastt600 600a0b80001253b2000001ef40984c2100000000000000000000000000000000

The mdisks are part of two mdisk groups (LC_Cp4_hot_ess and LC_Cp4_hot_fstt), as you
see in the output above. The output is unfortunately wrapped around, but with effort it is
possible to see that two have the controller_name of ESS and one is fastT. In the mdisk
groups we defined three vdisks, see the output of „svcinfo lsmdiskgrp“ below.
IBM_2145:admin>svcinfo lsmdiskgrp LC_Cp4_hot_ess
id 1
name LC_Cp4_hot_ess
status online
mdisk_count 2
vdisk_count 2
capacity 65.2GB
extent_size 16
free_capacity 0
IBM_2145:admin>svcinfo lsmdiskgrp LC_Cp4_hot_fstt
id 8
name LC_Cp4_hot_fstt
status online
mdisk_count 1
vdisk_count 1
capacity 70.0GB
extent_size 16
free_capacity 37.4GB

We’ll find the vdisks in the output of „svcinfo lsvdisk“ (again edited):
IBM_2145:admin>svcinfo lsvdisk
id name IO_group_id IO_group_name status
mdisk_grp_id mdisk_grp_name capacity type FC_id
FC_name RC_id RC_name
...
27 vd_LCcp4_508_1 0 io_grp0 online
8 LC_Cp4_hot_fstt 32.6GB striped 0
...
67 vd_LCcp4_508 0 io_grp0 online
1 LC_Cp4_hot_ess 32.6GB striped 0
...
68 vd_LCcp4_703 0 io_grp0 online
1 LC_Cp4_hot_ess 32.6GB striped

The vdisk vd_LCcp4_508_1 (id 27, using the mdisk defined on the FAStT) is exported to the
host is02d8:
15.07.04 8:31 Page 14 of 57
ISICC Walldorf Technical Brief 23/07/04

IBM_2145:admin>svcinfo lsvdiskhostmap vd_LCcp4_508_1


id name SCSI_id host_id host_name
wwpn vdisk_UID
27 vd_LCcp4_508_1 2 0 is02d8
10000000C92D2A5B 600507680180801CA0000000000000C8

The vdisk vd_LCcp4_508 (id 67, defined on the ESS) is exported to the host is02d6:
IBM_2145:admin>svcinfo lsvdiskhostmap 67
id name SCSI_id host_id host_name
wwpn vdisk_UID
67 vd_LCcp4_508 0 1 is02d6
10000000C92BDE98 600507680180801CA0000000000000C2

The vdisk vd_LCcp4_703 (id 68, defined in the ESS) is exported to both hosts:
IBM_2145:admin>svcinfo lsvdiskhostmap 68
id name SCSI_id host_id host_name
wwpn vdisk_UID
68 vd_LCcp4_703 0 0 is02d8
10000000C92D2A5B 600507680180801CA0000000000000C3
68 vd_LCcp4_703 1 1 is02d6
10000000C92BDE98 600507680180801CA0000000000000C3

Preparing the Primary liveCache Instance


One of the hot-standby servers was prepared first and tested before the hot-standby
environment was initialized and activated. The primary was prepared, the data migrated, and
the connection to the SCM system configured and tested.

Preparing the Disks on the Primary

The following two disks are the ESS volumes which will be used for is02d6 data and the
shared log. The log volume is seen by both hot-standby liveCache servers (is02d6,is02d8).
Over the SVC, there are 8 paths to the actual disks which appear to the server. Each of these
disks will be placed in a separate volume group as there must be a local volume group for the
data, and a shared volume group for the log.

Data:
vpath2 (Avail ) 600507680180801CA0000000000000C2 = hdisk2 (Avail pv ) hdisk4 (Avail
pv ) hdisk6 (Avail pv ) hdisk8 (Avail pv ) hdisk18 (Avail pv ) hdisk20 (Avail pv ) hdisk22
(Avail pv ) hdisk24 (Avail pv )
Log:
vpath3 (Avail ) 600507680180801CA0000000000000C3 = hdisk3 (Avail ) hdisk5 (Avail )
hdisk7 (Avail ) hdisk9 (Avail ) hdisk19 (Avail ) hdisk21 (Avail ) hdisk23 (Avail ) hdisk25
(Avail )

15.07.04 8:31 Page 15 of 57


ISICC Walldorf Technical Brief 23/07/04

Neither of the volume groups should be automatically activated. They must be mounted using
the –u option to avoid the disks begin given reservation locks.

Source database is over 32GB in physical size, but has only 715MB of actual data.
LiveCache is currently 98% empty. The object of this test is application level consistency,
large liveCache tests were done in the previous functionality stress tests so the size here is
immaterial.

15.07.04 8:31 Page 16 of 57


ISICC Walldorf Technical Brief 23/07/04

(NOTE: enhot2 is an alias name for is02d6. The reason for this alias is described later in the
HACMP setup)

A single raw logical volume of 8GB will be used for the basis for import, with a single 2GB
log volume. The original calculations for the number of 64MB partitions did not later agree
with what liveCache expected. They were apparently a little bit too small. The problem is the
error message which accompanied this problem was not very definitive. The problem could
be seen in the knldiag
2004-05-11 20:08:32 59 11000 vdevsize '
/dev/rLCDATA1' , 1048576 failed
2004-05-11 20:08:32 59 ERR 16 IOMan Unknown Data volume 1: Could not read from volume
2004-05-11 20:08:32 59 ERR 8 Admin ERROR ' disk_not_accessibl'CAUSED EMERGENCY SHUTDOWN

The devices were slightly enlarged and the restore retried successfully.

Logical Volume Definitions


root@enhot2:/sapdb/data/wrk/HOTSVC>lsvg -l sapdatavg
sapdatavg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
LCDATA1 raw 150 150 1 open/syncd N/A
root@enhot2:/sapdb/data/wrk/HOTSVC>lsvg -l saplogvg
saplogvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
LCLOG1 raw 42 42 1 open/syncd N/A
root@enhot2:/sapdb/data/wrk/HOTSVC>

Raw Device Owner Must be liveCache


Change the owner of the raw devices to the sdb user so that the liveCache can access them.
root@enhot2:/dev>ls rLC*
rLCDATA1 rLCLOG1
root@enhot2:/dev>chown sdb.sdba /dev/rLC*

Software Installation and Import of LC Data


Install liveCache software for a new installation on primary LC server.
Install the libHSS library support for SVC on both servers. See the referenced whitepaper for
the exact steps for installation.

Prepare for Export Import


Prepare for a data export from the production liveCache. In this example, an NFS filesystem
was used as the export/import device. This was prepared, exported from the production
liveCache and mounted on the new hot-standby primary in advance.
Backup medium, used for the source LC backup file, exported via NFS from is02d4 to is02d6
in preparation for the backup/restore transfer of data content in the liveCache. An alternative
method would have been to use a named pipe for export / import on the target system.

NFS mount from is02d4 to is02d6


Filesystem 1024-blocks Free %Used Iused %Iused Mounted on
/dev/hd4 262144 48912 82% 2939 3% /
/dev/hd2 2490368 89720 97% 47124 8% /usr
/dev/hd9var 131072 75640 43% 720 3% /var
/dev/hd3 262144 111944 58% 326 1% /tmp
/dev/hd1 131072 100208 24% 132 1% /home
/proc - - - - - /proc
/dev/hd10opt 131072 114124 13% 651 2% /opt
/dev/install 5242880 467108 92% 1925 1% /install
/dev/sapdb75 983040 952144 4% 16 1% /hotsapdb
is02d4:/sapdb/backup 15728640 14366436 9% 292 1% /mnt

15.07.04 8:31 Page 17 of 57


ISICC Walldorf Technical Brief 23/07/04

On the new hot-standby primary, build a


new instance of liveCache using raw
devices to reload the original liveCache
data.
Using the dbmgui, create a blank database
instance

LiveCache Migration
1) Shutdown SCM and from the original liveCache, create a full database backup to the NFS
exported file-system.Full database backup of liveCache on file-system, using dbmgui tool,
for restore into the new instance.

Label: DAT_00002
Date: 11.05.04 17:03:56
Medium: full
Volumes: 1
Size: 715328 KB | 89416 Pages
Log Page: 1140998
Last Savepoint: 11.05.04 17:03:55
Is consistent: Yes

2) On the new hot-standby primary, Install the

liveCache software, and primary,


create an instance for recovery

15.07.04 8:31 Page 18 of 57


ISICC Walldorf Technical Brief 23/07/04

3) Follow the path to restore the liveCache


from the backup taken from the original.

4) Define the backup device naming the file in

the NFS mounted file-system in which the


backup is located.

5) Select the newly created medium and


verify by reading the label that it is correct.
Restore the database content from this file.

Result of the restore of data to the new


primary instance.

15.07.04 8:31 Page 19 of 57


ISICC Walldorf Technical Brief 23/07/04

Attaching the APO System


Install the liveCache client on the APO system if it is not already installed. In this case we
removed the old version of liveCache once it was imported into the raw devices on the hot
standby server, and the client software was removed as well as it was part of the same volume
group.

The following client software was installed with profile 0, runtime for SAP AS.
Installation of MaxDB Software
*******************************
starting installation Th, May 13, 2004 at 20:36:39
operating system: AIX PowerPC 5.2.0.0
callers working directory: /hotsapdb/maxdb-server-aix5-64bit-ppc-7_5_0_10
installer directory: /hotsapdb/maxdb-server-aix5-64bit-ppc-7_5_0_10
archive directory: /hotsapdb/maxdb-server-aix5-64bit-ppc-7_5_0_10

existing profiles:
0: Runtime For SAP AS
1: DB Analyzer
2: JDBC
3: Server
4: Loader
5: ODBC
6: all
7: none
please enter profile id:

Actions for Users


The <sid>adm was added to group “sdba” to allow access to sdb directories belonging to the
liveCache client. Under /home/<LC-sid>adm, remove any .XUSER file.
LC10
The connection addresses and instance name must be changed in LC10. The connection host
must be the ip-alias for the hot-
standby, or service name. In this case
this alias is HOTLC. This is the
service address that HACMP manages

and activated on the primary server. As the


HACMP environment is not yet available, the
liveCache hostname is used rather than the
service address to test the new primary. This
was replaced as soon as the HACMP
environment was activated and the service IP
address available. The applications servers should be restarted after a change in access
information.

15.07.04 8:31 Page 20 of 57


ISICC Walldorf Technical Brief 23/07/04

Installation of the Standby


For this next chapters, use the reference document “MaxDB 7.5 on IBM TotalStorage
Enterprise Storage Server Integration of the Hot-Standby System Solution” Mainz 2004.
This document describes how the hot standby cluster nodes are to be installed and activated.
A copy of the integration profile, the RTEHSS_config and the liveCache parameters can be
found in the appendix of this document.

Software Installation on the standby


Install the liveCache software for a new installation on secondary LC server via SDBINST.
Choose the ID of the profile “Server” or “all”.

Integrating SCM 4.1 with Hot-Standby liveCache


LCINIT and Application Monitoring
It is possible to monitor the liveCache application as well as the liveCache server and trigger a
failover should either of these components come to grief. Normally, if an application fails and
the server is still active and available, it makes sense to restart the application on the original
server. In the case of liveCache, this is not desirable as the failed application has lost its
memory cache and would need to rebuild it prior to resuming productive work, while the
standby has an intact memory cache available. Therefore whether server or application
failure triggers the HA action, the object is a failover to the standby.

When liveCache application monitoring is active, and this is an optional but recommended
feature, it will react to the termination of the liveCache database and trigger a failover. This
reaction will also be triggered if the database is shutdown via LC10 (lcinit) unless
precautions are taken. In the case of a clean shutdown of the liveCache from APO, the
application monitoring under HACMP is turned off. It is restarted when the liveCache is
restarted from APO. APO starts and stops liveCache by means of the lcinit script so this
action will take place when the lcinit is invoked manually as well. In order to activate this
action, a link to the following script must be placed in /sapdb/<LCNAME>/db/sap to the file
/usr/es/sbin/cluster/local/lccluster, or the lccluster script must be copied into the directory
documented here.
To create a symbolic link use.
ln –s /usr/es/sbin/cluster/local/lccluster /sapdb/<LCNAME>/db/sap/lccluster

This must be done on all the liveCache hosts.

SETTING PERMISSIONS and Preparing Users on Both Servers


In order for the <LC-sid>adm to be able to activate HACMP scripts, the following
permissions for the following utilities must be set to 4755:
/usr/es/sbin/cluster/events/utils/cl_echo
/usr/es/sbin/cluster/events/utils/clRMupdate

Example:
root@enhot1:/usr/es/sbin/cluster/events/utils>chmod 4755 clRMupdate cl_echo

15.07.04 8:31 Page 21 of 57


ISICC Walldorf Technical Brief 23/07/04

On both hot standby servers, the users, sdb and <LC-SID>adm, must be members of the
system group or they will not have the authorization to run the HACMP utilities which
support the automatic starting, stopping, and initializing of the hot-standby servers.

Setup of Hot-Standby under HACMP


For this next chapters, use the reference document “MaxDB 7.5 on IBM TotalStorage
Enterprise Storage Server Integration of the Hot-Standby System Solution” Mainz 2004. This
document describes how HACMP was originally configured for this system.
Below is an overview of the resources used for this test setup:

NODE1: is02d6 => enhot2


datenvolumeVG: sapdatavg
logvolumeVG: saplogvg

datenLV: LCDATA1
vpath2 / SVC vdiskID: 67
logLV: LCLOG1
vpath3 / SVCvdiskID: 68

NODE2: is02d8 => enhot1


datenvolumeVG: sapdatavg
logvolumeVG: saplogvg

datenLV: LCDATA1
vpath4 / SVC vdiskID: 67
logLV: LCLOG1
vpath2 / SVCvdiskID: 68

Extract from the RTEHSS Configuration File (Valid for Both Nodes)

/usr/opt/ibm/ibmsap/HOTSVC/RTEHSS_config.txt
Variables:
MlCLogVdiskID (set to 68)
MlCDataVdiskID (set to 67)

SlCLogVdiskID (set to 68)


SlCDataVdiskID (set to 27)

HSS_NODE_001 (set to enhot2 )


HSS_NODE_002 (set to enhot1)

(Delete the values for any variable which are not being used)

• Follow the instructions to initialize the standby to activate the first data copy.
• On the SVC delete any outstanding flashcopy tasks referring to the hot-standby disks.
• On NODE2 do the following:
- delete all disks and run cfgmgr to get the correct PVIDs following the flash-
copy initialization
- importvg -n -R -y sapdatavg hdisk3

15.07.04 8:31 Page 22 of 57


ISICC Walldorf Technical Brief 23/07/04

- switch imported VG from hdisk to vpath:


hd2vp sapdatavg

- remove auto-varyon of volume group


smitty vg => set autovaryonvg = off for sapdatavg

- activate the volume group


varyonvg -u sapdatavg

- alter the owner of the raw devices /dev/rLCDATA1 und /dev/r/LCLOG1 to


'
sdb'

The following error was received when importing the raw devices on NODE2. This happens
occasionally when importing raw devices. The result is that the LV cannot be expanded, but
that is of no consequence here as a formatted data device cannot be extended under liveCache
anyway.

root@enhot1:/>importvg -n -R -y sapdatavg hdisk3


0516-1281 synclvodm: Warning, lv control block of LCDATA1
has been over written.
0516-622 synclvodm: Warning, cannot write lv control block data.
sapdatavg
root@enhot1:/>

Use IP-ALIAS as LiveCache Host Name

Alter the liveCache server name in


LC10 to use the HACMP rotating IP-
Alias which is used as the service
name for a liveCache hot-standby
cluster. Stop and restart the
application server to synchronize this
information.

15.07.04 8:31 Page 23 of 57


ISICC Walldorf Technical Brief 23/07/04

Define the primary instance as first Hot Standby master

Enable the instance on the primary server as a hot standby instance. This instance will be the
first master.
dbmcli -n <master> -u control,control -d HOTSVC
> db_offline
> param_directput ALLOW_MULTIPLE_SERVERTASKS_UKTS YES
> param_checkall
> hss_enable lib=libHSSibm2145 node=HOTLC
> db_online

The parameter ALLOW_MULTIPLE_SERVERTASKS_UKTS=YES allows better


scalability in terms of CPU usage on the standby instance. This insures the standby instance
always stays only a view seconds behind the master.

Add the standby instance on the secondary server

Register the standby instance on the secondary server and activate it. Connect via dbmcli to
the primary server.
dbmcli -n <master> -u control,control -d HOTSVC
> hss_addstandby <standby> login=sapdb,passwd
> db_standby <standby>

The command db_standby copies all parameters from the master to the standby, starts and
initialises the standby instance. Initialisation means it starts the flashcopy tasks which copies
an I/O-consistent image of the master data area to the standby. The standby instance doesn’t
wait until all blocks are copied. The standby instance immediately can work with it’s data
image. At the end, the command db_standby sets the instance in mode STANDBY.

Reinitializing the Standby Instance

Under certain circumstances, it may be necessary to reinitialize the standby instance by hand.
The situation which led to this problem during the tests has been corrected, but the actions
have been captured here for reference..

Example for the Initialization of the Standby:

dbmcli -n <master> -u control,control -d HOTSVC


> hss_execute <standby> db_offline
> hss_execute <standby> db_admin
> hss_execute <standby> util_execute init standby
> db_standby <standby>

15.07.04 8:31 Page 24 of 57


ISICC Walldorf Technical Brief 23/07/04

Issues and Recovery


Authorization Problem: User Mismatch
The new liveCache instance on the hot-standby primary was installed using a different default
for the database administrator user (dbm) as is typically the case. We will see that this caused
a number of problems later on. These problems and the solution have been documented here
in case others meet with a similar problem.

A problem was experienced in the test-box configuration apparently related to the fact that we
had done the functional tests on this same server which was not being installed with a new
liveCache installation using the same name as the original. We did this purposely in order to
fit into the environment which had already been configured for HA. Unfortunately, the new
installation took the default DBM-user: dbm,dbm and the original had the DBM-user:
control,control. Some of the errors encountered as a result are documented below.

In LC overview transaction lc10, the connection light is green indicating that LCA is working,
but an attempt to access the new liveCache via the sql connection fails:
CX_SY_NATIVE_SQL_ERROR. SAP kernel developer traces show the following error:
3004-330 Your encrypted password is invalid.

Reason for and Resolution of Authorization Problem


This problem was the result of a mismatch between the dbm users defined on the hot-standby
servers, and the one which was originally resident on the APO server. On the new liveCache
primary, is02d6, the dbm user for HOTSVC was dbm. On the APO server it was control. This
information is stored in several places in the liveCache file system and at OS level, such that
even dropping the original instance did not fully purge this data.

Actions:
The dbm user on the new liveCache primary was renamed to control. The following activities
were carried out on the new hot-standby primary (is02d6).

1. Place liveCache in “offline” mode

2. Write the new dbm user to the paramfile (file >indepdatapath>/config/<dbname>)


Format:
dbmcli –d <dbname> -u <olduser>,<oldpassword> param_directput
CONTROLUSERID <newuser>

dbmcli –d <dbname> -u <olduser>,><oldpassword> param_directput


COLTROLPASSWORD <newpassword>

Example on is02d6:
dbmcli –d HOTSVC –u dbm,dbm param_directput CONTROLUSERID control

15.07.04 8:31 Page 25 of 57


ISICC Walldorf Technical Brief 23/07/04

dbmcli –d HOTSVC –u dbm,dbm param_directput CONTROLPASSWORD control

4. Delete/move the user profile containers (upc files)


<indepdatapath>/config/<dbname>.upc
<rundirectory>/dbm.upc

On is02d6:
rm /sapdb/data/config/HOTSVC.upc
rm /sapdb/data/wrk/HOTSVC/dbm.upc

5. Restore the user profile containers with a fallback authorization against the paramfile
Format: dbmcli -s -d <dbname> -u <newuser>,<newpassword> db_state

Example on is02d6:
dbmcli -s -d HOTSVC -u control,control db_state

6. Bring the liveCache into mode “online”

7. Tell the DBM the user SYDBA and DOMAIN via a "upgrade system tables"
Format:
dbmcli -d <dbname> -u <newuser>,<newpassword> load_systab -u <sysdba>,<pwd> -ud
<domainpwd>

Example at is02d6:
dbmcli -d HOTSVC -u control,control load_systab -u superdba,colduser -ud domain
Note this not a normal administrative operation. Normally it is not possible to rename the dbm
user.

15.07.04 8:31 Page 26 of 57


ISICC Walldorf Technical Brief 23/07/04

Failover Tests
Application Level Hot Standby Verification Tests In Detail
Test Scenario in Detail : In order to test the smooth fail-over from one LC to another LC as
part of hot-standby functionality :

Test Execution :
1. Create a job consisting of a single step of SNP Heuristic run
Job GSN_CBSAP_DAILY_1_FAILOVER created in system is02d4 (SCM).

2. Run the job and download the Application log for the above job step run on Day1
( with liveCache 1 connected to the system )
Job run on 18.05.2004 ( Start time : 18.05.2004 19:47:09 )

3. Application log reviewed and downloaded for future reference

15.07.04 8:31 Page 27 of 57


ISICC Walldorf Technical Brief 23/07/04

Type Message text Det.


@5B\QInformation@ Input parameters @16\QDetail exists@
@5B\QInformation@ Customizing Settings @16\QDetail exists@
@5B\QInformation@ Active BAdI Implementations @16\QDetail exists@
@5B\QInformation@ Start of application: 19.05.2004 01:47:12 @5F\QNo detail exists@
@5B\QInformation@ End of application: 19.05.2004 01:51:05 @5F\QNo detail exists@
@5B\QInformation@ Runtime of application: 0:03:53 h @5F\QNo detail exists@
@5B\QInformation@ Total number of location products: 52 @5F\QNo detail exists@
@5B\QInformation@ Total number of demands: 37 @5F\QNo detail exists@
@5B\QInformation@ Total number of planned orders: 0 @5F\QNo detail exists@
@5B\QInformation@ Total number of stock transfers: 37 @5F\QNo detail exists@
@5B\QInformation@ Total number of product substitution orders: 0 @5F\QNo detail exists@
@5B\QInformation@ Stock transfers @16\QDetail exists@

4. Rerun the job on Day2 (while job is running, perform the fail-over wherein connection
to liveCache1 is broken and liveCache 2 is connected).
After technical setting up the system for hot-standby fail-over test, the Job
GSN_CBSAP_DAILY_1_FAILOVER was started ( by user BCKCTB ) at 19.09.05
14:45:02 hrs. It was allowed to run for over 100 seconds before Fail-over was
initiated. As expected, the job failed with the error ABAP/4 processor:
DBIF_DSQL2_CONNECTERR.

The job log details are :

5. After fail-over, the liveCache assignment was confirmed to have been moved to the
hot-standby. See the basis test results for the details of this.

6. The job was restarted again on 19.05.2004 14:55:37.


Job was successfully completed.

15.07.04 8:31 Page 28 of 57


ISICC Walldorf Technical Brief 23/07/04

7. The application log was analyzed and comparison of planning output of both
successful cycles ( i.e. each on 18.05.2004 and 19.05.2004 ) was carried out and is
enclosed in the embedded MS-Excel file. Extracts from this data are added below.

As expected, certain order/dates have undergone change due to running the jobs on 2
different dates.

Conclusion: Following the failover which occurred in the middle of an SNP planning job, the
liveCache data was reset to a consistent state such that the subsequent rerun of the planning
job completed successfully and the output data was correct.

Note: the failover was caused by a simulated liveCache database crash. This was done by
terminating the liveCache process. HACMP is active using application monitoring and
registers the loss of the liveCache application. As the memory cache of the failed process is
lost, there is no purpose in restarting the liveCache on the same server. Instead, the secondary
is switched to primary and the failed server is subsequently recovered and re-established as
the new secondary.

15.07.04 8:31 Page 29 of 57


ISICC Walldorf Technical Brief 23/07/04

Technical View of the Failover Test in Detail


This diagram depicts the actions which
Failover from enhot1 to enhot2 took place in the failover test on the hot
hotlc standby basis.
At the beginning of the test, enhot2 is the
liveCache server.
is02d8 is02d6 A failover will be triggered by killing the
enhot1 enhot2 liveCache kernel on enhot2. HACMP is
performing application level monitoring
and will see the application fail. HACMP
then triggers the actions which bring
about the hot-standby failover both at the
sapdatavg saplogvg sapdatavg OS level and at the liveCache level. At
the OS level, the IP-Alias (hotlc) which
is used as the service connection to
sapbackupvg
liveCache moves from primary to
HACMP Rotating standby during a failover. The volume
group, sapbackupvg, is a rotating
resource, which is owned by the current
primary to hold archive logs etc, moves
when the primary moves. HACMP does this directly. HACMP then triggers the liveCache to
perform the failover. LiveCache performs the steps required to bring the standby online,
which is basically a change of the log from read-only to read-write, and allow application
connections. If possible, HACMP and liveCache attempt to bring the failed liveCache
instance back online as soon as possible. In the case of this test, the server was still available
and there was no obstruction to a liveCache instance being immediately restarted. The
restarted liveCache, as it at this point in time does not have an intact data-cache, cannot return
as the master, but re-enters the cluster as a secondary hot-standby.

1) Starting Situation
Below the kernel processes of liveCache are displayed on both the cluster nodes.
Is03d8/enhot1 is standby (is02d8 10.17.70.190)
root@enhot2:/>ps -ef | grep kernel
root 34794 27186 0 14:22:47 pts/2 0:00 grep kernel
sdb 36658 1 0 18:40:25 - 0:02 /sapdb/HOTSVC/db/pgm/kernel HOTSVC
sdb 38052 36658 0 18:40:27 - 2:14 /sapdb/HOTSVC/db/pgm/kernel HOTSVC

Is02d6/enhot2 is master (is02d6 10.17.70.188)


root@enhot1:/>ps -ef | grep kernel
sdb 24626 30002 0 18:44:32 - 0:15 /sapdb/HOTSVC/db/pgm/kernel HOTSVC
sdb 30002 1 0 18:44:29 - 0:02 /sapdb/HOTSVC/db/pgm/kernel HOTSVC
root 33220 29688 0 14:21:07 pts/4 0:00 grep kernel

Looking at the network interfaces, we see that the rotating ip-alias used as the service address
for liveCache is visible on enhot2. This ip-alias, “hotlc” is controlled by HACMP and located
to the liveCache primary instance.

root@enhot2:/>netstat -i
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll
en1 1500 link#2 0.2.55.6a.31.7 15986625 0 1263577 0 0

15.07.04 8:31 Page 30 of 57


ISICC Walldorf Technical Brief 23/07/04

en1 1500 10.17.64 is02d6 15986625 0 1263577 0 0


en2 1500 link#3 0.2.55.9a.95.49 5999748 0 1093283 4 0
en2 1500 10.17 is02b6 5999748 0 1093283 4 0
en2 1500 192.168.0 enhot2 5999748 0 1093283 4 0
en2 1500 192.168.0 hotlc 5999748 0 1093283 4 0
lo0 16896 link#1 1401770 0 1399289 0 0
lo0 16896 127 loopback 1401770 0 1399289 0 0
lo0 16896 ::1 1401770 0 1399289 0 0
root@enhot2:/>

The dbmgui is connected to the hot-standby master from the user network. It attaches to the
address of is02d6. To really be able to watch the hot-standby in action from the dbmgui, it
must have access to the service network where the ip-alias “hotlc” is known. In the test
environment, this was a private service network and not accessible via the normal SAP
network. Therefore it was only possible to directly to the individual LC nodes.

2) Failure of the liveCache Primary


The liveCache primary is killed by sending a signal 9 to the main kernel process. This is done
at 14:30:55.
root@enhot2:/>ps -ef | grep kernel
sdb 36658 1 0 18:40:25 - 0:02 /sapdb/HOTSVC/db/pgm/kernel HOTSVC
sdb 38052 36658 0 18:40:27 - 2:15 /sapdb/HOTSVC/db/pgm/kernel HOTSVC
root 26168 32594 1 14:31:00 pts/5 0:00 grep kernel

root@enhot2:/>kill -9 38052
root@enhot2:/>date
Wed May 19 14:30:55 DFT 2004

The liveCache instance is down, no kernel processes are active now on enhot2.
root@enhot2:/>ps -ef | grep kernel
root 26168 32594 1 14:31:00 pts/5 0:00 grep kernel

3) HACMP Reacts (Primary)


root@enhot2:/usr/es/sbin/cluster/local>netstat -i
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll

15.07.04 8:31 Page 31 of 57


ISICC Walldorf Technical Brief 23/07/04

en1 1500 link#2 0.2.55.6a.31.7 16019990 0 1266227 0 0


en1 1500 10.17.64 is02d6 16019990 0 1266227 0 0
en2 1500 link#3 0.2.55.9a.95.49 6016716 0 1098320 4 0
en2 1500 10.17 is02b6 6016716 0 1098320 4 0
en2 1500 192.168.0 enhot2 6016716 0 1098320 4 0
lo0 16896 link#1 1406734 0 1404248 0 0
lo0 16896 127 loopback 1406734 0 1404248 0 0
lo0 16896 ::1 1406734 0 1404248 0 0
root@enhot2:/usr/es/sbin/cluster/local>

Croot@enhot1:/usr/es/sbin/cluster/local>netstat -i
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll
en0 1500 link#2 0.6.29.6c.c1.42 11285309 0 932067 0 0
en0 1500 10.17.64 is02d8 11285309 0 932067 0 0
en2 1500 link#3 0.2.55.9a.3b.bd 7446184 0 1627382 3 0
en2 1500 10.17 is02b8 7446184 0 1627382 3 0
en2 1500 192.168.0 enhot1 7446184 0 1627382 3 0
en2 1500 192.168.0 hotlc 7446184 0 1627382 3 0
lo0 16896 link#1 1748990 0 1749857 0 0
lo0 16896 127 loopback 1748990 0 1749857 0 0
lo0 16896 ::1 1748990 0 1749857 0 0
root@enhot1:/usr/es/sbin/cluster/local>

The following logs are excerpt from the following script:


#!/bin/ksh
. /usr/es/sbin/cluster/local/hacmpr3.profile
while [ 1 ]
do
echo `date` >> /tmp/hslog
/usr/es/sbin/cluster/utilities/clRGinfo | tee -a /tmp/hslog
/sapdb/programs/bin/dbmcli -d $LC_name -u control,control db_state | tee -a /tmp
/hslog
cat /usr/es/sbin/cluster/etc/clpol | tee -a /tmp/hslog
sleep 5
done
This Script periodically checks the status of the liveCache instance on the local host, and
snapshots the status of the HACMP resources. The timestamp is leading. The following is
extracted from the log on the primary, enhot2.

Wed May 19 14:30:47 DFT 2004


-----------------------------------------------------------------------------
Group Name Type State Location
-----------------------------------------------------------------------------
RG_LCmaster rotating OFFLINE is02d8 note: status of the revolving archive disk. It is owned by
ONLINE is02d6 the LC primary. It is online at is02d6 alias enhot2.

RG_LChot1 cascading ONLINE is02d8 status of the cluster servers, both are online.

RG_LChot2 cascading ONLINE is02d6

OK
State
ONLINE status of liveCache.. online
Wed May 19 14:30:55 DFT 2004
-----------------------------------------------------------------------------
Group Name Type State Location
-----------------------------------------------------------------------------
RG_LCmaster rotating OFFLINE is02d8
ONLINE is02d6

15.07.04 8:31 Page 32 of 57


ISICC Walldorf Technical Brief 23/07/04

RG_LChot1 cascading ONLINE is02d8

RG_LChot1 cascading ONLINE is02d8

RG_LChot2 cascading ONLINE is02d6

OK
State
OFFLINE LiveCache has failed!!
Wed May 19 14:31:02 DFT 2004
-----------------------------------------------------------------------------
Group Name Type State Location
-----------------------------------------------------------------------------
RG_LCmaster rotating OFFLINE is02d8 HACMP is active moving the rotating archive
RELEASING is02d6 resource and IP-Alias to the standby.

RG_LChot1 cascading ONLINE is02d8

RG_LChot2 cascading ONLINE is02d6


OK
State
OFFLINE
:
:
:
Wed May 19 14:32:05 DFT 2004
-----------------------------------------------------------------------------
Group Name Type State Location
-----------------------------------------------------------------------------
RG_LCmaster rotating ONLINE is02d8 Archive disk and ip-alias moved to enhot1
OFFLINE is02d6

RG_LChot1 cascading ONLINE is02d8

RG_LChot2 cascading ONLINE is02d6

OK
State
ADMIN local liveCache being reactivated
:
:
Wed May 19 14:32:13 DFT 2004
-----------------------------------------------------------------------------
Group Name Type State Location
-----------------------------------------------------------------------------
RG_LCmaster rotating ONLINE is02d8
OFFLINE is02d6

RG_LChot1 cascading ONLINE is02d8

RG_LChot2 cascading ONLINE is02d6

OK
State
STANDBY liveCache on enhot2 is now standby.
Wed May 19 14:32:21 DFT 2004
-----------------------------------------------------------------------------
Group Name Type State Location

15.07.04 8:31 Page 33 of 57


ISICC Walldorf Technical Brief 23/07/04

-----------------------------------------------------------------------------

3) HACMP Reacts (standby)


The following logs come from enhot1, the 2ndary standby. They depict the HACMP activities
as seen by the hot-standby 2ndary.

Wed May 19 14:30:48 DFT 2004


-----------------------------------------------------------------------------
Group Name Type State Location
-----------------------------------------------------------------------------
RG_LCmaster rotating OFFLINE is02d8 enhot2 hold the rotating resources. Enhot2 is
ONLINE is02d6 primary

RG_LChot1 cascading ONLINE is02d8 Both cluster members are online

RG_LChot2 cascading ONLINE is02d6

OK
State
STANDBY the local instance (enhot1) is the standby
Wed May 19 14:30:56 DFT 2004
-----------------------------------------------------------------------------
Group Name Type State Location
-----------------------------------------------------------------------------
RG_LCmaster rotating OFFLINE is02d8 HACMP has recognized the failure of the
TEMPORARY ERROR is02d6 primary at application level.. the cluster
RG_LChot1 cascading ONLINE is02d8 members (the servers) remain online

RG_LChot2 cascading ONLINE is02d6

OK
State
STANDBY
Wed May 19 14:31:04 DFT 2004
-----------------------------------------------------------------------------
Group Name Type State Location
-----------------------------------------------------------------------------
RG_LCmaster rotating OFFLINE is02d8 HACMP is moving the resources to the standby
RELEASING is02d6

RG_LChot1 cascading ONLINE is02d8

RG_LChot2 cascading ONLINE is02d6

OK
State
STANDBY
Wed May 19 14:31:11 DFT 2004
Wed May 19 14:31:34 DFT 2004
-----------------------------------------------------------------------------
Group Name Type State Location
-----------------------------------------------------------------------------
RG_LCmaster rotating ACQUIRING is02d8
OFFLINE is02d6

RG_LChot1 cascading ONLINE is02d8

RG_LChot2 cascading ONLINE is02d6

15.07.04 8:31 Page 34 of 57


ISICC Walldorf Technical Brief 23/07/04

OK
State
STANDBY
Wed May 19 14:31:59 DFT 2004
-----------------------------------------------------------------------------
Group Name Type State Location
-----------------------------------------------------------------------------
RG_LCmaster rotating ONLINE is02d8 Enhot1 now has the ip-alias and the
OFFLINE is02d6 archive resource

RG_LChot1 cascading ONLINE is02d8

RG_LChot2 cascading ONLINE is02d6

OK
State
ONLINE enhot1 is now online as the primary

4) Restarting the application


The following documents the reaction of the application to a failover.

NOTE: Unfortunately in the following logs, the time of the APO server was not synchronized
with the time on the liveCache servers, so there is a deviation of 16 minutes and 10 seconds.
The equivalent timestamp on the liveCache servers is denoted in brackets.

Error entry as seen in SM21. [14:30:32]

Error entry shortdump resulting from the liveCache failure. Seen in transaction ST22.
[14320:42]

Extracts from the SCM Developer Traces


/sapmnt/<SID>/sap/<instance>/work/dev_<work process>

The following developer trace indicated when the APO work process discovered the error.
Once a connection error is found, several retries are attempted and then the work-process is

15.07.04 8:31 Page 35 of 57


ISICC Walldorf Technical Brief 23/07/04

left in reconnect status. The 2nd extract shows the successful reconnection triggered by the
restart of the batch job at 14:55:37 [14:36:27].

The SNP application encounters an error on the liveCache connection. [14:30:32]


C Wed May 19 14:46:42 2004 [14:30:32]
C *** ERROR => SQL EXECUTE on connection DB_001, rc=-709 (CONNECT:
(database not running)) [dbdsada.c 1758]
:
:
Several Reconnects are tried [14:30:37]
B Wed May 19 14:46:47 2004
B Connect to LCA as SAPR3 with HOTLC-HOTSVC
C INFO : SQLOPT (set by environment) =
C INFO : SQLOPT= -I 0 -t 0 -F SAPDB.901306.pct
C Precompiler Runtime : C-PreComp 7.4.3 Build 030-123-056-274
C Precompiler runtime is SAP DB 7.4.3.030
C Try to connect as SAPR3/<pwd>@HOTLC-HOTSVC on connection 1 ...
C *** ERROR => CONNECT failed : sqlcode=-709 (CONNECT: (database not
running)) [dbadautl.c 311]
[14:30:42]
B Wed May 19 14:46:52 2004
B Connect to LCA as SAPR3 with HOTLC-HOTSVC
C INFO : SQLOPT (set by environment) =
C INFO : SQLOPT= -I 0 -t 0 -F SAPDB.901306.pct
C Precompiler Runtime : C-PreComp 7.4.3 Build 030-123-056-274
C Precompiler runtime is SAP DB 7.4.3.030
C Try to connect as SAPR3/<pwd>@HOTLC-HOTSVC on connection 1 ...
C *** ERROR => CONNECT failed : sqlcode=-709 (CONNECT: (database not
running)) [dbadautl.c 311]
B Reconnect failed for connection:
B 1: name = LCA, con_id = 000000165 state = DISCONNECTED, perm = NO ,
reco = YES, con_max = 255, con_opt = 255, occ = NO
B ***LOG BYY=> work process left reconnect status [dblink#1 @ 731]
[dblink 0731 ]
:
:
:
Reconnection resulting from restart of application is Successful! [14:39:27]
B Wed May 19 14:55:37 2004
B Connect to LCA as SAPR3 with HOTLC-HOTSVC
C INFO : SQLOPT (set by environment) =
C INFO : SQLOPT= -I 0 -t 0 -F SAPDB.901306.pct
C Precompiler Runtime : C-PreComp 7.4.3 Build 030-123-056-274
C Precompiler runtime is SAP DB 7.4.3.030
C Try to connect as SAPR3/<pwd>@HOTLC-HOTSVC on connection 1 ...
C Attach to SAP DB : Kernel 7.5.0 Build 012-123-071-164
C Database release is SAP DB 7.5.0.012
C INFO : Database 'HOTSVC' instance is running on 'HOTLC'
C INFO : LVC DEFAULTCODE = ASCII
C 01: DB_001 HOTLC-HOTSVC, conn=1, since=20040519145537
C ABAP= /SAPAPO/SAPLOM_CORE (3992)
C Now I'm connected to SAP DB
B Connection 1 opened

SM37 Runtimes for SNP Jobs


The first job terminated at 14:46 :53 [14:30:43] as a result of the liveCache connection error.

15.07.04 8:31 Page 36 of 57


ISICC Walldorf Technical Brief 23/07/04

The job was manually restarted at 14:55:37 [14:39:12] and the connection to liveCache was
successfully re-established as result of the first access attempt at 14:55:37 [14:39:27], see
above trace data.

Failover as seen from the Standby Server


Extractions from the Log of enhot1: Original Standby
This extraction from the knldiag log of liveCache shows the 2ndary being taking over as
master, or primary after the failure of the liveCache on enhot2.

2004-05-19 14:31:52 61 53070 SAVPOINT B20PREPARE_SVP: 222


2004-05-19 14:31:52 61 8 Pager SVP(3) Start Write Data
2004-05-19 14:31:52 61 9 Pager SVP(3) Stop Data IO, Pages: 3 IO:
3
2004-05-19 14:31:52 61 10 Pager SVP(3) Start Write Converter
2004-05-19 14:31:52 61 11 Pager SVP(3) Stop Converter IO, Pages:
62 IO: 62
2004-05-19 14:31:52 61 53071 SAVPOINT B20SVP_COMPLETED: 222
2004-05-19 14:31:52 61 53000 OBJECT Restarted Garbage coll: 10
2004-05-19 14:31:52 59 34 Admin Kernel state: 'REDO LOG'
finished
2004-05-19 14:31:52 80 13952 RTEHSS Standby role set: MASTER
2004-05-19 14:31:52 80 54003 dynpool DYNP_A42_PARSEID : 600
2004-05-19 14:31:52 80 54000 OBJECT
/sapdb/HOTSVC/db/lib/lib64/liboms
2004-05-19 14:31:52 80 201 RTE Kernel state changed from
STANDBY to ONLINE
2004-05-19 14:31:52 80 202 RTE Used physical memory 1036 MByte
2004-05-19 14:31:53 80 36 Admin Kernel state: 'ONLINE' reached
2004-05-19 14:31:53 80 13952 RTEHSS Standby role set: MASTER
2004-05-19 14:31:53 80 11560 COMMUNIC Releasing T68
2004-05-19 14:31:53 80 12929 TASKING Task T68 started
2004-05-19 14:31:53 80 11007 COMMUNIC wait for connection T68

================================================================

Failover as seen from the Standby Server


Extractions from the Log of enhot2 Original Primary
This extraction from the knldiag log of liveCache shows the primary failing (the process was
killed for this test) and the actions which restore it online as the new standby. The enhot1
takes over as the new primary, and this can be seen in the knldiag of enhot1 above.

From the old sapdb knldiag: the signal 9 used to kill the liveCache kernel and simulate a
liveCache failure is seen arriving at 14:20:51.

++++++++++++++++++++ Kernel Exit ++++++++++++++++++++++++++++


2004-05-19 14:30:51 0 12847 DBSTATE Kernel exited without core and
exit status 0x90009
2004-05-19 14:30:51 0 12850 DBSTATE Kernel exited due to signal
9(SIGKILL)
2004-05-19 14:30:51 0 12808 DBSTATE Flushing knltrace pages
2004-05-19 14:30:52 0 11987 dump_rte rtedump written to file
'rtedump'

15.07.04 8:31 Page 37 of 57


ISICC Walldorf Technical Brief 23/07/04

2004-05-19 14:30:52 0 WNG 11824 COMMUNIC Releasing T69 kernel abort


2004-05-19 14:30:52 0 WNG 11824 COMMUNIC Releasing T72 kernel abort
2004-05-19 14:30:52 0 WNG 11824 COMMUNIC Releasing T75 kernel abort
2004-05-19 14:30:52 0 WNG 11824 COMMUNIC Releasing T78 kernel abort
2004-05-19 14:30:52 0 WNG 11824 COMMUNIC Releasing T81 kernel abort
2004-05-19 14:30:52 0 WNG 11824 COMMUNIC Releasing T84 kernel abort
2004-05-19 14:30:53 0 12696 DBSTATE Change DbState to 'OFFLINE '(29)
--------------------------------------- current write position --------------
---

From the new sapdb klndiag: HACMP is able to restart the liveCache application, and the
new primary places this restarted instance into standby mode.
:
:
2004-05-19 14:32:01 133 12821 TASKING Thread 133 starting
2004-05-19 14:32:01 133 11597 IO Open '/dev/rLCLOG1' successfull,
fd: 13
2004-05-19 14:32:01 133 11565 startup DEVi started
2004-05-19 14:32:01 14 11000 vattach '/dev/rLCLOG1' devno 1 T2
succeeded
2004-05-19 14:32:01 14 11000 vdetach '/dev/rLCLOG1' devno 1 T2
2004-05-19 14:32:01 11 12822 TASKING Thread 132 joining
2004-05-19 14:32:01 132 11566 stop DEVi stopped
2004-05-19 14:32:01 11 12822 TASKING Thread 133 joining
2004-05-19 14:32:01 133 11566 stop DEVi stopped
2004-05-19 14:32:01 14 13950 RTEHSS RTEHSS_API
[RTEHSS_API(COPY):RTEHSS_SetLogReadOnlyStatus]
2004-05-19 14:32:01 14 13950 RTEHSS RTEHSS_API [Got valid handle]
2004-05-19 14:32:01 14 13950 RTEHSS RTEHSS_API [Would set log access
to read only]
2004-05-19 14:32:01 14 13953 RTEHSS Standby role set: STANDBY
(master node ENHOT1)
2004-05-19 14:32:02 8 201 RTE Kernel state changed from
STARTING to ADMIN
======================================= end of startup part 2004-05-19
14:32:02 8 11570 startup complete
2004-05-19 14:32:03 10 11561 COMMUNIC Connecting T68 local 37194
2004-05-19 14:32:03 80 11561 COMMUNIC Connected T68 local 37194
2004-05-19 14:32:03 80 11560 COMMUNIC Releasing T68
2004-05-19 14:32:03 80 12929 TASKING Task T68 started
:
:
:
2004-05-19 14:32:14 81 42 Admin Hotstandby: register succeded;

15.07.04 8:31 Page 38 of 57


ISICC Walldorf Technical Brief 23/07/04

Summary of the PoC


In summary, the application level verification completed in all accounts as expected.
Administration of the liveCache hot-standby configuration from APO was successfully. The
integration of the application level LCINIT to allow starting and stopping directly from SCM
(transaction lc10) was tested for the first time in this PoC. A new HACMP script was written
in support of the application level monitoring to ensure that the liveCache could be shutdown
when desired, without automatic failover to the standby server. The failover was tested
successfully under load using a number of standard test tools such as /sapapo/om03 and using
/sapapo/om17 for consistency verification. The real proof of concept was the test of the
failover under real customer load, and the proof of data consistency following the recovery.
This proves the functional design at the highest level. The second proof point was the speed of
the recovery. In the test environment, the liveCache data-cache was not very large so there
was no way of really demonstrating the time saved by the hot-standby vs. a failover solution
or a shadow database concept. Nevertheless, a main consideration when comparing the
various techniques is that the hot-standby failover reaction time does not increase with the
increasing size of the liveCache. This is an extremely important statement when considering
the size of the liveCaches being considered by many large customers.
The liveCache hot-standby is an important component for a highly available SCM landscape.
Integrated with other component solutions, an extremely robust system can be created which
is easy to maintain and administer. Based on the virtualization layers of the TotalStorage
components, it is easily expanded and administered at the infrastructure level as well,
allowing for maintenance without downtime.

Failover Solution Comparison


A further consideration in any liveCache failover is the time required to rebuild the memory
cache. LiveCache installations can have memory caches of many gigabytes in size. A restart
of the liveCache requires a reconstruction of this cache resulting in a large delay in production
restart, and performance loss until the cache is fully rebuilt.
All the solutions below are supported on AIX and IBM storage. The Hot-Standby solutions is
currently unique to IBM.

Traditional HA Cluster
Application: general to most database management systems.
Concept:
A cluster solution with two servers in active/standby mode. Both servers have physical access
to the same disks, with mutually exclusive use. Cluster software (HACMP) communicates
between the two servers and reacts to a hardware failure. In the event of a hardware failure,
the disks are taken over by the standby and activated, the IP address for client access is
switched to the standby, and the database application is restarted on the standby.

Reaction time:
Dependent on the number of volume groups and disks in the takeover, this is estimated at 2-3
minutes.
In a failover scenario, the active disks and active log of the failed server are brought back
online. Uncommitted transactions at the time of failure must be reset. If the crash occurred
during heavy load with many uncommitted transactions this could take a considerable amount
of time. A guestimate is <1hr.

15.07.04 8:31 Page 39 of 57


ISICC Walldorf Technical Brief 23/07/04

Benefits:
Provides automated failover and generally a fast reestablishment of service.

Considerations:
Does not protect against disk failure. Rollback of non-committed transactions required.
Relatively complex landscape. For liveCache, the backup database is started new which
requires the memory cache to first be rebuilt before resuming work.

Shadow Database
Application: general to most database management systems
Concept:
Two servers with separate database instances (separate data); one active and the other standby.
The logs of the first database are “shipped” to the second via network. The 2nd database is in
constant recovery mode and continually applies all the changes recorded in the logs coming
from the active server. When the active server fails, the final logs must be applied to the
shadow database, the database taken offline and restarted in active mode using a copy of the
failed server’s online log in order to recover to the point in time of the failure. Some solution
must be provided to takeover the IP address for client access. It is possible to automate these
steps using software like Libelle.

Reaction time:
Dependent on amount of data in the logs which still remains to be applied. Some shadow
databases are purposely run at 1 hr delay in order to protect from a logical error on the active
database. It is thought that this gives enough reaction time to protect the shadow.
If the scenario is automated, the time will include up to several hours of redo logging, and
then a stop and restart of the database.

Benefits:
Failover can be automated. Provides a disaster site recovery scenario. Protects from disk
and/or disk server failure. Rollback of failed transactions not required.

Considerations:
Complex landscape. If the solution is not automated, the reaction time can be greater and
more error prone due to manual intervention requirements. For liveCache, the database is
restarted and therefore the cache must be rebuilt before resuming work.

Hot Standby Database


Application: MaxDB and liveCache unique
Concept:
Two servers, primary and standby, with separate database instances (separate data), in
communication and sharing a single log. The solution is integrated into the storage subsystem
via a command API allowing it to control the copy services. The primary creates the standby
by initiating a “split mirror” (flash copy) of its data for the standby. It then activates the
standby instance. The standby uses its copy of the data, and continually follows the activities
of the primary by reading the primary’s active log. In this manner the standby stays seconds
behind the primary and keeps its memory data cache actual. In case of a failure, the standby
becomes the primary, and reverses its log behaviour from read to write. Should the failed
server become available again, the new primary can bring it online as the new standby by
15.07.04 8:31 Page 40 of 57
ISICC Walldorf Technical Brief 23/07/04

reversing the split mirror. HACMP provides the failover control, initiating the standby to
primary switch as result of either a hardware or application failure on the part of the primary.

Reaction time:
Very quick. <1 minute mostly due to HACMP having a small delay time in reacting.

Benefits:
Completely automated rotating failover. No rollbacks in liveCache; the roll forward is
seconds, and the liveCache data cache is already built and full of the most actual data.
Can protect against disk and disk server failure.

Considerations:
Supported for ESS with flash copy functionality, or SAN Volume Controller with either ESS
or FastT. Available with SCM 4.1 (liveCache and MaxDB 7.5).

Highly Available SCM Landscape Designs


A high available liveCache is only part of an HA SCM landscape. There are a number of
components which require HA consideration to make a total solution.
These include the SCM database, the liveCache, and the SAP enqueue server for data
synchronization. The following diagrams show possible high availability landscapes for SCM
in which all of the major components have HA solutions. Some of these solutions are
database specific and others are general solutions. This is not an exhaustive list of the
possibilities as there are so many variations.

In the following configurations the SAP SCM liveCache is generally protected using the hot-
standby solution. The SAP Central Instance is always running on the database server and a
replicated enqueue server always running on the complementary serve. Consequently the
subsequent scenarios will only show variations in the deployment of the underlying DB
database server.

Each scenario has advantages and disadvantages and thus needs to be selected depending on
the particular needs of customers.

Beyond the protection against server failure a complete high availability environment needs to
consider and protect against the lost of data (e.g. user or application error; disk system crash)
or disaster (e.g. destruction of a complete data center). Thes issues exceed the scope of this
document.

Using a “Traditional” Failover Cluster


This following diagram depicts a high availability scenario for a SAP SCM system using
HACMP and active/active fail-over configuration.
The two SAP CI/DB database servers perform mutual takeover for the partner server. If one
system fails, the corresponding database and corresponding central instance are restarted on
the 2nd server.

15.07.04 8:31 Page 41 of 57


ISICC Walldorf Technical Brief 23/07/04

Depending on the application service requirements (e.g. response times) both servers may
need to be sized bigger than just for indented SAP CI and DB server workload so to be able
handle the additional workload in case of a failover and still meet service requirements.
Workload management can be used to force the optimal resource distribution under the extra
load of a takeover.

SAP R/3 System SAP APO System


SAP APO liveCache 7.5 System

Application Application
Application Application SAP APO SAP APO
ApplicationServer ServerApplication LiveCache LiveCache
Server CIF Server DCOM
Server Server Server Server

HACMP
SAP CI / DB2 Failover SAP CI / DB2
Hot-Standby Solution
Server Cluster Server

This type of HA solution is chosen to protect against server failure (e.g. failure of the server
running the SAP Central Instance and DB2 server). It is not a protection against disk failure
(each database is stored only once and only accessed from the 2nd server in case of a failure)
nor it is a disaster recovery protection (e.g. usually both server are located relatively close).
This type of HA solution is the most commonly used implementation. It is sufficient to
address most customers’ high availability needs and it is also the most efficient and cost-
effective one. Depending on the setup of the cluster, usually only the application on the failing
server is affected by the failover but not the application on the other server. This HA
implementation supports most database management systems.

Using SAP Multiple Components in One Database (MCOD)


If the SAP R/3 and the SAP APO databases are relatively small one might consider putting
them both into the same database using SAP MCOD architecture.

DB2 with MCOD

DB2 supports SAP’s MCOD by allowing each SAP system to be run as separate DB2
partitions all sharing the same database. This makes it possible to run multiple DB2 partitions
together on a common server or distributed over many servers or even combination thereof.
The benefit of customers deploying MCOD with DB2 is better scalability (e.g. a customer can
start “small” running all partitions on one server and later scale-out onto many servers as the
databases grow and require more resources). It also provides administration benefits as, for
example, all “logical” SAP databases (e.g. SAP R/3, SAP APO, …) can be backup’ed and
restored at once, making it easy restoring all coupled systems to a consistent point in time.
The scenario below is basically equivalent to the previously shown traditional failover
scenario, except for the fact that the SAP R/3 and the SAP APO system share the same
database.

15.07.04 8:31 Page 42 of 57


ISICC Walldorf Technical Brief 23/07/04

SAP R/3 + APO MCOD System


SAP APO liveCache System
Application Application
Application Application SAP APO SAP APO
R/3 Appl. Server ServerAPO Appl.
Server CIF Server DCOM LiveCache Server LiveCache Server
Server Server

DB2 R/3 Partition DB2 APO Partition

DB2 APO Partition DB2 R/3 Partition IC Hot-Standby Solution

HACMP
Failover
Cluster

One server runs the DB2 partition for the SAP R/3 system while the other runs the DB2
partition for the SAP APO system. They are linked into an HACMP cluster and act as mutual
takeover.

Oracle RAC and MCOD

Under Oracle RAC, there are two database engines with shared access to the same physical
database. One database engine is dedicated to R/3 and the other dedicated to APO, but both
capable of doing either. In the case of a failure, HACMP with move the access service address
to the remaining engine and the the application servers of the failed database engine will
reconnect to the remaining engine. This solution provides a very fast failover as no database
takeover or application restart is necessary.

SAP R/3 + APO MCOD system

R/3 Appl APO Appl


DCOM
SAP SAP
R/3 Appl APO Appl liveCache liveCache
R/3 Appl APOAppl Primary Secondary

Oracle RAC Oracle RAC


Engine 1 Engine 2
GPFS
Hot Standby Solution
R/3 APO
SInge Oracle Instance

MCOD Benefits and Considerations


The benefit of the MCOD configuration over the earlier one is that the R/3 and the APO
database are backed-up at the same time and can thus be easily restored to the same point in
time. Otherwise an administrator would have to do this manually in case there was a problem
with either of the database requiring a restore. The drawback of this scenario is that since the

15.07.04 8:31 Page 43 of 57


ISICC Walldorf Technical Brief 23/07/04

data resides in the same database, the SAP R/3 and the SAP APO system share the same
database configuration, which is usually only suitable for small databases.
Even though not stated in the chart this scenario could also be implemented together with
SAP Enqueue Server replication.

Using a DLPAR Cluster Landscape


The following sections illustrate possible deployment scenarios using pSeries’ DLPARs.

Oracle RAC and DLPAR

The following configuration is again based on Oracle RAC. In this case there are two separate
RAC databases, one for ERP and one for APO. Two dynamic LPARs on separate machines
can each contain an engine for each database, or the databases can be separated into an LPAR
each on both machines. A failure of any database engine will cause the 2nd engine to takeover
the load. Dynamic LPAR functionality (on-demand) can be used to activate and enable
additional resources to the remaining engine.

SAP R/3 + APO MCOD system


SAP SCM LiveCache System
R/3 Appl APO Appl
R/3 Appl APO Appl DCOM
R/3 Appl APOAppl SAP SAP
liveCache liveCache
Primary Secondary

Oracle RAC Oracle RAC


Engine 2 Engine 1 Hot Standby Solution

Oracle RAC Oracle RAC


Engine 1 Engine 2

R/3 APO

GPFS GPFS

15.07.04 8:31 Page 44 of 57


ISICC Walldorf Technical Brief 23/07/04

Using the Spare Failover Capacity for SAP Application Server Workload

The following scenario is basically identical to the scenario described in Using a “Traditional”
Failover Cluster except that the systems are each partitioned into 2 DLPAR. One server
partition on each server run the SAP CI and DB server workload and the other (using the
additional and spare capacity needed for the take-over) runs some of the application server
workload.

SAP R/3 System SAP APO SAP APO liveCache


System System
Applicat Applicat
R/3Applicat Applicat SAP APO SAP APO
Appl. ion ionionAPO
ion CIF DCO LiveCache LiveCache
Server Appl. Server Server

HACMP
APO Appl. Server Failover R/3 Appl. Server
Cluster
CI / DB2 Server
CI / DB2 Server IC Hot-Standby Solution

In case of a failure of the database server in one DLPAR the corresponding application server
on the other server is stopped and the partition used to run the SAP CI and DB server
workload. Dynamic LPAR functionality (on-demand) can be used to activate and enable
additional resources or steal resources from less priority workloads.

DB2 High Availability Disaster Recovery (DB2 HADR1)


In this example, DB2 is using spare capacity on the database server to run an HADR1 system.
DB2 HADR (High Availability Disaster Recovery) provides a means to create and maintain a
synchronized copy of a database. The synchronized copy will be used here to minimize
downtime in case of a fail-over. This new functionality is available with DB2 V8.1 Fixpak 7
or DB2 V8.2 respectively.

SAP R/3 SAP APO SAP APO


System System
Applic Applic liveCache System
Applic
R/3 Applic
APO
SAP APO SAP APO
ation
ation CIF ation
ation DCO LiveCache LiveCach
Appl. Appl. Server e Server

HACMP
CI / DB2 Server Cluster CI / DB2 Server

DB2 HADR APO DB2 HADR R/3 IC Hot-Standby


Replication

15.07.04 8:31 Page 45 of 57


ISICC Walldorf Technical Brief 23/07/04

Adaptive Computing – On-Demand


The landscape below is based on IBM’s new implementation of the SAP adaptive computing
design using TotalStorage SAN FileSystem and Tivoli System Automation for dynamic
failover or capacity increase. This configuration supports both the ERP and the SCM
databases and application servers. This solution supports any database technology supported
by SAP. This represents a failover solution in which the spare server capacity can be used as a
standby for any failing landscape component. LiveCache can also be supported in this manner
but it would represent a failover and not a hot-standby.

Tivoli System Spare


R/3 Automation Server
Capacity
R/3
DB- ESS
SANFS APO
Server
DB-
Server
liveCache liveCache
pSeries ’
pSeries
APO
lC hot-standby
solution
Summary
This chapter has attempted to present a number of possible SCM landscape configurations
which in which the hot-standby liveCache can play a roll in providing the best total solution.
The possibilities are very extensive and these presented here are just a few examples for
thought and discussion.

15.07.04 8:31 Page 46 of 57


ISICC Walldorf Technical Brief 23/07/04

Overview of the Consolidated Colgate Asia Test-Box Project

This chapter introduces the Colgate Asia project and describes the environment in which the
sub-projects (referred two as parallel project threads) were carried out. The main thread,
which launched the joint project, was a function upgrade of the Colgate SCM landscape to the
newest SAP versions which were in focus for the Asia system in 2004-2005. The parallel
project threads were designed to take advantage of the hardware and software landscape, to
explore new IBM and SAP technology. The parallel threads included the introduction of
storage virtualization into a production system landscape, and the migration of a production
SCM system to the new hot-standby solution for liveCache.

The Functional Migration

TestBox ERP <-> SCM Landscape


ERP SCM APO 4.1
mySAP.ERP APO 4.0
4.7 APO 3.0
R/3
APO 3.0 sp 23
4.6C PI 2004.1 sp19
Plugin PI 2003.1 LC 7.5
ORACLE 9.2
2002.1 64bit LC 7.4.3
ORACLE 9.2
64bit LC 7.4.2

AIX 5.2 AIX 5.2

pSeries pSeries
p690 p690
6-32cpu 6-32cpu First Ever
1.3 GHz 1.3 GHz

The Asia landscape delivered by Colgate included the ERP and the SCM systems. These were
cloned onto the IBM hardware (described later) and re-established as a “production” pair in
Walldorf. With the help of Colgate, an SCM load scenario based on actual Colgate planning
jobs was designed which provided for repeatable functional and load testing. This test
scenario provided a reliable load generation which was then used for the comparison
performance tests through out the upgrade. For each of the upgrade steps, a series of these test
batteries was run to be compared against the previous baseline. This was done to ensure that
each upgrade either maintained or improved the system performance. The test scenario is
documented in detail for the main project thread. The above diagram depicts most, but not all
of the migration steps. Each of these steps was documented to provide a roadmap for
customer migration including the solutions to problems found and recommendations for
performance or functional improvements.

This project is described in detail in the document:


IBM/SAP Test-Box Whitepaper: “Moving forward: early upgrade insights of SAP
ERP and SAP SCM” Walldorf June 2004

15.07.04 8:31 Page 47 of 57


ISICC Walldorf Technical Brief 23/07/04

Introducing Storage Virtualization

The current trend is moving in the direction of “On Demand” or “Adaptive Computing”. The
philosophy behind this move is that systems should have the resources they need when they
need them and these resources should be present in a very flexible form such that they can
“come and go” dynamically. IBM offers dynamic LPAR functionality which allows CPUs
and memory to be dynamically moved to and from running systems. SAP is focusing on an
architecture in which systems can be easily and quickly moved from one server to another
should more resources be required, or new application servers can be spawned into production
to help unburden an overloaded system. The SAP architecture is based on server and storage
virtualization: a decoupling of the service from the server, and server from storage. IBM is
working with SAP to implement the SAP adaptive computing architecture using the rich
functionality of the IBM on-demand components and building blocks for virtualization. The
first major requirement for adaptive is the storage virtualization, and IBM offers two powerful
new products in this area:
The SAN Volume Controller, which decouples the servers “disk” from any
dependency on an actual physical storage location. The disk can be actually move
from one storage server to another during active production and the servers access
moves with it.

The SAN File-System, which provides a high performance shared file-system which
removes any obvious file-system to disk dependency and any need for the server to
own and activate disks. All application data is resident in the shared filespace, and the
server currently responsible for the application simply mounts the file-system.

These two products form the basis for the ongoing Adaptive Computing Initiative in the
ISICC and this test box provided the landscape in which to test the migration of a production
system to storage virtualization. The objective of this project thread was to answer the
following questions:

1 How do we migrate?
2 What do the steps cost in terms of performance and downtime?
3 What do we gain in new flexibility and functionality with our new infrastructure?

15.07.04 8:31 Page 48 of 57


ISICC Walldorf Technical Brief 23/07/04

Colgate
testbox Clone Clone Clone
SAN SW
3 fastT 4 fastT
image managed SANFS
5 SANFS
1 ESS 2 ESS
4 Hetergenous
SVC
Dirrect SVC 5 Migration to
CG FC
SANFS
3 Hetergenous CG FC
2 migration to svc
APO 3.0 LC 7.4.2

/ORACLE /SAPDB
Managed
ESS
ESS Image Disk Image Disk Disk
FastT600
FastT600
1 consistency group FC

Test
Point

POC activity

The diagram above depicts the steps taken to investigate the move to virtualization. Each of
the pink circles depicts a test point where comparison performance data was measured. Each
of the red arrows is a test-box activity. The following test scenarios are documented in detail
for this project:

1. Migration direct attached ESS to SVC attached ESS


ESS volumes are moved to SVC as “image disks”
Performance comparison of direct attached with SVC image disks
2. New Functionality: Cross Platform Flash-Copy
Flash-copy of ESS directly to FastT via SVC
3. Image-Disk to Manage-Disk migration
Migration of active production disks from original image to full function
managed disk. Performance comparison of imaged to managed disk.
4. New Functionality: Migrating active volumes
Active production volumes are migrated form one location to another on a storage
server and from one storage server to another.
5. Migration of JFS2 database systems to SANFS
Performance comparison
6. New Functionality: fileset multiple consistent flash-copy snapshots
Test consistent snapshots of running databases, prove restore consistency.

This project thread is covered in the document:


IBM/SAP Test-Box Whitepaper: “Moving Toward Adaptive Computing SAP on pSeries and
IBM TotalStorage” Walldorf June 2004

15.07.04 8:31 Page 49 of 57


ISICC Walldorf Technical Brief 23/07/04

Overview of Hardware Infrastructure


The Test-Box Landscape
Giga P2P
9.1.1.1 9.1.1.2
pSeries
p690 is02d1 is02d2 is02d4 is02d6 is02d8 is02d5
LPARS CG APO CG ERP APO3 HOT1 HOT2 APO2
TDP SVR

production 10.17.70.197
Network 10.17.70.183 10.17.70.184 10.17.70.186 10.17.70.188 10.17.70.190
10.17.69.199
10.17.69.94
10.17.70.37 ibmcc51
F80 Browser &
FC switch Monitor

san 10.17.4.102 10.17.4.106 10.17.4.108


console ess1c1 ess1c2 10.17.4.109
SVC Cluster1 Cluster2 Giga
Backup network
192.168.100.10
FasfT600
ess
192.168.100.5 ESS
console SANFS
Storage
Subsystem is02ssp

192.168.100.30 172.31.1.2
172.31.1.1
192.168.100.31
ESS enet 172.31.1.250

LTO Tape
192.168.100.40 SAN Data Library
GW
192.168.100.41

Servers
The server capacity was provided by a p690 with 32 CPUs at 1,3 GHz and 64 GB of memory.
The individual servers were implemented as dynamic LPARs. The dlpar functionality allowed
resources to be moved between the systems as required by the various performance tests. All
performance tests on SCM systems were done using 6 CPUs and 16 GB of memory. As only
64 GB memory was available, functional tests being done on parallel threads used a smaller
footprint until such a time as a performance test was required. During the height of the project
activity, 7 logical partitions were active simultaneously with 4 APO clones, an ERP and a hot-
standby liveCache cluster.

Cloning and Backup/Restore


The SCM system image was used for multiple tests during the test-box project. The SCM
system was cloned by means of the consistency group flash-copy functionality on the ESS
and then later on the SAN Volume Controller, which creates duplication images of the source
disks on separate target disks. These copy images were then activated as clone systems
attached to various lpars. Flash-copies were also used for backup and restore of the SCM
system. The SCM system required approximately 360 GB of storage (and therefore each
clone the same). The ERP system required 2 terabytes of storage and therefore no clone was
possible. The landscape did not have enough extra storage for a 2nd copy of the ERP. For the
ERP, a Tivoli Data Protection server was implemented which provided a database backup to a
SAN attached LTO tape library. This tape library, with two tape drives, was able to backup
and restore the ERP system in 9hrs. The SAP infrastructure backup via gigabit network
required 22hrs to do this.

15.07.04 8:31 Page 50 of 57


ISICC Walldorf Technical Brief 23/07/04

Software Versions
Operating System: AIX 5.2 ML2
SCM Database: Oracle 9.2.0.4.0
SCM liveCache: 7.5.0.12
SAN Volume Controller: 1.1.4

SAN Environment

SAN Volume Controller 1.1.4


SAN File System 2.1.0
AIX 5.2 ML02
IBM 2109-F16 Switches 3.1.1.C
FAStT
ESS
SAP APO 4.0 - 4.1
Oracle 9.2
LiveCache 7.4.3 – 7.5

Hardware Environment
Storage Systems
ESS Model F20
IBM ESS Storage subsystem is a proven platform for very high
availability , disaster tolerance, accessibility of mission-critical
information, high performance, flexibility, high scalability,
efficient managebility, data consolidation, and connectivity .
IBM ESS Storage subsystem used for these tests, was a model
ESS F20 with a capacity of more than 3 TB for data, 2 clusters,
internally build up with 16 Ranks and 36 GB drive, read and write
caches, fiber channel attachments and a number of software
functionality such as ' consistency group flashcopy' . The older
model ESS F20 used for the tests is not the fastest IBM ESS
model. The new model ESS 800 Enterprise Storage Server is
around twice as fast and is IBM’s most powerful disk storage server. Developed using IBM’s
Seascape architecture, the ESS 800 provides unmatchable functions for all the e-business
servers of the new IBM server brand.

FastT600
The FAStT600 is a mid-level storage server that can scale to over sixteen
eight terabytes of fibre channel disk. It uses the latest in storage networking
technology to provide an end-to-end 2 Gbps Fibre Channel solution. The
system used for this PoC supported the new FlashCopy with VolumeCopy,
a new function for a complete logical volume copy within the FAStT
Storage Server, and had storage capacity of 1.5 terabytes.

15.07.04 8:31 Page 51 of 57


ISICC Walldorf Technical Brief 23/07/04

SAN Volume Controller


The IBM TotalStorage SAN Volume
Controller is a scalable hardware and
software solution to allow aggregation of storage from different disk subsystems. It provides
storage virtualization and thus a consistent view of storage across a Storage Area Network
(SAN). The SAN Volume Controller provides in-band storage virtualization by creating a
pool of managed disks from attached back-end disk storage subsystems. These managed disks
are then mapped to a set of virtual disks for use by various host computer systems.

SAN FS, SAN File System


SAN File System is a multiplatform, robust, scalable, and
highly available file system, and is
storage management solution that works with Storage Area
Networks (SANs). It uses SAN
technology, which allows an enterprise to connect a large number of computers and share a
large number of storage devices, via a high-performance network.
With SAN File System, heterogeneous clients can access shared data directly from large,
high-performance, high-function storage systems, such as IBM TotalStorage Enterprise
Storage Server (ESS) and IBM TotalStorage SAN Volume Controller (SVC). The SAN File
System is built on a Fibre Channel network, and is designed to provide superior I/O
performance for data sharing among heterogeneous computers.

IBM Ultrium External Tape Library


The IBM 3581 is an excellent solution for businesses looking for
high density and performance tape autoloader storage in
constrained rack or desktop environments. It is designed to
provide autoloading capabilities with IBM LTO Ultrium 2 tape
drives, including 2 GB Fibre Channel and LVD 160 SCSI. Optional features are available and
designed to help enable the operation of the autoloader as a small library.

Server System
pSeries p690 High-End Unix Server
8-32way UNIX/Linux Server of the enterprise class offering high-
end performance and reliability as well as on-demand
functionality. The server can be used as a large SMP or logically
portioned into multiple logical systems. The p690 supports
dynamic logical partitioning which allows resources to be
reconfigured in running partitions: adapters, storage, and CPUs
can be moved between partitions without disrupting running
applications.

Processor: POWER4
CPU Speed: 1,3 / 1,9
System Memory: 8GB / 1TB
Internal Storage: 72,6 GB / 18,7 TB

15.07.04 8:31 Page 52 of 57


ISICC Walldorf Technical Brief 23/07/04

APPENDIX
RTE_config FIle
########## RTEHSS_config.txt ##########
# set environment vatriables req. by
# RTEHSS_init()
# created by Oliver Goos (oliver.goos@de.ibm.com)
# created on 05/19/03
#######################################

# Copy Server Services


# choose either FC or PPRC
CSmode FC
#CSmode PPRC

# OS specific install path for ibm2105cli or ibm2145cli


# Ibmclidir /usr/opt/ibm/ibm2105cli
Ibmclidir /usr/opt/ibm/ibm2145cli
# HomeDir of SAP live cache utils
Ibmsapapodir /usr/opt/ibm/ibmsap

# Master liveCache Server


## log Volume
MlCLogVoldev # raw device of logical volume
## Data Volume
MlCDataVoldev
## Case of SVC vdisk_id or vdisk_name
MlCLogVdiskID 68
MlCDataVdiskID 67

# 1st Standby liveCache Server


## log Volume
SlCLogVoldev # raw device of logical volume
## Data Volume
SlCDataVoldev
## Case of SVC vdisk_id
SlCLogVdiskID 68
SlCDataVdiskID 27

# 2nd Standby liveCache Server


## log Volume
SSlCLogVoldev # raw device of logical volume
## Data Volume
SSlCDataVoldev
## Case of SVC vdisk_id
SSlCLogVdiskID
SSlCDataVdiskID

# Copy Server
## IP adress
CSaIP 192.168.100.10
## User ID (admin)
CSaUID sdb_rsa
## User password, blank for SVC
CSapwd

# Copy Server backup


## IP adress
CSbIP 192.168.100.10

15.07.04 8:31 Page 53 of 57


ISICC Walldorf Technical Brief 23/07/04

# list all HSS_NODE_00x in this section (max. 3)


HSS_NODE_001 enhot2
HSS_NODE_002 enhot1
HSS_NODE_003

# copy server tasks


# specify task name which is used to copy data volume from current MASTER
(HS_NODE_00x) to requesting STANDBY (HS_NODE_00y)
EstDataCST_001_002 sapdat_1_2
EstDataCST_001_003
EstDataCST_002_001 sapdat_2_1
EstDataCST_002_003
EstDataCST_003_001
EstDataCST_003_002

# in case of remote copy / PPRC


# specify task name which is used to copy log volume from current MASTER
(HS_NODE_00x) to requesting STANDBY (HS_NODE_00y)
EstLogCST_001_002
EstLogCST_001_003
EstLogCST_002_001
EstLogCST_002_003
EstLogCST_003_001
EstLogCST_003_002

TermDataCST_001_002 sapdat_1_2
TermDataCST_001_003
TermDataCST_002_001 sapdat_2_1
TermDataCST_002_003
TermDataCST_003_001
TermDataCST_003_002

TermLogCST_001_002
TermLogCST_001_003
TermLogCST_002_001
TermLogCST_002_003
TermLogCST_003_001
TermLogCST_003_002

LiveCache Parameters
KERNELVERSION KERNEL 7.5.0 BUILD 012-123-071-164
INSTANCE_TYPE LVC
MCOD NO
_SERVERDB_FOR_SAP YES
_UNICODE YES
DEFAULT_CODE ASCII
DATE_TIME_FORMAT INTERNAL
CONTROLUSERID CONTROL
CONTROLPASSWORD
MAXLOGVOLUMES 2
MAXDATAVOLUMES 11
LOG_BACKUP_TO_PIPE NO
MAXBACKUPDEVS 2
BACKUP_BLOCK_CNT 8
LOG_MIRRORED NO
MAXVOLUMES 14
_MULT_IO_BLOCK_CNT 8

15.07.04 8:31 Page 54 of 57


ISICC Walldorf Technical Brief 23/07/04

_DELAY_LOGWRITER 0
LOG_IO_QUEUE 50
_RESTART_TIME 600
MAXCPU 20
MAXUSERTASKS 50
_TRANS_RGNS 8
_TAB_RGNS 8
_OMS_REGIONS 8
_OMS_RGNS 33
OMS_HEAP_LIMIT 0
OMS_HEAP_COUNT 1
OMS_HEAP_BLOCKSIZE 10000
OMS_HEAP_THRESHOLD 100
OMS_VERS_THRESHOLD 2097152
HEAP_CHECK_LEVEL 0
_ROW_RGNS 8
_MIN_SERVER_DESC 21
MAXSERVERTASKS 21
_MAXTRANS 292
MAXLOCKS 2920
_LOCK_SUPPLY_BLOCK 100
DEADLOCK_DETECTION 0
SESSION_TIMEOUT 900
OMS_STREAM_TIMEOUT 30
REQUEST_TIMEOUT 180
_IOPROCS_PER_DEV 2
_IOPROCS_FOR_PRIO 0
_USE_IOPROCS_ONLY NO
_IOPROCS_SWITCH 2
LRU_FOR_SCAN NO
_PAGE_SIZE 8192
_PACKET_SIZE 36864
_MINREPLY_SIZE 4096
_MBLOCK_DATA_SIZE 32768
_MBLOCK_QUAL_SIZE 16384
_MBLOCK_STACK_SIZE 16384
_WORKSTACK_SIZE 8192
_WORKDATA_SIZE 8192
_CAT_CACHE_MINSIZE 262144
CAT_CACHE_SUPPLY 3264
INIT_ALLOCATORSIZE 245760
ALLOW_MULTIPLE_SERVERTASK_UKTS YES
_TASKCLUSTER_01 tw;al;ut;100*bup;10*ev,10*gc;
_TASKCLUSTER_02 ti,100*dw;3*us,2*sv;
_TASKCLUSTER_03 equalize
_DYN_TASK_STACK NO
_MP_RGN_QUEUE YES
_MP_RGN_DIRTY_READ YES
_MP_RGN_BUSY_WAIT YES
_MP_DISP_LOOPS 2
_MP_DISP_PRIO YES
XP_MP_RGN_LOOP 0
MP_RGN_LOOP 100
_MP_RGN_PRIO YES
MAXRGN_REQUEST 3000
_PRIO_BASE_U2U 100
_PRIO_BASE_IOC 80
_PRIO_BASE_RAV 80
_PRIO_BASE_REX 40
_PRIO_BASE_COM 10
_PRIO_FACTOR 80

15.07.04 8:31 Page 55 of 57


ISICC Walldorf Technical Brief 23/07/04

_DELAY_COMMIT NO
_SVP_1_CONV_FLUSH NO
_MAXGARBAGE_COLL 10
_MAXTASK_STACK 1500
MAX_SERVERTASK_STACK 100
MAX_SPECIALTASK_STACK 100
_DW_IO_AREA_SIZE 50
_DW_IO_AREA_FLUSH 50
FBM_VOLUME_COMPRESSION 50
FBM_VOLUME_BALANCE 10
_FBM_LOW_IO_RATE 10
CACHE_SIZE 100000
_DW_LRU_TAIL_FLUSH 25
XP_DATA_CACHE_RGNS 0
_DATA_CACHE_RGNS 32
CONVERTER_REGIONS 8
MAXPAGER 32
SEQUENCE_CACHE 1
_IDXFILE_LIST_SIZE 2048
_SERVER_DESC_CACHE 74
_SERVER_CMD_CACHE 22
VOLUMENO_BIT_COUNT 8
OPTIM_MAX_MERGE 500
OPTIM_INV_ONLY YES
OPTIM_CACHE NO
OPTIM_JOIN_FETCH 0
JOIN_SEARCH_LEVEL 0
JOIN_MAXTAB_LEVEL4 16
JOIN_MAXTAB_LEVEL9 5
_READAHEAD_BLOBS 25
RUNDIRECTORY /sapdb/data/wrk/HOTSVC
OPMSG1 /dev/console
OPMSG2 /dev/null
_KERNELDIAGFILE knldiag
KERNELDIAGSIZE 800
_EVENTFILE knldiag.evt
_EVENTSIZE 0
_MAXEVENTTASKS 1

_MAXEVENTS 100
_KERNELTRACEFILE knltrace
TRACE_PAGES_TI 2
TRACE_PAGES_GC 20
TRACE_PAGES_LW 5
TRACE_PAGES_PG 3
TRACE_PAGES_US 10
TRACE_PAGES_UT 5
TRACE_PAGES_SV 5
TRACE_PAGES_EV 2
TRACE_PAGES_BUP 0
KERNELTRACESIZE 916
_AK_DUMP_ALLOWED YES
_KERNELDUMPFILE knldump
_RTEDUMPFILE rtedump
_UTILITY_PROTFILE dbm.utl
UTILITY_PROTSIZE 100
_BACKUP_HISTFILE dbm.knl
_BACKUP_MED_DEF dbm.mdf
_MAX_MESSAGE_FILES 64
_EVENT_ALIVE_CYCLE 0
_SHMCHUNK 256

15.07.04 8:31 Page 56 of 57


ISICC Walldorf Technical Brief 23/07/04

_SHAREDDYNDATA 100301
_SHAREDDYNPOOL 22653
_SHMKERNEL 833002
LOG_VOLUME_NAME_001 /dev/rLCLOG1
LOG_VOLUME_TYPE_001 R
LOG_VOLUME_SIZE_001 262144
DATA_VOLUME_NAME_0001 /dev/rLCDATA1
DATA_VOLUME_TYPE_0001 R
DATA_VOLUME_SIZE_0001 1048576
DATA_VOLUME_GROUPS 1
__PARAM_CHANGED___ 0
__PARAM_VERIFIED__ 2004-05-19 14:31:52
DIAG_HISTORY_NUM 2
DIAG_HISTORY_PATH /sapdb/data/wrk/HOTSVC/DIAGHISTORY
_DIAG_SEM 1
SHOW_MAX_STACK_USE NO
LOG_SEGMENT_SIZE 87381
SUPPRESS_CORE YES
FORMATTING_MODE PARALLEL
FORMAT_DATAVOLUME YES
OFFICIAL_NODE HOTLC
LOAD_BALANCING_CHK 0
LOAD_BALANCING_DIF 10
LOAD_BALANCING_EQ 5
HS_STORAGE_DLL libHSSibm2145
HS_NODE_002 ENHOT1
HS_NODE_001 ENHOT2
HS_DELAY_TIME_002 0
HS_DELAY_TIME_001 0
HS_SYNC_INTERVAL 10
USE_OPEN_DIRECT NO
SYMBOL_DEMANGLING NO
EXPAND_COM_TRACE NO
JOIN_OPERATOR_IMPLEMENTATION YES
JOIN_TABLEBUFFER 128
SET_VOLUME_LOCK NO
SHAREDSQL YES
SHAREDSQL_EXPECTEDSTATEMENTCOUNT 1500
SHAREDSQL_COMMANDCACHESIZE 32768
MEMORY_ALLOCATION_LIMIT 0
USE_SYSTEM_PAGE_CACHE YES
USE_COROUTINES NO
USE_STACK_ON_STACK NO
USE_UCONTEXT YES
MIN_RETENTION_TIME 60
MAX_RETENTION_TIME 480
MAX_SINGLE_HASHTABLE_SIZE 512
MAX_HASHTABLE_MEMORY 5120
HASHED_RESULTSET NO
HASHED_RESULTSET_CACHESIZE 262144
AUTO_RECREATE_BAD_INDEXES NO

15.07.04 8:31 Page 57 of 57

Вам также может понравиться