Академический Документы
Профессиональный Документы
Культура Документы
BMC
APPLICATION
RESTART
CONTROL
A return on investment analysis
WWW.OVUM.COM
WWW.OVUM.COM
WHITE PAPER
Contents
Executive summary .......................................................................................................................................................... 3
Introduction ....................................................................................................................................................................... 4
Methodology .................................................................................................................................................................... 4
The business problem ...................................................................................................................................................... 4
The BMC solution APPLICATION RESTART CONTROL ............................................................................................. 6
BMC APPLICATION RESTART CONTROL.................................................................................................................... 6
AR/CTL for DB2............................................................................................................................................................... 7
AR/CTL for IMS ............................................................................................................................................................... 7
AR/CTL for VSAM ........................................................................................................................................................... 7
Customer experiences...................................................................................................................................................... 8
Customers general and site-specific assessment of BMCs tools and support ............................................................... 8
BMC APPLICATION RESTART CONTROL return on investment................................................................................ 10
Performance savings ..................................................................................................................................................... 11
Problem resolution savings............................................................................................................................................ 11
Savings in staff utilization .............................................................................................................................................. 12
Increased systems availability ....................................................................................................................................... 12
Migration savings........................................................................................................................................................... 13
Risk mitigationn/business value..................................................................................................................................... 13
Analysis and conclusion ................................................................................................................................................ 15
Appendix.......................................................................................................................................................................... 16
Supporting customer evidence on the value and capabilities of BMC APPLICATION RESTART CONTROL............... 16
Published 09/2010
Page 2
WWW.OVUM.COM
WHITE PAPER
EXECUTIVE SUMMARY
The batch environment is still of major importance in delivering mainframe-based, mission-critical business
applications. As transaction rates and volumes have increased, its capacity to cope with the updates often
required to support important online environments has been squeezed. Being able to run batch applications in
parallel and concurrently with online applications has become vital for many organizations. In order to do that,
the batch activities need to avoid contention and locking out other processes. When problems arise they need
to be resolved effectively and automatically and that is why BMC developed APPLICATION RESTART
CONTROL (AR/CTL), for IMS, DB2, and VSAM and to coordinate the restart and recovery of batch processes
that abended or were causing other problems. At the core of BMC AR/CTL is the way it manages checkpoint
strategies helping to automate the insertion of checkpoints and pacing their use so that no matter how many
checkpoints are issued only a sensible number of them are actually processed. This saves massive amounts
of CPU time whilst ensuring that there are secure points from which to restart programs rather than restarting
from the beginning.
In this paper we approached a number of major BMC customers to see how valuably they regarded BMC
AR/CTL and to monetize their experiences to determine a potential return on investment (ROI) for those that
have yet to adopt the technology.
We found that the different sites often used the wide-ranging capabilities of AR/CTL in different ways, some
not using features that were of critical value to others such as the virtual sequential access method (VSAM)
support, for example. However, the universal response was that BMC AR/CTL was a critical element in
enabling their online systems to maintain the service levels required, and that without it the batch stream
would often be unable to process the background updates required to support it.
The ROI calculations were based on anecdotal and measured evidence from the customers and from our
own judgment on what would be realistic to expect for a significant site taking on BMC AR/CTL if they had
some or all of the problems that the sites in question had before its adoption.
We looked at the following areas and gave our assessment of what we felt was a realistic possible return
based on what these companies had told us.
$250,000
$150,000
$1,000,000
$2,000,000
$1.500,000
$1,000,000
We cant say that all sites would get close to a $4 million payback (assuming they dont require the VSAM
element) through the adoption of BMC AR/CTL and clearly many will have resolved or avoided the problems
in other ways, but we would say that BMCs solution was regarded as indispensible by these sites, that they
found it of immense value, and that the support given by BMC was found to be exceptionally good.
Published 09/2010
Page 3
WWW.OVUM.COM
WHITE PAPER
INTRODUCTION
METHODOLOGY
To better understand how BMC AR/CTL provides business value to those that make it a key part of their
mainframe systems management strategy, Ovum undertook interviews with a number of users of these
products. These interviews covered companies in a wide range of industries including a medical lab, a
government agency, logistics, and financial and insurance services.
The code may have been tuned for a generation of mainframes that had significantly less processing
power than is currently available, such that the checks and controls built into them are no longer
processing efficiently.
They may have been built for earlier generations of operating environment that in an ideal world one
would wish to migrate from them, but the cost of such migration and redevelopment might be prohibitive.
The scale of work undertaken by these batch programs may have grown exponentially over the years
such that each batch run is now performing a massive number of important updates and takes a
considerable time to run. Restarting them from the beginning in the event of a problem may simply be
impossible given the available batch capacity.
The need for 24/7 online applications that access or share information that is updated by batch programs
dictates that there will be a conflict of interest in managing the locks and the access to that information
which needs to be very carefully managed.
The temptation may be to leave the environment alone and run these programs as they have always run;
maybe overnight in a batch window or at least in an environment that isolates problems from your main online
systems. In the past you may have had on call supervisors/analysts that would notice an abended batch
process and have the appropriate knowledge regarding the best way to restart it, or you may have waited
until the morning to re-run failed programs. Either way this risked holding up important updates and was a
waste of time and resources.
Figure 1:
Delayed restart
Source: BMC
OVUM
Published 09/2010
Page 4
WWW.OVUM.COM
WHITE PAPER
Figure 1 makes the point that the time delay involved in manually resolving an abend and restarting from the
beginning is wasteful. For example, if the think and recover time was seven hours then a five-hour job that
abended after four hours has just become a sixteen-hour job once completed.
This time delay may well be acceptable in some sites. But if the batch processes perform critical updates
necessary for the smooth running of your business they may be totally unacceptable: for example, if you need
these to have competed before you can run your critical online systems, or if you are running these
concurrently with the online environment and may be increasing the risk of further lock contention. and other
clashes.
So what mechanisms exist to help resolve these issues or avoid them?
When a batch job fails, it must be restarted at the point of failure or at the beginning of the job (after recovery
of affected databases and files). It is not practical to back out everything and start from the beginning because
backing out updates can take twice as long as running the application. Nor is it practical to restart from the
wrong point, then back out errors and start all over again. The only way to restart at the point of failure is to
ensure that the application takes periodic checkpoints of the contents of application working storage areas
and knows the position of flat files so that they can be repositioned correctly at restart.
Taking too many checkpoints wastes CPU resources, but taking too few lengthens the time required to
recover the failed job. So there is an optimal level to the number of checkpoints that you would want to take,
and indeed this might vary during the course of the day and night.
You also have to ensure that you have checkpoints that cover all the applications and subsystems that might
cause contentions or problems including IMS and DB2.
If you get the balance right and have checkpoints that cover all the different potentially conflicting areas then
you will be able to back out only a minimum of updates or put a hold on a process whilst the conflicts are
removed and then restart, or reattach and restart, without wasting time or effort. Most importantly, as this can
be done automatically, you are not then waiting on manual supervisor intervention.
Figure 2:
Automated restart
Source: BMC
OVUM
Figure 2 shows the much more comfortable situation where an abend was correctly checkpointed and the
batch five-hour application was able to be automatically restarted from checkpoint 15. This job that failed after
four and a half hours may well have completed in a little over six hours.
Published 09/2010
Page 5
WWW.OVUM.COM
WHITE PAPER
Automatic restart checkpoint selection ensures integrity and shortens the restart time by determining
which checkpoint to restart from.
Application working storage checkpointing can capture and restore an application programs working
storage areas in main memory. This allows the program to resume processing at the last checkpoint. It
can also capture and restore saved areas of virtual storage for subprograms executing under the main
program.
Application reattach improves the operational stability of many application environments by providing
automation to react to certain types of abend conditions. Abends often result from lock contention. Often
the resolution to these conditions is well understood and can be resolved automatically. This makes it
possible to schedule update processes to run in parallel rather than serially because when locks occur
they do not cause significant difficulties.
Checkpoint and restart coordination for DB2, IMS, and CICS/VSAM restarts so that there is a
synchronized restart of batches that touch multiple environments.
Automatic checkpoint simplifies and speeds the process of implementing checkpoint/restart logic into
application programs.
Program exception handling automatically redirects bad input data that causes S0C7 abends into a
reject file and lets the application continue. Redirected records can be cleaned up later and resubmitted.
Flat files automatically manages flat files and ensures that the contents of the files are synchronized
with database activity when checkpoints are issued. During restart processing, the files are automatically
repositioned to their state as of the latest checkpoint.
Suspend and resume processing to obtain a point of consistency required for reorganization or
recovery with the following products by forcing a well-organized abend and then restarting the following
batch processes when contention is no longer a problem:
o
Published 09/2010
Page 6
WWW.OVUM.COM
o
WHITE PAPER
SQL return code handling can intercept a defined SQL return code received during application program
processing and issue a user-defined user abend code and reason code. This can be used to standardize
911 processing throughout an entire application environment.
Cursor repositioning most checkpoint restart solutions can effectively save working storage, but
AR/CTL for DB2 can return the application to the proper position within the cursor. This removes the need
to add logic to your DB2 applications to track and store the cursor position for use in a checkpoint restart.
Batch attachment facility performs the attachment to DB2 on behalf of the application and can run in an
attach only mode to provide the DB2 attach facility for programs not using checkpoint/restart services.
Restart with no code changes fully supports and enhances the IMS Extended Restart facility,
requires no application code or JCL changes, and eliminates the need to change application code to call
a third-party restart program.
Flat file management supports and manages IMS generalized sequential access method (GSAM)
files and native file techniques; there is no need to convert flat files to GSAM.
Checkpoint management externally filters excessive checkpoint activity to provide significant savings
in elapsed time and CPU consumption. Many legacy applications were developed to run on slower
processors and the checkpoint intervals were never recalibrated for hardware upgrades.
Database recovery control (DBRC) conversion aid can automatically provide a logging environment
to avoid having to retrofit data language interface and job control language scripts (DL/I JCL) when
converting an application to run under DBRC.
Local VSAM access services for VSAM data sets these are accessed exclusively by a batch VSAM
application program; provides checkpoint support and automatic backout support for VSAM files.
VSAM file sharing supports remote VSAM file sharing between batch applications and CICS regions
executing on the same or different z/OS images. This allows batch application programs to update
VSAM files while they are online to CICS and in full update mode, and makes it possible to avoid
converting a VSAM file to DB2 or IMS to provide 24x7 type access to the file.
Published 09/2010
Page 7
WWW.OVUM.COM
WHITE PAPER
CUSTOMER EXPERIENCES
CUSTOMERS GENERAL AND SITE-SPECIFIC ASSESSMENT OF BMCS TOOLS
AND SUPPORT
Ovum has a high regard for BMCs mainframe management solutions, and this is clearly supported by their
client base.
Figure 3:
OVUM
Figure 3 shows the customers average assessment of the overall quality of the BMC mainframe
management tools and their quality of support. They were asked to score this on a scale where 1 represents
poor quality or support and 5 represents excellence.
Published 09/2010
Page 8
WWW.OVUM.COM
Figure 4:
WHITE PAPER
OVUM
To create Figure 4 we asked the customers to review the specific usage of the features of BMC AR/CTL at
their own sites and to make an assessment of the relative value to them based on the following guidelines: if
they didnt use the feature we scored it as a 1; if they did use the feature then if it was of little use they should
score it 2 and if it was regarded as extremely important and useful to the site they should score it 5. A
maximum of 25 would then be possible for a site that used and found valuable every feature of AR/CTL that
we identified in the list.
You can see from the diagram that only one site made extensive use of the VSAM support features and that
they rated this highly. The main strengths of the solution were what we might have expected, in that it
reduces the elapsed time needed to run batch jobs and increased availability of the online services.
What also became clear from these scores is that AR/CTL also reduced the effort and problems caused by
the sort of issues that it was developed to fix, reducing the burden on staff and making the response to
problems much faster and easier.
Published 09/2010
Page 9
WWW.OVUM.COM
Figure 5:
WHITE PAPER
OVUM
Taking these figures and averaging them provides the summarized view of site-specific value given in Figure
5. We stress again that this is a chart that shows the relative value of the features to these sites rather than
an analysts technical evaluation of their particular capabilities. It is clear from the results that although the
VSAM support is not particularly useful to most sites although invaluable to those that need it because it
removed the need for an expensive migration the general set of features is highly regarded and seen to
offer significant value. This is something we will now explore in the next section of this paper.
Perhaps we should finish this section by quoting the software support manager at the government agency
who said I have loved this product ever since we have had it. Its really easy to maintain and update and
because I was one of the people who brought it in I feel that I know it inside out.
Published 09/2010
Page 10
WWW.OVUM.COM
WHITE PAPER
PERFORMANCE SAVINGS
Michael Pope at Safeco has some 6,000 batch jobs run daily that are registered with BMC AR/CTL. For
every thousand checkpoints requested the pacing mechanism maybe executes only ten. The average
number of checkpoints requested is roughly 5,000 per batch. Without AR/CTL they would obviously have
worked hard to reduce the checkpoints some other way, but we think it fair to suggest that they may have got
the number down to 50 rather than ten and that AR/CTL saving them at least 40 checkpoints per thousand
requested per batch run. Michael said that checkpoints were sub second and we will take 0.1 of a second for
this estimate. We will also take a conservative estimate of cost for the CPU usage.
Cost of CPU = $0.15 per second
The value of CPU time saving per day through the use of the pacing function is thus:
Number of jobs run * number of thousands of checkpoint pre-pacing * saving per thousand * time
to checkpoint * cost per second of CPU
6,000 * 5 *40 * 0.1 * 0.15 = $18,000 per day or $5,400,000 per annum
The medical laboratory had measured their actual CPU savings and said that the cost savings were:
Number of processes run per day * savings through pacing per process in minutes * 60 * CPU cost
per second
29 * 12 * 60 * 0.15 = $3,132 or $939,600 per annum
Published 09/2010
Page 11
WWW.OVUM.COM
WHITE PAPER
Michael Pope at Safeco his major site gets 200 problems a day that are resolved automatically by
reattaches. Michael said I can only say that if each of these was an incident i.e., a job failure with on-call
support and action then reattach is providing significant savings in time and effort. We suggest that in his
case it may be many times the cost experienced at Hermes.
Published 09/2010
Page 12
WWW.OVUM.COM
WHITE PAPER
MIGRATION SAVINGS
Only one of the sites we questioned had used the VSAM/DB2 capabilities of BMC AR/CTL to enable them to
run VSAM-based applications against DB2 without the need to migrate those environments and re-write the
programs.
The medical laboratory in question suggested that It would be a huge effort to convert to DB2 and would
need outside help. There are 3 million lines of code which would cost well over $1 million and the database
conversion would be additional certainly the full cost would be over $2 million.A typical migration saving for
a site with 3 million lines of code might be in the order of $2 million. So although this is an issue that is rather
specialized, it has enormous value to the sites where such savings are applicable.
Published 09/2010
Page 13
WWW.OVUM.COM
WHITE PAPER
In all the calculations discussed here its often difficult to put a value on the opportunity cost, or on the
business value lost of the reduced availability or complete loss of the companies critical applications.
According to several analyst houses, including Ovum Enterprise IT, on average, businesses lose between
$84,000 and $108,000 for every hour of system downtime, and according to Dunn & Bradstreet, 59% of
Fortune 500 companies experience a minimum of 1.6 hours of downtime per week.
A more detailed recent study gives typical hourly cost of downtime by industry for those areas that suffer the
most critical losses. Clearly these larger figures are for a total collapse in service availability.
Brokerage service
$6.48 million
Energy
$2 .80 million
Telecom
$2 .00 million
Manufacturing
$1.60 million
Retail
$1.10 million
Healthcare
$0.64 million
Media
$0.90 million
Average value
$2.22 million
Sources: Network Computing, the Meta Group, and Contingency Planning Research. All figures in US dollars.
Many of the customers gave clear recognition to the fact that failure to process the batch work would have
severe consequences. This might involve damage to their capacity to make critical business decisions
through a delay in the production of reports, disrupting business-critical online services, or delaying the
collection of payments. There was no doubt, in those we questioned, that the introduction of BMC AR/CTL
has reduced that likelihood considerably. If we use that Dunn & Bradstreet figure for actual downtime and
suggest that without BMC AR/CTL that downtime might be increased by as little as 1% then taking an
average for the industries from the table above we get a value for the reduced risk per year (when using BMC
AR/CTL) of:
Average cost of downtime per hour * number of hours reduced per week * number of weeks in a
year
= $2.22 * (1.6 * 1/100) * 52 = $1.847 million
Published 09/2010
Page 14
WWW.OVUM.COM
WHITE PAPER
Area
Specific example
Performance savings
Safeco
$5,400,000
Medical laboratory
$939,600
Government agency
$468,000
Hermes
Savings in staff utilization
Potential value/saving
per annum
$31,200
Medical laboratory
$239,000
Hermes
$150,000
$365,000
Hermes
Migration savings
Medical laboratory
Source: BMC
$14,400,000
$2,000,000
$369,855
$1,847,000
OVUM
The table above summarizes the calculations that we have made based on the anecdotal evidence provided by
the clients and our analysis of the impact that BMC AR/CTL is likely to make on each possible area of savings.
Performance savings are going to vary enormously between sites. But all those we questioned said that they
made significant savings and it would not be unreasonable to expect a saving of $1 million at a major
installation based on the evidence we were given.
We calculated the value of problem resolution in man effort rather than business impact in order not to double
count the impact of BMC AR/CTLs contribution to alleviating downtime or enabling higher systems availability
but it still showed a value of something like $250,000 on average.
Savings in staff utilization were almost unanimously in step with the view that they were saving at least a
single if not two full-time equivalents (FTEs) in headcount and thus it is easy to justify a value of $150,000 for
this saving.
Increased systems availability was harder to determine as most systems managers did not have a firm grasp
of the value per hour of their systems to the business although they all suggested that BMC ARC/CTL was a
valuable contributor to improving their availability. However one customer had experienced measurable
savings that resulted in a figure of $14.4 million per annum because BMC AR/CTL was the major enabler of
their ability to extend their online availability. We think it would be safer to suggest that on average
organizations may gain at least $1 million in additional value from higher online availability.
The avoidance of migration issues was worth over $2 million to one site. We think that this is tremendously
valuable to those for whom it is relevant but we will not assume that most would gain any benefit from this.
Risk mitigation was, however, clearly a valuable factor in the use of BMC AR/CTL particularly for sites where
the volume of transactions continued to grow and the pressure and risk of failure had been mounting on them.
We think that to say that BMC AR/CTL had a value of $1.5 million in reducing this risk would be a fair
assessment.
Published 09/2010
Page 15
WWW.OVUM.COM
WHITE PAPER
In conclusion then, a typical major installation might see a return on investment approaching $4 million from
the use of BMC AR/CTL if they had issues similar to those found on the sites of the clients that we
interviewed.
APPENDIX
SUPPORTING CUSTOMER EVIDENCE ON THE VALUE AND CAPABILITIES OF
BMC APPLICATION RESTART CONTROL
BMCs clients proved to be quite voluble on both the general and specific value they saw from the use of
BMC AR/CTL. They also talked about using those tools in conjunction with other BMC products. We will
concentrate on the main product set in review here but will also mention other products as they seem
relevant.
Performance savings
At the Safeco arm of Liberty Mutual there are two mainframes with IMS and DB2 using BMC AR/CTL.
Hundreds of job/steps are registered with AR/CTL thats usage is standard with IMS BMPs and DB2
processing.
Performance metrics are not kept but Michael Pope of the database services team said We do run a daily
audit report on pacing statistics and on many days have over 6000 entries in both test and production, with
most showing pacing saves between 90 and 100 % of checkpoints requested.
For example, the number on the right is actual completed checkpoints for the applications described.
BMC150165I
BMC150165I
BMC150165I
BMC150165I
188
2
88
Michael said that Safeco didnt use checkpoint processing before AR/CTL so in some ways we are using
more CPU cycles. Whenever there is a unit of work the programmers are instructed to request a checkpoint.
If there are 900,000 units of work there are likely to be 900,000 checkpoint requests. Multiple pacing
parameters are used to protect the online environments to ensure that there are not a large number of locks
on at any given time. Pacing is based on CPU time, number of calls, number of updates etc. and typically
only a small % of checkpoints are therefore applied ( maybe as little as 1% ).
Michael said that We would have difficulty running jobs with all these checkpoints. The slower an application
runs the more checkpoints get requested extending the runtime, and conversely the faster it runs the
overhead is reduced as less checkpoints actually get applied. We are saving a dramatic number of possible
checkpoints by using the pacing capabilities available with this product.
Michael estimated that a checkpoint was a sub second activity but we have not seriously measured what the
real CPU savings are. Without pacing developers would use an alternative method to reduce the number of
checkpoints, the benefits to AR/CTL are simplicity and a common programming interface.
Published 09/2010
Page 16
WWW.OVUM.COM
WHITE PAPER
An insurance company running a huge IMS site has recently turned on pacing on for a fairly significant
amount of their IMS BMP workload. A BMP is a mix that can process files, but the databases, buffering and
log belong to the control region enabling the running of batch activity concurrent with online systems. They
measured CPU usage before and after they turned on pacing and have been able to identify a 25-30% CPU
reduction over all the jobs that are running within that workload. Their programs were issuing an average of
four to five checkpoints each second. By moving that interval out to one checkpoint per second they were
able to achieve this significant saving. BMC projects that they will get some more incremental benefit by
eventually moving the interval out to a three second interval, but they are already in the area of diminishing
returns just going to a one-second interval because that removed 75% of the checkpoints and the associated
overhead.
The systems manager at the medical laboratory said that checkpoint pacing reduces checkpoints from 94.35
% of the issued checkpoints on his site. We dont have to hit the catalog all those times; the checkpoints are
automated not explicit. Pacing is ten-second time (for example in one run we issued 194 actual checkpoints
through AR/CTL).
Although it would commit over 1,000 times, total time would have been 19 minutes and with AR/CTL it took
seven minutes. We have 29 similar processes on a daily basis (each saving 12 minutes) and a total of 43
registered under AR/CTL that also run at weekends. Daily saving is then 348 minutes (5.7 hours!).
He summarized by saying Its a wonderful tool for the optimization of our batch system.
Problem resolution savings
Michael Pope of Safeco knows that the automated restart facilities are saving him considerable effort post
abends. There are as many as 200 reattaches in a given day. Typically a developer uses re-attach to reset
conditions where a delay of a few seconds can clear the issue. Without reattach developers would design
differently or else they would be called out in the middle of the night. It clearly is added value to detect and
delay and in almost all cases resolve the issue with this capability.
In the IMS world an abend without checkpoints requires a failed job step to bac kout all the way to the
beginning of the job step, rather than reprocessing from the last completed checkpoint. There are
fundamental time savings in the work already processed being saved. And if each of these became an
incident (job failure with on call support and action) then reattach is providing significant savings in time and
effort.
The software support manager at a government agency uses BMC AR/CTL with manual checkpointing for
their COBOL and Natural applications against ADABAS databases. 300 of their critical batch programs out of
a total of 1,500 are registered with BMC AR/CTL.
The software manager says We often update two different databases in one program so restarts need to
realign the databases from the last commit, otherwise wed have to back out everything to the beginning. We
get about 15 abends a week due to different things such as messing with JCL, or programs not being
registered, or even if there is an empty data set where they might want an orderly abend because that tells
them what to run next. Without BMC AR/CTL the critical abends would require a lot of human intervention and
analysis to resolve. Typically, wed need a lot of analysis to see where we were in the processing because its
very complicated to work out where we are. We have to go back to where we were and look at locks from that
point. It would take a good six hours to recover from one critical program that abends.
She said Tons of senior and expensive people are involved and it wastes a lot of CPU time and also ruins
the batch window because we then have to work out what cant run if we are to run our online systems.
Published 09/2010
Page 17
WWW.OVUM.COM
WHITE PAPER
Petra Kopp at Euler Hermes Kreditversicherungs-AG said AR/CTL can suspend a batch program if it is
doing something crazy like looping. We are able to suspend it in an elegant way through an interrupt facility,
back it out, and then resolve and restart the program. Most time we do this is when a batch job is running
over hours. This often happens due to DB2 database growth because the indexes werent set up for that
size. We may need to rebind them, or program changes make the access paths different. This happens
mostly with the Data Warehouse applications that are growing. It happens quite a lot there as they often need
to pull in other databases; this implies changes to access paths. Its happening about five times a week and
the time saved by being able to make the change and restart from an elegant checkpointed restart is an
average of 1 hour each time.
Petras final comment here was AR/CTL has definitely had a visible impact on our performance and the
degree of satisfaction with our department in the eyes of the business.
The central systems database leader of the global courier said Automated restart for 777s abends is
used extensively; the operators would have to have dealt with restarts manually before we put this in. These
are either cross-system between DB2 and IMS or within the IMS system. Errors like this used to delay
execution of several of our most critical jobs, as it would require manual intervention. This is not so much
about staff productivity, but the fact is that once it has failed it does nothing until someone has the time to go
in and deal with it. If it restarts automatically we dont have to worry about it unless it fails repeatedly (we have
this set to three which is rare).
Staff utilization savings
Michael Pope of Safeco said that The other main benefit of BMC AR/CTL is file repositioning which is not a
characteristic of basic checkpoint processing. But AR/C does this for you and thats a significant value.
Managing these definitions in the Safeco environment does take some resource, but is centralized so any
developer can take advantage of the products capabilities.
The systems manager at the medical laboratory says that AR/CTL has automated the elimination of
contention problems that give 911 abends. Our batch runs very much in serial and our schedule was reduced
without doing anything. Would take 20 minutes to resolve a negative 911 manually and on any given day we
might get 12 or 13 of these. We saw a red flag as DB2 tables were clobbering each other and it went on for a
year. The tables were growing and the batch window was processing more data. Other jobs scheduled to run
at certain times were beginning to overlap and thus there was even more contention.
Often a number of people were involved in resolution. Although technically one could handle it, we often had
four or five looking at it. So we often let other jobs finish and then wed restart them as we didnt have AR/CTL
to automate restarts from checkpoints. Now life is so good! No one even sees any problems as the reattaches
work and we get return codes of zero. This saves CPU time as there are no hiccups and lots of people time
so they could work on other things.
The systems manager also said that Maintenance is very manageable and easy to use. We only need one or
two people and the danger is that you really ought to have more than one who knows how to use it, as if there
is an AR/CTL problem you need to know what to do. Weve written down procedures that helped this, on
balance Id say it has reduced staff (probably by one unit).
Petra Kopp at Euler Hermes said We have about five restarts in a day. Every program change can cause a
mistake to be made. Most of the time we can use the restart facility and not often would we need more
specialist knowledge. The automated restart facility is simple to do use. If done manually its not the time but
the complexity of the task that is the problem particularly if the person is not familiar with IMS. If done by a
scheduler with limited knowledge they would need to know which checkpoint to use and whether it was
possible and necessary to make backups etc in which case it would be much more difficult.
Published 09/2010
Page 18
WWW.OVUM.COM
WHITE PAPER
We also use the Batch Backout Facility implemented in AR/CTL and that makes it much safer because the
backup of IMS database is very critical. If batch jobs abend during the night, the database might not be
available for other jobs and for the online-systems so it is vital, that any problems concerning backout are
solved as soon as possible. Again, if these staff who only schedule jobs should manually need to solve this
problem, there might well be mistakes. Now its automatic and we dont have to worry.
Wed probably need to employ an additional person at least particularly to cover jobs in the night that would
otherwise impact the start of the online system in the morning. There is a lot of pressure to make sure the
online system is available.
The central systems database leader of the global courier saw programmer productivity savings. He said
There are programmer savings on doing the testing: i.e., all thats required is a JCL parameter change to
force an abend to happen rather than either altering the data to force a known error or valid validation
condition that could trigger an event, or change the code to test it and then change it back again which
wouldnt really be a valid test! Adding a couple of hours work each time they tested, and with version control,
etc., it can all be quite a lot of work.
Increased systems availability
Safeco uses BMPs with AR/CTL to run transactions and batch processing simultaneously. This provides a
higher level of availability which is another true advantage. We knew that we had to add checkpoint logic if
we were to run BMPs to ensure that there is no run away batch activity and protect the integrity of the online
environment. AR/CTL was a good match because whether using IMS or DB2 the calls are very similar, better
and more consistent for developers particularly as a lot of our programs access IMS and DB2
simultaneously.
The government agency currently has a daily batch window although they will soon use BMC AR/CTL to
enable online updates to run in parallel to the batch. The software support manager said We still get
contention between batch programs about three times a week when they change the schedule and dont
realize that they have a utility running that needs exclusive access to data. We estimate that using BMC
AR/CTL to enable parallel processing is going to give them two hours extra per day availability of their online
systems as well as resolving these contentions.
Petra Kopp at Euler Hermes said AR/CTL is very important for our main daily business because we aim to
have a 7 day 24 hour online system and we currently provide a 23 hour online service both in our offices and
globally. The online customers work during the night to enter their contracts. Overnight batch runs must not
lock databases so that they can do this.
We started with AR/CTL so that we could have online transactions in parallel with batch during the night.
Without AR/CTL we couldnt have offered the online opportunity. Without checkpoint restart there was no
possibility of parallel working as databases could be locked for too long. There are 10,000 online users
(registered) but maybe only 2,000 using it in any one day. There are around 1,200 In-house users with maybe
500 using it on a daily basis.
Online customers are very important customers as they do a lot of the work for themselves and we must take
great care of them. In-house users work online until 19:00, remote users till 06:30: i.e., we enabled a further
11 hours access for them. The remaining one hour is used for image copies for backup etc. We have 400,000
transactions in a day.
I asked her what she thought the cost of delaying the start of the online system in the morning might be per
hour and she estimated roughly 12,000.
Published 09/2010
Page 19
WWW.OVUM.COM
WHITE PAPER
Migration avoidance
The systems manager at the medical laboratory said that his site had used AR/CTL to avoid migration
from VSAM to DB2 and had undertaken no conversions for four years.
He said BMC regards us as an exceptional site because we have 30 million records per VSAM file some with
nine indices and we had to get BMC to raise the 4MB limit to 8MB for AR/CTL.
It would be a huge effort to convert to DB2 and would need outside help. There are 3 million lines of code
which would cost well over $1 million, and the database conversion would be additional: certainly the full cost
would be over $2 million.
Risk mitigation and increased business value
The systems manager at the medical laboratory said We have a tight and restricted batch window. If we
fail to complete then the problems piggy back. Before we had AR/CTL we were close to missing our SLA as
we were within minutes of missing our batch window. After an abend, even with all hands on deck, we
wouldnt guarantee to complete in time. We asked the systems manager what the effect of missing that
window was. He said If we miss the batch SLA we go into the next day. We may miss weekend processing
or month end as we have 54 days on average outstanding invoices (24 days to process and 30 day payment
terms) totaling $3-4 million a day. If we miss it there is down time for the users doing billing. If we miss the
month end it could lengthen the time to receipt of payment to 84 days. Its happened and it hurts but now its
most unlikely to ever happen again.
Petra Kopp at Euler Hermes has a site that runs 1,000 batch jobs overnight to aggregate data, mainly in
support of a data warehouse applications tools. At the month end they run 2,000 batch jobs. Euler Hermes
needs BMC AR/CTL because they have a three-day window at the end of the month and that window is short.
They also have to launch some batch jobs during the day. If they get clashes or problems they have to restart
these batches, and without checkpointing it would be impossible as starting from the beginning would cost too
much time.
Petra said Some batch jobs are short some are long running from 1 to many hours, it depends. If a threehour job had to be restarted from the beginning, we wouldnt have the time required to complete all our work
in the three-day window. This is very stressful. If we missed the batch window people would have to wait,
especially the DW application, until the batches had finished before they could run their reports. At the
beginning of the month they need a lot of reports. If they have to wait, their managers cannot make decisions
based on the processed information. There might be as much as one, two or even three days delay whilst we
catch up. These reports are the basic of decisions and without them we will lose thousands of euros.
The central systems database leader at the global courier where they use BMC AR/CTL to resolve
problems that could effect their tracking systems said We get dozens and dozens of deadlocks in a day
although many of these are CICS transactions in CICS DBCTL (we dont use data sharing). They are mostly
background CICS tasks although some involve foreground CICS tasks, but not very many.
The ones that affect the batch jobs, some of which are scheduled automatically as messages, come in from
outside users, so they are not under operator control, and we just have to react to the failures they may
cause. These external messages are run ad hoc at random times during the day. If these updates fail, then
the stream of messages creates a backlog. In business terms some of these jobs are critical and a backlog of
15 minutes is bad news. Users get frustrated as they expect a response to these messages in a timely
fashion (within an hour) and these users could be anywhere in the world.
Published 09/2010
Page 20
WWW.OVUM.COM
WHITE PAPER
We track 1.8 million consignments per day with a 20% increase at Christmas, but as rates are generally
increasing we expect that by March next year that may be the norm. If we send data through as a plane takes
off, it needs to all be there by the time the plane lands. If it isnt, then theres difficulty. They can carry on
operating because the most fundamental info is on the labels of the goods to enable delivery but it wouldnt
feed into the tracking system and so there would be uncertainty with regards to each packages location.
They would be scanning packages locally without the supporting info and would have to match these up when
the info eventually came through. In the meantime the database wouldnt show a proper status for that
consignment, which would make it difficult to interpret online. The mainframe isnt the only link in the chain,
and other problems may compound and extend the delay, so its important that the mainframe part is fast. It
would be damaging to our reputation. We could lose customers.
Table 2:
Contact Details
Corporate Headquarters
London Office
BMC Software
Assurance House
Vicarage Road
Houston
Egham, Surrey
Texas 77042
TW20 9UY
USA
UK
Email: databaseadministrationsolution@bmc.com
www.bmc.com/uk
www.bmc.com
Source: BMC
OVUM
Ovums Knowledge Centers are new premium services offering the entire suite of Ovum information in fully interactive formats.
To find out more about Knowledge Centers and our research, contact us:
Ovum Europe
119 Farringdon Road
London, EC1R 3DA
United Kingdom
t: +44 (0)20 7551 9000
f: +44 (0)20 7551 9090/1
e: info@ovum.com
Ovum Australia
Level 5, 459 Little Collins Street
Melbourne 3000
Australia
t: +61 (0)3 9601 6700
f: +61 (0)3 9670 8300
e: info@ovum.com
*174381*