Вы находитесь на странице: 1из 7

Considerations for Load Stress Tests

v1.05

Most common problems with load tests are caused by unrealistic expectations, poorly
structured tests, or load generators that are incorrectly configured. This document is
intended as a reference for various recommendations and considerations to improve the
effectiveness of a load test. Note: It is not intended to be an implementation guide for
testing a PeopleSoft system.

While Oracle PeopleSoft GCS provides support for PeopleSoft product-related, the
Oracle GCS does not provide support for test-specific or implementation-related issues.
Unfortunately, in the context of a test it is often difficult to distinguish between the two.
In many cases, GCS may adopt a "best effort" approach to help identify whether the
problem is truly a product-related issues. If necessary, Oracle GCS may refer customers
to Oracle Consulting Services or the load generator vendor for additional support.

NOTE: This material contained in this document has not been been subjected to rigorous
internal review. Oracle assumes no responsibility for its accuracy or completeness. The
use of this information or the implementation of any of these techniques described herein
is the responsibility of those implementing them and often will depend upon the their
ability to evaluate and integrate them into the test and operational environments

THE PLANNING STAGE

Ensure that your team knows your load generator tool intimately and how to properly
script for PeopleSoft applications.

Know how to quickly get help from the load generator vendor's support team.
Understand that vendor provides support for the tool and the scripts generated by it.

Clearly define the purpose of the test. Do you want to?


 Identify the system's "weakest link"?
 Define how long can you run without restarting certain components?
 Build confidence that the hardware is sized adequately?
 Confirm that the batch can be processed within a certain window?
 Confirm that a certain critical batch process will not extend the batch cycle
unnecessarily?
 Determine the impact of changing database optimizer parameters?

Define specific metrics that support the purpose of the test. When you reach those goals
stop. Scope creep is a constant problem in load testing.

Plan enough time to respond to "lessons learned." Indeed, a valuable lesson may be that
the system is not ready to be deployed in its present configuration. You need to be able to
react to significant implications of tests.
Consider how you can be preparing to run the test early in the implementation process,
perhaps even doing repeated, incremental test. As one of the last steps in the
implementation schedule, tests often become "hurry up drills" and hence the results are
totally undermined.

Use your best experts involved at every level during the test, even if that means using
short term consulting talent. Do not use a test as a time for novices to "learn PeopleSoft".
Leverage the team that will be supporting your production system.

Make note of skill gaps that appear during a load test. You may become aware that your
team really doesn't understand one of the components as well as it should, or that you
have only one resource that truly does. Be proactive about training your internal support
teams.

Allocate adequate funding to do a test correctly. Licensing issues (e.g. not procuring
enough virtual users) and hardware limitations (e.g. not having all of the web
infrastructure used in production) often put pressure on test team to cut corners and skew
results.

Be committed to change your architecture or application as needed. Have a mechanism


clearly defined to capture, evaluate, and adopt recommendations that come from the test.

Consider the impact on batch when designing a test. The batch load can potentially
significantly affect the system's throughput and is often ignored during a user load test.
This oversight can be disastrous when you move into production.

Have the load generator team very involved and accessible during a test. Several times
we have observed the load generating team perform a test in a "fire and forget" fashion.
In the early stages of the testing the hardware is often overwhelmed, resulting in invalid
test results. Test parameters often need to be changed and if these can be done "real time"
the effectiveness of a test can be improved dramatically.

Ensure that you understand how to translate results obtained in a test environment into
predictions about the eventual behavior expected in production. Ideally, you would want
to use an identical environment but that may not be possible.

As you plan, don’t forget to understand the impacts of a test on other systems you use
(potentially including external systems you may require).

Plan for your tests to have iterations where a small number of factors would change from
test to test in order to record the impacts experienced. It is unreasonable to have only one
test defined unless every day is exactly the same as every other day in your production
environment.

SCRIPTING CONSIDERATIONS
Build a transaction mix representing realistic usage. Unless you are scripting for a single
transaction, produce a blend of transactions that represent what the system will process
during a given period. Ensure that the mix is very "visible" and be ready to share that
mix with support analysts.

Consider the impact of different "real" users doing similar transactions differently. In the
case of self-service user, these differences may be dramatic and even include "bad"
activities, such as saving erratically, backing up unpredictably, or abandoning the session
totally.

Understand the user authentication transaction intimately. The user authentication


process can be very complicated and often uses several external components. The
process of identifying roles and permissions within PeopleSoft is (relatively speaking)
expensive and a well-executed test will compensate for this load when ramping up virtual
users. Interestingly, it is very common for the login transaction to derail an entire test for
weeks at a time as we attempt to work through the impact of the "bad" login scripting
while at the same time attempt to get focus back on the other, much more problematic,
areas in the application and test scripts.

Consider your virtual users' security profiles before you start scripting. Don't record all
scenarios with a super user, e.g. VP1 or PS. As a rule, the more permission lists and roles
you add to a user, the more effort will be required to log in and process transactions.

Ensure that the iterations-to-login ratio is reasonable. For example a call center user may
log in and enter 100 cases during a four hours, then log out. In this case if you were to
log in, enter one case, and log out, the load presented to the system would have an
abnormal number of logins/log outs.

Ensure that scenario is properly "parameterized." Correctly parameterizing the script is


no longer a trivial exercise since many of the values exchanged between the browser and
the web server are computed values and hence not obvious when doing a simple "search
and replace" in the script. Ensure that your test load generator vendor understands this
behavior and provides tools to allow you to properly handle it.

Avoid reducing "think time" in order to scale load. By reducing "think time" load
generators product "hyperactive" virtual users, which very poorly represent real user load.
For example, consider a transaction that a real user would take 5 minutes to complete due
to "think time." If, after recording that transaction, all "think time" was removed, let's
assume that the system can processes the same actions in 6 seconds - 1% of the real user
experience. It is unreasonable think that you simulate 100 real users with one virtual user
by simply removing the "think time."

Consider testing browsers/OS combinations that your real users will actually use. Along
with IE, Firefox and Safari browsers are also supported. When the browser interacts with
the web server, the browser/OS platform combination is reported with each request and in
many cases the PeopleSoft system will produce different HTML depending on the
browser being used.

INFRASTRUCTURE SETUP

Ensure the test is run on supported platforms and versions of each component. Refer to
the "Support Platforms" on Customer Connection for additional details.

Ensure that the complexity of your test environment mimics the hardware you intend to
deploy. If "multiples" are present in production, echo that complexity in your test
equipment, e.g. if production will have a cluster of web servers, ensure that a cluster of at
least two is configured during the test.

Ensure that all the components of the web infrastructure are present and properly
configured as well- Load Balancers, SSL accelerators, VIP's, Reverse Proxy Servers, etc.

Use the same technique for managing load balancer session affinity in the test, as you
will in production.

Ensure that SSL configurations are correct for the test environment. Since certificates
need to match the actual hardware that they run on, it is critical to ensure that the SSL is
configured correctly for the test.

Mimic the complexities of the production network. Some things to consider:


 Do external users route through a DMZ?
 Are all the servers in the same subnet?
 Are all the servers in the same subdomain?
 Is SSL used in conversations between servers?

Ensure that your script represents the various ways that users will enter the application.
Some things to consider:
 Will all the users enter at a single URL?
 Externally via the web? Via VPN?
 From multiple offices through different network access points?

Consider the impact of guest logins, which can be significant. Switching to authenticated
users can be a very intensive process.

Ensure that all external interfaces are setup and capable to supporting the test loads.

Ensure that your LDAP servers can handle the super-production impulse of load.

Ensure that your Integration Broker is configured properly and that all of the nodes and
messages are generated properly for the test environment.

Ensure that a web profile configured similar to the production one is used during the test.
Ensure that the State Discard Interval and the Web Cache intervals and Cache Purge All
Hit Count have been set appropriately.

Enable "Log Error Report=Y" in the psappsrv.cfg. This simple change will log out errors
that are typically application related to the APPSRV_xxyy.log file.

Consider the impact the database's transaction logging mode. To save space, cost and
effort, most tests are run with no archive logging. Your production system will certainly
have this mode enabled, so it is reasonable to test with it enabled. This configuration has
far reaching implications.

Consider configuring a separate web server and app server specifically for capturing
performance information during the test. The trace levels on these domains can be set up
to capture SQL and PeopleCode traces. One very interesting approach involves sending a
very small portion of the virtual user load (1%) through this domain under load. Another
approach is to use that domain only as part of the "+1" (explained elsewhere) test phase.

Generate a set of SQL scripts that captures the number of user generated "events" in the
database. For example, if you are entering vouchers, count the number of voucher
headers, lines, and details generated during the test. Measure performance at the database
level and not simply at the test load generator level. Include this information in each
test's archive files.

Ensure you know the time difference between all your machines. Ideally, each machine
should be synchronized with a common server regularly to minimize the time error. This
will be important should any problems require investigation on more than one machine.

BEFORE YOU RUN A TEST

Ensure that you can easily rerun a test and get the same results. For a test to be
meaningful, its results must be reproducible. At some point, early in the test cycle,
typically after "smoke tests" are complete, ensure that you can run back-to-back tests and
produce similar results. If not, understand why not and correct any problems.

Capture single virtual user traces of each transaction for diagnostics. At a minimum, you
should capture a PeopleTools trace with SQL (the first five options) a PeopleCode trace
(PeopleCode Program Starts and External Calls). Additionally, when you are under load,
the "+1" user should generate the same for comparison purposes.

Instrumenting all of the components and monitoring their behavior during a test is
critical. At a minimum, capture these metrics on each server as appropriate:
 OS - CPU, Memory usage, I/O rates, IPC queue usage and throughput, Network
rates and volumes, socket usage, disk usage, etc.
 JVM - active threads, detailed GC, access logs, etc.
 App Servers - active processes, queuing, connections, errors, dumps, etc.
 Database - AWR, blocking, contention, transaction rates, archive performance,
etc.

Understand that monitoring comes at a cost. When you measure something, you alter it,
typically not for the better. Recognize which monitoring techniques will heavily skew
test results and apply these sparingly. For example, tracing at the domain level will
dramatically affect the performance of all users on that domain.

Ensure that all of the caches (app server, web server) are all in a consistent state when
you start each test. Depending on the test, this may mean pre-populating these files.

EXECUTING A TEST

Plan for a graduated approach to load. Initial test should have a very small fraction of
load to ensure that the scripts are valid and that the system is behaving as expected, a.k.a.
"smoke testing". Then produce a very modest load to look for unexpected contentions
and to identify the "weak links" in the infrastructure - they certainly will exist. Then,
only after correcting any problems found, increase the load incrementally.

When any one component saturates during a test, do not continue to increase load. Stop
the test and resolve the contention before rerunning the same test. As trivial as it may
sound, if a domain begins to queue at 100 users, then increasing the load to 500 users will
provide no meaningful information.

Always measure a human user's experience when the system is under load. Do not
merely rely on the test load generator metrics. We refer to this as "n+1" testing where "n"
is the virtual user load provided by the test load generator, and the "+1" is the human
experience that is measured as well. This user should closely follow a script that is
monitored carefully and collect those metrics with each test as appropriate. If possible use
a distinct username for this session so it can easily be tracked in all the logs.

ANALYZING THE RESULTS OF A TEST

Give each test run a discrete iteration number, e.g. Test #1, Test #2, etc. Be consistent
and never reuse a test name.

At the end of the test, before you being the analysis of the data, archive all of the log files
and relevant information for that test. Clearly label the archive with the test number. We
highly encourage this process to be scripted for speed and consistency.

Add a README to each test archive. Include in it a list of the files in the archive, the
purpose of the test and any observed results, a boiler plate section that describes the
servers names, domain names, OS versions, patch levels, etc. Record when the test
started and ended (local time) and when during the test there were any anomalies. If you
do not use local time, record time zone offset from GMT. Include the number of database
transactions involved and load generator tool test results.
Measure and ensure data cardinality after the test. After a test is it not uncommon to have
highly skewed data set with are uncommon in the real world. This skew would represent
a scripting error and needs to be addressed.

Open the log files and look for errors. Record these in the README file for the test run.
 Search the APPSRV_xxyy.log for the string "error report". These will typically be
application-related issues.
 Search the APPSRV_xxyy.log for the string "dump" which will point you to
domain process crashes.
 Look for java exceptions in the web server log files.
 Look at the system log files for underlying OS issues.

RECOVERING FROM A TEST RUN

Have a plan in place to be able to quickly restore the system to its "pre test" state. For
example, web server are restarted and cache files removed, app servers are restarted and
all cache files are restored with pre-test files, interfaces and external scripts reset, data
removed from the database and transaction log files purged, file systems checked for free
space, process scheduler ID's reset, report repositories purged, etc.

After each run, reset each and every component to the pre-test state.

If you choose not to do a full database restore, preferring to simply delete out data, ensure
that the tables and indexes are coalesced to their original state as well.

Вам также может понравиться