093 - STP 2008 01

A Publication
PR it Te
BE CTIC ting
A s
Un
ST ES
:
VOLUME 5 • JANUARY 2008 • $8.95 • www.stpmag.com
The Future of Software Testing...
Save $300 With Our

Super-Early Bird Discount
Register Online by January 11
February 26–27, 2008

New York Hilton
New York City, NY
A BZ Media Event
Brian Behlendorf on
Open Source
Rex Black on ROI
Jeff Feldstein on
Test Teams
Robert Martin on
Craftsmanship
Gary McGraw on
Security
Alan Page on
Centers of Excellence
Stretch your mind at

FutureTest 2008 — Robert Sabourin on
Just-In-Time Testing
an intense two-day
conference for executive
and senior-level
Joel Spolsky on
managers involved Software Testing
with software testing

and quality assurance.
Tony Wasserman on
Our nine visionary
the Future
keynotes and two
hard-hitting panel
Alan Zeichick moderates
discussions will inform two panel discussions on
Test Tools and on the
you, challenge you Application Life Cycle
and inspire you.
www.futuretest.net
VOLUME 5 • ISSUE 1 • JANUARY 2008
Contents A Publication
12 COV ER STORY
Lights, Camera, ALM 2.0!
When it comes to life cycle management, with its newly automatic
synchronization of metadata and development artifacts, ALM 2.0 is already
a star—and the tester is the director. Roll ‘em! By Brian Carroll
You Can Gauge

18 Performance
Without Requirements
Hang onto your Web app’s users by
analyzing their acceptable wait limits.
Here’s how. By Alexander Podelko
Let Not Your Depar t ments

25 Project Become
A Tragedy of Errors
7 • Editorial
Why we use software to test software.
To bypass critical errors in testing, learn

8 • Contributors
from the flight control industry and
Get to know this month’s experts and the
mind your out-of-range values. best practices they preach.
By Yogananda Jeppu and
Ambalal Patel Offshore
30 Playbook:
Quality Audit 101
9 • Feedback
It’s your chance to tell us where to go.
10 • Out of the Box

New products for testers.
Your company could be looking over-
seas to cut costs and deadlines. Put the
35 • Best Practices
odds in your favor with a quality audit,
Will unit testing go mainstream? Based on
so that your team’s best practices will be
my editorial experience, never! By Geoff Koch
on the agenda—wherever you test.
By Steve Rabin and 38 • Future Test
John Bradway Sync up development with quality assurance
to boost productivity. By Adam Kolawa
JANUARY 2008 www.stpmag.com • 5

Ed N otes
Using Software
VOLUME 5 • ISSUE 1 • JANUARY 2008
EDITORIAL
Editor Editorial Director
Edward J. Correia Alan Zeichick
+1-631-421-4158 x100 +1-650-359-4763
To Test Software
ecorreia@bzmedia.com alan@bzmedia.com
Copy Editor Contributing Editor

Laurie O’Connell Geoff Koch
loconnell@bzmedia.com koch.geoff@gmail.com
ART & PRODUCTION Happy New Year! As we software? The whole issue
Art Director Art /Production Assistant enter 2008, it’s time once reminds me of an absurdity
LuAnn T. Palazzo Erin Broadhurst
lpalazzo@bzmedia.com ebroadhurst@bzmedia.com
again to contemplate our I came upon while working
accomplishments of the year as a support technician for
SALES & MARKETING gone by, to consider our Windows magazine in the
Publisher goals for the year ahead, and 1990s. We were gathering
Ted Bahr for some, to ponder the requirements for the edito-
+1-631-421-4158 x101
ted@bzmedia.com
many paradoxes that vex rial and production network
our existence. from the staff, which insist-
Associate Publisher List Services
David Karp Lisa Fiske Why, for instance, do we ed on using Windows-based
+1-631-421-4158 x102 +1-631-479-2977 build software to test other hardware and software to
dkarp@bzmedia.com lfiske@bzmedia.com software? This question has Edward J. Correia publish the magazine.
never before occurred to me, nor does it “We’re a magazine about Windows, and
Advertising Traffic Reprints
Phyllis Oakes Lisa Abelson parallel such mysteries as people who are we need to be published on Windows”
+1-631-421-4158 x115 +1-516-379-7097 financially wealthy but short on values. was the philosophy.
poakes@bzmedia.com labelson@bzmedia.com But it does bear some discussion. And while that kind of idealism might
Director of Marketing Accounting
The idea was brought to me by testing have looked good on pages of the maga-
Marilyn Daly Viena Ludewig consultant Elfriede Dustin, who credits a zine, the state of desktop publishing on
+1-631-421-4158 x118 +1-631-421-4158 x110 conference-goer with reminding her of a the Windows platform at the time was
mdaly@bzmedia.com vludewig@bzmedia.com
concept she had pondered many times immature, to be generous. Their operat-
before. So why is it that we develop soft- ing system was Windows for Workgroups,
READER SERVICE
ware to test software? the first such installation in the company.
Director of Circulation Customer Service/
Agnes Vanek Subscriptions The practice of automating software My belief at the time was the same as it
+1-631-443-4158 +1-847-763-9692 testing itself involves a software develop- is today. The magazine should have used
avanek@bzmedia.com stpmag@halldata.com ment life cycle, complete with its own set the best tool available at the time, regard-
of requirements, a design and the actual less of its content. The parent company,
Cover Illustration by P. Avlen development and testing. And as Dustin now called CMP Media, made its bones
points out, the major advances in testing publishing dozens of periodicals, which
tools since the 1990s include the ability to at one time all used a mainframe-style
recognize object properties beyond their publishing system called Atex. And not a
x,y coordinates. This has made automa- single one of its publications was about
tion more viable because scripts can be Atex.
President BZ Media LLC more useful and less fragile. Why? Because one has nothing to do
Ted Bahr 7 High Street, Suite 407 The open source community also has with the other. “We want to eat our own
Huntington, NY 11743
Executive Vice President
+1-631-421-4158
emerged in the last 15 years as a prolific dog food,” they might have said. And for
Alan Zeichick
fax +1-631-421-4130 source of high-quality test automation Windows magazine to use Macintosh
www.bzmedia.com tools. As evidence, consider the FitNesse computers was unthinkable. “Ridic-
info@bzmedia.com
(fitnesse.org) acceptance testing frame- ulous,” I would have said (and probably
work, Watir (wtr.rubyforge.org) Ruby- did, in private). I lost that battle and
Software Test & Performance (ISSN- #1548-3460) is based automated browser testing librar- would ultimately not be involved in the
published monthly by BZ Media LLC, 7 High Street,
Suite 407, Huntington, NY, 11743. Periodicals postage ies, and Python and Perl. deployment. It was just as well, because
paid at Huntington, NY and additional offices.
Just last month, this magazine ran an the team struggled mightily.
Software Test & Performance is a registered trade-
mark of BZ Media LLC. All contents copyrighted excellent tutorial on building your own Software is very good at automating
2008 BZ Media LLC. All rights reserved. The price XML-based test automation framework. things. So when automated testing is the
of a one year subscription is US $49.95, $69.95 in
Canada, $99.95 elsewhere. Other open source test automation need, why not use the best tool for the
POSTMASTER: Send changes of address to Software frameworks are available, such as job? For the practice of automating soft-
Test & Performance, PO Box 2169, Skokie, IL 60076.
Software Test & Performance Subscribers Services STAF/STAX, which provides useful serv- ware testing, the best tool happens to be
may be reached at stpmag@halldata.com or by
staring you right in the face. ý

calling 1-847-763-9692.
ices you don’t have to build from scratch. more software. Sometimes the best tool is
Why shouldn’t software be used to test

Contributors
BRIAN CARROLL leads the Application Lifecycle Framework

project at Eclipse. His article, which begins on page 12, delves
into ALF and its implementation of ALM 2.0, pertaining to
the tester’s role.
Brian has been developing software professionally for 40
years, the last 25 of which have been focused on developer
tools and infrastructure. Brian has been a popular speaker
and project leader at GUIDE, the OMG and Eclipse. He
co-authored the original OASIS Web Services Distributed
Management (WSDM) specification. He’s also a Fellow at
Serena Software, which develops and markets ALM solutions.
ALEXANDER PODELKO’s engagements as an application

performance expert have included Aetna, Hyperion and
Intel. He is currently serving as a consulting member of
the technical staff at Oracle, part of Performance
Engineering for the Hyperion line of products, which
Oracle acquired in March.
Alexander’s article, which begins on page 18, draws from
his experience in the roles of performance tester, perform-
ance analyst, performance architect and performance engi-
neer to explain how a tester may establish performance
requirements in the absence of documented ones.
YOGANANDA JEPPU and AMBALAL PATEL are scientists at IFCS Aeronautical Development
Agency in Bangalore, India. Beginning on page 25, the colleagues take a Shakespearean
approach to reduction of defects
to help keep your project from
becoming a Tragedy of Errors.
Yogananda has many pub-
lished works, mostly relating to
real-time systems test method-
ologies for performance and
quality assurance in aeronautics-
industry control systems. In his
current post since 2000, Ambalal
holds degrees in mechanical
engineering and instrumentation and a Ph.D. in fuzzy control systems from the
Indian Institute of Technology, Khargapur.
STEVE RABIN has more than 20

years of experience designing and
delivering enterprise software. He’s
currently serving as CTO of soft-
ware and Internet venture capital
firm Insight Venture Partners.
He and JOHN BRADWAY, devel-
opment manager at project man-
agement software maker Primavera
Systems, explain how organizations
can test effectively offshore by keep-
ing the foreign processes in sync with those at home. Their article begins on page 30.
TO CONTACT AN AUTHOR, please send e-mail to feedback@bzmedia.com.
8 • Software Test & Performance JANUARY 2008

Feedback
humans when it comes to the creative

DEVELOP AND TEST process and methods such as explorato-
ry testing. We need software to test soft-
SIMULTANEOUSLY ware simply because, once it is written
properly, it increases the amount of test-
I would like to respond to Prakash Sodhani’s arti-
ing we can do with the limited amount
cle “Navigating Without Requirements” (Software of resources we have.
Test & Performance magazine, Dec. 2007) to suggest Comparing hardware testing to soft-
a much simpler and more effective way for tester to ware testing is an invalid comparison.
test applications being developed using the agile Hardware adheres to fixed rules (laws
of physics, etc.). If you think about it,
approach: namely, be an active participating member
a simulator is a combination of both
of the development team and translate the requirements into test cases at the exact hardware and software. Therefore,
same time as the developers are developing, using the exact same conversations with you’re testing hardware with hardware
the customers.The developers will be able to fix their bugs immediately instead of wait- tools, just as you test software with soft-
ware tools.
ing weeks to be told what the bugs are.
Dale L. Perry
Just because test cases are traditionally developed after the code is complete Tampa, FL
does not mean that this is the only or best way to develop test cases. What better way
to build quality in than to develop the tests in parallel instead of afterward? SOA’S BRAVE NEW WORLD?
Another advantage of the tester being actively involved from the beginning is that Regarding “Changing Tires on a Moving
Car” (Test & QA Report, Nov. 27, 2007),
they will sometimes spot issues with the requirements in the discussions with the cus-
capture and playback strategies have been
tomer that the developers miss. the basis of performance testing tools for
Steven Gordon, Ph.D. years. There is nothing new here. The
Scottsdale, AZ only difference is you are now capturing
and playing back a different stream—
SOAP messages instead of HTML, for
SOFTWARE FOR THE TESTING Communist. They can still grab and example. Rational and Mercury (and oth-
OF SOFTWARE? nationalize all our fabs. We don’t actual- ers) have had this technology in the mar-
“The Software Testing Paradox” by ly “own” anything in China. ket for years.
Edward J. Correia (Test & QA Report, Now if only they could outsource fee- They have also had the ability to vary
Dec. 4, 2007) is an interesting article. I ble-brained editors. That would be justice. the data to make it more realistic. Just
was not so startled by the question “Why Bill Baka playing hundreds of copies of exactly the
develop software to test software?” Via e-mail same message is hardly a good way to val-
We developed a software tool to inject idate an increased load/what-if scenario.
messages into our system under test, and THE HUMAN ADVANTAGE The statement “not having to under-
I do not think this is uncommon in oth- This is in response to Edward J. Correia’s stand the semantics of correctness” is just
er companies/projects. And as for the article “The Software Testing Paradox.” crap. Of course you do. If someone has
hardware world, think of an oscilloscope; There is a simple answer to this ques- changed the code you have to know if
Isn’t that a piece of hardware used to test tion “Why do we build software to use your test failed because it was because the
hardware? And further, think of a hard- for testing other software?” We use soft- code had changed and it was now cor-
ware production line, and consider cus- ware (automated testing tools) to test rectly doing something different, or the
tom-built hardware measuring tools, to software because it is impossible for peo- code had changed and the test has caught
check that the item is within spec. ple to approach effective testing manu- an incorrect behavior.
Mike Arnold ally. Manual testing requires lots of time Why is it every time anyone puts SOA
Cambridge, MA and resources, and is unreliable and can in a sentence, we are supposedly trans-
be very tedious and tiresome. Auto- ported to some brave new world?
SOLD OUT BY OUTSOURCING mated testing has the advantage of Mark McLaughlin
Regarding Edward J. Correia’s being very effective at repetitive tasks. Sydney, NSW, Australia
“Offshoring Strategy: Trust, but Verify” Many of the techniques we use to test
(Test & QA Report, Nov. 20, 2007), can- software are mathematical in nature FEEDBACK: Letters should include the writer’s
cel me. I’m not the CEO selling out his and can be easily automated, making name, city, state, company affiliation, e-mail
country—I’m the EE who got sold out. the task of test design simpler and more address and daytime phone number. Send your
Outsourcing only benefits the business efficient. thoughts to feedback@bzmedia.com. Letters
become the property of BZ Media and may be
execs, not the people who made the busi- However, with all their advantages,
edited for space and style.
ness. Outsource to China. They are still automated testing tools cannot replace

Out of t he Box
Hoping to Stir The

Heart of the Lonely
Exception Hunter
Red Gate, which specializes in tools for Exception Hunter
Microsoft developer technologies, in from Red Gate can
December released Exception Hunter, identify exceptions
a US$295-per-user analysis tool for .NET in .NET code before
that it claims can predict exceptions like- the app is finished,
ly to be thrown by methods before allowing handlers to
they’re part of a finished application. be written.
“Until now, developers have had to
wait until an error happened to find out bly to analyze,
which method throws which exception,” drilling down to
said Exception Hunter lead developer the method level and reviewing a list of patching after application develop-
Bart Read. “Exception Hunter finds and the different types of exceptions that ment,” said Read.
reports exceptions early in the devel- can potentially be thrown. Exception Hunter 1.0 is available
opment process, so they don’t cause With this information, the develop- now for Windows 2000 and later (includ-
downstream problems for developers er can then write the appropriate excep- ing Vista) and requires the .NET
and end users.” tion handlers. Framework 2.0 and IE 6 or later. It also
The tool works by permitting the “It’s all about an ounce of preven- couples with the code editing capabili-
developer or tester to select an assem- tion, rather than the frustrating cure of ties of Visual Studio 2002 or later.
Coverity’s Prevent SQS Now Detects Race

Conditions as Cores Continue to Multiply
With a totally rewritten user interface, ally leads to a system freeze or crash. ning operation and blocks other threads
Coverity’s flagship Prevent SQS static Performance-sapping thread blocks hap- from making any progress.
code-analysis tool now can detect race pen when one thread calls a long-run- As the number of processor cores
conditions in software writ- continues to rise in target sys-
ten for multicore and mul- tems, these hard-to-detect and
tiprocessor systems. impossible-to-predict prob-
Released in December, lems will become an increas-
the latest version of the ingly common issue for
scanning tool for C/C++ testers. “Race conditions are
and Java can now detect the particularly difficult for devel-
three major types of code opers because they are hard
flaws: race conditions, dead- to test for [and] nearly impos-
locks and thread blocks, the sible to replicate,” said
company claims. Coverity CTO Ben Chelf in a
Race conditions occur news release. “The conse-
when multiple threads quences of a race condition in
access the same shared data the field can be disastrous.” In
without suitable locks in some instances, they have
place to prevent accidental even been blamed for deaths
overwrite, loss or corrup- and widespread catastrophes.
tion. Deadlocks are defined Prevent SQS is available
as two or more threads that now for all major platforms
chase each other’s locks in and compilers; pricing is
Coverity’s Prevent SQS identifies race conditions, now through a new UI.
a circular chain, which usu- based on project size.

Understand Your
Role as a Tester
When ALM Is
Here’s Your Close-Up:

On the Scene
Get Ready for ALM 2.0
12 • Software Test & Performance

By Brian Carroll
ALF also provides a single sign-on based identity capability (using WS-
f your organization is seeking
I an application life cycle man-
agement tool, you’d do well to
capability that allows a user’s identity to
be passed securely to all the tools that
participate in an orchestration.
Security, WS-Trust, WS-Federation and
SAML Assertions) that conveys the
identity of the user making the change
This article explains how ALM sys- that triggered an event to be passed
adopt one that supports ALM tems—using ALF as a model—can give through the BPEL process to all the
2.0, which improves on prior versions development teams, and specifically relevant tools. As a result, all the tools
by automatically synchronizing the testers, the tools they need to perform are updated to reflect the change.
metadata and development artifacts most effectively. ALF also provides common services
among tools, while still allowing each To provide development teams with that make gluing the tools together eas-
tool to manage much of its own data flexibility and responsiveness, ALF ier. These services include engines to
storage. Automatic synchronizations employs the notion of Event Driven send e-mail notices to team members
could include, for example, adjusting Architecture (EDA), which allows when certain events occur, logging serv-
the test plan to reflect changes in the ices to create an audit trail of the devel-
status of a requirement or an issue, or opment activities, and relationship serv-
launching a series of activities to ices to establish relationships among
•
deploy a build into staging or produc- artifacts stored within different tools.
tion once that deployment is approved. Finally, ALF provides a set of best
Think of the automation of ALM as practices for tool integration and
replacing the “cut and paste” that vocabularies. Vocabularies define the
occurs today to keep development ALM 2.0 helps core data structures, events and servic-
tools in sync, though the newer spec es that tools should expose in different
adds a lot more than that. ALM 2.0
helps all the stakeholders in the soft-
all the stakeholders stages of the life cycle. An ALF vocab-
ulary describes the essential events
ware development process communi-
cate and collaborate more efficiently.
in the software and services for configuration man-
agement, requirements, testing, etc.
ALM frameworks provide the glue that Vocabularies make it simpler to
integrates stakeholders and their
development process define orchestrations and substitute
development tools across the life cycle development tools with minimal dis-
and opens opportunities for process communicate and ruption to the tool integrations.
improvement and quality improve- However, ALF also will work if tools do
ment through the use of better tools. collaborate more not support the vocabularies.
For more on ALM 2.0, see the sidebar Based on standards and implement-
“ALM’s Second Coming.” efficiently. ed as open source, ALF eliminates the
nightmare that happens when an
Eclipse Already There
The Eclipse development community is
often quick to exploit new specifications
or development ideas. For example, the
Eclipse Application Lifecycle
• organization upgrades a tool in its
development stack and the proprietary
point-to-point integrations break.
This can often bring development
to a grinding halt. Not only is the
Framework (ALF) project (www.eclipse underlying source code of the interop-
.org/alf) already implements an open tools to emit an event when something erability platform available to the
source ALM 2.0 framework. changes within a tool that may affect shop, but the logic of the integrations
Though still in the incubation phase, related data in another tool (for exam- is expressed in the high-level standard
ALF provides event-driven tool integra- ple, once a build has been completed workflow language of BPEL. The
tion and orchestration using standards or a bug has been approved to be fixed development team (most likely the
such as Web services and BPEL, the in the current release). configuration management team) can
OASIS Web Services Business Process The events go through the ALF make the fixes themselves without hav-
Execution Language OASIS Web Event Manager, which filters the ing to wait for the vendor to get a
Services Business Process Execution events of interest and routes them to patch into the next maintenance
Language. the appropriate BPEL process. A BPEL release cycle.
process is like a flowchart that indi-
Brian Carroll is lead developer on the Eclipse cates the sequence and data to be ALM on Testing
Application Lifecycle Framework (ALF)
passed among tools to keep them in Another positive byproduct of ALM
project and a Serena Fellow.
sync. ALF also contains a standards- 2.0 is that it will make testers’ lives eas-

ALM 2 IN ACTION
-
es pro
t e ch nologi SDL
t SOA hen W
a b il i ties tha concepts, t and imple-
cap OA of t
rn the ning S nceive opmen
5. Lea art by lear g able to co your devel rhaps
t in r pe
sid er vide. S n BPEL. Be egration” fo easier and
o n e n t b
this, c - and th e “killer i ke your jo .
y p r actice iew to partic th
ment ation will m e recogniti
a on
d
lrea d its purv vel- to
u d on’t a
xte n ftw a r e d e
or g a n iz
y o u so m
t i s c ritical s
o e so o n er e n ge
1. If y he QA team your shop’s ALM soluti e e v en garn m e a surem which chan le
t f r n es , mp
having the design o erwise, you ss, such as o e l l i n itiativ and knowin porating si to
g
n h c e b a r ols
ipate i process. Ot flawed pro changes to 6 .
h s
As wit the proces t ar t by inc n PPM to are
o
n t g a n ving S blow
opme automatin ish betwee an review. impro d which hu orate full- ocess and
r t. QA
e u m
could b esn’t disting that need hu a n
help , then inc o r p
pmen t p r
ting
o
that d ed and thos
e
b y propaga ust r e p o r ts
y o u r develo
at y
autom ficienc eam m how g to
in cr e ase ef ut the QA t ke sense. track ng.
i nt P rocessin
b ma v
2.0 ca
n
ng too
ls,
re they impro ex Eve cess.
2. ALM hanges amo ools to ensu ly in g Compl lopment pro
e
c
simple links among
t of er app he dev ool
r e e x ecution de 7 . Consid quality of t n ’ t like. T -
o t h e o o
monit tomate ch as c
o
improv
e th s you
d
d not-s
2 .0 can au ting tools, su eric code k w i t h tool of great an est-of-
LM ra en uc ix “b
ause A incorpo more g es to get st in a m g to to
3. Bec ols, consider anners and d the resourc 8 . Don’ t uently cont a es switchin rs that tr y
f to s c h a f r e q m a k e n d o h o se
sets o vulnerability ay not have suites ools. ALM w
v
ar y of yourself, “Wame-
it y u m t . B e k
secur ols, that yo s. great tools easier latform. As priet ar y fr
to
quality reviously. s it p roduce p
b r e e d ” i n t o t h e i r ve d b y a p
ro
o r t p s y stem o n ’ t be y o u e r
sup p t he d. D lock s
well as planne level; rest is
pro c ess as ons work as a crooked y’re b e st inte
the rati sing the ”
4. Test re the integ ilds walls u s behave as g and work?
a k e su w h o b u t ra ti o n h a n dli n
M n es r
e maso ur orch to erro
like th nsure that yo cial attention
e e
test to to, giving sp
o se d
supp ns. propagates changes (for example, a
onditio
edge c requirement is deleted) once those
ment. The expected improvement in links are established, but if there are
system quality resulting from that syn- changes to the wording of a require-
chronization is what sells the technolo- ment that subtly change its meaning,
ier. With the new spec, tool integration gy to development managers. the automated transformation rules
is based on the development process, ALM 2.0 can eliminate the rote work won’t catch that change. So the human
such that updates to project require- needed to synchronize data across tools, intelligence of a QA team is still needed
ments are synchronized across all the but tool integrations are generally driv- to interpret a requirement and deter-
tools used in the development cycle. So en by mechanical transformation rules mine which tests are appropriate to ver-
when a requirement is deferred to a and don’t yet understand the semantics ify the requirement has been satisfied
later release, the test management tool (or meaning) of development artifacts. (Director’s Note 1).
will have been automatically notified to Perhaps such understanding will come Certain changes in requirements or
disable tests related to that require- in ALM 3.0. For now, the current spec bug reports will require changes in

ALM 2 IN ACTION
A
application code and the tests them-
selves. Improved communication LM 2.0: THE SECOND COMING
between tools—and therefore all the
roles in development—may make meta- The following is an excerpt from Carey Schwaber’s August 2006 Forrester report that
data more accessible to testers, but craft- introduced the term ALM 2.0.
ing a flexible test and identifying edge
Tomorrow’s ALM is a platform for the coordination and management of development activ-
conditions to be tested still requires the
ities, not a collection of life-cycle tools with locked-in and limited ALM features.These plat-
skills of a journeyman tester.
forms are the result of purposeful design rather than rework following acquisitions. The
However, with ALM 2.0, the QA
architectural ingredients of ALM 2.0 are:
department will have more time to
design and craft tests, and spend less time · Practitioner tools assembled out of plug-ins. An à la carte approach to product packag-
on creating custom automations and ing provides customers with simpler, cheaper tools. Far from being a pipe dream, this
reporting on the tests. ALM integrations approach is a reality today. IBM has done the most to exploit this concept, currently pro-
will perform those tasks (Director’s viding many different grades of development and modeling tools that are all available as
Note 2). perspectives in Eclipse, as well as the ability to install only selected features packs in each
ALM also impacts how integrations of these tools.
among tools are expressed. Today,
scripting with command-line inter- This approach has not yet been successfully applied outside of development and modeling
faces is the common way to tie togeth- tools. For example, today, customers must choose between defect management that’s too
er a sequence of tools. However, it’s tightly coupled with test management and software configuration management (SCM) or
difficult to integrate tools that run on defect management in a stand-alone tool.
different platforms (for example,
· Common services available across practitioner tools. Vendors are identifying features
Linux or the mainframe), or expose
that should be available from within multiple practitioner tools—notably, collaboration,
their interfaces over the Web rather
workflow, security and reporting and analytics—and driving them into the ALM platform.
than the command line. Telelogic has started to make progress on this front for administrative functionality like
With ALM 2.0, BPEL can sequence licensing and installation. Microsoft has gone even further: Visual Studio Team System
and integrate tools that run on differ- leverages SharePoint Server for collaboration and Active Directory for authentication,
ent platforms and that don’t expose and because it uses SQL Server as its data store, it can leverage SQL Server Analysis
command-line interfaces at all. And Services and SQL Server Report Builder for reporting and analytics.
the orchestration can be expressed
using a BPML or BPEL editor rather · Repository neutrality. At the center of ALM 2.0 sits not one repository, but many. Instead
than as script code using a text editor. of requiring use of the vendor’s SCM solution for storage of all life cycle assets, tomor-
row’s ALM will be truly repository-neutral, with close to functional parity, no matter
Unit and Component Testing where assets reside. IBM, for example, has announced that in coming years, its ALM solu-
The development of unit tests will be tion will integrate with a wide variety of repositories, including open source version-con-
trol tools like Concurrent Versions System (CVS) and Subversion. This will drive down
largely unaffected by ALM 2.0.
ALM implementation costs by removing the need to migrate assets—a major obstacle for
However, unit tests will be incorporat-
many shops—and will bring development on mainframe, midrange, and distributed plat-
ed into much larger sequences of test-
forms into the same ALM fold.
ing as part of the continuous integra-
tion movement. Why run unit tests · Use of open integration standards. Two means of integration—use of Web services APIs
continuously when you can’t incorpo- and use of industry standards for integration—will ease and deepen integration between a
rate code scanning, security scanning single vendor’s tools, as well as between its tools and third-party tools. Many vendors still
and other types of quality assurance don’t offer Web-services-based APIs, but this will change with time. In addition, new stan-
into the process? dards for life-cycle integration, including Eclipse projects like the Test and Performance
An early ALF prototype incorporat- Tools Project (TPTP) and Mylar, promise to simplify tools integration. One case in point:
ed security scanning with traditional SPI Dynamics reports that its integration with IBM Rational ClearQuest Test Manager
test management, and combined the took one-third of the time it would have taken if both tools didn’t leverage TPTP.
reporting from both into a single
“Deploy to Production” issue and · Microprocesses and macroprocesses governed by externalized workflow. The ability to
create and manage executable application development process descriptions is one of the
report. It has long been a strange
big wins for ALM 2.0. When processes are stored in readable formats like XML files, they
irony of our industry that the auto-
can be versioned, audited and reported upon.This facilitates incremental process improve-
mated test suite is run manually. With
ment efforts and the application of common process components across otherwise discrete
ALM 2.0, the automated test suite can processes. For example, Microsoft Visual Studio Team System process templates are
run automatically after, say, a success- implemented in XML and contain work-item-type definitions, permissions, project struc-
ful build (Director’s Note 3). ture, a project portal and a version control structure.
Integration vs. Application Testing There is no solution on the market that possesses all of these characteristics, but this is the
While the focus of testing will be on direction in which most vendors are moving. However, it will be at least two years before
the application delivered to the end any vendor offers a solution that truly fulfills the vision of ALM 2.0.
user, the development processes and
its integrations will become more Source: Forrester Research, Inc.
inclusive and, therefore, will also need

ALM 2 IN ACTION
was scattered in silos maintained by

each tool. The ALM 2.0 framework can
create a central log of activities that can
be fertile ground for insights that can
improve your processes.
Tools exist that use Complex Event
Processing to look for patterns in
event logs. For example, you may be
able to identify a pattern in which
changes made by a certain developer
or using a certain tool led to a failure
when running automated tests.
Analyzing the logs for such patterns
may lead to insight that can improve
the overall process quality (Director’s
Note 7).
Easier Tool Changes

ALF vocabularies provide another
dimension to the flexibility that ALM
2.0 provides. The idea is that switching
tools today can be a painful experi-
ence—history can get lost, not all data
can be migrated, and integrations will
have to be redone.
ALF vocabularies are developed in
an open and transparent process by
sets of companies agreeing on the
essential data that each tool exposes,
clearly defining the meaning of that
data and segregating essential, com-
mon data from tool-specific data.
to be tested Management Visibility Integrations can then be expressed as
(Director’s Note 4). ALM 2.0’s automation capabilities pro- a core using the common data elements
vide the potential for providing QA and and extended to leverage any tool-specif-
New Skills Needed management with far greater visibility ic data. Such an approach increases the
SOA is perhaps the most commonly into the development process. Orchestra- plug-ability of tools. Switching tools then
flaunted buzzword in the industry tions can gather data from tools and involves only changes to the tool-specific
today. And for good reason: Beyond refine them for consumption by Project data, and then only if that data is con-
the hype, it’s clear that SOA provides a Portfolio Management (PPM) tools for sumed by other tools (Director’s Note 8).
platform-independent way to
integrate different software sys- Get Started Now
tems that previously were diffi- Open source ALM 2.0 is avail-
cult or impossible to integrate.
Those systems could be end-
user applications or even sys-
tems of development and test
• able today in the Eclipse ALF
project. It provides infrastruc-
ture for event routing, guid-
ance for process automation
tools. ALF uses Web services as Open source ALM 2.0 is available using the BPEL engine of your
the primary interface descrip- choice and authentication, as
tion and communication mech- today in the Eclipse ALF project. well as conveying identity
anism. among tools integrated with
•
That means that the WSDL Web services.
describes the events and services It also provides some vocab-
of tool interfaces, and BPEL is ularies—and you can partici-
used to describe process automa- pate in developing new vocabu-
tions. And why not? Software laries for the tools you use, or
development has been building try your hand at extending
SOA-based systems for the business; why presentation by QA and development some existing ones. Download ALF
not apply SOA to improve the way soft- managers (Director’s Note 6). from www.eclipse.org/alf and get a
ware development operates? Shouldn’t head start on improving your develop-
Use Complex Event Processing
ductivity today. ý
the shoemaker’s children have shoes? ment organization’s quality and pro-
(Director’s Note 5) Prior to ALM 2.0, logged information

By Alexander Podelko
efining performance requirements is

D an important part of system design
and development. If there are no written perform-
ance requirements, it means that they exist in the
heads of stakeholders, but nobody bothered to
write them down and make sure that everybody
agrees on them.
Exactly what is specified may vary significantly
depending on the system and environment, but
all requirements should be quantitative and meas-
urable. Performance requirements are the main
input for performance testing (where they are
verified), as well as capacity planning and pro-
duction monitoring.
There are several classes of performance
requirements. Most traditional are response time
(how fast the system can handle individual
requests) and throughput (how many requests
the system can handle). All classes are vital: Good
throughput with a long response time often is
unacceptable, as is good response time with low
throughput.
Response time (in the case of interactive work) or
processing time (in the case of batch jobs or sched-

uled activities) defines how fast requests should be
processed. Acceptable response times should be
defined in each particular case. A time of 30 min-
utes could be excellent for a big batch job, but
absolutely unacceptable for accessing a Web page eight to 10 seconds.
in a customer portal. Response time depends on The service/stored procedure response-time
workload, so you must define conditions under requirement should be determined by its share in
which specific response times should be the end-to-end performance budget. In this way,
achieved; for example, a single user, average load or the worst-possible combination of all required serv-
peak load. ices, middleware and presentation layer overheads
Significant research has been done to define will provide the required time. For example, with a
what the response time should be for interactive sys- Web page with 10 drop-down boxes calling 10 sepa-
tems, mainly from two points of view: what response rate services, the response time objective for each
time is necessary to achieve optimal user’s perform- service may be 0.2 seconds to get three seconds
ance (for tasks like entering orders) and what average response time (leaving one second for net-
response time is necessary to avoid Web site aban- work, presentation and rendering).
donment (for the Internet). Most researchers Response times for each individual transaction
agreed that for most interactive applications, there vary, so use some aggregate values when specify-
is no point in making the response time faster than
one to two seconds, and it’s helpful to provide an Alexander Podelko is a software consultant currently
engaged by Oracle.
indicator (like a progress bar) if it takes more than

ing performance requirements, such as averages or per- pleted. Throughput defines the load on the system and is
centiles (for example, 90 percent of response times are less measured in operations per time period. It may be the num-
than X). Maximum/timeout times should be provided also, ber of transactions per second or the number of adjudicated
as necessary. claims per hour.
For batch jobs, remember to specify all schedule-related Defining throughput may be pretty straightforward for a
information, including frequency (how often the job will be system doing the same type of business operations all the time,
run), time window, dependency on other jobs and dependent processing orders or printing reports. It may be more difficult
jobs (and their respective time windows to see how changes in for systems with complex workloads: The ratio of different
one job may impact others). types of requests can change with the time and season.
Throughput is the rate at which incoming requests are com- It’s also important to observe how throughput varies with

GAUGING PERFORMANCE
time. For example, throughput can be tion for business systems today). to Scott Barber, even users who are
defined for a typical hour, peak hour • Users feel they are interacting freely accustomed to a sub-second response
and non-peak hour for each particular with the information (1-5 seconds): time on a client/server system are
kind of load. In some cases, you’ll need They notice the delay, but feel the happy with a three-second response
to further detail what the load is hour- computer is “working” on the com- time from a Web-based application7.
by-hour. mand. The user’s flow of thought P. Sevcik identified two key factors
The number of users doesn’t, by stays uninterrupted. impacting this threshold8: the number of
itself, define throughput. Without Miller reported this threshold as one elements viewed and the repetitiveness
defining what each user is doing and second. Using the research that was avail- of the task. The number of elements
how intensely (i.e., throughput for one able to them, several authors recom- viewed is the number of items, fields,
user), the number of users doesn’t mended that the computer should paragraphs etc. that the user looks at.
make much sense as a measure of load. respond to users within two seconds 1, 4, 5. The amount of time the user is willing to
For example, if 500 users are each run- Another research team reported that wait appears to be a function of the per-
ning one short query each minute, we with most data entry tasks, there was no ceived complexity of the request.
have throughput of 30,000 queries per advantage of having response times faster Users also interact with applications
hour. If the same 500 users are running than one second, and found a linear at a certain pace depending on how
the same queries, but only one query decrease in productivity with slower repetitive each task is. Some are highly
per hour, the throughput is 500 queries repetitive; others require the user to
per hour. So there may be the same 500 think and make choices before pro-
•
users, but a 60X difference between ceeding to the next screen. The more
loads (and at least the same difference repetitive the task, the better the
in hardware requirements for the appli- expected response time.
cation—probably more, considering That is the threshold that gives us
that not many systems achieve linear
scalability).
An animated watch response-time usability goals for most
user-interactive applications. Response
Response Times:
cursor was good times above this threshold degrade
productivity. Exact numbers depend
Review of Research on many difficult-to-formalize factors,
As long ago as 1968, Robert B. Miller’s
for more than a such as the number and types of ele-
paper “Response Time in Man- ments viewed or repetitiveness of the
Computer Conversational Transactions”
minute, and a task, but a goal of three to five seconds
described three threshold levels of is reasonable for most typical business
human attention1. J. Nielsen believes progress bar kept applications.
that Miller’s guidelines are fundamental • Users are focused on the dialog (8+
for human-computer interaction, so users waiting seconds): They keep their attention
they are still valid and not likely to on the task. Miller reported this
change with whatever technology comes until the end. threshold as 10 seconds. Anything
next 2. These three thresholds are: slower needs a proper user inter-
• Users view response time as instanta-
neous (0.1-0.2 second): They feel
that they directly manipulate
objects in the user interface; for
example, the time from the
• face (for example, a percent-done
indicator as well as a clear way for
the user to interrupt the opera-
tion). Users will probably need to
reorient themselves when they
moment the user selects a column return to the task after a delay
in a table until that column high- response times (from one to five sec- above this threshold, so productivi-
lights or the time between typing a onds)6. With problem-solving tasks, which ty suffers.
symbol and its appearance on the are more like Web interaction tasks, they
screen. Miller reported that thresh- found no reliable effect on user produc- A Closer Look At
old as 0.1 seconds. According to P. tivity up to a five-second delay. User Reactions
Bickford, 0.2 second forms the The complexity of the user interface Peter Bickford investigated user reac-
mental boundary between events and the number of elements on the tions when, after 27 almost instanta-
that seem to happen together and screen both impact thresholds. Back in neous responses, there was a two-
those that appear as echoes of each 1960s through 1980s, the terminal inter- minute wait loop for the 28th time for
other 3. face was rather simple, and a typical task the same operation. It took only 8.5
Although it’s a quite important was data entry, often one element at a seconds for half the subjects to either
threshold, it’s often beyond the reach of time. Most earlier researchers reported walk out or hit the reboot. Switching
application developers. That kind of that one to two seconds was the thresh- to a watch cursor during the wait
interaction is provided by operating sys- old to keep maximal productivity. delayed the subject’s departure for
tem, browser or interface libraries, and Modern complex user interfaces with about 20 seconds. An animated watch
usually happens on the client side with- many elements may have higher cursor was good for more than a
out interaction with servers (except for response times without adversely minute, and a progress bar kept users
dumb terminals, that is rather an excep- impacting user productivity. According waiting until the end.

GAUGING PERFORMANCE
Bickford’s results were widely used implemented to alleviate the problem. requests per minute. But here we
for setting response times requirements get an absurd situation that if we
for Web applications. C. Loosley, for Not-So-Traditional Performance improve processing time from 10
example, wrote, “In 1997, Peter Requirements to one second and keep the same
Bickford’s landmark paper, ‘Worth the While they’re considered traditional throughput, we miss our require-
Wait?’ reported research in which half and absolutely necessary for some ment because we have only two
the users abandoned Web pages after a kind of systems and environments, “concurrent” users.
wait of 8.5 seconds. Bickford’s paper some requirements are often missed To support 20 “concurrent” users
was quoted whenever Web site with a one-second response
performance was discussed, time, you really need to
and the ‘eight-second rule’
soon took on a life of its own as
a universal rule of Web site
design.”
• increase throughput 10 times
to 1,200 requests per minute.
It’s important to under-
stand what users you’re dis-
A. Bouch attempted to iden- When resource requirements are cussing: The difference
tify how long users would wait between each of these three
for pages to load 10. Users were measured as resource utilization, metrics for some systems may
presented with Web pages that be drastic. Of course, it
had predetermined delays it’s related to a particular depends heavily on the nature
ranging from two to 73 sec- of the system.
onds. While performing the hardware configuration.
task, users rated the latency Performance and Resource
•
(delay) for each page they Utilization
accessed as high, average or The number of online users
poor. Latency was defined as (the number of parallel ses-
the delay between a request for sion) looks like the best metric
a Web page and the moment for concurrency (complement-
when the page was fully rendered. The or not elaborated enough for interac- ing throughput and response time
Bouch team reported the following rat- tive distributed systems. requirements). Finding the number of
ings: Concurrency is the number of simul- concurrent users for a new system can
Good Up to 5 seconds taneous users or threads. It’s impor- be tricky, but information about real
Average From 6 to 10 seconds tant: Connected but inactive users still usage of similar systems can help to
Poor More than 10 seconds hold some resources. For example, the make the first estimate.
In a second study, when users requirement may be to support up to Resources. The amount of available
experienced a page-loading delay 300 active users, but the terminology hardware resources is usually a vari-
that was unacceptable, they pressed a used to describe the number of users able at the beginning of the design
button labeled “Increase Quality.” is somewhat vague. Typically, three process. The main groups of resources
The overall average time before metrics are used: are CPU, I/O, memory and network.
pressing the “Increase Quality” but- • Total or named users. All registered When resource requirements are
ton was 8.6 seconds. or potential users. This is a metric measured as resource utilization, it’s
In a third study, the Web pages of data the system works with. It related to a particular hardware con-
loaded incrementally with the banner also indicates the upper potential figuration. It’s a good metric when the
first, text next and graphics last. Under limit of concurrency. hardware the system will run on is
these conditions, users were much • Active or concurrent users. Users known. Often such requirements are a
more tolerant of longer latencies. The logged in at a specific moment of part of a generic policy; for example,
test subjects rated the delay as “good” time. This is the real measure of that CPU utilization should be below
with latencies up to 39 seconds, and concurrency in the sense it’s used 70 percent. Such requirements won’t
“poor” for those more than 56 seconds. here. be very useful if the system deploys on
This is the threshold that gives us • Really concurrent. Users actually different hardware configurations,
response-time usability requirements running requests at the same and especially for “off-the-shelf” soft-
for most user-interactive applications. time. While that metric looks ware.
Response times above this threshold appealing and is used quite often, When specified in absolute values,
cause users to lose focus and lead to it’s almost impossible to measure like the number of instructions to exe-
frustration. Exact numbers vary signif- and rather confusing: the num- cute or the number of I/O per trans-
icantly depending on the interface ber of “really concurrent” action (as sometimes used, for exam-
used, but it looks like response time requests depends on the process- ple, for modeling), it may be consid-
should not be more than eight to 10 ing time for this request. For ered as a performance metric of the
seconds in most cases. Still, the thresh- example, let’s assume that we got software itself, without binding it to a
old shouldn’t be applied blindly; in a requirement to support up to 20 particular hardware configuration.
many cases, significantly higher “concurrent” users. If one request In the mainframe world, MIPS was
response times may be acceptable takes 10 seconds, 20 “concurrent” often used as a metric for CPU con-
when appropriate user interface is requests mean throughput of 120 sumption, but I’m not aware of such a

GAUGING PERFORMANCE
widely used metric in the distributed The Difference Between Goals combination of circumstances. In-
systems world. And Requirements stead, specify goals (making sure that
The importance of resource-related One issue, as Barber notes, is goals versus they make sense) and only then, if
requirements will increase again with requirements11. Most response time they’re not met, make the decision
the trends of virtualization and service- “requirements” (and sometimes other about what to do with all the informa-
oriented architectures. When you kinds of performance requirements) are tion available.
depart from the “server(s) per applica- goals (and sometimes even dreams), not
tion” model, it becomes difficult to requirements: something that we want to Knowing What Metrics to Use
specify requirements as resource utiliza- achieve, but missing them won’t neces- Another question is how to specify
tion, as each application sarily prevent deploying response time requirements or goals.
will add only incremental- the system. For example, such metrics as average,
ly to resource utilization You may have both max, different kinds of percentiles and
for each service used.
Scalability is a system’s
ability to meet the per-
formance requirements as
• goals and requirements
for each of the perform-
ance metrics, but for
some metrics/systems
median can be used. Percentiles are
more typical in SLAs (service-level
agreements). For example, “99.5 per-
cent of all transactions should have a
the demand increases Using multiple ,they are so close that response time less than five seconds.”
(usually by adding hard- from the practical point While that may be sufficient for
ware). Scalability require- performance of view, you can use one. most systems, it doesn’t answer all
ments may include Still, in many cases, espe- questions. What happens with the
demand projections such metrics that only cially for response times, remaining 0.5 percent? Does this 0.5
as an increasing number there’s a big difference percent of transactions finish in six to
of users, transaction vol- together provide between goals and seven seconds or do all of them time
umes, data sizes or adding requirements (the point out? You may need to specify a combi-
new workloads. the full picture when stakeholders agree nation of requirements: for example,
From a performance that the system can’t go 80 percent below four seconds, 99.5
requirements perspective,
scalability means that you
can complicate into production with
such performance).
percent below six seconds, 99.99 per-
cent below 15 seconds (especially if we
should specify perform- For many interactive know that the difference in perform-
ance requirements not
your process. Web applications, re- ance is defined by distribution of
only for one configura- sponse time goals are two underlying data). Other examples may
tion point, but as a func-
tion, for example, of load
or data.
For example, the
• to five seconds, and
requirements may be
somewhere between eight
seconds and one minute.
be average four seconds and max 12
seconds, or average four seconds and
99 percent below 10 seconds.
Things get more complicated when
requirement may be to One approach may be there are many different types of trans-
support throughput to define both goals and actions, but a combination of per-
increase from five to 10 transactions per requirements. The problem? Require- centile-based performance and avail-
second over the next two years, with ments are very difficult to get. Even if ability metrics usually works fine for
response time degradation not more stakeholders can define performance interactive systems. While more
than 10 percent. Most scalability requirements, quite often go/no-go sophisticated metrics may be necessary
requirements I’ve seen look like “to sup- decisions are based not on the real for some systems, in most cases sophis-
port throughput increase from five to requirements, but rather on second-tier tication can make the process over-
10 transactions per second over next goals. complicated and difficult to analyze.
two years without response time degra- In addition, using multiple per- There are efforts to make an
dation”—that’s possible only with addi- formance metrics that only together objective user-satisfaction metric. One
tion of hardware resources. provide the full picture can compli- is Application Performance Index
Other contexts. It’s very difficult to cate your process. For example, you (www.Apdex.org). Apdex is a single
consider performance (and, there- may state that you have a 10-second metric of user satisfaction with the per-
fore, performance requirements) requirement and you took 15 seconds formance of enterprise applications.
without context. It depends, for exam- under full load. But what if you know The Apdex metric is a number
ple, on hardware resources provided, that this full load is the high load on between 0 and 1, where 0 means that
the volume of data operated on and the busiest day of year, that the max no users were satisfied, and 1 means all
the functionality included in the sys- load for other days falls below 10 sec- users were satisfied.
tem. So if any of that information is onds, and you see that it is CPU-con- The approach introduces three
known, it should be specified in the re- strained and may be fixed by a hard- groups of users: satisfied, tolerating
quirements. ware upgrade? and frustrated. Two major parameters
While the hardware configuration Real response time requirements are introduced: threshold response
may be determined during the design are so environment- and business- times between satisfied and tolerating
stage, the volume of data to keep is dependent that for many applications, users T, and between tolerating and
usually determined by the business it’s cruel to force people to make hard frustrated users F 12. There probably is
and should be specified. decisions in advance for each possible a relationship between T and the

GAUGING PERFORMANCE
TABLE 1: THE SEVCIK METHODS the chosen design.

For example, if we need to call 10
Web services sequentially to show the
1. Default value (the Apdex methodology 6. Controlled performance experiment
Web page with a three-second response
suggests 4 seconds) 7. Best time multiple
time, the sum of response times of each
2. Empirical data 8. Find frustration threshold F first and
Web service, the time to create the Web
3. User behavior model (number of calculate T from F (the Apdex method-
page, transfer it through the network
elements viewed/task repetitiveness) ology assumes that F = 4T)
and render it in a browser should be
4. Outside references 9. Interview stakeholders
below three seconds. That may be trans-
5. Observing the user 10. Mathematical inflection point lated into response-time requirements
of 200-250 milliseconds for each Web
response time goal and between F and saved separately. We have the same service.
the response time requirement. business requirements, but response The more we know, the more accu-
times per page and the number of rately we can apportion overall response
Where Do Performance pages per hour would be different. time to Web services. Another example of
Requirements Come From? Usability requirements (mainly technological requirements can be found
If you look at performance require- related to response times) also figure in the resource consumption re-
ments from another point of view, you into the performance equation. Many quirements. In its simplest form, CPU
can classify them into business, usabil- researchers agree that users lose focus and memory utilization should be below
ity and technological requirements. if response times are 70 percent for the chosen
Business requirements come directly more than eight to 10 hardware configuration.
from the business and may be cap- seconds, and that Business requirements
tured very early in the project life
cycle, before design starts. For a
requirement such as ”A customer rep-
resentative should enter 20 requests
response times should be
two to five seconds for
maximum productivity.
These usability consider-
• should be elaborated dur-
ing design and develop-
ment, and merge togeth-
er with usability and tech-
per hour, and the system should sup- ations may influence Small nological requirements
port up to 1,000 customer representa- design choices (such as into the final perform-
tives,” requests should be processed in using several Web pages anomalies ance requirements, which
five minutes on average, throughput instead of one). In some can be verified during
would be up to 20,000 requests per cases, usability require- from expected testing and monitored in
hour, and there could be up to 1,000 ments are linked closely production. The main
parallel sessions. to business require- behavior are reason that we separate
The main trap here is to immedi- ments; for example, these categories is to
ately link business requirements to a make sure that your sys- often signs understand where the
requirement comes from:
of bigger Is it a fundamental busi-
ness requirement or a
problems. result of a design decision
that may be changed if
necessary?
• Determining specific
performance require-
ments is another large
topic that is difficult to for-
malize. Consider the
t e m ’ s approach suggested by Sevcik for find-
response ing T, the threshold between satisfied
times aren’t and tolerating users. T is the main
worse than parameter of the Apdex (Application
response Performance Index) methodology, pro-
times of viding a single metric of user satisfac-
similar or tion with the performance of enterprise
specific design, technology or usability competitor systems. applications. Sevcik defined 10 differ-
requirement, thus limiting the num- The third category, technological ent methods (see Table 1).
ber of available design choices. If we requirements, comes from chosen design The idea is to use several (say, three)
consider a Web system, for example, and used technology. Some technologi- of these methods for the same system. If
it’s probably possible to squeeze all the cal requirements may be known from the all come to approximately the same num-
information into a single page or have beginning if some design elements are ber, they give us T. While the approach
a sequence of two dozen screens. All used, but others are derived from busi- was developed for production monitor-
information can be saved at once, or ness and usability requirements through- ing, there is definitely a strong correla-
each page of these two dozen can be out the design process and depend on tion between T and the response time

GAUGING PERFORMANCE
goal (having all users satisfied sounds as a become clear. these few failed transactions are a view
pretty good goal) and between F and the These two situations look similar, page for your largest customer, and you
response time requirement. So the but are completely different in nature: won’t be able to create an order until
approach probably can be used for get- 1.) The system is missing a require- it’s fixed?
ting response time requirements with ment, but results are consistent: This is In functional testing, as soon as you
minimal modifications. a business decision, such as a cost vs. find a problem, you usually can figure
While some specific assumptions response time tradeoff; and 2.) Results out how serious it is. This isn’t the case
like four seconds for aren’t consistent (while for performance testing: Usually you
default or the F = 4T requirements can even be have no idea what caused the observed
relationship may be ip met): This may indicate a symptoms or how serious it is, and
for argument, the ap-
proach itself conveys the
important message that
there are many ways to
• problem, but its scale
isn’t clear until investigat-
ed.
Unfortunately, this view
quite often the original explanations
turn out to be wrong.
Michael Bolton described the situa-
tion concisely 13:
determine a specific per- Usually is rarely shared by devel- As Richard Feynman said in his appen-
formance requirement, opment teams too eager to dix to the Rogers Commission Report on the
which, for validation you have finish the project, move it Challenger space shuttle accident, when
purposes, is best derived into production, and something is not what the design expected,
from several sources. no idea move on to the next proj- it’s a warning that something is wrong.
Depending on your sys- ect. Most developers “The equipment is not operating as expected,
tem, you can determine what caused aren’t very excited by the and therefore there is a danger that it can
which methods from the prospect of debugging operate with even wider deviations in this
above list (or maybe the observed code for small memory unexpected and not thoroughly understood
some others) are appli- leaks or hunting for a rare way.” When a system is in an unpredicted
cable, calculate the met-
rics and determine your
symptoms error that’s difficult to
reproduce. So the devel-
state, it’s also in an unpredictable state.
requirements.
or how opment team becomes Raising Performance
very creative in finding Consciousness
Requirements “explanations.” We need to specify performance
Verification:
serious it is. For example, growing requirements at the beginning of any
Performance vs. Bugs memory and periodic project for design and development
Requirement verification
presents another subtle
issue: how to differenti-
ate performance issues
• long-running transactions
in Java are often explained
as a garbage collection
issue. That’s false in most
(and, of course, reuse them during
performance testing and production
monitoring). While performance
requirements are often not perfect,
from functional bugs cases. Even in the few forcing stakeholders just to think
exposed under load. instances when it is true, it about performance increases the
Often, additional investigation is makes sense to tune garbage collection chances of project success.
required before you can determine and prove that the problem is gone. What exactly should be specified—
the cause of your observed results. Teams can also make fatal assump- goal vs. requirements (or both), aver-
Small anomalies from expected behav- tions, such as thinking all is fine when age vs. X percentile vs. Apdex, etc.—
ior are often signs of bigger problems, the requirements stipulate that 99 per- depends on the system and environ-
and you should at least to figure out cent of transactions should be below X ment, but all requirements should be
why you get them. seconds, and less than 1 percent of both quantitative and measurable.
When 99 percent of your response transactions fail in testing. Making requirements too complicated
times are three to five seconds (with the Well, it doesn’t look fine to me. It may hurt here. You need to find mean-
requirement of five seconds) and 1 per- may be acceptable in production over ingful goals/requirements, not invent
cent of your response times are five to time, considering network and hard- something just to satisfy a bureaucratic
eight seconds, it usually isn’t a problem. ware failures, OS crashes, etc. But if the process.
But it probably should be investigated if 1 performance test was run in a controlled If you define a performance goal as
percent fail or have strangely high environment and no hardware/OS fail- a point of reference, you can use it
response times (for example, more than ures were observed, it may be a bug. For throughout the whole development
30 seconds, with 99% three to five sec- example, it could be a functional prob- cycle and testing process, tracking
onds) in an unrestricted, isolated test lem for some combination of data. your progress from a performance
environment. When some transactions fail under engineering viewpoint. Tracing this
This isn’t due to some kind of arti- load or have very long response times in metric in production will give you valu-
future system releases. ý

ficial requirement, but is an indication the controlled environment and you able feedback that can be used for
of an anomaly in system behavior or don’t know why, you’ve got one or more
test configuration. This situation often problems.
is analyzed from a requirements point When you have an unknown prob- REFERENCES
of view, but it shouldn’t be, at least lem, why not trace it down and fix it in 1. Miller, R. B. Response Time in User-system
Conversational Transactions, In Proceedings of the
until the reasons for that behavior the controlled environment? What if AFIPS Fall Joint Computer Conference, 33, 1968.

Drink Deep or Taste Not Photo Courtesy of The Library of Congress
The Reliability Pool

By Yogananda Jeppu and Ambalal Patel
f Shakespeare were to write a play about the field of safety-critical development, where we do
I software development and testing activities,

he would perhaps call it “The Tragedy of Errors.” And with
most of our work.
Flight Control Software

his use of the language, he would no doubt receive poor “Prove it before these varlets here, thou honourable man; prove it”
marks for usability. may sound dramatic, but those words from Shakespeare’s
Nevertheless, his “Comedy of Errors” offers several
Yogananda Jeppu and Ambalal Patel are scientists at IFCS
parallels to the Tragedy of Errors that is many of today’s
Aeronautical Development Agency in Bangalore, India.
software development projects. That’s true at least in

TRAGEDY OF ERRORS
tem. The market is full of automated

FIG. 1: TYPICAL IRON BIRD SETUP
tools, but most fall short. Sometimes
the only solution is built by hand.
Flight Digital Flight The Iron Bird

Dynamic Engineering Control Friar John, go hence; Get me an iron crow,
Simulator Test Stand Computer and bring it straight unto my cell.
With With Control – “Romeo and Juliet”
Input Laws & A setup like Iron Bird isn’t unique
Output Airdata to the aerospace industry. We’ve visit-
Rack & ed automobile plants that use similar
Actual Signal setups for their traction units. What
Actuators & Processing Avionics &
these and other such systems all have
Hydraulic Pilot Station
in common is that the various embed-
Systems ded controllers test actual scenarios in
real time.
A typical Iron Bird setup consists of
actual aircraft actuators with their
associated electrical and hydraulic
“Measure for Measure” are a fact of life embedded software requires a special- setup (see Figure 1). This equipment
for the tester and the project team. ized setup to generate typical flight produces audible noise, which is easily
Proving the safety-critical flight control scenarios. The complete aircraft hard- controlled by earplugs.
software on various platforms is a taxing ware is placed in a large room con- The actuators are connected to an
task for the whole project team. The nected to custom hardware that emu- Engineering Test Stand (ETS). The
safety-critical nature of the software lates the flight conditions. This setup ETS has all the electronics to emulate
necessitates a strict discipline in the soft- is normally called the Iron Bird. a four-channel sensor environment
ware development process. What follows is a narration of our and is electronically noisy. This
Automated code generators, test experiences with automatic testing on noise—the equivalent of what remains
case generators and various low-level this platform as we offer you a look after being reduced to whatever extent
tests using automated tools verify the into the facts of the process. What is possible technologically—is the
software and ensure an error-free should be the automatic pass/fail cri- more dangerous type to applications.
development. This ends the software teria or bounds for the results in this However, it can’t be completely
verification task, which is often carried noisy (both electronically and audibly) removed and must therefore be fac-
out by an independent team. The environment? “To Pass or to Fail: That tored in and accommodated.
process of validating the software and Is the Question.” We’re sure you’ll be The Flight Dynamic Simulator
the system demands an end-to-end test quoting us after we bid adieu! (FDS) is where the aircraft is simulated
against the specifications. The test procedures detailed here in software. This unit generates the
In a flight control application, vali- can’t be found in any textbook. They sensor signals as seen during a typical
dating the quadruplex redundancy were developed by following our flight. The Avionics and Pilot Station
built into the computer and the instincts and the knowledge of our sys- enables the engineer to verify that the
messages displayed to the pilot appear
FIG. 2: AN EVALUATION TOOL correctly. It also enables the pilot to fly
the simulator while the test engineer
creates in-flight failure scenarios by
plucking wires and switching off sys-
Control tems.
Input Tolerance
Law & Output This is the final frontier for the
Generator Bound
Airdata Digital Flight Control Computer
Generator (DFCC). The two critical software
Software
components, namely Control Laws
with
and the Airdata system, get thoroughly
Tolerance
validated in the presence of simulated
Processing
errors in this platform. Any bugs trav-
elling out of this setup will be caught
flying by the pilot.
The Aircraft Is Unstable!

Events The embedded controller is a software
component that takes in the aircraft
sensor inputs and generates actuator
commands to stabilize the aircraft and
enable flight. This so-called fly-by-wire

TRAGEDY OF ERRORS
technology is seen in action in all FIG. 3: THE THREE WITCHES

modern fighter aircraft and the jumbo
jets that ferry us around the world
today.
The Airdata system computes air-
craft speed and altitude, and the con-
troller uses this information to pro-
vide uniform operation over the entire
flight envelope. Now these software
components are thoroughly tested at
various levels. But this has to be
proved in an actual environment with
The various processes act on the
induced errors. What if the wire gets
three witches as they fly through
cut? Can your software handle it?
the tube. When they come out,
These questions have to be answered they are of different sizes. The
satisfactorily. This is more like brow- three witches represent the limits
beating the software—and it really Upper, Nominal and Lower for the
gets a good beating at Iron Bird. Input and Output Signals.
Many tests are carried out at this
platform and classified into static and
dynamic tests. We’re restricting our
scope to static testing for the discus-
sion here. Thousands of static tests are
conducted on this platform to validate more problems. The pass/fail criteria defined test case. The sensors include
the embedded controller. This isn’t should take into consideration all body rates and accelerations, static
possible without automation. We give these noise factors and the fact that and dynamic pressures, pilot inputs,
the computer the pass/fail criteria by the embedded controller is going to etc. These input signals are constants,
generating automatic error bounds generate some noise of its own due to since we’re considering static tests. A
specific to each test case. Any numeri- processing. It’s essential that the width set of inputs would be to set pitch rate
cal value of the output vari- to 10 deg/s, yaw rate to 0.0
ables falling out of these deg/s and normal acceleration
•
numerical limits will cause a to 3g, etc. The set of inputs,
test to fail. selected for a test, is injected
The error bounds are gen- into a tolerance bound genera-
erated taking into considera- tor.
tion the electronic noise in
the sensor inputs and the
As the witches fly through the long Based on the particular
hardware characteristics and
actuator outputs. Here we the senor noise, bias offset and
consider all the noise in the
tube, they’re battered around gain, three values are generat-
electronic circuits from the ed representing the upper,
sensor until the point where
based on specific logic, and come nominal and lower limits of the
the software takes over at the sensor output.
input end. out in a different size. The three values of each
We also take into account sensor variable are injected
the noise in electronics from
where the software generates
the digital commands until
the actuator begins to drive.
• into the Control Law and
Airdata module, which
includes an algorithm provid-
ed by the designers. The
The noise here is from the Control Law and Airdata mod-
sensor electronics, the signal condi- of the bounds be optimal to catch ule defines the embedded controller
tioners, the linear/rotary variable dif- bugs. This requires a tool tailor-made functionality in the form of various
ferential transformers, analog-to-digi- for the application. paths interconnected, with signals get-
tal converters and digital-to-analog ting added, subtracted, multiplied or
converters. The effects of hardware A Comparison Tool divided, as the case may be.
characteristics such as offset, gain and A specialized tool has been developed Each path has controller elements
biases come into the picture here. for the test activity. We call it the such as saturation limits, nonlinear
If we inject a rate signal of 10.0 Evaluation Tool, or EVTOOL for blocks, gains and switches. Since we’re
degrees per second, we’re likely to get, short. It’s a simple name for complex testing the system in a static mode, we
say, 9.123, 11.843 or any random value software. The EVTOOL block don’t take into account the filters and
between these two bounds. The noise schematic is shown in Figure 2. An rate limiters. They’re considered as
is also dependent on the amplitude of input generator generates a set of val- unity gains for constant inputs.
the signal. Higher amplitude yields ues for all the sensors based on the Neither will we go into laplace trans-

TRAGEDY OF ERRORS
forms, etc.
FIG. 4: ADD OR SUBTRACT SIGNALS
Event triggers such as aircraft
switch inputs are injected separately.
The Control Law and Airdata system
outputs are three values of the output
Signal B
variables that define the pass/fail Signal A
bound. When the test case is executed
on the Iron Bird, it should lie within + Output Y = A+B
the upper and lower bounds of these
output values for the test case to pass.
Any anomaly is automatically declared
as a fail requiring further analysis. example, say that signal A has three In case the signals are subtracted Y =
witches (components): aL the lower (A-B), you can change the sign of b to -b
The Three Witches bound, a0 the nominal value and aU the and interchange the upper and lower
The Tolerance Bound Generator gener- upper bound. Similarly, the signal B has limits of B. A “wide band” or “narrow
ates three values of the input signal. its bounds defined by [bL , b0 , bU ]. The band” is the question; you’ll have to
Let’s say we want to test the controller output Y with its three components is decide which formula to use for your
with a pitch rate input of 10 deg/s. The computed by the following equations: specific problem.
output of the module would be some- Two signals can divide or multiply;
thing like 10.897, 10.0 and 9.456. Y = [yL, y0, yU] =[(aL + bL), (a0 + for example, Y = A/B or Y = AxB. In
Imagine these as the three witches in b0), (aU + bU)] for addition Y = A+B cases like this, a Kronecker tensor
“Macbeth” who are flying through the product is computed. Leopold
Airdata System and Control Law mod- Y = [yL, y0, yU] =[(aL - bU), (a0 - b0), Kronecker incidentally believed that
ules (as in a game of Quidditch). (aU - bL)] for subtraction Y = A-B “God made the integers, all else is the
Now imagine the Control Law and work of man.” The product, a human
work, is basically a combination of the
FORMULA 1: OUTPUT BOUNDS upper, lower and nominal values of
signal A and B, as given below.
Z = A ƒ B = [aLbL, aLb0, aLbU, a0bL,

2 2
z L1 = y0 + (x0 ⋅ (g0 − g L )) + (g0 ⋅ ( x0 − xL ))  a0b0, a0bU, aUbL, aUb0, aUbU]
yL = min {zL1 , zL2 , zU 1, zU 2 }
2 2 
z L 2 = y0 − (x0 ⋅ (g0 − g L )) + (g0 ⋅ (x0 − xL ))  Z = A ƒ (1/B) = [aL/bL, aL/b0,
y0 = x0 g0 where  aL/bU, a0/bL, a0/b0, a0/bU, aU/bL,
2 2
zU 1 = y0 + (x0 ⋅ ( gU − g0 )) + (g0 ⋅ (xU − x0 ))  aU/b0, aU/bU]
yU = max {zL1 , zL2 , zU 1 , zU 2 }
(x0 ⋅ (gU − g0 )) + (g0 ⋅ (xU − x0 )) 
2 2
zU 2 = y0 − The bounds of the output are now
defined as the minimum, nominal and
maximum of the product computed
above.
Airdata module as a factory that bangs on Please note here that for the differ- Y = [yL, y0, yU] = [min (Z), {a0b0 or
the items passing through it or stretches ence of two signals (A-B), the upper a0/b0}, max (Z)]
them with tongs. As the witches fly limit of signal B is subtracted from the
through the long tube, they’re battered lower limit of signal A to give the lower This is the absolute worst-case toler-
around based on specific logic, and come limit of the output signal. These worst- ance bound, which is quite wide.
out in a different size (see Figure 3). It’s case bounds defined above, however,
possible that the lower limit would come may give you a wider estimate of the Signal and Gain
out as the upper limit. We’ll discuss this bound, thus passing a case that should The multiplication of a gain with a sig-
in detail as we go along. have failed. This occurs even more fre- nal can be considered a multiplication
quently in cases where random noise is of two signals (see Figure 5). This gives
Do the Math present. a wider bound. An RSS approach gives a
A drop of water in the breaking gulf We find that using a Root Mean better result in such cases. Let the signal
And take unmingled that same drop again Square (RSS) representation helps in X be defined by the three components,
Without addition or diminishing. these cases. For the case Y = A+B, where and the gain G be similarly defined by
—“Comedy of Errors” the variables are defined as above, the the three components. The bounds for
In the Control Law, two signals can RSS is defined as: the output are then defined as shown in
add up or subtract, as shown in Figure Formula 1.
2 2
4. The output is equal to the sum of sig- y L = y0 − (a0 − aL ) + (b0 − bL ) 
nal A and signal B. The bounds for the  Linear Interpolation
y0 = a0 + b0 
output signals are computed from the  Nonlinear blocks are normally specified
2 2
bounds of the signals A and B. For yU = y0 + (aU − a0 ) + (bU − b0 )  in a Control Law as two sets of variables;

TRAGEDY OF ERRORS
A
say, X and Y. Each of this is a vector of
{x1, x2, x3 ..}, {y1, y2, y3 ..}, points known SHAKESPEAREAN TRAGEDY
as the breakpoints of the nonlinearities.
These nonlinear blocks can form a pos- If Shakespeare had written a play about software development and testing, the story might
open like this:
itive slope or a negative slope, as shown
in Figure 6.
ACT I, SCENE 1: A Day in the Software House
The output is dependent on the
Narrator: Diseased Nature oftentimes breaks forth: In strange eruptions. – “Henry IV”
characteristics of this slope. In case of a
Certification Agent: All is not well; I doubt some foul play. – “Hamlet”
positive slope, you’ll observe that the
Project Manager: Find out the cause of this effect, Or rather say, the cause of this defect,
lower limit of the output corresponds to
the lower limit of the input. But this For this effect defective comes by cause. – “Hamlet”
isn’t true for a negative slope, where the Prove it before these varlets here, thou honourable man; prove it –
upper limit of the output corresponds “Measure for Measure”
to the lower limit of the input. Notice Test Lead: I will prove it legitimate, sir, upon the oaths of judgment and reason. –
how the limits were interchanged, as “Twelfth Night”
indicated earlier. Tester A: O hateful error, melancholy’s child, Why dost thou show to the apt thoughts of
Notice also that the slopes shown in men: The things that are not? – “Julius Caesar”
the figures are linear. There could be two Tester B: When sorrows come, they come not single spies, But in battalions. – “Hamlet”
or three different slopes in the nonlinear Tester C: The nature of bad news infects the teller. – (“Antony and Cleopatra”
block (see Figure 7). In such cases, the Whispers in Coffee Room: What’s this, what’s this? Is it her fault or mine? The tempter,
components would have different ranges or the tempted, who sins most, ha? - “Measure for Measure”
for the output. A large slope for the Project Manager: Condemn the fault, and not the actor of it? – “Measure for Measure”
upper limit of the input could increase Test Lead: But since correction lieth in those hands, Which made the fault that we cannot
the upper limit of the output drastically. correct... – “Richard II”
This would widen the bounds or reduce Test Team: Unarm, Eros; the long day’s task is done, And we must sleep. – “Antony and
it, changing the shapes of the witches as Cleopatra”
they fly through the tube.
Just don’t get caught sleeping on the job, or your project might end up like the Ariane 5’s
Two-Dimensional Lookups Flight 501.
The “G” gain defined in the section
“Signal and Gain” section above could
be a constant or an output of a two- The two input signals to the lookup interpolated gain output Y = [yL, y0,
dimensional lookup table similar to table would be altitude and speed, yU] with tolerance bounds is comput-
Table 1. These feedback gains are used each signal having its three compo- ed as described below. We consider
in aircraft to ensure performance over nents. The gain thus computed would the set of all combinations of inputs
the flight envelope. The gains are also have a bound. In these cases we with their three components as below:
“scheduled” based on the speed and normally compute the gains for all C = {(aL, bL), (aL, b0), (aL, bU), (a0,
altitude. Refer to “Thou Shalt combinations of the two signals. bL), (a0, b0), (a0, bU), (aU, bL), (aU,
b0), (aU, bU)}
TABLE 1: A TABLE OF GAINS
We then compute the gain output
Altitude/Speed 100m 1000m 5000m 10000m using the lookup table for all these com-
0.0 2.354 4.363 3.456 3.567 binations of inputs as:
Z = {z1, z2, z3, z4, z5, z6, z7, z8, z9}
0.1 3.235 5.347 4.575 3.567
The nominal value is z5 correspon-
0.2 4.354 6.474 5.374 3.879
ding to the input combination (a0, b0).
The upper and lower bounds for the
Experiment With Thy Software” in the Consider two signals A and B as computed gains are then given by taking
June 2007 issue of Software Test & inputs to the lookup table. A = [aL, a0, the maximum and minimum of Z as:
Performance magazine for an article aU] and B = [bL, b0, bU] are defined Y = [yL, y0, yU] =[min (Z), z5, max
on the testing of these gain tables. with their tolerance bounds. The (Z)]
FIG. 5: SIGNAL MULTIPLICATION Non-Monotonic Nonlinearities

We have to take a little care when spe-
cial nonlinear blocks are present in
the path. These nonlinearities can
Output Y = X*G
Signal X
G exist as shown in Figure 8.
In the convex nonlinearity (a), we
see that the output bounds are not
around the nominal. In cases like this,
continued on page 37 >

When Your Company Resorts to Sending
Its Testing Overseas, a Quality Audit Can
Help Ensure a Winning Season

By Steve Rabin and John Bradway
utting costs is the goal of every company. And software testing is a practice that
C has been ripe for the offshore harvest. Given the inherent risks of using offshore
resources for testing and adapting offshore teams The audit as described throughout this docu-
to agile practices, it’s important to ensure that the ment consists of five phases:
best practices of the onshore team are being repli- Pre-assessment planning. This phase includes
cated elsewhere. setting expectations, creating a timeline and
Through an analysis of the approach used by getting executive sponsorship for the project
one company—Primavera Systems—to align soft- audit. The deliverable for this phase is agree-
ware quality efforts around the globe, you can ment with and buy-in of the audit process and
learn how to perform a quality audit that is equal- a follow-up commitment for improvement.
ly applicable to teams inside the four walls of an Typically this is capped by a meeting with the
organization. audit sponsor and key stakeholders on the
We considered Primavera a good case study audit process and objectives.
because it has internal development centers in the Data gathering. This phase involves developing
U.S. (Bala Cynwyd, Pa. and San Francisco), Israel interview questions and surveys and gathering all
and London, as well as offshore development/QA documentation (bug reports, test cases) prior to
centers in India and Eastern Europe. the interview process.
The idea behind a quality audit is to perform a Assessment. The assessment phase involves con-
systematic examination of the practices a team ducting interviews and developing preliminary
uses to build and validate software. This is impor- findings. Confidentiality is crucial, and team
tant, as issues of quality are represented across the members must clearly understand the process. A
software development life cycle. The audit aims meeting explaining the process and the reasons
not to uncover software defects, but to understand behind it should be held with the entire team.
how well a team comprehends and executes the Post audit. After reviewing documents and
defined quality practices. interview notes, the analyzed information is syn-
The quality audit can be used to assess if a thesized into a list of findings and prioritized
process is working and if things are being done remediation steps.
the way they’re supposed to be. The audit is also Presentation of findings with sponsor and team.
an excellent way of measuring the effectiveness of Findings are presented and agreement is reached
implemented procedures. Management can use on highest-priority improvement areas.
audit results as a tool to identify weaknesses, risks In Primavera’s case, the quality audit and the
and areas of improvement. entire software development quality management
The audit is as much about people as it is about life cycle are tied to the ISO 9001:2000 standard.
the procedures in place. The more team members There are a variety of reasons for this, as
understand their roles and how that relates to explained below.
quality, the more likely the team will grasp and
adhere to the defined practices. The ultimate Aligning Scrum With ISO 9001
objective, of course, is to deliver high-quality soft- Primavera Systems, which has practiced Scrum
ware (as defined by the organization). since June 2003, found itself with an interesting
Quality audits are typically performed at regu- dilemma in the beginning of 2005. Increasingly,
lar intervals. The initial audit develops a quality perspective customers were inquiring about the
procedure baseline and determines areas that ISO certification level of Primavera’s development
need remediation. Subsequent audits are moni- processes. In fact, quality auditing is an important
toring-oriented to help teams identify and address element in ISO’s quality system standard. The
gaps in the quality process. Remediation involves most recent ISO standard has shifted the focus of
articulating effective actions that address deficien- audits from procedural adherence only to meas-
cies in the daily process of conducting quality uring the effectiveness of quality practices to deliv-
practices. ered results.
A quality audit can be aided by the use of soft- While this makes sense, implementing and
ware tools but, as stated above, it’s as much about assessing the usefulness of ISO in an agile envi-
people mentoring and teaching as anything else. ronment is a challenge. After all, the Agile
Any team, offshore or otherwise, will best respond Manifesto declares “working software over com-
Photos by Stefan Klein
to an approach that focuses on helping individu- prehensive documentation” and “people

als meet expectations and generate improvement. and interactions over process
and tools.” Many
Steve Rabin is CTO at Insight Venture Partners. John
Bradway is development manager at Primavera Systems.
JANUARY 2008
OFFSHORE PLAYBOOK
release management. In other words, the The team decided it was more impor-
documentation described Primavera’s tant to audit the quality practices them-
agile methods across development selves vs. the quality of the required arti-
domains. facts. While both are important, the qual-
During the same time period, ity of the artifacts can be improved
Primavera was engaged in creating two through training and education, but if
offshore development Scrum teams in the processes aren’t in place, not much
Bangalore. This emerging documenta- can be done. This approach is less threat-
tion, which articulated development ening to the teams being audited if the
processes, proved to be a highly useful quality of the work is not the primary
resource when ramping up the offshore focus.
team. Since the offshore team was unfa- The audit is meant to be foremost a
miliar with Scrum methodologies in gen- fact-finding and learning experience.
eral and Primavera’s practices in particu- With a baseline in place, the ongoing
lar, being pointed to relevant sections of review process can look at the quality of
the documentation helped orient them the artifacts and determine ways to help
agile thought leaders don’t consider ISO and made them productive more quick- the team make improvements. The team
a priority. ly. It also eased communication, since was most interested in objectively look-
Primavera addressed this issue inter- everyone was using the same practices, ing for evidence that the quality process
nally by focusing the team on common metrics and terminology. was being followed.
core objectives and the firm’s mission to Throughout the process, the team
deliver the best possible quality software. Preparing for the Audit made sure to consult with management,
The firm sells software to highly regulat- The quality policy described in the qual- initially to acquire and maintain leader-
ed markets, so the need to support cus- ity management system called for annual ship buy-in. Meetings were held with
tomer requirements, including ISO, is an audits of all quality processes, on- and development managers and executives,
important consideration. Bottom line is offshore. Regardless of the documenta- both on- and offshore, to discuss the
that the quality audit made sense and tion, development management consid- audit and its goals and to solicit input.
provided the opportunity to better align ered it important to The team discovered
the on- and offshore teams. ensure that the quality that the word audit evoked
As a first step, Primavera engaged an practices being used off- some uncomfortable
outside consultant to assess the current
software development life cycle and
information technology procedures as
they relate to alignment with ISO
shore duplicated those
being used at home.
Better to address quality
issues before there’s an
• responses in offshore
teams. Because of this, it
was important to set the
context for the audit as an
9001:2000 standards. Associated with this actual problem.
What exactly exercise in self-improve-
was the delivery of a gap analysis between Given the number of ment and as a retrospective
the current SDLC and the standards. development locations,
did the team tool to help drive positive
The results were a little surprising. the audit team needed to change. Positioning the
While Primavera wasn’t producing all of develop a repeatable want to measure, audit in this light helped
the documentation to meet the letter of process for the project ensure the team’s coopera-
the law as it relates to the ISO standard, audit. For example, what what data was tion and active participa-
existing processes were highly aligned exactly did the team want tion. Since agile methods
with the spirit and intent of the standard. to measure, what data was needed, and how generally provide a high
The consultant felt that by organizing needed, how would the degree of transparency and
many of the existing artifacts into a qual- data be collected and what would it be are constantly being
ity management system and creating a rules would be used to refined, the teams were
limited amount of additional documen- generate the metrics? collected? comfortable with (and
tation, the team could safely declare, Also, would it be necessary appreciated) the focus.
•
“We’re aligned with the ISO 9001:2000 to normalize the data? With management sup-
standard” and provide enough docu- The team developed a port and the offshore
mentation to back this up. checklist to qualitatively team’s understanding of
An important side benefit was the measure how well the off- the context of the audit,
ability to use the assembled information shore teams were con- the audit team began gath-
as a valuable reference for developers forming to agile practices. ering and examining some
and a great resource for helping new Audit criteria were selected from various of the related artifacts, documents that
employees ramp up. The documents, sections of the quality manual along with had been delivered by the offshore team
created by the consultant, described key input from members representing multi- over time. The idea was to look at exist-
processes used in the software develop- ple domains. This resulted in a 43-point ing requirements and attempt to trace
ment life cycle: software testing, configu- audit covering requirements manage- them through the quality process. In
ration management, defect tracking, ment, design and implementation, con- total, this involved looking for the code,
internationalization, product mainte- figuration management, testing, defect test plans, automation and test results—
nance, requirements management and tracking and release management. both automated and manual, related to

OFFSHORE PLAYBOOK
the delivered requirement. satisfy the criterion JUnit and FitNesse tests are run. In
There were several reasons for doing This scheme must be used with some the event of a failure, an e-mail noti-
this. First, it was valuable to gain an caution and is provided only as a guide- fication is sent and the merge is
understanding of how transparent the line. The reason for this is simply that rejected and deferred to the next
process was, based on the exchange of auditors interpret things differently, day.
materials from teams distributed around making it difficult to determine the pre- • FitNesse tests run automatically with
the world. This was also a learning exercise meaning of any particular score. A daily builds
cise to become familiar with the team’s degree of subjectivity occurs during the
work processes and communication style. audit process that must be taken into Testing
The team was able to identify a variety of account along with the objective meas- • Acceptance tests cover require-
specific issues that required detailed ures. ments and are automated
examination during the on-site audit. The audit criteria are listed below. • Test procedures documented in
This is not the full, detailed spreadsheet Mercury Quality Center
The Audit Checklist And used by auditors on-site, but rather the • Developer unit tests written
Evaluation Scheme higher-level categories. Keep in mind • Automated Silk tests run to validate
The audit checklist was developed to that this checklist was developed to code integration
measure how effectively a team has address the specific needs of Primavera • Test cases and test results can be
implemented and follows the quality and the objectives of the audit. The traced to requirements
guidelines. There are a number of ways terms used below along with the refer- • Internationalization cases consid-
Primavera used the results of the audit, enced tools are also specific to ered during testing
so the checklist had to be synced to the Primavera’s use of agile practices. • Performance testing
objectives. This included: • Test results records stored
To baseline the current state of devel- Requirements management • Test cases peer reviewed
opment within an organization prior to • Primavera PM used to track require- • Active system tests conducted
introducing Primavera’s development ments
process. • Collaboration with Product Owner Defect tracking
To monitor the progress of adoption • Requirements estimated using • In-process defects “scrummed”
of Primavera’s documented Quality accepted techniques (Ideal Team appropriately
Management System and identify areas Days) • Defect reporting using Mercury
where the implemented practices were • Sprint 0 to refine estimates Quality Center
satisfactory as well as areas that could be • Sprint -1 to determine initial • Defect threshold counts used for
improved. requirement estimates sprint entry/exit criteria
The criteria used in the audit are
grouped under the following high-level Design and implementation Release management
headings: • Feature specifications (created as • Sprint review meetings held
• Requirements management design documents and updated “as • Release management team reviews
• Design and implementation built” when requirement is complet- • Sprint retrospectives
• Configuration management ed) • Sprint closeout processes (backlogs
• Testing • Code tested and passes unit tests updated)
• Defect tracking prior to check-in • Tracking progress with burn-downs
• Release management • Formal code reviews requested by • Attend Scrum Master meetings
While the overall goal of conducting programming manager for appro- • Daily team meetings
a project audit is to provide a qualitative priately complex areas • Product Owner sets sprint priorities
view of the suitability of and adherence • Peer or buddy review of code for • Co-located teams
to the development process, a quantita- code check-ins • Obstacles removed daily
tive view is also necessary. For example, • Requesting schema changes • Sprint planning meetings
the auditor conducting the evaluation through schema change process • Task granularity (e.g., no more than
may find it useful to flag certain criteria • Technical designs where appropri- 16 hrs.)
or results that he feels the need to ate
emphasize. Similarly, the team being • Designs and coding consider inter- Performing the Audit
audited may have certain concerns that nationalization It’s worth noting that, except for the
they want to have the auditor scrutinize. • Online help and printed manuals largest organizations, it’s not necessary
The Primavera audit team adopted updated for new requirements or desirable to employ specific auditors.
the following scheme for measuring It’s best to choose auditors from within
each of the 43 audit points: Configuration management the team and across disciplines. In fact,
1.00 If the process fully satisfies the • Builds automated there are good arguments for cycling the
criterion • Builds automatically deployed twice auditing team so many resources get the
0.75 If the testing process largely satisfies daily opportunity to interact with their peers
the criterion • Automated process replicates all and view the quality process from a dif-
0.50 If the testing process partially client server, Web and Group Server ferent perspective. Auditor training is
satisfies the criterion code to all four ClearCase servers recommended so that the results can be
0 If the testing process does not • After a compile of the merged code, as normalized as possible.

OFFSHORE PLAYBOOK
By choosing a broad spectrum of good benefit. perform defined practices consistently,

auditors from all levels of the organiza- At the conclusion of the audit, a and it should be noted when this is not
tion, you’ll ensure that everyone has meeting was held with the entire staff the case.
commitment to the Quality System and to review the preliminary results of the Bear in mind that the audit has two
gain a better understanding of the over- audit prior to presenting the results to modes of operation: appraisal and analy-
all management process. It’s also a good management. In each audit, each line sis. Both of these modes involve a combi-
way for people from various areas to gain item that did not receive a 1 was ana- nation of subjective and objective influ-
an understanding of how their depart- lyzed, and a series of next steps was dis- ences. Appraisal generally involves
ment fits into the organization. cussed. A final remediation plan was resource issues, while analysis is more
The audit can be done formally or developed and approved based on the procedure-oriented.
informally, with every resource or select- conclusions drawn from analyzing all Identifying issues across both of these
ed resources. The format is based on of the results. spectrums is an important audit skill and
the objectives, and in needs to be taken into account
Primavera’s case ,the approach and documented. After all, the
chosen was informal. Four
developers, two business ana-
lysts and the entire QA staff par-
ticipated in the first offshore
• remediation plan involves
changes to both how people act
and what the procedure accom-
plishes.
audit. The team met with these Audit findings need to be The results of the audit
individuals over a two-day peri- along with associated docu-
od focused on looking at indi- documented, and problems reported mentation were added to
vidual processes in addition to Primavera’s quality manage-
how requirements traced for further action. ment repository. A meeting was
through the different artifacts. held to discuss the results and
•
The mantra for the audit was remediation plan with the
not only “tell me,” but “show same team that met to kick off
me.” Listening to individuals the audit process; this included
describe the process and what stakeholders and management.
was done is important, but so is As in the on-site review, each
viewing the actual artifacts. Since a qual- Wrapping Up the Audit area that didn’t score a 1 was dis-
ity process is interconnected across the Audit findings need to be documented cussed.
development life cycle, time was spent and problems reported for further
looking at related, adjacent links to action. A date should be established Lessons Learned
other areas of the process. for the correction, and the next audit Setting the stage for the audit as an in-
It was important to set and then man- should ensure that issues were reme- depth retrospective aligned with the
age expectations when reviewing peo- died. The audit documentation does- agile goal of continuous improvement
ple’s work. Reminding everyone that the n’t need to be complicated, but should helped Primavera secure individual buy-
focus was on improvement helped us include the audit plan, the audit notes in for the audit.
achieve a balanced auditor/auditee envi- and the audit report. The notes are the Since it seemed that people weren’t
ronment. items the auditor wrote down during prepared for the “show me” mentality
Making sure that all participants gain the audit, and can include specific employed during the interview process,
value from the experience is also impor- findings, responses to questions, key better communication and expectation
tant. Everyone involved learned some- documents reviewed and comments. settings with the team makes sense.
thing new about the processes in use, the The audit report is the “official” doc- Discussing the style to be used during the
rationale behind them and how provid- ument used to report the findings of the audit also helps make those involved
ing traceability adds value. Being able to audit. A template for this document more comfortable.
take a feature and trace it back through should be prepared by the audit team, The scoring system is a work in
test results, automation, unit tests and amended as necessary and consistently process and will be improved over time.
requirements makes obtaining a detailed used by all auditors. The document It remains subjective, which is accept-
understanding of a feature much easier should include details of the audit, date, able, but the addition of a weighting sys-
than walking in cold. auditors’ names and findings. Once tem is under consideration.
Teams shouldn’t expect to achieve agreed to, the audit report should The use of weighting for areas that
a perfect score, so this is another area include the remediation plan. are fundamental to the development
that requires expectation manage- The process audit is concerned with process will provide better results.
ment. In Primavera’s case, the teams both the validity and overall reliability of Since each area of the quality manage-
fared very well. Most of the shortcom- the process. For example, does the ment process isn’t equal in impor-
ings were not unexpected, since the process consistently produce the intend- tance, weighting will expose those
on- and offshore teams worked closely ed results? It’s important to identify non- areas of concern more visibly.
together on a daily basis. Interestingly, value-added steps that the team may The bottom line is that the entire
several valuable insights into other have added to the process. Once identi- development organization now realizes
the goal of continuous improvement. ý

areas that needed improvement were fied, this needs to be documented. The the value this kind of audit can bring to
exposed. This was unexpected, but a team should demonstrate the ability to

Best Prac t ices
Unit Testing Tools Unlikely

To Transform Life or Industry
Count me among those rupt-driven work and home their attitudes about testing,” says
skeptical that unit testing lives that most of us lead. Savoia, founder and chief technology
will ever go mainstream. And, software traditionalists officer of Agitar Software in Mountain
Granted, I don’t write be damned, I suspect the View, Calif.
code. However, I do churn same can be said of most Savoia is worth listening to on the sub-
out a goodly amount of code that’s written in today’s ject of testing and software generally. He
tech-related prose. And fast-changing, forever-itera- has contributed to products that have
like most science and tech- tive world of development. won a bevy of awards, including JavaOne’s
nology journalists I know, I At the risk of being too Duke Award and Software Development
more or less loathe the solipsistic, dear reader, you magazine’s Productivity Award. And his
process of revising and should know that most résumé includes a list of senior technical
Geoff Koch
editing what I write. months I find myself scram- positions at Sun and Google to go along
Why is this relevant? Because the bling to file this column with some sem- with that cherished Silicon Valley merit
wrestling that journalists do with para- blance of respect for the deadline. badge: starting and then selling a compa-
graphs, sentences and individual words (Though for this particular screed, I ny for a minor fortune. Savoia’s load test-
is the rough analog to unit testing in failed miserably.) Then, a day or two after ing company Velogic, founded in 1998,
software development. dashing this off to the Long Island–based was sold to Keynote just two years later for
It’s been said that all journalists are editors, I invariably think of several ways $50 million.
frustrated novelists. While I don’t have an in which I could have more adroitly made Beyond his experience, Savoia brings
unfinished manuscript in my desk drawer, my point. I’m guessing the feeling is quite one of the most candid and engaging
I am familiar with traditional lore about close to those experienced by the vast personalities in the software industry to
great writers. Namely, most of them spend majority of developers when the code an interview. It’s clear after just a few
the lion’s share of their time redoing or base they’ve labored to contribute to minutes that the guy is having fun and
outright ripping up their drafts and start- finally goes from beta into production. has a well-developed sense of humor
ing anew. Here’s former U.S. Poet about the often stilted and obtuse world
Laureate Billy Collins on the subject, from What Would Alberto Do? of testing and development. One exam-
his poem “Royal Aristocrat.” While reporting this column, I heard ple of this is the chosen moniker,
“I was a single monkey / trying to from many people with unit testing Crap4j, of the open source project
type out the opening lines of my own experience, and almost all of them Savoia spawned with Bob Evans. The
Hamlet / but often doing nothing more made the same two points. The first is tool helps developers find code that’s
than ironing pieces of paper in the plat- that unit testing undeniably makes crap to maintain, either because it’s too
en / then wrinkling them into balls to sense. It’s cheaper and more efficient, complex, doesn’t have enough associat-
flick into the wicker basket.” the pros say, to have developers do their ed tests or both.
But folks, journalism ain’t poetry or own testing and to automate the process I can’t help but think as Savoia
any other type of literature, though I did of running a steadily growing library of enthusiastically describes his career that
have a professor at Stanford who argued these tests. The second is that the ranks in many ways, he’s a case study as to why
as much. (No doubt he had that unfin- of developers and teams that actually unit testing will never reach its tipping
ished manuscript in his desk drawer.) practice unit testing are modest indeed, point. We don’t talk specific numbers,
Journalism is much more akin to the if not vanishingly small. but clearly that résumé of his implies an
process of writing code than to anything In his Jan. 24, 2007, blog entry enviable degree of financial security. So
in the so-called higher literary arts. A “Testing Genes, Test Infection, and the why start yet another company in the
journalistic article is quick, approximate Future of Developer Testing,” Alberto ultra-niche testing market? Because, he
and often destined to have a shelf life— Savoia declares that developers who explains, he loves this stuff.
particularly when the subject is technolo- resist unit testing far outnumber those In the hyper-logical, left-brained fash-
gy—measured in weeks or even days who practice the craft, zealously or oth-
rather than years. Generally, the ideal of erwise. “This spells trouble because I Seen any good poetry related to programming?
Best Practices columnist Geoff Koch wants to
continuous backspacing, tweaking and have found that when developers
know: gkoch at stanfordalumni dot org.
polishing is forever yielding to the inter- become managers, they bring with them

Best
Practices
ion that typifies most successful techies, ting the jackpot with a best-selling book “Simplicity is essential to a core unit
Savoia says we’re all constantly faced by and then opting to stay at the rewrite testing framework; for this reason, the
a more or less binary set of choices desk for years thereafter—for fun. I’ve most successful unit test frameworks
about our lives that can be described by interviewed enough developers to sus- tend to stay small, in code size and in
the equation X + Y = total well-being, pect that even among the I-dream-in- team size,” says David Saff, an MIT doc-
financial and otherwise. X, usually a code crowd, Savoia is an outlier. toral student who is now one of the lead
responsible position at an established maintainers of JUnit. “In comparison to
company, is associated with guaranteed Limited Commercial Appeal other successful open source projects,
income. Y is associated with choice More concrete reason for skepticism this promotes a proliferation of third-
imbued with those elusive qualities of comes from several of the other inter- party extensions, while limiting contri-
emotional engagement, satisfaction of viewees for this column, including one butions to the core framework. I think
intellectual curiosity and overall happi- at industry giant Cisco Systems. this somewhat limits the chance for a
ness. Once financial security is achieved, “Right now Cisco is in a latency curve company to arise that would do the
he continues, it always makes sense to with unit testing,” says Andy Chessin, a equivalent for, say, JUnit, that Covalent
pursue Y when confronted with an technical leader at the San Jose, Calif.- has done for Apache.”
either-or choice about what to do next. based company. “Unit testing has a In other words, unit testing is des-
The logic, which is unassailable, is huge barrier to entry. If the group real- tined to remain by and for a modest
not what’s interesting here. Rather, it’s ly isn’t passionate about it, doesn’t group of undoubtedly smart specialists.
that for Savoia, happiness is more about understand it or doesn’t have the time So while the worlds of writing and pro-
mucking about testing lines of code to get started with it, they probably gramming will always have a small
than, say, fishing in Montana, going to won’t be driven to take the plunge.” priesthood obsessed with quality and
cooking school in the south of France, “Right now, few people really under- constant revision, most of the rest of us
or in my case, buying courtside seats stand what [API-level] unit testing will continue to muddle along as best
and losing myself in Big 10 basketball involves or how to get started with it,” he we can—at least until we can hang up
tors for other pursuits. ý

from November to March. Savoia’s says. “But I think they will soon catch on.” our reporter’s notebooks and text edi-
choice is sort of akin to a journalist hit- Maybe, though I wouldn’t bet on it.
Index to Advertisers
Advertiser URL Page
Automated QA www.testcomplete.com/stp 11
Checkpoint Technologies www.checkpointech.com/BuildIT 6
FutureTest 2008 www.futuretest.net 2-3
Hewlett-Packard www.hp.com/go/securitysoftware 40
iTKO www.itko.com/lisa 8
Seapine www.seapine.com/ttstech 4
Software Test & Performance www.stpmag.com 36
Software Test & Performance www.stpcon.com 39

Conference Spring 2008
Software Test & Performance www.stpmag.com 17

White Papers

TRAGEDY OF ERRORS
< continued from page 29 FIG. 6: POSITIVES AND NEGATIVES

we must take care to see that the apex
point is taken for computing the
bounds. It’s possible that the convex
nonlinearity may be a curve instead of Y0 Y0
the inverted V shape shown here. In yU yU
such cases, we compute the apex point
y0 y0
as the maximum value that is possible
yL
in the region of interest. In case of yL
convex nonlinearity, the apex point
xL , x0 , xU X0 xL , x0 , xU X0
defines the upper bound. But in case
of the concave nonlinearity (b), the
apex point defines the lower limit.
We have had numerous problems Positive slope Negative slope
with such blocks, with cases failing
when they should have passed. So take
care of these blocks! Let’s say the upper and lower bounds For example, let’s say that Trigger 1
are 62.4 and 57.5, respectively, as given is set to True based on a speed signal of
Tragedy of an Error by the software module. The upper less than 70.0 km/h. When True, this
The disastrous failure and tragic self- bound is taken instead as 66.0, and the trigger further sets the output value of a
destruction of the European Ariane 5 lower bound is retained at 57.5 deg/s. second system to 13.0 (a constant), and
expendable launch system was the when False, it computes a value based
result of a software bug. During Flight FIG. 7: NON-LINEAR SLOPES on an algorithm.
501, its maiden voyage, the data con- Let the signal value X = [xL, x0, xU] =
version of a 64-bit floating point num- [69.5, 71.3, 73]. Then Trigger 1 will take
ber was too large to be represented by Y0 the values [1, 0, 0]. This will cause the sec-
a 16-bit signed integer value. This out- Slope=S2 ond system to have the values [13.0, 4.0,
of-range value caused a hardware yU 5.35]. The lower bound is 13.0 as com-
exception, resulting in total system fail- y0 Slope= S1 pared to, say, 3.23 if the trigger was set
ure. In India, we too have had a space yL based only on the nominal value. The
vehicle fail due to out-of-range value. X0 upper bound is 5.35. This would cause a
Because of these experiences, we X=[xL , x0 , xU ] great deal of confusion indeed!
now test our systems with 10 percent
higher values than the maximum possi- Piecewise Linear (S1 < S2) Avoid Tragedy
ble value of any signal. In such cases, if Software validation is always associated
the nominal value considered is the with system validation, the most impor-
maximum value, the upper bound is set tant issue being whether the software
to 10 percent higher than the maxi- Special Events has been designed to cope with system
mum value. The lower bound is as given For events triggered by a particular sig- failures. Proof must be provided for
by the sensors. In case of the minimum nal, the trigger is set based only on the this in any safety-critical application,
possible value selected as the nominal, nominal value. This is important and test cases must be designed to
10 percent lower value is taken as the because when a trigger further triggers cover these aspects. Iron Bird is the
lower bound. For example, if the maxi- the output of a second system, we can final testbed, and maintaining the soft-
mum possible pitch rate signal is 60.0 have problems if we don’t set the ware at this level is difficult.
deg/s, this is selected as the nominal. events based on the nominal value. Always experiment with your sys-
tem. The techniques detailed here
FIG. 8: NON-MONOTONICS have evolved after several experi-
ments in the initial phase of the
project. We modeled the system char-
acteristics to develop the EVTOOL
software. This has paid rich dividends
and continues to do so as new ver-
sions keep coming.
All the effort we put in results in
something fruitful. The challenge of
finding errors in the system gives the
tester a pleasant feeling that is
described beautifully by Shakespeare
in “Much Ado About Nothing”:
a. Convex Nonlinearity a. Concave Nonlinearity The pleasant’st angling is to see the fish
Cut with her golden oars the silver stream,
And greedily devour the treacherous bait. ý

Future
Future Test
Test
Get Development
out of sync with the application.
Whenever that happens, the developers
need to spend just a few minutes review-
ing any test failures reported for their
code, and then either updating the test
In Sync With QA cases (if the test failed because the func-
tionality changed intentionally) or fix-
ing the code (if the test failed because a
modification introduced a defect).
In my humble opinion, this is the
Development is writing implemented functionality only way that QA can be automated.
code for the next release of actually works. This has a
the application. QA has a couple of implications. Looking Back, Looking Ahead
regression test suite that One is that team leaders Having been in this industry for 20 years
tests the previous release, must allocate sufficient now, I’ve witnessed many changes.
but they’re waiting for the budget and time for regres- Languages have come and gone, the
end of the current devel- sion test suite development level of programming abstraction has
opment iteration before and maintenance. I’ve elevated, and development processes
they start to update the test found that the best results have grown increasingly compressed
suite to cover the new and are achieved when there’s and iterative. Yet, from assembly code to
modified functionality. As roughly a 50/50 distribu- SOA, one thing remains the same: the
Adam Kolawa
a result, the code base is tion of effort between writ- need for an effective, reliable way to
evolving significantly during the devel- ing code that represents the functional- determine if code changes negatively
opment phase, but the regression test ity of the application and writing code impact application functionality.
suite is not… so by the time that QA that verifies that functionality. One way of helping organizations
receives the code from development, The other implication is that the overcome this challenge is to provide
the code and the regression test suite team needs to modify their workflow so technologies that enable development
are totally out of sync. that QA works in parallel with develop- teams to quickly build a regression test
Once QA has the new version in ment: updating the regression test suite suite that includes unit tests, static
hand, they try to run the old regres- as the developers update the code base. analysis tests, load tests and anything
sion test suite against the new version To achieve this, QA must become more else that can be used to identify
of the application. It runs, but an over- tightly integrated with development. To changes. Our goal here is to help teams
whelming number of test case failures start, development has to build test identify and evaluate functionality
are reported because the code has cases as they write code. This means that changes with as little effort as possible
changed so much. the team needs to define and enforce a so that keeping the test suite in sync
At that point, QA often thinks, policy that every time development with the application is not an over-
“Instead of trying to modify these test implements a feature or use case, they whelming chore.
cases or test scripts for the new version, add a test case to check that it’s func- The other part is to optimize the
we might as well go ahead and test it by tioning correctly. development “production line” to sup-
hand because it’s the same amount of The role of QA, then, is to review port these efforts. This involves imple-
work, and even if I update it now, I’ll still these test cases as soon as they’re writ- menting infrastructure components
have to update it all over again for the ten, as part of the code review proce- (including source control systems,
next version.” So they end up testing by dure. Their goal here is to verify nightly build systems, bug tracking sys-
hand, and typically come to the conclu- whether the test case actually repre- tems, requirements management sys-
sion that automation is overrated. sents the use case that is implemented tems, reporting systems and regression
That’s how automation goes to hell in the code. If QA and development testing systems) to suit the organiza-
in QA. work together in this manner, the test tion’s existing workflow, linking these
suite is constantly updated in sync with components together to establish a fully
Keep in Sync the application. automated building/testing/reporting
As a result of the divide between QA To keep this up, you need to ensure process, and mentoring the organiza-
and development, the regression test that the test suite is constantly tested tion on how to leverage this infrastruc-
suite is treated as an afterthought. To against the application. This means it ture for process improvement. The
software defects. ý
keep the test suite in sync with the must become part of the nightly build result is greater productivity and fewer
code, the team needs to treat the and test process. Every night, after the
regression test suite like it’s a key part application is built, the regression test
Dr. Adam Kolawa is founder and CEO of
of the application—the part of the suite executes. If test failures are report-
Parasoft.
application that verifies whether the ed, then the test suite might be growing

April 15-17, 2008
San Mateo Marriott
San Mateo, CA
SPRING
SUPERB SPEAKERS!
Michael Bolton • Jeff Feldstein TERRIFIC TOPICS!
Michael Hackett • Jeff Johnson • Agile Testing • Test Automation
Bj Rollison • Rob Sabourin • UI Evaluation • Java Testing

Mary Sweeney • Robert Walsh • Security Testing • Testing Techniques
AND DOZENS MORE! • Improving Web Application Performance
• Optimizing the Software Quality Process
Register by Jan. 25 To Get • Developing Quality Metrics
The eXtreme Early-Bird Rate! • Testing SOA Applications
www.stpcon.com • Charting Performance Results
• Managing Test Teams

A L T E R N A T I V E T H I N K I N G A B O U T A P P L I C A T I O N S E C U R I T Y:
Hone Your Threat Detection (To A Telepathic Level).

Alternative thinking is attacking your own Web applications, finding
vulnerabilities and destroying them with precision and vengeance—
throughout the life of the application.
It’s looking at application security through the eyes of a hacker

to identify threats to your system and risks to your business.
It’s harnessing the power of SPI Dynamics, recently acquired by HP,

to redefine and expand your security abilities. (Please note: positive
effects on your bottom line.)
It’s assessing security the right way, from development to QA

to operations—without slowing down the business.
(Cue elated cheers.)
Technology for better business outcomes. hp.com/go/securitysoftware
©2008 Hewlett-Packard Development Company, L.P.

093 - STP 2008 01

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

093 - STP 2008 01

Загружено:

Авторское право:

Доступные форматы

A Publication

Save $300 With Our

February 26–27, 2008

Rex Black on ROI

Stretch your mind at

with software testing

You Can Gauge

Let Not Your Depar t ments

To bypass critical errors in testing, learn

10 • Out of the Box

JANUARY 2008 www.stpmag.com • 5

Copy Editor Contributing Editor

staring you right in the face. ý

JANUARY 2008 www.stpmag.com • 7

BRIAN CARROLL leads the Application Lifecycle Framework

ALEXANDER PODELKO’s engagements as an application

STEVE RABIN has more than 20

TO CONTACT AN AUTHOR, please send e-mail to feedback@bzmedia.com.

8 • Software Test & Performance JANUARY 2008

humans when it comes to the creative

JANUARY 2008 www.stpmag.com • 9

Hoping to Stir The

Coverity’s Prevent SQS Now Detects Race

10 • Software Test & Performance JANUARY 2008

Here’s Your Close-Up:

Get Ready for ALM 2.0

12 • Software Test & Performance

JANUARY 2008 www.stpmag.com • 13

14 • Software Test & Performance JANUARY 2008

inclusive and, therefore, will also need

JANUARY 2008 www.stpmag.com • 15

was scattered in silos maintained by

Easier Tool Changes

16 • Software Test & Performance JANUARY 2008

efining performance requirements is

processing time (in the case of batch jobs or sched-

18 • Software Test & Performance JANUARY 2008

JANUARY 2008 www.stpmag.com • 19

20 • Software Test & Performance JANUARY 2008

JANUARY 2008 www.stpmag.com • 21

22 • Software Test & Performance JANUARY 2008

TABLE 1: THE SEVCIK METHODS the chosen design.

JANUARY 2008 www.stpmag.com • 23

future system releases. ý

24 • Software Test & Performance JANUARY 2008

The Reliability Pool

I software development and testing activities,

Flight Control Software

JANUARY 2008 www.stpmag.com • 25

tem. The market is full of automated

Flight Digital Flight The Iron Bird

The Aircraft Is Unstable!

26 • Software Test & Performance JANUARY 2008

technology is seen in action in all FIG. 3: THE THREE WITCHES

JANUARY 2008 www.stpmag.com • 27

Z = A ƒ B = [aLbL, aLb0, aLbU, a0bL,

28 • Software Test & Performance JANUARY 2008

FIG. 5: SIGNAL MULTIPLICATION Non-Monotonic Nonlinearities

JANUARY 2008 www.stpmag.com • 29

30 • Software Test & Performance JANUARY 2008

to an approach that focuses on helping individu- prehensive documentation” and “people

32 • Software Test & Performance JANUARY 2008