Академический Документы
Профессиональный Документы
Культура Документы
PR it Te
BE CTIC ting
A s
Un
ST ES
:
VOLUME 5 • JANUARY 2008 • $8.95 • www.stpmag.com
The Future of Software Testing...
A BZ Media Event
Brian Behlendorf on
Open Source
Jeff Feldstein on
Test Teams
Robert Martin on
Craftsmanship
Gary McGraw on
Security
Alan Page on
Centers of Excellence
www.futuretest.net
VOLUME 5 • ISSUE 1 • JANUARY 2008
Contents A Publication
12 COV ER STORY
Lights, Camera, ALM 2.0!
When it comes to life cycle management, with its newly automatic
synchronization of metadata and development artifacts, ALM 2.0 is already
a star—and the tester is the director. Roll ‘em! By Brian Carroll
Using Software
VOLUME 5 • ISSUE 1 • JANUARY 2008
EDITORIAL
Editor Editorial Director
Edward J. Correia Alan Zeichick
+1-631-421-4158 x100 +1-650-359-4763
To Test Software
ecorreia@bzmedia.com alan@bzmedia.com
ART & PRODUCTION Happy New Year! As we software? The whole issue
Art Director Art /Production Assistant enter 2008, it’s time once reminds me of an absurdity
LuAnn T. Palazzo Erin Broadhurst
lpalazzo@bzmedia.com ebroadhurst@bzmedia.com
again to contemplate our I came upon while working
accomplishments of the year as a support technician for
SALES & MARKETING gone by, to consider our Windows magazine in the
Publisher goals for the year ahead, and 1990s. We were gathering
Ted Bahr for some, to ponder the requirements for the edito-
+1-631-421-4158 x101
ted@bzmedia.com
many paradoxes that vex rial and production network
our existence. from the staff, which insist-
Associate Publisher List Services
David Karp Lisa Fiske Why, for instance, do we ed on using Windows-based
+1-631-421-4158 x102 +1-631-479-2977 build software to test other hardware and software to
dkarp@bzmedia.com lfiske@bzmedia.com software? This question has Edward J. Correia publish the magazine.
never before occurred to me, nor does it “We’re a magazine about Windows, and
Advertising Traffic Reprints
Phyllis Oakes Lisa Abelson parallel such mysteries as people who are we need to be published on Windows”
+1-631-421-4158 x115 +1-516-379-7097 financially wealthy but short on values. was the philosophy.
poakes@bzmedia.com labelson@bzmedia.com But it does bear some discussion. And while that kind of idealism might
Director of Marketing Accounting
The idea was brought to me by testing have looked good on pages of the maga-
Marilyn Daly Viena Ludewig consultant Elfriede Dustin, who credits a zine, the state of desktop publishing on
+1-631-421-4158 x118 +1-631-421-4158 x110 conference-goer with reminding her of a the Windows platform at the time was
mdaly@bzmedia.com vludewig@bzmedia.com
concept she had pondered many times immature, to be generous. Their operat-
before. So why is it that we develop soft- ing system was Windows for Workgroups,
READER SERVICE
ware to test software? the first such installation in the company.
Director of Circulation Customer Service/
Agnes Vanek Subscriptions The practice of automating software My belief at the time was the same as it
+1-631-443-4158 +1-847-763-9692 testing itself involves a software develop- is today. The magazine should have used
avanek@bzmedia.com stpmag@halldata.com ment life cycle, complete with its own set the best tool available at the time, regard-
of requirements, a design and the actual less of its content. The parent company,
Cover Illustration by P. Avlen development and testing. And as Dustin now called CMP Media, made its bones
points out, the major advances in testing publishing dozens of periodicals, which
tools since the 1990s include the ability to at one time all used a mainframe-style
recognize object properties beyond their publishing system called Atex. And not a
x,y coordinates. This has made automa- single one of its publications was about
tion more viable because scripts can be Atex.
President BZ Media LLC more useful and less fragile. Why? Because one has nothing to do
Ted Bahr 7 High Street, Suite 407 The open source community also has with the other. “We want to eat our own
Huntington, NY 11743
Executive Vice President
+1-631-421-4158
emerged in the last 15 years as a prolific dog food,” they might have said. And for
Alan Zeichick
fax +1-631-421-4130 source of high-quality test automation Windows magazine to use Macintosh
www.bzmedia.com tools. As evidence, consider the FitNesse computers was unthinkable. “Ridic-
info@bzmedia.com
(fitnesse.org) acceptance testing frame- ulous,” I would have said (and probably
work, Watir (wtr.rubyforge.org) Ruby- did, in private). I lost that battle and
Software Test & Performance (ISSN- #1548-3460) is based automated browser testing librar- would ultimately not be involved in the
published monthly by BZ Media LLC, 7 High Street,
Suite 407, Huntington, NY, 11743. Periodicals postage ies, and Python and Perl. deployment. It was just as well, because
paid at Huntington, NY and additional offices.
Just last month, this magazine ran an the team struggled mightily.
Software Test & Performance is a registered trade-
mark of BZ Media LLC. All contents copyrighted excellent tutorial on building your own Software is very good at automating
2008 BZ Media LLC. All rights reserved. The price XML-based test automation framework. things. So when automated testing is the
of a one year subscription is US $49.95, $69.95 in
Canada, $99.95 elsewhere. Other open source test automation need, why not use the best tool for the
POSTMASTER: Send changes of address to Software frameworks are available, such as job? For the practice of automating soft-
Test & Performance, PO Box 2169, Skokie, IL 60076.
Software Test & Performance Subscribers Services STAF/STAX, which provides useful serv- ware testing, the best tool happens to be
may be reached at stpmag@halldata.com or by
YOGANANDA JEPPU and AMBALAL PATEL are scientists at IFCS Aeronautical Development
Agency in Bangalore, India. Beginning on page 25, the colleagues take a Shakespearean
approach to reduction of defects
to help keep your project from
becoming a Tragedy of Errors.
Yogananda has many pub-
lished works, mostly relating to
real-time systems test method-
ologies for performance and
quality assurance in aeronautics-
industry control systems. In his
current post since 2000, Ambalal
holds degrees in mechanical
engineering and instrumentation and a Ph.D. in fuzzy control systems from the
Indian Institute of Technology, Khargapur.
•
deploy a build into staging or produc- artifacts stored within different tools.
tion once that deployment is approved. Finally, ALF provides a set of best
Think of the automation of ALM as practices for tool integration and
replacing the “cut and paste” that vocabularies. Vocabularies define the
occurs today to keep development ALM 2.0 helps core data structures, events and servic-
tools in sync, though the newer spec es that tools should expose in different
adds a lot more than that. ALM 2.0
helps all the stakeholders in the soft-
all the stakeholders stages of the life cycle. An ALF vocab-
ulary describes the essential events
ware development process communi-
cate and collaborate more efficiently.
in the software and services for configuration man-
agement, requirements, testing, etc.
ALM frameworks provide the glue that Vocabularies make it simpler to
integrates stakeholders and their
development process define orchestrations and substitute
development tools across the life cycle development tools with minimal dis-
and opens opportunities for process communicate and ruption to the tool integrations.
improvement and quality improve- However, ALF also will work if tools do
ment through the use of better tools. collaborate more not support the vocabularies.
For more on ALM 2.0, see the sidebar Based on standards and implement-
“ALM’s Second Coming.” efficiently. ed as open source, ALF eliminates the
nightmare that happens when an
Eclipse Already There
The Eclipse development community is
often quick to exploit new specifications
or development ideas. For example, the
Eclipse Application Lifecycle
• organization upgrades a tool in its
development stack and the proprietary
point-to-point integrations break.
This can often bring development
to a grinding halt. Not only is the
Framework (ALF) project (www.eclipse underlying source code of the interop-
.org/alf) already implements an open tools to emit an event when something erability platform available to the
source ALM 2.0 framework. changes within a tool that may affect shop, but the logic of the integrations
Though still in the incubation phase, related data in another tool (for exam- is expressed in the high-level standard
ALF provides event-driven tool integra- ple, once a build has been completed workflow language of BPEL. The
tion and orchestration using standards or a bug has been approved to be fixed development team (most likely the
such as Web services and BPEL, the in the current release). configuration management team) can
OASIS Web Services Business Process The events go through the ALF make the fixes themselves without hav-
Execution Language OASIS Web Event Manager, which filters the ing to wait for the vendor to get a
Services Business Process Execution events of interest and routes them to patch into the next maintenance
Language. the appropriate BPEL process. A BPEL release cycle.
process is like a flowchart that indi-
Brian Carroll is lead developer on the Eclipse cates the sequence and data to be ALM on Testing
Application Lifecycle Framework (ALF)
passed among tools to keep them in Another positive byproduct of ALM
project and a Serena Fellow.
sync. ALF also contains a standards- 2.0 is that it will make testers’ lives eas-
-
es pro
t e ch nologi SDL
t SOA hen W
a b il i ties tha concepts, t and imple-
cap OA of t
rn the ning S nceive opmen
5. Lea art by lear g able to co your devel rhaps
t in r pe
sid er vide. S n BPEL. Be egration” fo easier and
o n e n t b
this, c - and th e “killer i ke your jo .
y p r actice iew to partic th
ment ation will m e recogniti
a on
d
lrea d its purv vel- to
u d on’t a
xte n ftw a r e d e
or g a n iz
y o u so m
t i s c ritical s
o e so o n er e n ge
1. If y he QA team your shop’s ALM soluti e e v en garn m e a surem which chan le
t f r n es , mp
having the design o erwise, you ss, such as o e l l i n itiativ and knowin porating si to
g
n h c e b a r ols
ipate i process. Ot flawed pro changes to 6 .
h s
As wit the proces t ar t by inc n PPM to are
o
n t g a n ving S blow
opme automatin ish betwee an review. impro d which hu orate full- ocess and
r t. QA
e u m
could b esn’t disting that need hu a n
help , then inc o r p
pmen t p r
ting
o
that d ed and thos
e
b y propaga ust r e p o r ts
y o u r develo
at y
autom ficienc eam m how g to
in cr e ase ef ut the QA t ke sense. track ng.
i nt P rocessin
b ma v
2.0 ca
n
ng too
ls,
re they impro ex Eve cess.
2. ALM hanges amo ools to ensu ly in g Compl lopment pro
e
c
simple links among
t of er app he dev ool
r e e x ecution de 7 . Consid quality of t n ’ t like. T -
o t h e o o
monit tomate ch as c
o
improv
e th s you
d
d not-s
2 .0 can au ting tools, su eric code k w i t h tool of great an est-of-
LM ra en uc ix “b
ause A incorpo more g es to get st in a m g to to
3. Bec ols, consider anners and d the resourc 8 . Don’ t uently cont a es switchin rs that tr y
f to s c h a f r e q m a k e n d o h o se
sets o vulnerability ay not have suites ools. ALM w
v
ar y of yourself, “Wame-
it y u m t . B e k
secur ols, that yo s. great tools easier latform. As priet ar y fr
to
quality reviously. s it p roduce p
b r e e d ” i n t o t h e i r ve d b y a p
ro
o r t p s y stem o n ’ t be y o u e r
sup p t he d. D lock s
well as planne level; rest is
pro c ess as ons work as a crooked y’re b e st inte
the rati sing the ”
4. Test re the integ ilds walls u s behave as g and work?
a k e su w h o b u t ra ti o n h a n dli n
M n es r
e maso ur orch to erro
like th nsure that yo cial attention
e e
test to to, giving sp
o se d
supp ns. propagates changes (for example, a
onditio
edge c requirement is deleted) once those
ment. The expected improvement in links are established, but if there are
system quality resulting from that syn- changes to the wording of a require-
chronization is what sells the technolo- ment that subtly change its meaning,
ier. With the new spec, tool integration gy to development managers. the automated transformation rules
is based on the development process, ALM 2.0 can eliminate the rote work won’t catch that change. So the human
such that updates to project require- needed to synchronize data across tools, intelligence of a QA team is still needed
ments are synchronized across all the but tool integrations are generally driv- to interpret a requirement and deter-
tools used in the development cycle. So en by mechanical transformation rules mine which tests are appropriate to ver-
when a requirement is deferred to a and don’t yet understand the semantics ify the requirement has been satisfied
later release, the test management tool (or meaning) of development artifacts. (Director’s Note 1).
will have been automatically notified to Perhaps such understanding will come Certain changes in requirements or
disable tests related to that require- in ALM 3.0. For now, the current spec bug reports will require changes in
A
application code and the tests them-
selves. Improved communication LM 2.0: THE SECOND COMING
between tools—and therefore all the
roles in development—may make meta- The following is an excerpt from Carey Schwaber’s August 2006 Forrester report that
data more accessible to testers, but craft- introduced the term ALM 2.0.
ing a flexible test and identifying edge
Tomorrow’s ALM is a platform for the coordination and management of development activ-
conditions to be tested still requires the
ities, not a collection of life-cycle tools with locked-in and limited ALM features.These plat-
skills of a journeyman tester.
forms are the result of purposeful design rather than rework following acquisitions. The
However, with ALM 2.0, the QA
architectural ingredients of ALM 2.0 are:
department will have more time to
design and craft tests, and spend less time · Practitioner tools assembled out of plug-ins. An à la carte approach to product packag-
on creating custom automations and ing provides customers with simpler, cheaper tools. Far from being a pipe dream, this
reporting on the tests. ALM integrations approach is a reality today. IBM has done the most to exploit this concept, currently pro-
will perform those tasks (Director’s viding many different grades of development and modeling tools that are all available as
Note 2). perspectives in Eclipse, as well as the ability to install only selected features packs in each
ALM also impacts how integrations of these tools.
among tools are expressed. Today,
scripting with command-line inter- This approach has not yet been successfully applied outside of development and modeling
faces is the common way to tie togeth- tools. For example, today, customers must choose between defect management that’s too
er a sequence of tools. However, it’s tightly coupled with test management and software configuration management (SCM) or
difficult to integrate tools that run on defect management in a stand-alone tool.
different platforms (for example,
· Common services available across practitioner tools. Vendors are identifying features
Linux or the mainframe), or expose
that should be available from within multiple practitioner tools—notably, collaboration,
their interfaces over the Web rather
workflow, security and reporting and analytics—and driving them into the ALM platform.
than the command line. Telelogic has started to make progress on this front for administrative functionality like
With ALM 2.0, BPEL can sequence licensing and installation. Microsoft has gone even further: Visual Studio Team System
and integrate tools that run on differ- leverages SharePoint Server for collaboration and Active Directory for authentication,
ent platforms and that don’t expose and because it uses SQL Server as its data store, it can leverage SQL Server Analysis
command-line interfaces at all. And Services and SQL Server Report Builder for reporting and analytics.
the orchestration can be expressed
using a BPML or BPEL editor rather · Repository neutrality. At the center of ALM 2.0 sits not one repository, but many. Instead
than as script code using a text editor. of requiring use of the vendor’s SCM solution for storage of all life cycle assets, tomor-
row’s ALM will be truly repository-neutral, with close to functional parity, no matter
Unit and Component Testing where assets reside. IBM, for example, has announced that in coming years, its ALM solu-
The development of unit tests will be tion will integrate with a wide variety of repositories, including open source version-con-
trol tools like Concurrent Versions System (CVS) and Subversion. This will drive down
largely unaffected by ALM 2.0.
ALM implementation costs by removing the need to migrate assets—a major obstacle for
However, unit tests will be incorporat-
many shops—and will bring development on mainframe, midrange, and distributed plat-
ed into much larger sequences of test-
forms into the same ALM fold.
ing as part of the continuous integra-
tion movement. Why run unit tests · Use of open integration standards. Two means of integration—use of Web services APIs
continuously when you can’t incorpo- and use of industry standards for integration—will ease and deepen integration between a
rate code scanning, security scanning single vendor’s tools, as well as between its tools and third-party tools. Many vendors still
and other types of quality assurance don’t offer Web-services-based APIs, but this will change with time. In addition, new stan-
into the process? dards for life-cycle integration, including Eclipse projects like the Test and Performance
An early ALF prototype incorporat- Tools Project (TPTP) and Mylar, promise to simplify tools integration. One case in point:
ed security scanning with traditional SPI Dynamics reports that its integration with IBM Rational ClearQuest Test Manager
test management, and combined the took one-third of the time it would have taken if both tools didn’t leverage TPTP.
reporting from both into a single
“Deploy to Production” issue and · Microprocesses and macroprocesses governed by externalized workflow. The ability to
create and manage executable application development process descriptions is one of the
report. It has long been a strange
big wins for ALM 2.0. When processes are stored in readable formats like XML files, they
irony of our industry that the auto-
can be versioned, audited and reported upon.This facilitates incremental process improve-
mated test suite is run manually. With
ment efforts and the application of common process components across otherwise discrete
ALM 2.0, the automated test suite can processes. For example, Microsoft Visual Studio Team System process templates are
run automatically after, say, a success- implemented in XML and contain work-item-type definitions, permissions, project struc-
ful build (Director’s Note 3). ture, a project portal and a version control structure.
Integration vs. Application Testing There is no solution on the market that possesses all of these characteristics, but this is the
While the focus of testing will be on direction in which most vendors are moving. However, it will be at least two years before
the application delivered to the end any vendor offers a solution that truly fulfills the vision of ALM 2.0.
user, the development processes and
its integrations will become more Source: Forrester Research, Inc.
•
That means that the WSDL Web services.
describes the events and services It also provides some vocab-
of tool interfaces, and BPEL is ularies—and you can partici-
used to describe process automa- pate in developing new vocabu-
tions. And why not? Software laries for the tools you use, or
development has been building try your hand at extending
SOA-based systems for the business; why presentation by QA and development some existing ones. Download ALF
not apply SOA to improve the way soft- managers (Director’s Note 6). from www.eclipse.org/alf and get a
ware development operates? Shouldn’t head start on improving your develop-
Use Complex Event Processing
ductivity today. ý
the shoemaker’s children have shoes? ment organization’s quality and pro-
(Director’s Note 5) Prior to ALM 2.0, logged information
time. For example, throughput can be tion for business systems today). to Scott Barber, even users who are
defined for a typical hour, peak hour • Users feel they are interacting freely accustomed to a sub-second response
and non-peak hour for each particular with the information (1-5 seconds): time on a client/server system are
kind of load. In some cases, you’ll need They notice the delay, but feel the happy with a three-second response
to further detail what the load is hour- computer is “working” on the com- time from a Web-based application7.
by-hour. mand. The user’s flow of thought P. Sevcik identified two key factors
The number of users doesn’t, by stays uninterrupted. impacting this threshold8: the number of
itself, define throughput. Without Miller reported this threshold as one elements viewed and the repetitiveness
defining what each user is doing and second. Using the research that was avail- of the task. The number of elements
how intensely (i.e., throughput for one able to them, several authors recom- viewed is the number of items, fields,
user), the number of users doesn’t mended that the computer should paragraphs etc. that the user looks at.
make much sense as a measure of load. respond to users within two seconds 1, 4, 5. The amount of time the user is willing to
For example, if 500 users are each run- Another research team reported that wait appears to be a function of the per-
ning one short query each minute, we with most data entry tasks, there was no ceived complexity of the request.
have throughput of 30,000 queries per advantage of having response times faster Users also interact with applications
hour. If the same 500 users are running than one second, and found a linear at a certain pace depending on how
the same queries, but only one query decrease in productivity with slower repetitive each task is. Some are highly
per hour, the throughput is 500 queries repetitive; others require the user to
per hour. So there may be the same 500 think and make choices before pro-
•
users, but a 60X difference between ceeding to the next screen. The more
loads (and at least the same difference repetitive the task, the better the
in hardware requirements for the appli- expected response time.
cation—probably more, considering That is the threshold that gives us
that not many systems achieve linear
scalability).
An animated watch response-time usability goals for most
user-interactive applications. Response
Response Times:
cursor was good times above this threshold degrade
productivity. Exact numbers depend
Review of Research on many difficult-to-formalize factors,
As long ago as 1968, Robert B. Miller’s
for more than a such as the number and types of ele-
paper “Response Time in Man- ments viewed or repetitiveness of the
Computer Conversational Transactions”
minute, and a task, but a goal of three to five seconds
described three threshold levels of is reasonable for most typical business
human attention1. J. Nielsen believes progress bar kept applications.
that Miller’s guidelines are fundamental • Users are focused on the dialog (8+
for human-computer interaction, so users waiting seconds): They keep their attention
they are still valid and not likely to on the task. Miller reported this
change with whatever technology comes until the end. threshold as 10 seconds. Anything
next 2. These three thresholds are: slower needs a proper user inter-
• Users view response time as instanta-
neous (0.1-0.2 second): They feel
that they directly manipulate
objects in the user interface; for
example, the time from the
• face (for example, a percent-done
indicator as well as a clear way for
the user to interrupt the opera-
tion). Users will probably need to
reorient themselves when they
moment the user selects a column return to the task after a delay
in a table until that column high- response times (from one to five sec- above this threshold, so productivi-
lights or the time between typing a onds)6. With problem-solving tasks, which ty suffers.
symbol and its appearance on the are more like Web interaction tasks, they
screen. Miller reported that thresh- found no reliable effect on user produc- A Closer Look At
old as 0.1 seconds. According to P. tivity up to a five-second delay. User Reactions
Bickford, 0.2 second forms the The complexity of the user interface Peter Bickford investigated user reac-
mental boundary between events and the number of elements on the tions when, after 27 almost instanta-
that seem to happen together and screen both impact thresholds. Back in neous responses, there was a two-
those that appear as echoes of each 1960s through 1980s, the terminal inter- minute wait loop for the 28th time for
other 3. face was rather simple, and a typical task the same operation. It took only 8.5
Although it’s a quite important was data entry, often one element at a seconds for half the subjects to either
threshold, it’s often beyond the reach of time. Most earlier researchers reported walk out or hit the reboot. Switching
application developers. That kind of that one to two seconds was the thresh- to a watch cursor during the wait
interaction is provided by operating sys- old to keep maximal productivity. delayed the subject’s departure for
tem, browser or interface libraries, and Modern complex user interfaces with about 20 seconds. An animated watch
usually happens on the client side with- many elements may have higher cursor was good for more than a
out interaction with servers (except for response times without adversely minute, and a progress bar kept users
dumb terminals, that is rather an excep- impacting user productivity. According waiting until the end.
Bickford’s results were widely used implemented to alleviate the problem. requests per minute. But here we
for setting response times requirements get an absurd situation that if we
for Web applications. C. Loosley, for Not-So-Traditional Performance improve processing time from 10
example, wrote, “In 1997, Peter Requirements to one second and keep the same
Bickford’s landmark paper, ‘Worth the While they’re considered traditional throughput, we miss our require-
Wait?’ reported research in which half and absolutely necessary for some ment because we have only two
the users abandoned Web pages after a kind of systems and environments, “concurrent” users.
wait of 8.5 seconds. Bickford’s paper some requirements are often missed To support 20 “concurrent” users
was quoted whenever Web site with a one-second response
performance was discussed, time, you really need to
and the ‘eight-second rule’
soon took on a life of its own as
a universal rule of Web site
design.”
• increase throughput 10 times
to 1,200 requests per minute.
It’s important to under-
stand what users you’re dis-
A. Bouch attempted to iden- When resource requirements are cussing: The difference
tify how long users would wait between each of these three
for pages to load 10. Users were measured as resource utilization, metrics for some systems may
presented with Web pages that be drastic. Of course, it
had predetermined delays it’s related to a particular depends heavily on the nature
ranging from two to 73 sec- of the system.
onds. While performing the hardware configuration.
task, users rated the latency Performance and Resource
•
(delay) for each page they Utilization
accessed as high, average or The number of online users
poor. Latency was defined as (the number of parallel ses-
the delay between a request for sion) looks like the best metric
a Web page and the moment for concurrency (complement-
when the page was fully rendered. The or not elaborated enough for interac- ing throughput and response time
Bouch team reported the following rat- tive distributed systems. requirements). Finding the number of
ings: Concurrency is the number of simul- concurrent users for a new system can
Good Up to 5 seconds taneous users or threads. It’s impor- be tricky, but information about real
Average From 6 to 10 seconds tant: Connected but inactive users still usage of similar systems can help to
Poor More than 10 seconds hold some resources. For example, the make the first estimate.
In a second study, when users requirement may be to support up to Resources. The amount of available
experienced a page-loading delay 300 active users, but the terminology hardware resources is usually a vari-
that was unacceptable, they pressed a used to describe the number of users able at the beginning of the design
button labeled “Increase Quality.” is somewhat vague. Typically, three process. The main groups of resources
The overall average time before metrics are used: are CPU, I/O, memory and network.
pressing the “Increase Quality” but- • Total or named users. All registered When resource requirements are
ton was 8.6 seconds. or potential users. This is a metric measured as resource utilization, it’s
In a third study, the Web pages of data the system works with. It related to a particular hardware con-
loaded incrementally with the banner also indicates the upper potential figuration. It’s a good metric when the
first, text next and graphics last. Under limit of concurrency. hardware the system will run on is
these conditions, users were much • Active or concurrent users. Users known. Often such requirements are a
more tolerant of longer latencies. The logged in at a specific moment of part of a generic policy; for example,
test subjects rated the delay as “good” time. This is the real measure of that CPU utilization should be below
with latencies up to 39 seconds, and concurrency in the sense it’s used 70 percent. Such requirements won’t
“poor” for those more than 56 seconds. here. be very useful if the system deploys on
This is the threshold that gives us • Really concurrent. Users actually different hardware configurations,
response-time usability requirements running requests at the same and especially for “off-the-shelf” soft-
for most user-interactive applications. time. While that metric looks ware.
Response times above this threshold appealing and is used quite often, When specified in absolute values,
cause users to lose focus and lead to it’s almost impossible to measure like the number of instructions to exe-
frustration. Exact numbers vary signif- and rather confusing: the num- cute or the number of I/O per trans-
icantly depending on the interface ber of “really concurrent” action (as sometimes used, for exam-
used, but it looks like response time requests depends on the process- ple, for modeling), it may be consid-
should not be more than eight to 10 ing time for this request. For ered as a performance metric of the
seconds in most cases. Still, the thresh- example, let’s assume that we got software itself, without binding it to a
old shouldn’t be applied blindly; in a requirement to support up to 20 particular hardware configuration.
many cases, significantly higher “concurrent” users. If one request In the mainframe world, MIPS was
response times may be acceptable takes 10 seconds, 20 “concurrent” often used as a metric for CPU con-
when appropriate user interface is requests mean throughput of 120 sumption, but I’m not aware of such a
widely used metric in the distributed The Difference Between Goals combination of circumstances. In-
systems world. And Requirements stead, specify goals (making sure that
The importance of resource-related One issue, as Barber notes, is goals versus they make sense) and only then, if
requirements will increase again with requirements11. Most response time they’re not met, make the decision
the trends of virtualization and service- “requirements” (and sometimes other about what to do with all the informa-
oriented architectures. When you kinds of performance requirements) are tion available.
depart from the “server(s) per applica- goals (and sometimes even dreams), not
tion” model, it becomes difficult to requirements: something that we want to Knowing What Metrics to Use
specify requirements as resource utiliza- achieve, but missing them won’t neces- Another question is how to specify
tion, as each application sarily prevent deploying response time requirements or goals.
will add only incremental- the system. For example, such metrics as average,
ly to resource utilization You may have both max, different kinds of percentiles and
for each service used.
Scalability is a system’s
ability to meet the per-
formance requirements as
• goals and requirements
for each of the perform-
ance metrics, but for
some metrics/systems
median can be used. Percentiles are
more typical in SLAs (service-level
agreements). For example, “99.5 per-
cent of all transactions should have a
the demand increases Using multiple ,they are so close that response time less than five seconds.”
(usually by adding hard- from the practical point While that may be sufficient for
ware). Scalability require- performance of view, you can use one. most systems, it doesn’t answer all
ments may include Still, in many cases, espe- questions. What happens with the
demand projections such metrics that only cially for response times, remaining 0.5 percent? Does this 0.5
as an increasing number there’s a big difference percent of transactions finish in six to
of users, transaction vol- together provide between goals and seven seconds or do all of them time
umes, data sizes or adding requirements (the point out? You may need to specify a combi-
new workloads. the full picture when stakeholders agree nation of requirements: for example,
From a performance that the system can’t go 80 percent below four seconds, 99.5
requirements perspective,
scalability means that you
can complicate into production with
such performance).
percent below six seconds, 99.99 per-
cent below 15 seconds (especially if we
should specify perform- For many interactive know that the difference in perform-
ance requirements not
your process. Web applications, re- ance is defined by distribution of
only for one configura- sponse time goals are two underlying data). Other examples may
tion point, but as a func-
tion, for example, of load
or data.
For example, the
• to five seconds, and
requirements may be
somewhere between eight
seconds and one minute.
be average four seconds and max 12
seconds, or average four seconds and
99 percent below 10 seconds.
Things get more complicated when
requirement may be to One approach may be there are many different types of trans-
support throughput to define both goals and actions, but a combination of per-
increase from five to 10 transactions per requirements. The problem? Require- centile-based performance and avail-
second over the next two years, with ments are very difficult to get. Even if ability metrics usually works fine for
response time degradation not more stakeholders can define performance interactive systems. While more
than 10 percent. Most scalability requirements, quite often go/no-go sophisticated metrics may be necessary
requirements I’ve seen look like “to sup- decisions are based not on the real for some systems, in most cases sophis-
port throughput increase from five to requirements, but rather on second-tier tication can make the process over-
10 transactions per second over next goals. complicated and difficult to analyze.
two years without response time degra- In addition, using multiple per- There are efforts to make an
dation”—that’s possible only with addi- formance metrics that only together objective user-satisfaction metric. One
tion of hardware resources. provide the full picture can compli- is Application Performance Index
Other contexts. It’s very difficult to cate your process. For example, you (www.Apdex.org). Apdex is a single
consider performance (and, there- may state that you have a 10-second metric of user satisfaction with the per-
fore, performance requirements) requirement and you took 15 seconds formance of enterprise applications.
without context. It depends, for exam- under full load. But what if you know The Apdex metric is a number
ple, on hardware resources provided, that this full load is the high load on between 0 and 1, where 0 means that
the volume of data operated on and the busiest day of year, that the max no users were satisfied, and 1 means all
the functionality included in the sys- load for other days falls below 10 sec- users were satisfied.
tem. So if any of that information is onds, and you see that it is CPU-con- The approach introduces three
known, it should be specified in the re- strained and may be fixed by a hard- groups of users: satisfied, tolerating
quirements. ware upgrade? and frustrated. Two major parameters
While the hardware configuration Real response time requirements are introduced: threshold response
may be determined during the design are so environment- and business- times between satisfied and tolerating
stage, the volume of data to keep is dependent that for many applications, users T, and between tolerating and
usually determined by the business it’s cruel to force people to make hard frustrated users F 12. There probably is
and should be specified. decisions in advance for each possible a relationship between T and the
• Determining specific
performance require-
ments is another large
topic that is difficult to for-
malize. Consider the
t e m ’ s approach suggested by Sevcik for find-
response ing T, the threshold between satisfied
times aren’t and tolerating users. T is the main
worse than parameter of the Apdex (Application
response Performance Index) methodology, pro-
times of viding a single metric of user satisfac-
similar or tion with the performance of enterprise
specific design, technology or usability competitor systems. applications. Sevcik defined 10 differ-
requirement, thus limiting the num- The third category, technological ent methods (see Table 1).
ber of available design choices. If we requirements, comes from chosen design The idea is to use several (say, three)
consider a Web system, for example, and used technology. Some technologi- of these methods for the same system. If
it’s probably possible to squeeze all the cal requirements may be known from the all come to approximately the same num-
information into a single page or have beginning if some design elements are ber, they give us T. While the approach
a sequence of two dozen screens. All used, but others are derived from busi- was developed for production monitor-
information can be saved at once, or ness and usability requirements through- ing, there is definitely a strong correla-
each page of these two dozen can be out the design process and depend on tion between T and the response time
goal (having all users satisfied sounds as a become clear. these few failed transactions are a view
pretty good goal) and between F and the These two situations look similar, page for your largest customer, and you
response time requirement. So the but are completely different in nature: won’t be able to create an order until
approach probably can be used for get- 1.) The system is missing a require- it’s fixed?
ting response time requirements with ment, but results are consistent: This is In functional testing, as soon as you
minimal modifications. a business decision, such as a cost vs. find a problem, you usually can figure
While some specific assumptions response time tradeoff; and 2.) Results out how serious it is. This isn’t the case
like four seconds for aren’t consistent (while for performance testing: Usually you
default or the F = 4T requirements can even be have no idea what caused the observed
relationship may be ip met): This may indicate a symptoms or how serious it is, and
for argument, the ap-
proach itself conveys the
important message that
there are many ways to
• problem, but its scale
isn’t clear until investigat-
ed.
Unfortunately, this view
quite often the original explanations
turn out to be wrong.
Michael Bolton described the situa-
tion concisely 13:
determine a specific per- Usually is rarely shared by devel- As Richard Feynman said in his appen-
formance requirement, opment teams too eager to dix to the Rogers Commission Report on the
which, for validation you have finish the project, move it Challenger space shuttle accident, when
purposes, is best derived into production, and something is not what the design expected,
from several sources. no idea move on to the next proj- it’s a warning that something is wrong.
Depending on your sys- ect. Most developers “The equipment is not operating as expected,
tem, you can determine what caused aren’t very excited by the and therefore there is a danger that it can
which methods from the prospect of debugging operate with even wider deviations in this
above list (or maybe the observed code for small memory unexpected and not thoroughly understood
some others) are appli- leaks or hunting for a rare way.” When a system is in an unpredicted
cable, calculate the met-
rics and determine your
symptoms error that’s difficult to
reproduce. So the devel-
state, it’s also in an unpredictable state.
requirements.
or how opment team becomes Raising Performance
very creative in finding Consciousness
Requirements “explanations.” We need to specify performance
Verification:
serious it is. For example, growing requirements at the beginning of any
Performance vs. Bugs memory and periodic project for design and development
Requirement verification
presents another subtle
issue: how to differenti-
ate performance issues
• long-running transactions
in Java are often explained
as a garbage collection
issue. That’s false in most
(and, of course, reuse them during
performance testing and production
monitoring). While performance
requirements are often not perfect,
from functional bugs cases. Even in the few forcing stakeholders just to think
exposed under load. instances when it is true, it about performance increases the
Often, additional investigation is makes sense to tune garbage collection chances of project success.
required before you can determine and prove that the problem is gone. What exactly should be specified—
the cause of your observed results. Teams can also make fatal assump- goal vs. requirements (or both), aver-
Small anomalies from expected behav- tions, such as thinking all is fine when age vs. X percentile vs. Apdex, etc.—
ior are often signs of bigger problems, the requirements stipulate that 99 per- depends on the system and environ-
and you should at least to figure out cent of transactions should be below X ment, but all requirements should be
why you get them. seconds, and less than 1 percent of both quantitative and measurable.
When 99 percent of your response transactions fail in testing. Making requirements too complicated
times are three to five seconds (with the Well, it doesn’t look fine to me. It may hurt here. You need to find mean-
requirement of five seconds) and 1 per- may be acceptable in production over ingful goals/requirements, not invent
cent of your response times are five to time, considering network and hard- something just to satisfy a bureaucratic
eight seconds, it usually isn’t a problem. ware failures, OS crashes, etc. But if the process.
But it probably should be investigated if 1 performance test was run in a controlled If you define a performance goal as
percent fail or have strangely high environment and no hardware/OS fail- a point of reference, you can use it
response times (for example, more than ures were observed, it may be a bug. For throughout the whole development
30 seconds, with 99% three to five sec- example, it could be a functional prob- cycle and testing process, tracking
onds) in an unrestricted, isolated test lem for some combination of data. your progress from a performance
environment. When some transactions fail under engineering viewpoint. Tracing this
This isn’t due to some kind of arti- load or have very long response times in metric in production will give you valu-
f Shakespeare were to write a play about the field of safety-critical development, where we do
•
numerical limits will cause a to 3g, etc. The set of inputs,
test to fail. selected for a test, is injected
The error bounds are gen- into a tolerance bound genera-
erated taking into considera- tor.
tion the electronic noise in
the sensor inputs and the
As the witches fly through the long Based on the particular
hardware characteristics and
actuator outputs. Here we the senor noise, bias offset and
consider all the noise in the
tube, they’re battered around gain, three values are generat-
electronic circuits from the ed representing the upper,
sensor until the point where
based on specific logic, and come nominal and lower limits of the
the software takes over at the sensor output.
input end. out in a different size. The three values of each
We also take into account sensor variable are injected
the noise in electronics from
where the software generates
the digital commands until
the actuator begins to drive.
• into the Control Law and
Airdata module, which
includes an algorithm provid-
ed by the designers. The
The noise here is from the Control Law and Airdata mod-
sensor electronics, the signal condi- of the bounds be optimal to catch ule defines the embedded controller
tioners, the linear/rotary variable dif- bugs. This requires a tool tailor-made functionality in the form of various
ferential transformers, analog-to-digi- for the application. paths interconnected, with signals get-
tal converters and digital-to-analog ting added, subtracted, multiplied or
converters. The effects of hardware A Comparison Tool divided, as the case may be.
characteristics such as offset, gain and A specialized tool has been developed Each path has controller elements
biases come into the picture here. for the test activity. We call it the such as saturation limits, nonlinear
If we inject a rate signal of 10.0 Evaluation Tool, or EVTOOL for blocks, gains and switches. Since we’re
degrees per second, we’re likely to get, short. It’s a simple name for complex testing the system in a static mode, we
say, 9.123, 11.843 or any random value software. The EVTOOL block don’t take into account the filters and
between these two bounds. The noise schematic is shown in Figure 2. An rate limiters. They’re considered as
is also dependent on the amplitude of input generator generates a set of val- unity gains for constant inputs.
the signal. Higher amplitude yields ues for all the sensors based on the Neither will we go into laplace trans-
forms, etc.
FIG. 4: ADD OR SUBTRACT SIGNALS
Event triggers such as aircraft
switch inputs are injected separately.
The Control Law and Airdata system
outputs are three values of the output
Signal B
variables that define the pass/fail Signal A
bound. When the test case is executed
on the Iron Bird, it should lie within + Output Y = A+B
the upper and lower bounds of these
output values for the test case to pass.
Any anomaly is automatically declared
as a fail requiring further analysis. example, say that signal A has three In case the signals are subtracted Y =
witches (components): aL the lower (A-B), you can change the sign of b to -b
The Three Witches bound, a0 the nominal value and aU the and interchange the upper and lower
The Tolerance Bound Generator gener- upper bound. Similarly, the signal B has limits of B. A “wide band” or “narrow
ates three values of the input signal. its bounds defined by [bL , b0 , bU ]. The band” is the question; you’ll have to
Let’s say we want to test the controller output Y with its three components is decide which formula to use for your
with a pitch rate input of 10 deg/s. The computed by the following equations: specific problem.
output of the module would be some- Two signals can divide or multiply;
thing like 10.897, 10.0 and 9.456. Y = [yL, y0, yU] =[(aL + bL), (a0 + for example, Y = A/B or Y = AxB. In
Imagine these as the three witches in b0), (aU + bU)] for addition Y = A+B cases like this, a Kronecker tensor
“Macbeth” who are flying through the product is computed. Leopold
Airdata System and Control Law mod- Y = [yL, y0, yU] =[(aL - bU), (a0 - b0), Kronecker incidentally believed that
ules (as in a game of Quidditch). (aU - bL)] for subtraction Y = A-B “God made the integers, all else is the
Now imagine the Control Law and work of man.” The product, a human
work, is basically a combination of the
FORMULA 1: OUTPUT BOUNDS upper, lower and nominal values of
signal A and B, as given below.
Airdata module as a factory that bangs on Please note here that for the differ- Y = [yL, y0, yU] = [min (Z), {a0b0 or
the items passing through it or stretches ence of two signals (A-B), the upper a0/b0}, max (Z)]
them with tongs. As the witches fly limit of signal B is subtracted from the
through the long tube, they’re battered lower limit of signal A to give the lower This is the absolute worst-case toler-
around based on specific logic, and come limit of the output signal. These worst- ance bound, which is quite wide.
out in a different size (see Figure 3). It’s case bounds defined above, however,
possible that the lower limit would come may give you a wider estimate of the Signal and Gain
out as the upper limit. We’ll discuss this bound, thus passing a case that should The multiplication of a gain with a sig-
in detail as we go along. have failed. This occurs even more fre- nal can be considered a multiplication
quently in cases where random noise is of two signals (see Figure 5). This gives
Do the Math present. a wider bound. An RSS approach gives a
A drop of water in the breaking gulf We find that using a Root Mean better result in such cases. Let the signal
And take unmingled that same drop again Square (RSS) representation helps in X be defined by the three components,
Without addition or diminishing. these cases. For the case Y = A+B, where and the gain G be similarly defined by
—“Comedy of Errors” the variables are defined as above, the the three components. The bounds for
In the Control Law, two signals can RSS is defined as: the output are then defined as shown in
add up or subtract, as shown in Figure Formula 1.
2 2
4. The output is equal to the sum of sig- y L = y0 − (a0 − aL ) + (b0 − bL )
nal A and signal B. The bounds for the Linear Interpolation
y0 = a0 + b0
output signals are computed from the Nonlinear blocks are normally specified
2 2
bounds of the signals A and B. For yU = y0 + (aU − a0 ) + (bU − b0 ) in a Control Law as two sets of variables;
A
say, X and Y. Each of this is a vector of
{x1, x2, x3 ..}, {y1, y2, y3 ..}, points known SHAKESPEAREAN TRAGEDY
as the breakpoints of the nonlinearities.
These nonlinear blocks can form a pos- If Shakespeare had written a play about software development and testing, the story might
open like this:
itive slope or a negative slope, as shown
in Figure 6.
ACT I, SCENE 1: A Day in the Software House
The output is dependent on the
Narrator: Diseased Nature oftentimes breaks forth: In strange eruptions. – “Henry IV”
characteristics of this slope. In case of a
Certification Agent: All is not well; I doubt some foul play. – “Hamlet”
positive slope, you’ll observe that the
Project Manager: Find out the cause of this effect, Or rather say, the cause of this defect,
lower limit of the output corresponds to
the lower limit of the input. But this For this effect defective comes by cause. – “Hamlet”
isn’t true for a negative slope, where the Prove it before these varlets here, thou honourable man; prove it –
upper limit of the output corresponds “Measure for Measure”
to the lower limit of the input. Notice Test Lead: I will prove it legitimate, sir, upon the oaths of judgment and reason. –
how the limits were interchanged, as “Twelfth Night”
indicated earlier. Tester A: O hateful error, melancholy’s child, Why dost thou show to the apt thoughts of
Notice also that the slopes shown in men: The things that are not? – “Julius Caesar”
the figures are linear. There could be two Tester B: When sorrows come, they come not single spies, But in battalions. – “Hamlet”
or three different slopes in the nonlinear Tester C: The nature of bad news infects the teller. – (“Antony and Cleopatra”
block (see Figure 7). In such cases, the Whispers in Coffee Room: What’s this, what’s this? Is it her fault or mine? The tempter,
components would have different ranges or the tempted, who sins most, ha? - “Measure for Measure”
for the output. A large slope for the Project Manager: Condemn the fault, and not the actor of it? – “Measure for Measure”
upper limit of the input could increase Test Lead: But since correction lieth in those hands, Which made the fault that we cannot
the upper limit of the output drastically. correct... – “Richard II”
This would widen the bounds or reduce Test Team: Unarm, Eros; the long day’s task is done, And we must sleep. – “Antony and
it, changing the shapes of the witches as Cleopatra”
they fly through the tube.
Just don’t get caught sleeping on the job, or your project might end up like the Ariane 5’s
Two-Dimensional Lookups Flight 501.
The “G” gain defined in the section
“Signal and Gain” section above could
be a constant or an output of a two- The two input signals to the lookup interpolated gain output Y = [yL, y0,
dimensional lookup table similar to table would be altitude and speed, yU] with tolerance bounds is comput-
Table 1. These feedback gains are used each signal having its three compo- ed as described below. We consider
in aircraft to ensure performance over nents. The gain thus computed would the set of all combinations of inputs
the flight envelope. The gains are also have a bound. In these cases we with their three components as below:
“scheduled” based on the speed and normally compute the gains for all C = {(aL, bL), (aL, b0), (aL, bU), (a0,
altitude. Refer to “Thou Shalt combinations of the two signals. bL), (a0, b0), (a0, bU), (aU, bL), (aU,
b0), (aU, bU)}
TABLE 1: A TABLE OF GAINS
We then compute the gain output
Altitude/Speed 100m 1000m 5000m 10000m using the lookup table for all these com-
0.0 2.354 4.363 3.456 3.567 binations of inputs as:
Z = {z1, z2, z3, z4, z5, z6, z7, z8, z9}
0.1 3.235 5.347 4.575 3.567
The nominal value is z5 correspon-
0.2 4.354 6.474 5.374 3.879
ding to the input combination (a0, b0).
The upper and lower bounds for the
Experiment With Thy Software” in the Consider two signals A and B as computed gains are then given by taking
June 2007 issue of Software Test & inputs to the lookup table. A = [aL, a0, the maximum and minimum of Z as:
Performance magazine for an article aU] and B = [bL, b0, bU] are defined Y = [yL, y0, yU] =[min (Z), z5, max
on the testing of these gain tables. with their tolerance bounds. The (Z)]
utting costs is the goal of every company. And software testing is a practice that
C has been ripe for the offshore harvest. Given the inherent risks of using offshore
resources for testing and adapting offshore teams The audit as described throughout this docu-
to agile practices, it’s important to ensure that the ment consists of five phases:
best practices of the onshore team are being repli- Pre-assessment planning. This phase includes
cated elsewhere. setting expectations, creating a timeline and
Through an analysis of the approach used by getting executive sponsorship for the project
one company—Primavera Systems—to align soft- audit. The deliverable for this phase is agree-
ware quality efforts around the globe, you can ment with and buy-in of the audit process and
learn how to perform a quality audit that is equal- a follow-up commitment for improvement.
ly applicable to teams inside the four walls of an Typically this is capped by a meeting with the
organization. audit sponsor and key stakeholders on the
We considered Primavera a good case study audit process and objectives.
because it has internal development centers in the Data gathering. This phase involves developing
U.S. (Bala Cynwyd, Pa. and San Francisco), Israel interview questions and surveys and gathering all
and London, as well as offshore development/QA documentation (bug reports, test cases) prior to
centers in India and Eastern Europe. the interview process.
The idea behind a quality audit is to perform a Assessment. The assessment phase involves con-
systematic examination of the practices a team ducting interviews and developing preliminary
uses to build and validate software. This is impor- findings. Confidentiality is crucial, and team
tant, as issues of quality are represented across the members must clearly understand the process. A
software development life cycle. The audit aims meeting explaining the process and the reasons
not to uncover software defects, but to understand behind it should be held with the entire team.
how well a team comprehends and executes the Post audit. After reviewing documents and
defined quality practices. interview notes, the analyzed information is syn-
The quality audit can be used to assess if a thesized into a list of findings and prioritized
process is working and if things are being done remediation steps.
the way they’re supposed to be. The audit is also Presentation of findings with sponsor and team.
an excellent way of measuring the effectiveness of Findings are presented and agreement is reached
implemented procedures. Management can use on highest-priority improvement areas.
audit results as a tool to identify weaknesses, risks In Primavera’s case, the quality audit and the
and areas of improvement. entire software development quality management
The audit is as much about people as it is about life cycle are tied to the ISO 9001:2000 standard.
the procedures in place. The more team members There are a variety of reasons for this, as
understand their roles and how that relates to explained below.
quality, the more likely the team will grasp and
adhere to the defined practices. The ultimate Aligning Scrum With ISO 9001
objective, of course, is to deliver high-quality soft- Primavera Systems, which has practiced Scrum
ware (as defined by the organization). since June 2003, found itself with an interesting
Quality audits are typically performed at regu- dilemma in the beginning of 2005. Increasingly,
lar intervals. The initial audit develops a quality perspective customers were inquiring about the
procedure baseline and determines areas that ISO certification level of Primavera’s development
need remediation. Subsequent audits are moni- processes. In fact, quality auditing is an important
toring-oriented to help teams identify and address element in ISO’s quality system standard. The
gaps in the quality process. Remediation involves most recent ISO standard has shifted the focus of
articulating effective actions that address deficien- audits from procedural adherence only to meas-
cies in the daily process of conducting quality uring the effectiveness of quality practices to deliv-
practices. ered results.
A quality audit can be aided by the use of soft- While this makes sense, implementing and
ware tools but, as stated above, it’s as much about assessing the usefulness of ISO in an agile envi-
people mentoring and teaching as anything else. ronment is a challenge. After all, the Agile
Any team, offshore or otherwise, will best respond Manifesto declares “working software over com-
Photos by Stefan Klein
JANUARY 2008
OFFSHORE PLAYBOOK
release management. In other words, the The team decided it was more impor-
documentation described Primavera’s tant to audit the quality practices them-
agile methods across development selves vs. the quality of the required arti-
domains. facts. While both are important, the qual-
During the same time period, ity of the artifacts can be improved
Primavera was engaged in creating two through training and education, but if
offshore development Scrum teams in the processes aren’t in place, not much
Bangalore. This emerging documenta- can be done. This approach is less threat-
tion, which articulated development ening to the teams being audited if the
processes, proved to be a highly useful quality of the work is not the primary
resource when ramping up the offshore focus.
team. Since the offshore team was unfa- The audit is meant to be foremost a
miliar with Scrum methodologies in gen- fact-finding and learning experience.
eral and Primavera’s practices in particu- With a baseline in place, the ongoing
lar, being pointed to relevant sections of review process can look at the quality of
the documentation helped orient them the artifacts and determine ways to help
agile thought leaders don’t consider ISO and made them productive more quick- the team make improvements. The team
a priority. ly. It also eased communication, since was most interested in objectively look-
Primavera addressed this issue inter- everyone was using the same practices, ing for evidence that the quality process
nally by focusing the team on common metrics and terminology. was being followed.
core objectives and the firm’s mission to Throughout the process, the team
deliver the best possible quality software. Preparing for the Audit made sure to consult with management,
The firm sells software to highly regulat- The quality policy described in the qual- initially to acquire and maintain leader-
ed markets, so the need to support cus- ity management system called for annual ship buy-in. Meetings were held with
tomer requirements, including ISO, is an audits of all quality processes, on- and development managers and executives,
important consideration. Bottom line is offshore. Regardless of the documenta- both on- and offshore, to discuss the
that the quality audit made sense and tion, development management consid- audit and its goals and to solicit input.
provided the opportunity to better align ered it important to The team discovered
the on- and offshore teams. ensure that the quality that the word audit evoked
As a first step, Primavera engaged an practices being used off- some uncomfortable
outside consultant to assess the current
software development life cycle and
information technology procedures as
they relate to alignment with ISO
shore duplicated those
being used at home.
Better to address quality
issues before there’s an
• responses in offshore
teams. Because of this, it
was important to set the
context for the audit as an
9001:2000 standards. Associated with this actual problem.
What exactly exercise in self-improve-
was the delivery of a gap analysis between Given the number of ment and as a retrospective
the current SDLC and the standards. development locations,
did the team tool to help drive positive
The results were a little surprising. the audit team needed to change. Positioning the
While Primavera wasn’t producing all of develop a repeatable want to measure, audit in this light helped
the documentation to meet the letter of process for the project ensure the team’s coopera-
the law as it relates to the ISO standard, audit. For example, what what data was tion and active participa-
existing processes were highly aligned exactly did the team want tion. Since agile methods
with the spirit and intent of the standard. to measure, what data was needed, and how generally provide a high
The consultant felt that by organizing needed, how would the degree of transparency and
many of the existing artifacts into a qual- data be collected and what would it be are constantly being
ity management system and creating a rules would be used to refined, the teams were
limited amount of additional documen- generate the metrics? collected? comfortable with (and
tation, the team could safely declare, Also, would it be necessary appreciated) the focus.
•
“We’re aligned with the ISO 9001:2000 to normalize the data? With management sup-
standard” and provide enough docu- The team developed a port and the offshore
mentation to back this up. checklist to qualitatively team’s understanding of
An important side benefit was the measure how well the off- the context of the audit,
ability to use the assembled information shore teams were con- the audit team began gath-
as a valuable reference for developers forming to agile practices. ering and examining some
and a great resource for helping new Audit criteria were selected from various of the related artifacts, documents that
employees ramp up. The documents, sections of the quality manual along with had been delivered by the offshore team
created by the consultant, described key input from members representing multi- over time. The idea was to look at exist-
processes used in the software develop- ple domains. This resulted in a 43-point ing requirements and attempt to trace
ment life cycle: software testing, configu- audit covering requirements manage- them through the quality process. In
ration management, defect tracking, ment, design and implementation, con- total, this involved looking for the code,
internationalization, product mainte- figuration management, testing, defect test plans, automation and test results—
nance, requirements management and tracking and release management. both automated and manual, related to
the delivered requirement. satisfy the criterion JUnit and FitNesse tests are run. In
There were several reasons for doing This scheme must be used with some the event of a failure, an e-mail noti-
this. First, it was valuable to gain an caution and is provided only as a guide- fication is sent and the merge is
understanding of how transparent the line. The reason for this is simply that rejected and deferred to the next
process was, based on the exchange of auditors interpret things differently, day.
materials from teams distributed around making it difficult to determine the pre- • FitNesse tests run automatically with
the world. This was also a learning exer- cise meaning of any particular score. A daily builds
cise to become familiar with the team’s degree of subjectivity occurs during the
work processes and communication style. audit process that must be taken into Testing
The team was able to identify a variety of account along with the objective meas- • Acceptance tests cover require-
specific issues that required detailed ures. ments and are automated
examination during the on-site audit. The audit criteria are listed below. • Test procedures documented in
This is not the full, detailed spreadsheet Mercury Quality Center
The Audit Checklist And used by auditors on-site, but rather the • Developer unit tests written
Evaluation Scheme higher-level categories. Keep in mind • Automated Silk tests run to validate
The audit checklist was developed to that this checklist was developed to code integration
measure how effectively a team has address the specific needs of Primavera • Test cases and test results can be
implemented and follows the quality and the objectives of the audit. The traced to requirements
guidelines. There are a number of ways terms used below along with the refer- • Internationalization cases consid-
Primavera used the results of the audit, enced tools are also specific to ered during testing
so the checklist had to be synced to the Primavera’s use of agile practices. • Performance testing
objectives. This included: • Test results records stored
To baseline the current state of devel- Requirements management • Test cases peer reviewed
opment within an organization prior to • Primavera PM used to track require- • Active system tests conducted
introducing Primavera’s development ments
process. • Collaboration with Product Owner Defect tracking
To monitor the progress of adoption • Requirements estimated using • In-process defects “scrummed”
of Primavera’s documented Quality accepted techniques (Ideal Team appropriately
Management System and identify areas Days) • Defect reporting using Mercury
where the implemented practices were • Sprint 0 to refine estimates Quality Center
satisfactory as well as areas that could be • Sprint -1 to determine initial • Defect threshold counts used for
improved. requirement estimates sprint entry/exit criteria
The criteria used in the audit are
grouped under the following high-level Design and implementation Release management
headings: • Feature specifications (created as • Sprint review meetings held
• Requirements management design documents and updated “as • Release management team reviews
• Design and implementation built” when requirement is complet- • Sprint retrospectives
• Configuration management ed) • Sprint closeout processes (backlogs
• Testing • Code tested and passes unit tests updated)
• Defect tracking prior to check-in • Tracking progress with burn-downs
• Release management • Formal code reviews requested by • Attend Scrum Master meetings
While the overall goal of conducting programming manager for appro- • Daily team meetings
a project audit is to provide a qualitative priately complex areas • Product Owner sets sprint priorities
view of the suitability of and adherence • Peer or buddy review of code for • Co-located teams
to the development process, a quantita- code check-ins • Obstacles removed daily
tive view is also necessary. For example, • Requesting schema changes • Sprint planning meetings
the auditor conducting the evaluation through schema change process • Task granularity (e.g., no more than
may find it useful to flag certain criteria • Technical designs where appropri- 16 hrs.)
or results that he feels the need to ate
emphasize. Similarly, the team being • Designs and coding consider inter- Performing the Audit
audited may have certain concerns that nationalization It’s worth noting that, except for the
they want to have the auditor scrutinize. • Online help and printed manuals largest organizations, it’s not necessary
The Primavera audit team adopted updated for new requirements or desirable to employ specific auditors.
the following scheme for measuring It’s best to choose auditors from within
each of the 43 audit points: Configuration management the team and across disciplines. In fact,
1.00 If the process fully satisfies the • Builds automated there are good arguments for cycling the
criterion • Builds automatically deployed twice auditing team so many resources get the
0.75 If the testing process largely satisfies daily opportunity to interact with their peers
the criterion • Automated process replicates all and view the quality process from a dif-
0.50 If the testing process partially client server, Web and Group Server ferent perspective. Auditor training is
satisfies the criterion code to all four ClearCase servers recommended so that the results can be
0 If the testing process does not • After a compile of the merged code, as normalized as possible.
•
The mantra for the audit was remediation plan with the
not only “tell me,” but “show same team that met to kick off
me.” Listening to individuals the audit process; this included
describe the process and what stakeholders and management.
was done is important, but so is As in the on-site review, each
viewing the actual artifacts. Since a qual- Wrapping Up the Audit area that didn’t score a 1 was dis-
ity process is interconnected across the Audit findings need to be documented cussed.
development life cycle, time was spent and problems reported for further
looking at related, adjacent links to action. A date should be established Lessons Learned
other areas of the process. for the correction, and the next audit Setting the stage for the audit as an in-
It was important to set and then man- should ensure that issues were reme- depth retrospective aligned with the
age expectations when reviewing peo- died. The audit documentation does- agile goal of continuous improvement
ple’s work. Reminding everyone that the n’t need to be complicated, but should helped Primavera secure individual buy-
focus was on improvement helped us include the audit plan, the audit notes in for the audit.
achieve a balanced auditor/auditee envi- and the audit report. The notes are the Since it seemed that people weren’t
ronment. items the auditor wrote down during prepared for the “show me” mentality
Making sure that all participants gain the audit, and can include specific employed during the interview process,
value from the experience is also impor- findings, responses to questions, key better communication and expectation
tant. Everyone involved learned some- documents reviewed and comments. settings with the team makes sense.
thing new about the processes in use, the The audit report is the “official” doc- Discussing the style to be used during the
rationale behind them and how provid- ument used to report the findings of the audit also helps make those involved
ing traceability adds value. Being able to audit. A template for this document more comfortable.
take a feature and trace it back through should be prepared by the audit team, The scoring system is a work in
test results, automation, unit tests and amended as necessary and consistently process and will be improved over time.
requirements makes obtaining a detailed used by all auditors. The document It remains subjective, which is accept-
understanding of a feature much easier should include details of the audit, date, able, but the addition of a weighting sys-
than walking in cold. auditors’ names and findings. Once tem is under consideration.
Teams shouldn’t expect to achieve agreed to, the audit report should The use of weighting for areas that
a perfect score, so this is another area include the remediation plan. are fundamental to the development
that requires expectation manage- The process audit is concerned with process will provide better results.
ment. In Primavera’s case, the teams both the validity and overall reliability of Since each area of the quality manage-
fared very well. Most of the shortcom- the process. For example, does the ment process isn’t equal in impor-
ings were not unexpected, since the process consistently produce the intend- tance, weighting will expose those
on- and offshore teams worked closely ed results? It’s important to identify non- areas of concern more visibly.
together on a daily basis. Interestingly, value-added steps that the team may The bottom line is that the entire
several valuable insights into other have added to the process. Once identi- development organization now realizes
ion that typifies most successful techies, ting the jackpot with a best-selling book “Simplicity is essential to a core unit
Savoia says we’re all constantly faced by and then opting to stay at the rewrite testing framework; for this reason, the
a more or less binary set of choices desk for years thereafter—for fun. I’ve most successful unit test frameworks
about our lives that can be described by interviewed enough developers to sus- tend to stay small, in code size and in
the equation X + Y = total well-being, pect that even among the I-dream-in- team size,” says David Saff, an MIT doc-
financial and otherwise. X, usually a code crowd, Savoia is an outlier. toral student who is now one of the lead
responsible position at an established maintainers of JUnit. “In comparison to
company, is associated with guaranteed Limited Commercial Appeal other successful open source projects,
income. Y is associated with choice More concrete reason for skepticism this promotes a proliferation of third-
imbued with those elusive qualities of comes from several of the other inter- party extensions, while limiting contri-
emotional engagement, satisfaction of viewees for this column, including one butions to the core framework. I think
intellectual curiosity and overall happi- at industry giant Cisco Systems. this somewhat limits the chance for a
ness. Once financial security is achieved, “Right now Cisco is in a latency curve company to arise that would do the
he continues, it always makes sense to with unit testing,” says Andy Chessin, a equivalent for, say, JUnit, that Covalent
pursue Y when confronted with an technical leader at the San Jose, Calif.- has done for Apache.”
either-or choice about what to do next. based company. “Unit testing has a In other words, unit testing is des-
The logic, which is unassailable, is huge barrier to entry. If the group real- tined to remain by and for a modest
not what’s interesting here. Rather, it’s ly isn’t passionate about it, doesn’t group of undoubtedly smart specialists.
that for Savoia, happiness is more about understand it or doesn’t have the time So while the worlds of writing and pro-
mucking about testing lines of code to get started with it, they probably gramming will always have a small
than, say, fishing in Montana, going to won’t be driven to take the plunge.” priesthood obsessed with quality and
cooking school in the south of France, “Right now, few people really under- constant revision, most of the rest of us
or in my case, buying courtside seats stand what [API-level] unit testing will continue to muddle along as best
and losing myself in Big 10 basketball involves or how to get started with it,” he we can—at least until we can hang up
Index to Advertisers
Advertiser URL Page
Automated QA www.testcomplete.com/stp 11
Hewlett-Packard www.hp.com/go/securitysoftware 40
iTKO www.itko.com/lisa 8
Seapine www.seapine.com/ttstech 4
Get Development
out of sync with the application.
Whenever that happens, the developers
need to spend just a few minutes review-
ing any test failures reported for their
code, and then either updating the test
In Sync With QA cases (if the test failed because the func-
tionality changed intentionally) or fix-
ing the code (if the test failed because a
modification introduced a defect).
In my humble opinion, this is the
Development is writing implemented functionality only way that QA can be automated.
code for the next release of actually works. This has a
the application. QA has a couple of implications. Looking Back, Looking Ahead
regression test suite that One is that team leaders Having been in this industry for 20 years
tests the previous release, must allocate sufficient now, I’ve witnessed many changes.
but they’re waiting for the budget and time for regres- Languages have come and gone, the
end of the current devel- sion test suite development level of programming abstraction has
opment iteration before and maintenance. I’ve elevated, and development processes
they start to update the test found that the best results have grown increasingly compressed
suite to cover the new and are achieved when there’s and iterative. Yet, from assembly code to
modified functionality. As roughly a 50/50 distribu- SOA, one thing remains the same: the
Adam Kolawa
a result, the code base is tion of effort between writ- need for an effective, reliable way to
evolving significantly during the devel- ing code that represents the functional- determine if code changes negatively
opment phase, but the regression test ity of the application and writing code impact application functionality.
suite is not… so by the time that QA that verifies that functionality. One way of helping organizations
receives the code from development, The other implication is that the overcome this challenge is to provide
the code and the regression test suite team needs to modify their workflow so technologies that enable development
are totally out of sync. that QA works in parallel with develop- teams to quickly build a regression test
Once QA has the new version in ment: updating the regression test suite suite that includes unit tests, static
hand, they try to run the old regres- as the developers update the code base. analysis tests, load tests and anything
sion test suite against the new version To achieve this, QA must become more else that can be used to identify
of the application. It runs, but an over- tightly integrated with development. To changes. Our goal here is to help teams
whelming number of test case failures start, development has to build test identify and evaluate functionality
are reported because the code has cases as they write code. This means that changes with as little effort as possible
changed so much. the team needs to define and enforce a so that keeping the test suite in sync
At that point, QA often thinks, policy that every time development with the application is not an over-
“Instead of trying to modify these test implements a feature or use case, they whelming chore.
cases or test scripts for the new version, add a test case to check that it’s func- The other part is to optimize the
we might as well go ahead and test it by tioning correctly. development “production line” to sup-
hand because it’s the same amount of The role of QA, then, is to review port these efforts. This involves imple-
work, and even if I update it now, I’ll still these test cases as soon as they’re writ- menting infrastructure components
have to update it all over again for the ten, as part of the code review proce- (including source control systems,
next version.” So they end up testing by dure. Their goal here is to verify nightly build systems, bug tracking sys-
hand, and typically come to the conclu- whether the test case actually repre- tems, requirements management sys-
sion that automation is overrated. sents the use case that is implemented tems, reporting systems and regression
That’s how automation goes to hell in the code. If QA and development testing systems) to suit the organiza-
in QA. work together in this manner, the test tion’s existing workflow, linking these
suite is constantly updated in sync with components together to establish a fully
Keep in Sync the application. automated building/testing/reporting
As a result of the divide between QA To keep this up, you need to ensure process, and mentoring the organiza-
and development, the regression test that the test suite is constantly tested tion on how to leverage this infrastruc-
suite is treated as an afterthought. To against the application. This means it ture for process improvement. The
software defects. ý
keep the test suite in sync with the must become part of the nightly build result is greater productivity and fewer
code, the team needs to treat the and test process. Every night, after the
regression test suite like it’s a key part application is built, the regression test
Dr. Adam Kolawa is founder and CEO of
of the application—the part of the suite executes. If test failures are report-
Parasoft.
application that verifies whether the ed, then the test suite might be growing
SPRING
SUPERB SPEAKERS!
Michael Bolton • Jeff Feldstein TERRIFIC TOPICS!
Michael Hackett • Jeff Johnson • Agile Testing • Test Automation