Академический Документы
Профессиональный Документы
Культура Документы
4 HAILPERN AND SANTHANAM 0018-8670/02/$5.00 © 2002 IBM IBM SYSTEMS JOURNAL, VOL 41, NO 1, 2002
tation for this terminology, but it is not essential for
the basic understanding of problem definition). Note
that the terms “debugging,” “testing,” and “verifi-
cation” are not mutually exclusive activities, espe-
cially in everyday practice. The definitions draw dis-
tinctions, but the boundaries may actually be fuzzy.
We begin with a software program written in a pro-
gramming language (let P be the program written
in language L). The program is expected to satisfy
a set of specifications, where those specifications are
written in a specification language (call the set of
specifications ⌽ ⫽ { 1 , 2 , 3 , . . . , n } and the
specification language ⌳). In most real-world cases,
the specification language (⌳) is the natural language
of the development team (i.e., English, Spanish, etc.).
VERIFICATION
DEBUGGING
The first is during the coding process, when the pro- recovery from a failure, get exposed at this stage and
grammer translates the design into an executable the offending code needs to be found and fixed be-
code. During this process the errors made by the pro- fore large-scale deployment. This process may also
grammer in writing the code can lead to defects that be called “problem determination,” due to the en-
need to be quickly detected and fixed before the code larged scope of the analysis required before the de-
goes to the next stages of development. Most often, fect can be localized.
the developer also performs unit testing to expose
any defects at the module or component level. The Verification. In order to verify the functional cor-
second place for debugging is during the later stages rectness of a program, one needs to capture the
of testing, involving multiple components or a com- model of the behavior of the program in a formal
plete system, when unexpected behavior such as language or use the program itself. In most commer-
wrong return codes or abnormal program termina- cial software development organizations, there is of-
tion (“abends”) may be found. A certain amount of ten no formal specification of the program under de-
debugging of the test execution is necessary to con- velopment. Formal verification 8 is used routinely by
clude that the program under test is the cause of the only small pockets of the industrial software com-
unexpected behavior and not the result of a bad test munity, particularly in the areas of protocol verifi-
case due to incorrect specification, inappropriate cation and embedded systems. Where verification is
data, or changes in functional specification between practiced, the formal specifications of the system de-
different versions of the system. Once the defect is sign (derived from the requirements) are compared
confirmed, debugging of the program follows and the to the functions that the code actually computes. The
misbehaving component and the required fix are de- goal is to show that the code implements the spec-
termined. The third place for debugging is in pro- ifications.
duction or deployment, when the software under test
faces real operational conditions. Some undesirable Testing. Testing is clearly a necessary area for soft-
aspects of software behavior, such as inadequate per- ware validation. Typically, prior to coding the pro-
formance under a severe workload or unsatisfactory gram, design reviews and code inspections are done
P5
M2 F1 F3
P4
M3 M1 P3
MODULE P2
F2 F4
P1
as part of the static testing effort. 9 Once the code ● Belief in the malleability of software—that what-
is written, various other static analysis methods based ever is produced can be repaired or enhanced later
on source code can be applied. 5 The various kinds
and stages of testing that target the different levels Consequently, in most software organizations, nei-
of integration and the various modes of software fail- ther the requirements nor the resulting specifications
ures are discussed in a wide body of literature. 2,10,11 are documented in any formal manner. Even if writ-
The testing done at later stages (e.g., external func- ten once, the documents are not kept up to date as
tion tests, system tests, etc., as shown in Figure 3) the software evolves (the required manual effort is
is “black box” testing, based on external specifica- too burdensome in an already busy schedule). Spec-
tions, and hence does not involve the understand- ifications captured in natural languages are not eas-
ing of the detailed code implementations. Typically, ily amenable to machine processing.
system testing targets key aspects of the product, such
as recovery, security, stress, performance, hardware Should one wish to go beyond fuzzy specifications
configurations, software configurations, etc. Testing written in a natural language, there is a long history
during production and deployment typically involves of many intellectually interesting models and tech-
some level of customer-acceptance criteria. Many niques 12–14 that have been devised to formally de-
software companies have defined prerelease “beta” scribe and prove the correctness of software: Hoare-
programs with customers to accomplish this. style assertions, Petri nets, communicating sequential
processes, temporal logic, algebraic systems, finite
Current state of technology and practice state specifications, model checking, and interval log-
ics. A key aspect of formal modeling is that the level
In commercial hardware development, it is a com- of detail needed to capture the adequate aspects of
mon practice to capture the requirements and spec- the underlying software program can be overwhelm-
ifications in a formal manner and use them exten- ing. If all of the details contained in the program are
sively in the development and testing of the products. necessary to produce the specification or test cases,
The cost of bad or imprecise specification is high and then the model may well be at least as large as the
the consequences are severe. In contrast, software program, thus lessening its attractiveness to the soft-
poses feasibility challenges for the capture and use ware engineers. For example, there have been sev-
of such information. A software development orga- eral attempts to model software programs as finite
nization typically faces: state machines (FSMs). While FSMs have been suc-
cessful in the context of embedded systems and pro-
● Ever-changing requirements (which, in many cases, tocol verification, state-based representation of soft-
are never written down) ware leads to explosion in the number of states very
● Software engineers’ lack of skills in formal tech- quickly. This explosion is a direct result of software
niques constructs such as unbounded data structures, un-
● Enormous time pressure in bringing product to bounded message queues, the asynchronous nature
market of different software processes (without a global syn-
Even today, debugging remains very much an art. Verification. As discussed earlier, in order to verify
Much of the computer science community has largely the functional correctness of a program, one needs
ignored the debugging problem. 15 Eisenstadt 16 stud- to capture the specifications for the program in a for-
ied 59 anecdotal debugging experiences and his con- mal manner. This is difficult to do, because the de-
clusions were as follows: Just over 50 percent of the tails in even small systems are subtle and the exper-
problems resulted from the time and space chasm tise required to formally describe these details is
between symptom and root cause or inadequate de- great. One alternative to capturing a full formal spec-
bugging tools. The predominant techniques for find- ification is to formalize only some properties (such
ing bugs were data gathering (e.g., print statements) as the correctness of its synchronization skeleton)
and hand simulation. The two biggest causes of bugs and verify these by abstracting away details of the
were memory overwrites and defects in vendor-sup- program. For network protocols, reactive systems,
plied hardware or software. and microcontroller systems, the specification of the
problem is relatively small (either because the pro-
To help software engineers in debugging the pro- tocols are layered with well-defined assumptions, in-
gram during the coding process, many new ap- puts, and outputs, or because the size of the program
proaches have been proposed and many commer- or the generality of the implementation is restricted)
cial debugging environments are available. and hence tractable by automatic or semiautomatic
Integrated development environments (IDEs) pro- systems. There is also a community that builds a
vide a way to capture some of the language-specific model representing the software requirements and
predetermined errors (e.g., missing end-of-statement design 12,13 and verifies that the model satisfies the
characters, undefined variables, and so on) without program requirements. However, this does not as-
requiring compilation. One area that has caught the sure that the implemented code satisfies the prop-
imagination of the industry is the visualization of the erty, since there is no formal link between the model
necessary underlying programming constructs as a and the implementation (that is, the program is not
means to analyze a program. 17,18 There is also con- derived or created from the model).
very nature, be as complex as the actual program. Test metrics. As discussed earlier, testing is, indeed,
Any simplification or abstraction may hide details a sampling of the program execution space. Conse-
that may be critical to the correct operation of the quently, the natural question arises: when do we stop
program. Similarly, any proof system that can au- testing? Given that we cannot really show that there
tomatically verify a real program must be able to han- are no more errors in the program, we can only use
dle very complex logical analyses, some of which are heuristic arguments based on thoroughness and so-
formally undecidable. The use of complex theorem- phistication of testing effort and trends in the result-
proving systems also requires a high skill level and ing discovery of defects to argue the plausibility of
does not scale to large programs. The human factor the lower risk of remaining defects. Examples of met-
also enters into the equation: crafting a correct spec- rics used 24 during the testing process that target de-
ification (especially one using an obscure formal sys- fect discovery and code size are: product and release
tem) is often much more difficult than writing the size over time, defect discovery rate over time, de-
program to be proved (even one written in an ob- fect backlog over time, and so on. Some practical
scure programming language). 20 To date, success in metrics that characterize the testing process are: test
program verification has come in restricted domains progress over time (planned, attempted, actual), per-
where either the state space of the problem is con- centage of test cases attempted, etc. In some orga-
strained or only a portion of the program is actually nizations, the use of test coverage metrics (e.g., code
verified. General theorem provers, model checkers, coverage) are used to describe the thoroughness of
state machine analyzers, and tools customized to par- the testing effort. The use of Orthogonal Defect Clas-
ticular applications have all been used to prove such sification (ODC) methodology 25 allows a more seman-
systems. tic characterization of both the nature of the defects
found and the thoroughness of the testing process.
Testing. Dijkstra’s criticism, 21 “Program testing can
be used to show the presence of bugs, but never to Static testing—source code analysis. Analysis of source
show their absence” is well known. From his point code to expose potential defects is a well-developed
of view, any amount of testing represents only a small branch of software engineering. 26 The typical mode
sampling of all possible computations and is there- of operation is to make the target code available to
fore never adequate to assure the expected behav- a code analysis tool. The tool will then look for a
ior of the program under all possible conditions. He class of problems and flag them as potential candi-
asserted that “the extent to which the program cor- dates for investigation and fixes. There are clear ben-
rectness can be established is not purely a function efits in source code analysis: it can be done before
of the program’s external specifications and behav- an executable version of the program exists. A cer-
ior but it depends critically upon its internal struc- tain class of faults, for example memory leaks, are
ture.” However, testing has become the preferred more easily exposed by analysis than by testing. A
process by which software is shown, in some sense, fault is more easily localized, since the symptoms tend
to satisfy its requirements. This is primarily because to be close to the cause. Typical analyses performed
no other approach based on more formal methods will involve a compiler or parser, tied to the language
comes close to giving the scalability and satisfying of the program, that builds a representation, such