Functional Testing

Scenario-based assessment of nonfunctional requirements
Andreas Gregoriades, Alistair Sutcliffe, Member, IEEE

AbstractThis paper describes a method and a tool for validating non-functional requirements in
complex socio-technical systems. The System Requirements Analyser (SRA) tool validates system
reliability and operational performance requirements using scenario-based testing. Scenarios are
transformed into sequences of task steps and the reliability of human agents performing tasks with
computerised technology is assessed using Bayesian Belief Network (BN) models. The tool tests
system performance within an envelope of environmental variations and reports the number of tests
that pass a benchmark threshold. The tool diagnoses problematic areas in scenarios representing
pathways through system models, assists in the identification of their causes and supports comparison
of alternative requirements specifications and system designs. It is suitable for testing socio-technical
systems where operational scenarios are sequential and deterministic, in domains where designs are
incrementally modified so set-up costs of the BNs can be defrayed over multiple tests.
Index TermsNon-Functional Requirements Validation, Scenario-Based Testing, Bayesian Belief Networks,

Systems Engineering
1 INTRODUCTION
cenarios have attracted considerable interest as a means of validating requirements
Sspecifications [4, 5, 9, 54]. Foundations of scenario-based approaches were laid by

Hsia and Davis [8, 33], and the influential work of Potts [48] who created the
Inquiry Cycle and later the ScenIC [46] method for scenario-based requirements
validation [46, 47, 48, 49]. The potential of scenario-based requirements validation
has also been recognised by Anderson and Durley [1], Zhu and Jin [71], and Haumer
[28].
Scenarios have been applied to the analysis of non-functional requirements (NFRs)

using dependency tables to assess the relationships between different NFRs [43] and
by modelling the dependencies between goals (representing functional requirements
and non-functional requirements, also called soft goals), and the agents and tasks that
achieve them in the i* language [70]. The satisfying or fulfilment of soft goals (i.e.
NFRs) by functional requirements is assessed by inspecting strategic dependency and
rationale models that show goals, agents, tasks and dependency relationships [40, 41,
1
70]. Although i* support tools do provide limited reasoning support for assessing
dependencies, most validation still requires human expertise. The TROPOS [20]
language supports more formal reasoning about i* models; however, it does not
explicitly assess non-functional requirements.
Unlike functional requirements, which can be deterministically validated, NFRs are

soft variables that can not be implemented directly; instead, they are satisfied [40] by
a combination of functional requirements. Since many NFRs are influenced by human
properties, they inherit the diverse nature of human characteristics: for example,
assessment of NFRs such as system reliability is influenced by human characteristics
such as ability, stress, concentration, etc. Software engineering and systems
engineering requirements validation methods do not take human factors into account,
even though they are a critical cause of systems failure [30, 31, 52].
In our previous work [61] we developed a method and software tool for scenariobased requirements validation that prompted designers with questions about potential
problems in a scenario event sequence. The tool used a psychology-based taxonomy
of failure causes [32] with a pathway expansion algorithm that generated alternative
paths from a single seed scenario. It supported an inspection-based process with probe
questions about possible problems and generic requirements as cures for the problems
it identified. However, evaluation of this approach showed that too many scenario
variations were generated and the software developers drowned in excessive detail.
To address this problem, we developed a semi-automated approach to requirements

validation [59], by transforming the taxonomy of human and system failure causes
into a model to predict potential errors in a system design. Bayesian Belief Nets
(BNs) provided a probabilistic reasoning mechanism to predict reliabilities, from
models composed of descriptions of system components and attributes of human
operators [21]. However, the output from the BN model was fed into a paper-based
walkthrough for validating scenarios which was still time-consuming. This led to the
motivation for the research we report in this paper, to create a software tool, for
scenario-based requirements validation that automates as much of the process as
possible.
The paper is organised in seven further sections. BN and uncertainty modelling is

briefly described; this is followed by the methodology and the tools architecture; the
NFR assessment follows. A case study analysis of NFR compliance and validation of
system-level components in a military command and control domain is presented in
which the tool is applied; the BN evaluation is explained; and the paper concludes
with a discussion and proposals for future development of our approach.
2 RELATED WORK
The SRA System Requirements Analyser tool described in this paper can be
regarded as a form of model checking which takes place early in the system
development life cycle, and uses BNs to reason about properties of system
components rather than more detailed models of system behaviour.
Model-checking techniques have been used extensively to verify and validate

requirements. However, despite the advantages, formal modelling suffers from a
communication problem between user-stakeholders and the model developers [7, 11],
since formal models are difficult to communicate to the stakeholders who set the
requirements in the first place. The software cost reduction (SCR) system used a
tabular notation for specifying requirements dependencies which is relatively easy for
software developers and end users to understand [34]. Tabular representation based on
the underlying SCR state transition formal model provided a precise, unambiguous
basis for communication among developers, coupled with automated analysis of
specifications. The approach hides the logic associated with most formal methods and
adopts a notation that developers find easier to use.
While tabular representations can improve communication of requirements, a

combination of visualisations, examples and simulation are necessary to explain
complex requirements to end users [6]. Scenario-based representations and animated
simulations help users see the implications of system behaviour and thereby improve
requirements validation [22]. Lalioti [36, 37] suggested potential benefits from
animating requirements validation including an interactive and user-friendly
validation environment for stakeholders.
Animation simulation tools integrated with formal model checkers have been
developed by Dubois in the ALBERT II [10] language and associated requirements
validation-animator tool (animator). The language preserves the structure of the
informal requirements expressed by stakeholders and maintains traceability links to
the formalised software requirements document. The animator validates the
requirements based on scenarios proposed by the stakeholders, allowing them to
cooperatively explore different possible behaviours of the future system. A similar
approach has been adopted in the KAOS language and supporting GRAIL tool which
enable formal reasoning about dependencies between goal model, required system
behaviour and obstacles or constraints [66, 67]. Another similar animator-validator
tool, TROLL [23], uses a formal object-oriented language for modelling information
systems, with syntax and consistency checker tools as well as an animator to
generates executable prototypes that can be used for requirements validation. As with
SCR and ALBERT II animators, our approach employs a tabular and graphical
representation of results [29] and runs test scenarios against the system model to
identify problems with the requirements specifications.
Scenario-based requirements analysis methods, pioneered by Potts [46, 47, 48],

proposed that obstacles or difficulties which might prevent a goal being achieved
should challenge requirements and hence promote refinement of the requirements
specification to deal with such obstacles. This approach was developed by van
Lamsweerde [65, 67], who applied formal reasoning to requirements specifications to
infer whether goals could or could not be achieved given constraints imposed by
obstacles. Hierarchical goal decomposition produced specifications of the states to be
achieved and the system behaviour required to reach those states, so considerable
problem refinement is necessary before automated reasoning can be applied. These
approaches also assumed that a limited number of scenarios and their inherent
obstacles are tested. This raises the question of test data coverage, i.e. just what is a
sufficient set of scenarios to enable validation to be completed with confidence?
While we believe there is no quick answer to this vexing problem, one approach is to
automate the process as far as possible so more scenarios can be tested.
Methods for requirements validation in safety critical systems have adopted

hierarchical fault trees to represent the space of possible normal and abnormal system
4
behaviours and their causal conditions (e.g. THERP [64]). While fault stress can be
formalised as state machines with temporal logic to reason about potential failures in
deterministic systems [27], the influence of human operators and the system
environment are generally not modelled. When they are represented, as performance
shaping factors [35], probabilistic modelling has to be used to reason about the
likelihood of failure of system components based on a description of their properties
and factors such as operator stress and fatigue [68].
Intent specifications provide a hierarchical model to facilitate reasoning about system

goals and requirements in safety critical systems [38]. Goals are decomposed in a
means-ends hierarchy, widely practised in requirements engineering [54, 67]. Intent
specification requirements are assessed by inspecting dependencies between
constraints, design principles and system goals to discover conflicts. Automated
support for reasoning about conflicting system states and behaviour is provided by the
SpecTRM-RL tool which uses a tabular format to represent relationships between
threat events and systems states, based on design assumptions and constraints.
However, intent specifications do not support assessment of human error in systems
or dependencies between human operators and user interfaces.
Assessment of non-functional system requirements, such as system reliability, has to

use probabilistic reasoning since the range of potential system behaviours is either
unknown, in the early requirements phase, or too large to specify. Bayesian Nets
(BNs) have been developed to assess software quality from properties of the code and
software engineering process [13, 15, 16, 18, 19], and for system risk analysis and
management [17]. Fenton and Littlewoods [16] approach predicts the number of
defects in the system. They estimate software reliability using BNs to reason about
quality probabilities based on information gathered during the software development
process, such as the difficulty of the problem, the complexity of the designed solution,
the programmers skill, and the design methods employed. Fenton [17, 42] has
developed large BN models to assess risk at the system level, such as the reliability of
system engineering processes for developing ships, vehicles or the operational
reliability of air traffic control systems. This work has also produced methods and
tools for building large BN models to solve complex real world problems and
improved support for use of BN tools by end users. BNs have also been applied to
5
evaluating the confidence which might be assigned to different combinations of test

strategies in assuring reliable software [72].
In summary, BNs have been widely applied as a probabilistic reasoning technique in

software engineering and other domains; however, previous work used single nets to
evaluate a set of discrete states pertaining to a software product or development
process. In our earlier work we extended the application of BNs for safety analysis in
systems engineering domains using a semi-automated scenario-based approach [21].
We then developed more automated tools for scenario analysis of NFR conformance
for requirements specifications with multiple BN tests [60]. This paper extends that
work to show the development of a more comprehensive tool architecture which can
be configured with different types of BNs to analyse other non-functional
requirements; description of the scenario-based NFR evaluation method with different
modes of using BNs in scenario analysis; and validation studies of the BNs. The
extensive case study is reported using the tool to analyse a requirements specification
for an aircraft weapons loading system for a future aircraft carrier.
3 MODELLING UNCERTAINTY
Because of the uncertain nature of NFRs it is necessary to model them using

modelling techniques such as Bayesian probability, Dempster-Shafer theory, fuzzy
sets or possibility theory. Following Wright and Cais [69] review of the advantages
and disadvantages of stochastic reasoning methods, we adopted Bayesian probability.
They argued that Bayesian probability offered easier combination of multiple
influences on probability than Dempster Shafer and a sounder reasoning mechanism
than fuzzy sets. Bayesian probability provides a decision theory of how to act on the
world in an optimal fashion under circumstances of uncertainty. It also offers a
language and calculus for reasoning about the beliefs that can be reasonably held, in
the presence of uncertainty, about future events, on the basis of available evidence
[45]. BNs are useful for inferring the probabilities of future events, on the basis of
observations or other evidence that may have a causal relationship to the event in
question [12, 19].
BNs are directed acyclic graphs of causal influences, where the nodes represent
variables, and the arcs represent (usually causal) relationships between variables [12].
The example in figure 1 shows two influences on agent stress loading: workload and
duty time. Variables can have any number of states in a BN, so the choice of
measurement scale is left to the analysts discretion. For the illustration we have
assigned these variables to one of the two possible states: high, or low.
Fig. 1: Fragment of the proposed BN model.
In the above example, if we know that when the duty time is high (bad) and the
workload is high (bad), then the overall probability of the agents stress loading being
high (i.e. bad influence on human agent) will be greater. In the BN we model this by a
network probability table (NPT), as shown in table 1.
Table 1: A network probability table for the BN in figure 1.
Duty Time
Stress-loading
High
Low
Workload
High
Low
High
Low
High
0.4
0.6
Low
0.6
0.4
Column 1 asserts that if the duty time of a human agent is high (bad) and his/her
workload is high, then the probability of stress loading being high (bad) is 1, with
zero probability of being low. NPTs are configured by estimating the probabilities for
the output variables by an exhaustive pairwise combination of the input variables.
BNs can accommodate both probabilities based on subjective judgements (elicited
from domain experts) and objective data [17]. When the net and NPTs have been
completed, Bayes theorem is used to calculate the probability of each state of each
node in the net. The theorem is shown in equation 1:
P ( a / b) =
P(b / a ) P(a )
P(b)
[1]
Where,
P(a/b) = posterior (unknown) probability of a being true given b is true
P(b/a) = prediction term for b given a is true (from NPT)
7
P(a)
P(b)
= prior (input) probability of a

= input probability of b
or, less formally:

Posterior_Probability =
Likelihood Prior_Probability
Evidence
Substituting data from the above example, the calculation is as follows. We want to
calculate the probability P that duty_time will be high and if we have opbserved that
the agent has a high workload. The likelihood of P stress (high) given work loading =
high and duty_time = high is 0.6 as given in the network probability table. To
calculate the posterior probability P(duty_time = high) and P(overloading = high), we
need the prior P(duty_time = high) is 0.5 and the input evidence of workload being
high which is 0.42, which produces the following calculation (equation 2):
P(duty = high / load = high) = (P(loading = high / duty = high) * P(duty = high)) /
P(loading = high)
or
P(duty = high /load = high) = (0.6 * 0.5) / 0.42 = 0.71
[2]
Input evidence values are propagated through the network, updating the values of
other nodes. The network predicts the probability of certain variable(s) being in
particular state(s), given the combination(s) of evidence entered. BN models are
extremely computation-intensive; however, recent propagation algorithms exploit
graphical models topological properties to reduce computational complexity [45].
These are used in several commercial inference engines such as HUGIN, which we
used. BNs have to conform to a strict hierarchy since cycles lead to recursive and non
terminating propagation of probabilities by the algorithm. This imposes some
compromises in modelling influences, which can be partially overcome by
introducing additional input nodes to model cyclic influences, although this increases
complexity of the network and the control process for the algorithm.
BNs are currently used in many applications to reason about probabilities of

properties given a set of existing (prior) states; however, they do not naturally lend
themselves to a time series analysis. We examined three possibilities. First was serial
evaluation using an extended net which contained an input node that accepted the
result from the previous run. Hence the output reliability from step 1 became an input
prior state for step 2. This approach had the advantage of being able to explicitly
model the interaction between events; for instance, a high probability of failure at step
1 may make completion of step 2 much more difficult. However, input of a posterior
probability into a BN as a prior observation over many events has doubtful validity,
and we were advised by an expert statistician to avoid this approach. The experts
argument was that each run should be assumed to be independent, which would not be
the case if we propagated results between runs. The second approach was to combine
the output probabilities from a sequential run; assuming a BN has been used to assess
the probability of failure in a multi-step scenario, how should N probability
judgements be combined into a single value? One possibility was to use the output
probabilities as input into a summariser net that combined all the inputs as prior
observations into a single probability, with the net structure organised to group events
into episodes in a scenario. However, this option also faced the same criticism as the
first, namely converting multiple posterior probabilities into input observations. Our
expert advised that sample runs assuming they were independent, were possible but
this required probabilities of sampling particular runs to be set. This introduced a
subjective sampling bias; accordingly we rejected this option as well.
The third option avoided the net combination problem by converting the output
probability into a Boolean variable by judging each step to have succeeded or failed.
The output calculated probability for each event was compared with a user-defined
target value, and if it surpassed the target is was assigned as a survivor, otherwise a
failure and discounted. This option had the advantage of being able to pinpoint
particular steps in scenarios that were posing reliability problems. Furthermore,
sensitivity analyses could be carried out with multiple BN runs for each step by
varying the environmental conditions, and thus producing frequencies of survivors for
a set number of tests at each scenario event. This enabled investigation of the effect of
environmental conditions on design (x) with a set of scenarios (a, b, c) by counting
the number of surviving BN runs per step, having systematically varied all
combinations of the environmental variables from worst case to best case.
The SRA tool currently has two BNs: one to evaluate reliability and one to evaluate
performance time. Each BN model has variants with different probability distributions
in the NPTs to deal with variations in the degree of automation between tasks. New
BNs can be added to the tool to evaluate a wide range of NFRs.
9
3.1 BN MODEL OF SYSTEM RELIABILITY
The BN model of system reliability is based on a taxonomy of influencing factors by

Sutcliffe and Rugg [62] and the slips/mistakes distinction from Reason [52], who
drew on earlier work by Norman [44]. Slips are attention-based lapses and omissions
in skilled behaviour, whereas mistakes are failures in plans and hence fit into
Rasmussens [51] rule and knowledge levels of processing. The BN model
distinguishes between tasks that involve highly trained skills and are more prone to
slips (e.g. monitoring tasks) and knowledge-intensive tasks, such as analysis and
planning, that are more prone to mistakes.
According to the human error theory [41], the system environmental variables have an
indirect influence on an individuals ability through increasing the fatigue and stress
levels, as reflected in the BN model in figure 2. An individuals ability, however, has
a direct effect on mistakes. Organisational factors (management culture, incentives)
have a direct effect on individuals motivation [39]. Finally, individuals
characteristics, such as domain and task knowledge, have a direct effect on mistaketype errors [53]. Slips are mainly influenced by the user interface, the constraints
(time constraints, interruptions) and the individuals dedication [52]. Tasks of high
cognitive complexity are considered to be more prone to mistake-errors, while tasks
of physical complexity, such as complex manipulations involving precise movements
and detailed co-ordination, are more prone to slip-errors [59].
10
17
18
Noise
Lighting
1-2
19
20
21
22
Sea_State
Visibility
3-6
7 -10
War_Peace
Comfort
11-16
17-22
Functionality
Performance
External_Env
factors
Internal_Env
factors
Env_Context
13
Reliability
15
16
Management
culture
14
Incentives
Duty_Time
Workload
Cognitive
complexity
Physical
complexity
Task
support
Task
complexity
Usability
7
Functional
UI_Design
11
Enthusiasm
Stress
10
Knowledge
Autom
Effectiveness
Time
constraints
Distractions
Task
Domain
knowledge knowledge
Operational
stress
External
influence
Inherent
ability
Ineffective
-ness
Ability
Internal
influence
Expertise
Mistakes
Organisation
culture
Internal
motivation
Fatigue
12
6
Motivation
Dedication
Slips
Fig. 2: BN model for system reliability. Inputs 1-2 relate to the task, 3-6 are technology
attributes, 7-10 are human attributes and 11-22 are environmental variables. Appendix A
describes the nodes and summarises the NPT influences from parent to child nodes.
The first two inputs represent judgement of task complexity; for instance, operating
radar is cognitively and physically easy, whereas interpreting an object on the radar is
cognitively more complex (hence set to high). Inputs 3 to 6 describe technical
component properties, which can be taken from historic data on similar equipment, or
estimated. Inputs 7 to 10 are properties of human agents which are taken from training
data and job descriptions. Input values for the agents task knowledge, domain
knowledge, motivation and so forth can be measured using aptitude and psychometric
tests. The next six variables model influences on the human operational environment.
These include the short-term effects of time pressure, distractions and workload,
which can be estimated from narrative scenario descriptions, to the longer-term
influences of management culture and incentives. The final six inputs describe aspects
of the systems operational environment (noise, lighting, comfort, sea state, visibility
and war/peace status). All the inputs are held in databases containing attributes of
human agents, technology components, and tasks. The environmental variables, sub-
11
divided into human and system operational aspects, can be entered manually to reflect
a particular scenario or systematically varied.
Task complexity can be either cognitive or physical; for instance, operating radar is
cognitively and physically easy, whereas interpreting an object on the radar is
cognitively more complex (hence set to high). Attributes of technical components can
be taken from historic data on similar equipment, or estimated. Inputs from the human
agent are taken from training data, job descriptions or are measured objectively by
using
psychological
questionnaires.
For
instance,
general
ability
and
accuracy/concentration can be measured by intelligence aptitude scales, decision

making and judgement by locus of control scales, whilst domain and task knowledge
can be measured by creating simple tests for a specific task/domain. Input nodes in
the human operational environment include the short-term effects of time pressure,
distractions and workload, which can be estimated from narrative scenario
descriptions, to the longer-term influences of management culture and incentives,
which are judged from contextual scenarios. The final input nodes describe aspects of
the systems operational environment. All the inputs are contained in files that link the
variables to human agents, technology components, task properties or the
environment, sub-divided into human and system operational aspects. The input
variables are all discrete states (best/worst case) which are derived from the measures
detailed in appendix A.
The BN is run with a range of scenarios that stress-test system design against
operational variables. Scenarios can either be taken from domain-specific operational
procedures or by interviewing users, or postulated to cover a variety of organisational
and work situations that may occur in the domain. The BN produces outputs: slip-type
errors that apply to skilled tasks (recognise, interpret and act), and mistake errors
pertinent to judgement-style tasks (analyse, plan and decide).
3.2 BN FOR OPERATIONAL PERFORMANCE TIME
The topology and components of the BN for performance time assessment are similar
to the Reliability BN since many of the influences on performance and error are the
same. The Operational Performance Time model has a similar causal network to the
Reliability BN, apart from having one output node (operational performance) rather
12
than two. As with the Reliability BN illustrated in figure 2, the likelihood influences
expressed in the BN model and its NPTs are based on human factors performance
literature [64] (see also appendix A). For example, a poor physical and operational
environment (time on duty and workload) have an adverse influence on the agents
stress and fatigue levels which in turn adversely influence the agents concentration
[3]. Input variables are either an experts assessment of a quality, e.g. information for
decision support provided in a prototype; or a functionally rich and more expensive
design which would have a higher rating for functionality, situation awareness
support, etc. Different levels of automation are reflected in variations in the BNs. For
example highly automated tasks tend to be quicker and more reliable, but this only
applies if the equipment is well designed and maintained. Hence maintenance has
more influence in highly automated tasks than in minimum automation, and this is
reflected in different NPTs based on the equipment types. Similarly, the type of task
(manual, semi-automated) determines the degree of influence of technology.
Whereas the Reliability BN produces probabilities of reliable completion for each

task step, output from the Operational Performance BN is used to increase a best case
task completion time to reflect the less than ideal properties of human and machine
agents. Each task is assigned a best and worst case completion time, obtained from
domain experts. The estimated task completion time is calculated using the following
formula (equation 3):
ET = (Plow * BT) + (Phigh) * WT)
[3]
Where,
ET
P high
BT
P low
WT
= Estimated time
= Probability of operational performance being high
= Best task-completion time
= Probability of operational performance being low
= Worst completion time
Hence, if the probability of high operational performance is equal to 1 then the

probability of low operational performance will be 0 (best case); this will result in a
best case completion time. On the other hand, if the probability of low operational
performance time is 0.57 and the best and worst times are 3 and 10 sec respectively,
then the estimated time is (0.57*3) + (0.43*10) = 6.01 sec. If the threshold value is set
13
at 75% in the range best-worst case, then this is converted into time with the
following formula (equation 4):
Thsec = (Th% / 100) * BT + ((1- Th%) / 100) * WT
[4]
Where,
Thsec = Threshold in seconds

BT = Best task-completion time
Th% = Threshold as a percentage value
WT = Worst completion time
Therefore, according to the above example and equation 5:
Thsec = 10
75 (10 3)
= 4.75 sec
100
[5]
Hence any task completion time less than 4.75 is acceptable. For each task-step the
system counts the BN runs with task completion times below the threshold.
To reflect the case of reverting to manual when an automated technology fails, highly
automated tasks worst completion times are generally set much higher than those of
the manual tasks. This is because the human operator has to diagnose the reason for
failure and then substitute the manual version of the task which will not be familiar.
Hence the worst-case time will be longer than the manual task alone. For instance, the
task Manually load weapons on trolley requires 120 sec to complete best-case
situations and 180 sec in the worst case. On the other hand, the same task with
automated technology could be completed ideally in 70 seconds but in 320 sec in the
worst case. If the automated technology fails to load the weapons correctly then
intervention of a human agent is required to discover the reason for the failure and
then correct the misplacement or manually load the weapons.
4 SRA SYSTEM ARCHITECTURE
Analysis starts with the selection of the i* model to be evaluated, and creating the test
scenarios. Scenarios are narratives taken from real life experience describing
operation of similar systems from which event sequences are extracted. This process
is explained in more depth in sections 5 and 6. A scenario editor tool is provided [24]
which allows the analyst to point to task nodes on the i* diagram; the tool then
presents a list of the technology and human agents which may be associated with the
14
task. The analyst picks the agents from the list to form a task tuple consisting of
<human agent, task, technology agent>. Scenarios are built up in this manner by
following task pathways through the i* model, which is illustrated in figure 3. The
analyst specifies the NFR threshold values, then selected the scenarios and system
database. The SRA loads the required information (for the task and agents in the
scenario) from the domain database. Because of differences between semi- and highly
automated tasks, the system evaluates operational performance for each type of task
using slightly different BN models. Nodes that do not apply to the equipment used are
left undefined and therefore have a neutral influence on operational performance. For
instance, tasks that are highly automated are more dependent on maintenance
compared with semi-automated tasks, whereas highly automated equipment is
generally more reliable as long as it is well designed and maintained. These influences
are reflected in the network probability tables of the BN models.
Fig. 3: System model for a navy command and control combat system represented in the i*
notation. To simplify the model only human agents are shown. Scenarios trace pathways
through the model from the radar operator to PWO and then to weapons directors EWD,
WDB or WDV for a response to the threat.
Furthermore, depending on the task type, the SRA assesses system reliability based on
two types of errors, slips and mistakes. Slips are more common in tasks that are
highly skilled or physical in nature, while mistakes occur in tasks that are cognitively
15
complex or knowledge-intensive, such as planning [50, 51]. For each BN run the tool
assesses the system reliability and compares it against the pre-defined threshold.
Throughout this process the system keeps track of the number of BN runs that pass
the threshold.
In its current form the tool assesses two NFRs, system reliability and operational
performance time. The BN models are used in a plug-and-play architecture that binds
BN models input nodes with the System Requirements Analyser (SRA), enabling a
range of NFRs to be tested using the same set of scenarios.
The SRA tool is composed of the following software components (see figure 4):
The Session Controller implements the user command interface for selecting
designs and scenarios and executes the algorithm that assesses a set of
scenarios with the BNs. It calls the system reliability or operational
performance BN assessors to execute the BN runs with all possible
environmental combinations.
The i* model editor allows interactive construction of i* models with typical

CASE tool-type functions.
The Interactive Scenario Constructor produces test scenarios from the system
model based on user directions. Scenarios are stored in a database in an array
of tuples.
The Model Controller controls the BN models. It selects the appropriate BN

model for each task step, then populates the input nodes, runs the model and
receives the belief distributions of the output nodes. The Model Controller also
manages the back propagation of the BN model to identify required
technology and agent characteristics.
The BN assessor modules run the net by calling the HUGIN algorithm for
each task step and for each set of environmental variable combinations. The
output from each run is compared with the desired NFR threshold and the
survivor runs are passed to the results visualiser.
16
i* model editor
Scenario constructor
i* system
models
Desired NFR
thresholds
Session
controller
Selects appropriate
BN models,
controls runs
Model
controller
Configuration
editing: add new
BN models and
selection rules
HUGIN
BN editor
HUGIN BN
tool
Scenario
task
sequences
agent
task
properties
Domain
database
Results
visualiser
Survivor runs
for each step /
scenario / design
Fig. 4: System Requirements Analyser conceptual architecture and functional

components.
The Visualiser provides a visual summary of all qualified BN runs for a set of
scenarios for one or more system designs. This enables different designs to be
compared and problem areas in the requirements to be identified, i.e.
task/technical component combinations which show low potential NFR
assessments. The Visualiser displays results at three levels: System, Scenario
and Phase views based on our previous visualisation model [24].
The system can be configured with new BNs by creating a new net and NPTs using
the HUGIN tool. The new BN is then added to the Model and Session Controllers by
editing menus to allow selection of the new NFR analysis and adding any rules to the
Model Controller to select between different model sub-types and NPTs according to
task or agent/equipment types. Currently only one NFR can be analysed in a session;
however, several designs and scenarios can be analysed sequentially. The system
automatically aggregates results from lower-level phase views, to the scenario and
then system design level, allowing two or more designs to be compared using the
17
same set of scenarios. The system was developed in JAVA using JBuilder 9 (J2EE).
The user interface was implemented using Swing components while the model
controller interfaces with the HUGIN Decision Engine via the provided Java API. The
connection to the database uses JDBC.
5 NFR ANALYSIS METHOD
The process, illustrated in figure 5, starts by creating the system model, using the i*
modelling language, to describe the characteristics of agents, tasks, resources and soft
goals. Soft goals in this case constitute the NFRs under investigation, while resources
are the equipment used by the agent to perform the task. The domain knowledge
necessary for the development of the i* model is elicited from domain experts. NFRs
and their validation criteria are specified in the requirements specification (e.g. system
reliability should be >= 95% for a system design with a set of operational scenarios
1..n).
The next step converts scenarios, which are narrative stories, into a format that can be
executed by the system. This is achieved by extracting task sequences undertaken by
agents from the narrative. For example in the naval domain a missile attack scenario
narrative is The enemy aircraft launches a missile, which is detected by the ships
radar. The Radar Operator (RO) reports a hostile contact, speeds and bearing to the
Tactical Picture Complier (TPC) who estimates the direction and closing time of the
threat and notifies the incoming missile threat to the Principal Weapons Officer
(PWO). PWO decides to jam the missiles radar using electronic counter-measures
and issues the command to the Electronic Weapons Director (EWD) [continues].
Scenario narratives can contain implicit tasks which are not articulated because they
are tacit or assumed knowledge, therefore we apply generic task patterns [58] to
define the task sequence. In the above example the generic pattern for command and
control consists of five tasks Monitor (events), Interpret (threat), Analyse (situation),
Plan (response), and Act.
Using the scenario editor with the i* system model, test scenarios are constructed by
selecting the tasks that are explicit and implicit in the scenario narrative, so for the
above example the task sequence from the Monitor by RO to Plan by PWO followed
by Act (EWD) would be selected. Scenarios are composed of a number of phases and
18
each phase is composed of a number of task-steps, each one modelled as a

<Agent,Task,Technology> tuple.
Descriptions of
tasks, agents,
goals
Develop
system
model
Construct
scenarios
scenario
constructor
Compare
designs
survivors
bar chart
i* editor
Select
designs, scenarios
and NFRs
survivors
bar chart
backpropagation
Pinpoint
critical
tasks
Identify
improvements
Assess
environmental
variables
Identify
critical
components
results
visualiser
results
visualiser
Requirements
changes
Fig. 5: NFR analysis method, processes and tool support. Ellipses denote method steps,
ordered in a dependency sequence; boxes show tool components that support the method
step they are connected to.
In the above missile attack example, the narrative has four phases, each one
representing a command and control sequence: first electronic counter-measures are
tried; in the next phase the ship manoeuvres to avoid the threat; then fires decoys; and
finally destroys the hostile missile with a defensive missile. Phases are used to
structure task sequences that fulfil a higher order goal. Scenarios can be interactively
constructed by pointing to tasks on the system model editor display. The tool then
automatically creates a scenario task sequence by tracing the human and machine
agents involved with each task.
The Compare Design step finds the best system design using the system view bar
chart (see figure 6) to investigate the number of surviving runs for each task step.
Trade-offs between NFRs can be assessed by selecting different BN models (e.g.
reliability, performance time) from the Session Controller menu, while designs can be
19
compared by changing the database, which loads different technology and human
agents that represent a new design, and repeating the process. NFR thresholds can be
set at the users discretion so the tool allows the analyst to compare designs and
desired performance in a more flexible manner than if the variables had been hard
coded.
The best design will generally have more surviving BN runs (as defined in section 3);
however, it is also desirable that the design succeeds in all scenario steps. Each bar in
the system view (see figure 6) corresponds to the cumulative number of surviving
runs for each task-step in a scenario phase. The analyst can easily identify the best
design and pinpoint task steps with low NFR satisfaction rates by focusing on low
scores on the bar chart. Moving the cursor on top of any bar reveals the total number
of surviving runs for the task-step.
The bar chart identifies poorly performing task steps, which can be cross-referenced
to the human and machine agents involved. Right-clicking on top of any bar to reveal
the components involved. The domain database can then be queried to find the input
variables. The domain database has an annotation field so the analyst can record
reasons for settings, and refer to these when improvement may have to be made. The
BN models have a limited explanation facility of pop-up tool tips that summarise the
NPT influences (see appendix A) for each parent-child node combination. This
information is then used in the Identify Improvement step. Further advice on generic
requirements for technology to support particular tasks, and improving human
operation, is given in a related part of the toolset which we have described elsewhere
[57].
The best design also needs to be resilient to environmental conditions. This analysis is
supported by the results visualiser in the Assess Environment step. The results
visualiser uses colour coding to identify variables which adversely affect system risks
over a range of scenario steps. In the phase view the influences of environmental
variables on survivor runs are collated into a matrix (figure 6). Columns correspond to
the twelve environmental variables, and rows report the percentage scores that passed
the threshold. The impact of environmental variables is calculated as equation 6:
20
EP ( x )
IEP ( x) =
Qb
100
Q EP ( x ) All
[6]
Where,
Qb
EP ( x )
= Survivor runs with environmental variable (x) set to best case
Q EP ( x ) All = Total survivor runs for all settings

The matrixs colour coding denotes the level of importance of each parameter;
green designates a low risk parameter since it has been assigned to worst-case
most of the time. On the other hand red denotes the high risk due to the high
percentage of runs with best-case settings. Since the environmental variables which
were set to worst case did not degrade the NFR level below the threshold, if they are
set to best case they can only have a positive effect on the NFR. Conversely, with
variables that were set to best case during the NFR assessment, if set to worst case
they will decrease the NFR so it fails to pass the threshold level, therefore they are
indicated as a risk.
Selected
BN for
NFR
Test
Scenarios
Dynamic
task-steps
display,
depicting the
last phase of
a scenario
composed of
three tasksteps
1
Phase
view
System view
2 different
designs
Fig. 6: System visualisation showing the system and phase view of the operational
performance assessment. The Incentives column (1) is worst case (coloured red in display),
whereas the Light column (2) is better than average (yellow) and other columns are average
(orange). In this run no best-case (green) runs survived.
21
In the Identify Improvements step, if an overall design or a particular task step fails to
meet the desired NFR threshold then the back propagation analysis is used to set the
desired NFR value and the BN is back-propagated to discover the necessary settings
of agent or environmental variables to achieve the NFR value. Back propagation can
be used in two modes: all input nodes unconstrained, in which case the BN calculates
the input values required to achieve the user-determined output NFR; or one/few input
nodes unconstrained, in which case the BN calculates the values for these nodes given
settings for the constrained nodes. Back-propagation is usually hypothesis-driven to
focus on where design improvement could be made, so many variables are left with
their original settings, with a few nodes left unconstrained.
The results from the back propagation are compared with the properties of the original
component in order to identify the level of improvement required. For instance, if the
usability of the radar is set to 0.65 (actual) in the database and the assessed usability
from the back propagation is 0.83 (estimated) to achieve the desired NFR for
reliability of 0.85, then the required level of improvement is 0.18, i.e. 0.83 minus
0.65.
Figure 7 depicts the back propagation of the Operational Performance model using an
input set of environmental variables, the agent properties and the required NFR values
defined by the requirement specifications. The monitor windows on top of system
environment, human agent and NFR notes show the input variables. The monitor
windows on top of technology influences depict the distribution of the output nodes.
22
Desired NFR value input
Changed functionality
variable output
Fig. 7: Back propagating the BN to identify the cause of the NFR effect in terms of technology
characteristics (influence of each one). A sub-set of the Operational Time performance net is
illustrated.
6 CASE STUDY
This case study describes the application of the SRA tool in validating the operational
performance and system reliability of a complex socio-technical system. The
requirements question is to assess the impact of new automated technology on the
task of loading weapons on to aircraft in an aircraft carrier. A description of the
human roles used in the following scenario is provided in table 2 and the technology
components are listed in appendix B.
A request for an air mission arrives in the control room from Carrier Group Strategic
Command. The mix of weapons/fuel tanks/electronic counter-measures pods, etc. is
planned according to the mission type and aircraft assigned to the mission. The Air
Planning Officer (APO) plans the required weapons load and schedules the loading
with the Deputy Air Planning Officer (DAPO). The load plan is communicated to the
Magazine Weapons Supervisor (MWS). The MWS plans the retrieval of weapons
23
from the magazine and the Magazine Artificer (MA) retrieves the weapons and places
them on a trolley. The trolley is placed on the hoist which lifts it to the flight deck.
The trolley is then moved by the Weapons Artificer (WA) to the specified aircraft.
The Weapons Team Supervisor (WTS) is responsible for organising the WA teams. A
number of checks are performed by the Weapons Loading Controller (WLC) prior to
the loading of the weapons, e.g. check that the aircraft is properly grounded and
engine power is set to off; visually inspect the wing rack to ensure safety pins are
placed and the rack is locked; verify that all cockpit armament selectors are in the off
or safe position. On completion of safety checks the WA positions the trolley under
the aircraft wing, orients the trolley under the desired rack, lifts into position and
attaches the weapons. The trolley has a pneumatic pump to hoist the weapon up to the
wing; however, the final load-and-secure is manual and requires two or more WAs
depending on weapon weight. The process is repeated for the rest of the weapons. On
completion of the loading process the WLC tests the connections between the
weapons and the rack, then the WA removes the trolley. Finally the WLC inspects the
weapons before arming them and reporting completion to the Flight Deck Supervisor.
The process is usually carried out concurrently with two teams, one per aircraft wing.
Table 2. Description of the agent roles.

Roles
Description
APO
Air Planning Officer is responsible for the planning of the weapons load
according to missions requirements
DAPO
Deputy Air Planning Officer is accountable to the APO. Responsible for the
planning of weapons load and communicating the plan to the magazine
MWS
Magazine Weapons Supervisor is responsible for the effective management of

the MAs and the planning of the weapons retrieval
MA
Magazine Artificer is responsible for the retrieval of weapons from the

magazine and loading on the transportation equipment
WTS
Weapons Team Supervisor is responsible for the effective management of the

weapons loading team
WA
Weapons Artificer is responsible for handling weapon systems on the flight

deck and elsewhere
WLC
Weapons Loading Controller manages the flight deck weapon loading process
The scenario task-steps and components used for two prospective designs are shown
in appendix B. Tasks in Design 1 are manual or semi-automated, while in Design 2
they are semi- or fully automated; for instance, the task Transfer weapons to aircraft
has becomes specialised into Move trolley to aircraft and Drive autoload palette to
24
aircraft. The autoload palette has image sensors to detect the correct position on the
aircraft wing and knowledge of the aircraft and weapon type, so it can automatically
hoist and connect the weapons. The second design saves manpower since it can be
operated by one WA, and is potentially more rapid to operate, but it is more
expensive. The systems engineer needs to compare the two designs with a sensitivity
analysis to test different assumptions.
The analyst can easily pinpoint the more reliable design by focusing on the
comparison in the system view. Overall most of the tasks were more reliable in
Design 2 (advanced technology) at the rear of the bar chart in figure 6; however, tasks
Schedule load and Report task completion had more survivors and hence better
reliability in Design 1. Also both designs had poor reliability for Move trolley to
aircraft and the following checking tasks, so these are critical tasks that warrant
further attention. The two designs have equal and acceptable reliability for the Load
Planning task even though Design 2 was automated. Inspection of the agents
properties and the BN tables shows that the information accuracy and maintenance
technology properties were set to poor because the planning system was a new
prototype, hence the improvement from automation was small. The poor reliability of
Move trolley to aircraft in both designs is a consequence of the effect of
environmental variables on human operation. This can be seen in the phase view in
figure 6 which shows that this task and load planning both suffer from adverse
environmental influences. Moving the trolley is primarily a manual task, so the
system selects the NPT tables which minimise the influence of the technology
component; in the Design 2 autoload palette, poor maintenance settings for new
technology reduce the advantage of automation. The adverse environmental
influences on human and machine agents are present for both designs, reflecting the
experience that manoeuvring equipment on a pitching aircraft carrier deck (sea
variable setting) is prone to error. Similarly the subsequent four checking tasks are all
manual and exposed to reliability influences from motivation (slips when not paying
attention) and interruptions in a busy flight deck environment (concurrency variable).
Solutions require human factors knowledge, which might suggest double checks to
improve reliability or improved design to support checking by augmented reality
display of reminders, location of objects to check, etc.
25
Fig. 8: Task completion time for each task in both designs. The lower part of the bar is the
best case time; the upper part is the estimated time taking agent and environment variables
into account.
When the operational performance times are compared (see lower bars at the rear of
figure 8), Design 2 is quicker for nearly all tasks, which is not surprising since it has
more automated tasks. The projected increase from the best case task completion
times for Design 1 reflects the effect of the same variables that also caused poor
reliability.
Completion times for Plan and Schedule load tasks are long for both designs, which
might seem strange since Design 2 partially automated both tasks. However, best case
time even after automation is still long, since human checking is necessary to verify
automated decision making. The projected actual times reflect the poor reliability of
both designs, which can be traced to poor rating of information provided by the
technology, reflecting uncertainty under operational conditions. Most tasks have more
rapid best-case and estimated times in Design 2 because automated processes are
quicker and the time advantage is not changed by the effect of poor reliability in some
tasks, e.g. Planning, Scheduling, and Move trolley to aircraft.
The next step is to consider the critical environmental variables for both designs,
illustrated in figure 9. Figure 9a shows that incentives, motivation, duty time
concurrency, and time constraints were all marked as vulnerable for Design 1. Design
2 (figure 9b) in contrast fares better with only motivation, concurrency and
maintenance marked as vulnerable. Maintenance becomes a concern for the second,
more highly automated design and this reflects the NPTs selected for different levels
of automation. Cures as before require human factors knowledge; however, some
suggestions which can be found in the system database are to increase motivation to
improve crew morale, or provide incentives for these roles. Concurrency is difficult to
26
cure since so many tasks are prone to interruptions, while the effect of maintenance
depends on the system engineers judgement about the effectiveness of planned
maintenance. The tools role is to point out the problem which can be cured by
changed procedures, and management decisions such as to increase investment in low
maintenance equipment.
Fig. 9(a): Environmental influences for Design 1. The arrow points to critical task. Red (darker
shading) indicates adverse environmental variables.
Fig. 9(b): Environmental influences for Design 2. The arrow points to the critical task.
After identifying the most appropriate design, the problematic tasks and the critical
environmental variables, the analyst investigates the improvements required for the
Autoload palette component, which was the weakest link in Design 2. Using the backpropagation facility, the minimum acceptable reliability is set in the output node, and
the nodes where design or operational environmental changes can be made are left
unconstrained.
27
Fig. 10: Tuple components suggested improvements for Design 2. The circled cells
correspond to the required improvements for the generic task Drive autoload palette to
aircraft. Dark-filled cells represent properties that are not applicable to the component.
In this case, equipment maintenance (already identified as a vulnerability) and the

human operators experience (the only way to overcome difficult carrier deck
operations) are selected. The BN shows that maintenance needs to be improved by
50%, and operators experience by 26% (see figure 10). Translating these into specific
needs requires domain expertise; however, the tool does quantify the degree of
improvement and this can be empirically tested by setting targets in a prototype
system.
7 VALIDATING THE BN MODELS
We used data mining techniques to test the assumptions embedded in the BN models
to map the expected influences elicited from domain experts and theory. We
simulated all possible permutations of the input model variables and created a
database of reliability and performance time predictions for these runs. This produced
an extensive set of test data; for example, for one scenario composed of four phases
with six task steps in each phase the tool generated 4*6*312 records. The BN models
NPT and the causal influences were analysed with the following data mining
techniques: relevance analysis, association rules and classification [25]. Relevance
analysis ranks input parameters of the model based on their relevance to one of the
models output parameters (e.g. reliability in our BN). Association rules describe how
often two or more facts co-occur in a data set and were employed to check the causal
associations in our model. Classification partitions large quantities of data into sets
with common characteristics and properties and was used to provide a further check
on the structure of the BN models.
The initial assumptions made about influences on system reliability and operational
performance were mainly satisfied. However, the relevance analysis revealed that sea
state had only a minor influence on system error, although according to domain
28
experts, it is a major influence on human error. Several intermediate nodes had diluted
the influence of sea state on system error nodes so it was necessary to alter the BN
causal diagram. The two BN models for assessing operational performance with
different levels of automation showed a similar influence of maintenance on
operational performance, which should not be the case. These inaccuracies were
addressed by altering the BNs NPTs to increase the prior probability influence for
poor maintenance on automated tasks.
Association analysis identified two rules with high significance levels that were not
explicitly defined in the model:
IF (DutyTime = High) THEN (Survived = Fail)
IF (Workload) = High) THEN (Survived = Fail).
These rules indicated that the causal influences of Duty Time and Workload were
higher than the influences in the BN what had been specified by the domain experts.
In order to overcome this problem we altered the NPT settings to reduce the
weighting of these nodes and increase the influence of the Distractions node that
appeared weak. Finally, classification analysis pinpointed problems with crew
motivation and agent ability nodes which suggested changes to the BN model
structure.
8 DISCUSSION AND CONCLUSIONS
The main contribution of this research has been to develop automated testing of
requirements specifications and designs for conformance to non-functional
requirements using a set of scenarios and variations in the system environment. This
is a considerable advance over existing tools which support validation of NFRs by
inspection of models [41]. Our automated scenario-based testing tool explicitly
considers environmental influences, and provides visualisations for pinpointing
problematic tasks and components within a design and scenario sequence. The
technology is applicable to problems where requirements are expressed as properties
of components, such as the human and machine agents in our system engineering
domain. However, the configuration costs of the BNs will limit the cost effectiveness
of the technology for new green-field requirements engineering problems; on the
other hand it should pay back in brown-field domains where designs are incrementally
refined, and the set-up costs can be amortised over many generations of testing.
29
More generally the SRA could be applied to any class of component-based problems
where the selection of components needs to be optimised for non-functional
requirement types of criteria. The architecture is modular and scalable, allowing new
NFRs to be investigated by plugging in the appropriate BN. Our work presents a
new view on component-based model-checking using BNs which could, in principle,
be applied to model-checking requirements at lower levels of granularity, such as
black-box software component configuration. The BN approach could apply to any
domain where requirements attributes can be synthesised into a predictive model of
performance, effectiveness, or other non-functional requirements. It can be applied to
problems that can be described by a set of sequential tasks, for instance checking
workflow systems expressed as sequential tasks/functions undertaken by a
collaboration between human and software agents.
The SRA tool was a development from our previous BN requirements analyser [26],
and has partially addressed the difficult problem of scenario-based testing [4, 63x].
Although there is no substitute for domain expertise in generating or acquiring
scenarios, our approach can amplify scenario-based validation by systematically
testing a set of assumptions that are implicit within scenarios. This enables areas of
concern to be pinpointed, as well as enabling trade-off analysis between alternative
designs. However, the fidelity of testing depends on the accuracy and sophistication
of the BN models. There is no quick solution to validating complex models of human
error and environmental influences on system failure since exhaustive experiments on
complex systems can never be complete; incorporating human factors into assessment
of systems or user interfaces has to rely on models constructed from theory and
domain expertise [30, 35, 53]. We have followed both approaches in constructing BN
models.
The SRA tool is aimed at requirements investigation in complex socio-technical

systems, and hence it complements model-checking tools which are more appropriate
to later stages in development when specifications of agent behaviour are available,
e.g. SpecTM-RL [38], KAOS-GRAIL [66, 67]. Other scenario-based requirements
analysis tools such as ARTSCENE [56] help to automatically generate scenario
variations
by
pathway
expansion
algorithms
that
trace
normal
and
alternative/exception paths through use cases, but no validation support is provided

30
beyond suggestions for generic requirements which may be applications for different
scenario events.
The use of BNs by Fenton et al. in their work on software metrics and risk analysis
[12, 15, 18] is closely related to our approach. However, they employed BNs to assess
the quality of software systems based on the properties of system specifications,
development process and code. Their use of BNs assumes a static view whereas we
have extended Bayesian tests for a dynamic view in operational scenarios by
introducing the notion of test survivors to avoid the problems of Bayesian reasoning
over multiple sequence states. We do not consider operational testing with scenarios.
In the JSIMP tool Fenton and Cates [14] provide predictions of project failures based
on BN analysis of project management practices. Users enter scenario information via
a questionnaire interface and obtain probability distributions of unknown variables
using the back-propagation facilities of BNs, also incorporated within our tool.
Although the JSIMP tool has an end-user interface that hides complexities of the BN
from the user, it does not include sophisticated visualisation facilities to compare with
our SRA tool, which allows the analyst to assess multiple model assessments over a
variety of scenario sequences and environmental conditions.
There is no shortage of scenario-based tools for requirements validation and

verification; however, all these tools use more detailed specifications of system
behaviour which will not exist in the early stages of the requirements process or
domains with black-box component-based design. For instance, Ryser and Glinz [55]
convert natural language scenarios into statecharts which in turn are used to generate
test cases used for system validation. In common with our tool the scenario
conversion process is manual and labour intensive, so one future direction in our work
will be to investigate information extraction tools [57] which may be able to partially
automate generation of scenario event sequences from text-based narratives. Like the
ARTSCENE environment, the SCENT method [53] only provides automated
derivation of possible test cases, and no assistance in validation of requirements
specifications. Zhu and Jin [71] also used formalised scenarios for validating
requirements based on the principles of activity lists [2] but did not provide any
validation for non-functional requirements.
31
Although our approach has delivered an analysis tool for investigating system
requirements, there are some limitations in its applicability. First we make the
assumption of single-threaded tasks. While this is true for highly trained military
domains in event-driven scenarios, it will not be the case in domains where
opportunistic behaviour is the norm. Another simplification is that we do not model
concurrency and communication in our scenarios. Since our scenarios are singlethreaded, concurrency is not a severe problem; furthermore, we argue that the SRA
tool uses approximate models so its value lies not in diagnosis of a completely
realistic task model but rather in comparative assessment of two (or more) different
designs using the same set of scenarios and analysis approach. Given these
limitations, the SRA provides a reasonable trade-off between modelling effort and
diagnostic power. However, in our ongoing research we are investigating concurrent
scenarios and communication within the BN analysis.
REFERENCES
[1]
J. S. Anderson and B. Durley, Using scenarios in deficiency-driven requirements

engineering, presented at Requirements Engineering RE'93, 1993.
[2]
J. S. Annett and K. D. Duncam, Task analysis and training design, Occupational

Psychology, vol. 41, pp. 211-221, 1967.
[3]
R. W. Bailey, Human Performance Engineering: A Guide for System Designers. Englewood

Cliffs NJ: Prentice Hall, 1982.
[4]
J. M. Carroll, Scenario-based design: Envisioning work and technology in system

development. New York.: Wiley, 1995.
[5]
J. M. Carroll, M. B. Rosson, G. Chin, and J. Koenemann, Requirements development in

scenario-based design, IEEE Transactions on Software Engineering, vol. 24, pp. 1156 1170, 1998.
[6]
K. Casey and C. Exton, A Java 3D Implementation of a Geon Based Visualization tool for
UML, presented at PPPJ, Kilkenny, Ireland, 2003.
[7]
S. Cunning, J., Test scenario generation from structural requirements specification,

presented at Symposium on Engineering of Computer-Based Systems (ECBS '99), Nashville,
TN, USA, 1999.
[8]
A. Davis and P. Hsia, Giving voice to requirements engineering, IEEE Software, vol. 11,
pp. 12-16, 1994.
[9]
J. C. S. do Prado Leite and L. M. Cysneiros, Nonfunctional Requirements: From Elicitation

to Conceptual Models, IEEE Transactions on Software Engineering, vol. 30, pp. 328-350,
2004.
32
[10]
P. Dubois, E. Dubois, and J. Zeippen, On the Use of a Formal Representation, presented at

3rd IEEE International Symposium on Requirements Engineering, Los Alamitos CA, 1997.
[11]
G. Engels, Model-Based Verification and Validation of properties, Electronic Notes in

Theoretical Computer Science, vol. 82, 2003.
[12]
N. Fenton, Applying Bayesian belief networks to critical systems assessment, Critical

Systems, vol. 8, pp. 10-13, 1999.
[13]
N. Fenton, A critique of software defect prediction models, IEEE Transactions on Software

Engineering, vol. 25, pp. 675-689, 1999.
[14]
N. Fenton and P. Cates, JSIMP: BN model and tool for the SIMP project, Queen Mary
(University of London), London 30 July 2003.
[15]
N. Fenton, P. Krause, and M. Neil, Software Measurement: Uncertainty and Causal

Modeling, IEEE Software, vol. 10, pp. 116-122, 2002.
[16]
N. Fenton and B. Littlewood, Software reliability and metrics: Elsevier, 1991.
[17]
N. Fenton and N. Maiden, Making Decisions: Using BNs and MCDA. London.: Computer
Science Dept, Queen Mary and Westfield College, 2000.
[18]
N. Fenton and M. Neil, Software metrics: successes, failures and new directions, Journal of
Systems Software, 2000.
[19]
N. Fenton and S. L. Pfleeger, Software Metrics: A Rigorous Approach. London: International

Thomson Computer Press, 1997.
[20]
A. Fuxman, M. Pistore, J. Mylopoulos, and P. Traverso, Model Checking Early

Requirements Specifications in Tropos, presented at International Symposium on
Requirements Engineering 01, Toronto, Canada, 2001.
[21]
J. Galliers, S. Sutcliffe, and S. Minocha, An impact analysis method for safety-critical user
interface design, IEEE Transactions on Software Engineering, vol. 6, pp. 341-369, 1999.
[22]
A. Gemino, Empirical comparison of animation and narration in requirements validation,

Requirements Engineering, vol. 9, pp. 153-168, 2003.
[23]
A. Grau and M. Kowsari, A validation system for object-oriented specifications of

information systems, presented at 1st East European symposium on advances in databases
and information systems (ADBIS '97), St Petersburg, 1997.
[24]
A. Gregoriades, J. E. Shin, and A. G. Sutcliffe. Human-centred requirements engineering. In

Proceedings: RE 04, Kyoto Japan,. Los Alamitos CA: IEEE Computer Society Press, pp154164, 2004.
[25]
A. Gregoriades, A. G. Sutcliffe, and H. Karanikas, Evaluation of the SRA Tool Using Data
Mining Techniques, presented at CAiSE 2003, Klagenfurt/Velden, Austria, 2003.
[26]
A. Gregoriades, A. G. Sutcliffe, and J. E. Shin, Assessing the Reliability of Socio-technical

Systems, presented at 12th Annual Symposium INCOSE, Las Vegas, USA, 2002.
[27]
K. M. Hansen, A. P. Ravn, and V. Stavridou, From safety analysis to software requirements,

IEEE Transactions on Software Engineering, vol. 24, pp. 573 - 584, 1998.
[28]
P. Haumer, K. Pohl, and K. Weidenhaupt, Requirements elicitation and validation with real
world scenes, IEEE Transactions on Software Engineering, vol. 24, pp. 1036-1054, 1998.
33
[29]
C. Heitmeyer, J. Kirby, and B. Labaw, Applying the SCR requirements method to a weapons
control panel: An experience report, presented at FMSP 98, Clearwater Beach, Florida, USA,
1998.
[30]
E. Hollnagel, Cognitive Reliability & Error Analysis Method: Elsevier Science, 1998.
[31]
E. Hollnagel, Human Reliability Analysis Context and Control. New York: Academic Press,
1993.
[32]
E. Hollnagel, The phenotype of erroneous actions: Implications for HCI design, in Humancomputer Interaction and complex systems, G. Weir and J. Alty, Eds. London: Academic
Press, 1990.
[33]
P. Hsia, A. Davis, and D. Kung, Status Report: Requirements engineering, IEEE Software,
vol. 10, pp. 75-79, 1993.
[34]
R. Jeffords and C. Heitmeyer, A strategy for efficient verifying requirements specification

using composition and invariants, presented at ESEC/FSE 03, Helsinki, Finland, 2003.
[35]
B. I. Kirwan, A Guide to Practical Human Reliability Assessment. London: Taylor and

Francis, 1994.
[36]
V. Lalioti, Animation for validation of business system specifications, presented at Hawaii

International Conference on System Sciences 30, The dynamics of business systems
engineering, Wailea, Hawaii, January 1997, pp 7-10, 1997.
[37]
V. Lalioti and P. Loucopoulos, Visualisation of conceptual specifications., Information

Systems, vol. 19, pp. 291-309, 1994.
[38]
N. G. Leveson, Intent specifications: an approach to building human-centered

specifications, IEEE Transactions on Software Engineering, vol. 26, pp. 15 - 35, 2000.
[39]
N. G. Leveson, Safeware: System Safety and Computers. Reading, MA.: Addison Wesley,
1995.
[40]
J. Mylopoulos, L. Chung, and B. Nixon, Representing and using non-functional

requirements: A process oriented approach, IEEE Transactions on Software Engineering,
vol. 18, pp. 483-497, 1992.
[41]
J. Mylopoulos, L. Chung, and E. Yu, From Object-Oriented to Goal-Oriented Requirements

Analysis, Communications of the ACM, vol. 42, pp. 1-7, 1999.
[42]
M. Neil, N. Fenton, and L. Nielsen, Building large-scale Bayesian Networks, The

Knowledge Engineering Review, vol. 15, pp. 257-284, 2000.
[43]
B. Nixon, Management of performance requirements for information systems, IEEE

Transactions on Software Engineering, vol. 26, pp. 1122 - 1146, 2000.
[44]
D. Norman, The psychology of everyday things. New York: MIT Press, 1988.
[45]
J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Information.

San Francisco: Morgan Kaufmann, 1988.
[46]
C. Potts, ScenIC: A strategy for inquiry-driven requirements determination, presented at

RE'99: International Symposium on Requirements Engineering, Limerick, Ireland, 1999.
[47]
C. Potts and A. Anton, A Representational Framework for Scenarios of System Use,

34
[48]
C. Potts, K. Takahashi, and A. Anton, Inquiry-Based Requirements Analysis, IEEE

Software, vol. 11, pp. 21-32, 1994.
[49]
C. Potts, K. Takahashi, J. Smith, and K. Ota, An Evaluation of Inquiry-Based Requirements

Analysis for an Internet Service, presented at Second International Symposium on
Requirements Engineering, York, UK, 1995.
[50]
J. Rasmussen, Human Error and the Problem of Causality in Analysis of Accidents,

Philosophical Transactions of the Royal Society of London Series B - Biological Sciences, vol.
327, pp. 449-462, 1990.
[51]
J. Rasmussen, Skills, rules, knowledge; signals, signs, and symbols; and other distinctions in
human performance models, IEEE Transactions on System Man and Cybernetics, vol. 13,
pp. 257-266., 1983.
[52]
J. Reason, Human Error. New York: Cambridge University Press, 1990.
[53]
J. Reason, Managing the Risks of Organizational Accidents. Ashgate: Aldershot, 2000.
[54]
C. Rolland, C. Souveyet, and C. B. Achour, Guiding goal modeling using scenarios, IEEE
Transactions on Software Engineering, vol. 24, pp. 1055 - 1071, 1998.
[55]
J. Ryser and M. Glinz, A scenario-based approach to validating and testing software systems
using statecharts, presented at 12th International Conference on Software and Systems
Engineering and their Applications ICSSEA' 99, Paris, France, 1999.
[56]
N. Seyff, P. Grunbacher, N. Maiden, and A. Toscar, Requirements engineering tools go

mobile, presented at International conference on software engineering (ICSE 04), Scotland,
2004.
[57]
J. E. Shin, A. Sutcliffe, and A. Gregoriades, Scenario Advisor Tool for Requirements

Engineering, Requirements Engineering, vol. Online
http://www.springerlink.com/app/home/journal.asp?wasp=m3tlwhruwl4u54qmhqvl&referrer=
parent&backto=linkingpublicationresults,1:102830,1, 2004.
[58]
Sutcliffe, A.G. The Domain Theory: Patterns for Knowledge and Software Reuse. Mahwah
NJ: Lawrrence Erlbaum Associates, 2002.
[59]
A. G. Sutcliffe, J. Galliers, and S. Minocha, Human Errors and System Requirements,

presented at 4th IEEE International Symposium on Requirements Engineering, Los Alamitos,
1999.
[60]
A. G. Sutcliffe and A. Gregoriades. Validating Functional System Requirements with

Scenarios. In Proceedings of 1st IEEE Joint International Conference on Requirements
Engineering, RE02, Essen, Germany Sept 2002, Eds Greenspan S., Siddiqi J., Dubois E. and
Pohl K., pp 181-190. Los Alamitos CA: IEEE Computer Society Press, 2002.
[61]
A. Sutcliffe, N. Maiden, S. Minocha, and M. Darrel, Supporting scenario based requirements

engineering, IEEE Transactions on software engineering, vol. 24, pp. 1072-1088., 1998.
[62]
A. G. Sutcliffe and G. Rugg, A taxonomy of error types for failure analysis and risk
assessment, International Journal of Human Computer Interaction, vol. 10, pp. 381-406.,
1998.
[63]
A. Sutcliffe, G. and M. Ryan, Assessing the Usability and Efficiency of Design Rationale,
presented at Human Computer Interaction INTERACT-97, IFIP/Chapman and Hall, 1997.
[64]
A. D. Swain and H. Guttmann, Handbook of human reliability analysis with emphasis on

nuclear power plants applications, Nuclear Regulatory Commission, Washington, DC 1983.
35
[65]
A. van Lamsweerde, Goal-Oriented Requirements Engineering: A Guided Tour, presented

at Fifth IEEE International Symposium on Requirements Engineering (RE '01), 2001.
[66]
A. van Lamsweerde, Goal-oriented requirements engineering: a roundtrip from research to

practice, presented at Requirements Engineering Conference, Kyoto, Japan, 2004.
[67]
A. van Lamsweerde and E. Letier, Handling obstacles in goal-oriented requirements

engineering, IEEE Transactions on Software Engineering, vol. 26, pp. 978 - 1005, 2000.
[68]
M. Visser and P. A. Wieringa, PREHEP: human error probability based process unit
selection, IEEE Transactions on Software Engineering, vol. 31, pp. 1 - 15, 2001.
[69]
D. Wright and K. Cai, Representing uncertainty for safety critical systems, City University,
London 1994.
[70]
E. Yu and J. Mylopoulos, Towards Modelling Strategic Actor Relationships for Information

Systems Development, with Examples from Business Process Reengineering, presented at
4th Workshop on Information Technologies and Systems, Vancouver, B.C., Canada, 1994.
[71]
H. Zhu and L. Jin, Scenario analysis in an automated tool for requirements engineering,
[72]
H. Ziv and D.J. Richardson. Constructing Bayesian-network Models of Software Testing and
Maintenance Uncertainties, International Conference on Software Maintenance, Bari, Italy,
September 1997.
36
Appendix A: BN models: summary of input nodes and measurements

Node
Description + measure
Worst-case settings
Noise
Ambient noise: decibels (dB)
>100 dB (good <50 dB)
Lighting
Ambient lighting: lux, or legibility of

small 10 pt text
10 pt text not legible at 20 cms
Comfort
Ambient temperature
temperature <15C or >35C
War/peace
War or peace status on 1 to 4 scale,

peacetime to war
War emergency
Sea state
Sea state and hence ship roll and pitch,

measured on Beaufort scale 1 to 9
Beaufort force >8
Visibility
Visibility from vessel in nautical miles
<1 nautical mile
Workload
Agents workload
>3 concurrent tasks
Duty time
Agents time on duty and at sea
>3 months continuously at sea
Fatigue
Time on watch, weighted by war/peace
>7 hours on duty at high alert
Time constraints
Time available to complete a task
Response necessary <1 min
Incentives
Incentives: measured by job satisfaction

questionnaire
No incentives to improve, rating

<2 on 1 to 7 (best) scale
Management culture
Management culture: job satisfaction

questionnaire
No leadership, little motivation

or responsibility, rating <2 on 1
to 7 (best) scale
Functionality
Support for users task: equipment

satisfaction questionnaire or expert
assessment of technical specification
Rating of useful features <2 on 1

to 7 scale where 7 is excellent
Performance
Expert assessment of technical

performance
e.g. threat detection/destroy

probabilities fail to meet
minimum requirements
Reliability
Reliability history: mean time between

failures
MTBF >1 in 10 hours operation
Usability
Usability measured by questionnaire

rating or usability testing
>5 errors committed by 95%

users following test task
Distraction
Distractions to normal task operation
>5 interruptions/min
Internal motivation
Agents internal motivation assessed by

questionnaire or task performance test
Rating <2 on 1 to 7 motivation

questionnaire, 7 excellent
Cognitive complexity
Cognitive complexity of the task: NASA

TLX
Cognitive complexity measure

>10 on TLX scale
Physical complexity
Physical complexity measured by number

of manipulations, precision, and difficulty;
expert assessment; or operational time
Physical complexity upper 10%

of distribution of task assessed
Inherited ability
Agents inherited ability: IQ test or

aptitude questionnaire
Agents score <25% or in lowest

10% of test score distribution
Task knowledge
Agents task knowledge: quiz score or

performance test

Domain knowledge
Agents domain knowledge: quiz score

37
Appendix B: Alternative designs for the aircraft carriers aircraft weapons loading system
Tasks
Design 1: Manual
Agent
Plan weapons load for mission
Technology
Design 2:
Increased Automation
Agent
Technology
APO
Weapons aircraft
availability display
APO
Automated weapons
aircraft allocation
system
Schedule weapons load

sequence
DAPO
Flight deck display
APO
Aircraft weapons load

scheduler
Communicate load plan to

flight deck and magazine
DAPO
Radio
DAPO
Data link
Plan weapons retrieval
MWS
Weapons layout chart
MWS
Weapons layout chart
Retrieve weapons from

magazine
MA
Weapons retrieval trolley
MWS
Weapons retrieval robot
Load weapons onto

transporter
MA
Weapons trolley
MA
Weapons autoload
palette
Place transporter on hoist
MA
Weapons trolley
MA
Weapons autoload
palette
Operate hoist
MA
Hoist
MA
Autoload hoist
Transfer weapons to aircraft
WA
Weapons trolley
WA
Weapons autoload
palette
Check aircraft is grounded
WTS
Ground cable indicator
WTS
Ground cable indicator
Check safety pins
WTS
Safety pins
WTS
Safety pins
Check armament is set to off
WTS
Armament indicator
WTS
Armament indicator
Check power is off
WTS
Power indicator
WTS
Power indicator
Position weapons loading

equipment under aircraft wing
WA
Weapons loading trolley
WA
Weapons autoload
palette
Orient weapons loading

equipment
WA
WA
Weapons autoload
palette
Lift and position weapons on

wing
WA
WA
Weapons autoload
palette
Test weapons connection
WTS
Aircraft weapon mounts
WTS
Weapons autoload
palette
Remove weapons loading

equipment
WA
WA
Weapons autoload
palette
Inspect weapons
WLC
Weapon racks and mounts
WLC
Weapon racks and

mounts
Arm weapons
WTS
Weapon controls
WTS
Weapon controls
Report load completion
WTS
Radio
WTS
Data link
38
Andreas Gregoriades holds a PhD and MPhil in Computer Science from UMIST
(University of Manchester Institute of Science and Technology). Currently he is
employed as a Research Fellow at the Surrey Defence Technology Centre (DTC). His
research interests cover Artificial Intelligence for smart Decision Support, Systems
Engineering, Human Reliability Assessment and Software Engineering. He has been
involved in a number of EPSRC and European R&D projects in the areas of Complex
Socio-technical Systems Design, Business Process Modelling and Simulation,
Requirements Engineering and Systems Reliability Assessment. He has also acted as
a reviewer for IEEE Transactions on Knowledge and Data Engineering and for
various International Conferences and Workshops.
Alistair Sutcliffe is Professor of Systems Engineering, in the School of Informatics,

University of Manchester. He has been principle investigator on numerous EPSRC
and European Union projects on requirements engineering, multimedia user
interfaces, safety critical systems and cognitive modelling for information retrieval.
He researches in Human Computer Interaction and Software Engineering. In HCI
particular interests are interaction theory, and user interface design methods for web
sites, multimedia, virtual reality, safety critical systems, and design of complex sociotechnical systems. In software engineering he specialises in requirements engineering
methods and tools, scenario based design, knowledge reuse and theories of domain
knowledge. Alistair Sutcliffe on the editorial board of ACM-TOCHI, REJ and JASE.
Alistair Sutcliffe and is the editor of the ISO standard 14915 part 3, on Multimedia
user interface design. He has over 200 publications including five books and several
edited volumes of papers and was awarded the IFIP silver core in 2000.
39
40

Functional Testing

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Functional Testing

Загружено:

Авторское право:

Доступные форматы

Scenario-based assessment of nonfunctional requirements

Andreas Gregoriades, Alistair Sutcliffe, Member, IEEE

Index TermsNon-Functional Requirements Validation, Scenario-Based Testing, Bayesian Belief Networks,

cenarios have attracted considerable interest as a means of validating requirements

Sspecifications [4, 5, 9, 54]. Foundations of scenario-based approaches were laid by

Scenarios have been applied to the analysis of non-functional requirements (NFRs)

Unlike functional requirements, which can be deterministically validated, NFRs are

To address this problem, we developed a semi-automated approach to requirements

The paper is organised in seven further sections. BN and uncertainty modelling is

Model-checking techniques have been used extensively to verify and validate

While tabular representations can improve communication of requirements, a

Scenario-based requirements analysis methods, pioneered by Potts [46, 47, 48],

Methods for requirements validation in safety critical systems have adopted

Intent specifications provide a hierarchical model to facilitate reasoning about system

Assessment of non-functional system requirements, such as system reliability, has to

evaluating the confidence which might be assigned to different combinations of test

In summary, BNs have been widely applied as a probabilistic reasoning technique in

Because of the uncertain nature of NFRs it is necessary to model them using

Fig. 1: Fragment of the proposed BN model.

= prior (input) probability of a

or, less formally:

P(duty = high /load = high) = (0.6 * 0.5) / 0.42 = 0.71

BNs are currently used in many applications to reason about probabilities of

3.1 BN MODEL OF SYSTEM RELIABILITY

The BN model of system reliability is based on a taxonomy of influencing factors by

accuracy/concentration can be measured by intelligence aptitude scales, decision

Whereas the Reliability BN produces probabilities of reliable completion for each

ET = (Plow * BT) + (Phigh) * WT)

Hence, if the probability of high operational performance is equal to 1 then the

Thsec = (Th% / 100) * BT + ((1- Th%) / 100) * WT

Thsec = Threshold in seconds

4 SRA SYSTEM ARCHITECTURE

The i* model editor allows interactive construction of i* models with typical

The Model Controller controls the BN models. It selects the appropriate BN

Fig. 4: System Requirements Analyser conceptual architecture and functional

each phase is composed of a number of task-steps, each one modelled as a

= Survivor runs with environmental variable (x) set to best case

Q EP ( x ) All = Total survivor runs for all settings

Desired NFR value input

Table 2. Description of the agent roles.

Magazine Weapons Supervisor is responsible for the effective management of

Magazine Artificer is responsible for the retrieval of weapons from the

Weapons Team Supervisor is responsible for the effective management of the

Weapons Artificer is responsible for handling weapon systems on the flight

In this case, equipment maintenance (already identified as a vulnerability) and the

7 VALIDATING THE BN MODELS

The SRA tool is aimed at requirements investigation in complex socio-technical

alternative/exception paths through use cases, but no validation support is provided

There is no shortage of scenario-based tools for requirements validation and

J. S. Anderson and B. Durley, Using scenarios in deficiency-driven requirements

J. S. Annett and K. D. Duncam, Task analysis and training design, Occupational

R. W. Bailey, Human Performance Engineering: A Guide for System Designers. Englewood

J. M. Carroll, Scenario-based design: Envisioning work and technology in system

J. M. Carroll, M. B. Rosson, G. Chin, and J. Koenemann, Requirements development in

S. Cunning, J., Test scenario generation from structural requirements specification,

J. C. S. do Prado Leite and L. M. Cysneiros, Nonfunctional Requirements: From Elicitation

P. Dubois, E. Dubois, and J. Zeippen, On the Use of a Formal Representation, presented at

G. Engels, Model-Based Verification and Validation of properties, Electronic Notes in

N. Fenton, Applying Bayesian belief networks to critical systems assessment, Critical

N. Fenton, A critique of software defect prediction models, IEEE Transactions on Software

N. Fenton, P. Krause, and M. Neil, Software Measurement: Uncertainty and Causal

N. Fenton and B. Littlewood, Software reliability and metrics: Elsevier, 1991.

N. Fenton and S. L. Pfleeger, Software Metrics: A Rigorous Approach. London: International