Вы находитесь на странице: 1из 6

Can Neural Networks be easily Interpreted in Software Cost Estimation?

Ali Idri and Taghi M. Khoshgoftaar Alain Abran


Empirical SoftwareEngineering Laboratory Software Engineering Management Laboratory
Department of Computer Science and Engineering Ecole de Technologie Supkrieure-ETS
Florida Atlantic University 1100, Notre-Dame Ouest,
Boca Raton, FL 33431 Montrkal, Canada, H3C 1K3
E-mail: {aidri, taghi}@cse.fau.edu E-mail: abran.alain@uqam.ca

Abstract There are two main advantages when using estimation by


artificial neural networks. First, it allows the learning from
Software development effort estimation with the aid of previous situations and outcomes. This learning criterion is
neural networks has generally been viewed with skepticism by a very important for cost estimation models because software
majority of the software cost estimation community. Although,
development technology is continuously evolving. Second,
neural networks have shown their strengths in solving complex
it can model a complex set of relationships between the
problems, their shortcoming of being ‘ black boxes’ models has
prevented them from being accepted as a common practice for dependent variable (such as, cost or effort) and the
cost estimation. In this paper, we study the interpretation of cost independent variables (or cost drivers). However, there are
estimation models based on a Backpropagation three 4ayer some shortcomings that prevent it fiom being accepted as a
Perceptron network Ou r proposed idea comprises mainly of the common practice in cost estimation modeling:
use of a method that maps this neural network to a fuzzy rule- Neural networks approach may be considered as ‘black
based system. Consequently, if the obtained fuzzy rules are boxes’. Consequently, it is not easy to understand and to
easily interpreted, the neural network will also be easy to explain its process to the users.
interpret. Our case study is based on the COCOMO’ 81 dataset.
The ability of neural networks to solve pmblems of high
complexity has been proven in classification and
1. INTRODUCTION categorization areas, whereas in the cost estimation field
we deal with a generalization rather than a classification
Estimation models in software engineering are used to problem.
predict some important attributes of future entities such as There are no guidelines for the construction of the neural
software development effort, software reliability, and networks topologies (number of layers, number of units
productivity of programmers. Among bese models, those per layer, initial weights, etc.)
estimating software effort have motivated considerable In this work, we deal with the first limitation of neural
research in recent years. The prediction procedure used by networks. Many researchers in different fields are hesitant
these softwareeffort models can be based on a to use neural nets because of their shortcoming of being
mathematical function, such as Esfort = a x sizeS or other ‘black boxes’, that is determining why an artificial neural
techniques such as artificial neural networks, analogy based networks makes a particular decision is a difficult task.
reasoning, regression trees, and rule -based induction This is a significant weakness, for without the ability to
models. In this paper, we are concerned with cost estimation produce comprehensible decisions, it is hard to trust the
models that are based on artificial neural networks. The reliability of neural networks that address real-world
artificial neural networks approach is inspired from problems. Consequently, we are convinced that the
biological nerve nets. An artificial neural network is explanation and interpretation of the knowledge stored in
characterized by its architecture, its learning algorithm and the architecture and the synapse weights of the neural nets
its activation functions. In general, for software cost are very important to gain the acceptance of practitioners.
estimation modeling, the most commonly adopted Our proposed idea comprises mainly of the use of a
architecture, learning algorithm and the activation function method that maps the neural network to a fuzzy rule -based
are respectively the feed-forward multklayer Perceptron, the system (FRBS). Consequently, if the obtained if-then fuzzy
Backpropagation algorithm and the Sigmoid function. Many rules are easily interpreted, then the neural network may
researchers have applied the neural networks approach to also be easily interpreted. Our case study is based on the
estimate software development effort COCOM0’81 historical dataset.
[5,13,16,17,18,19,20,21]. Most of their investigations have This paper is organized as follows: In Section 2, we
focused more attention on the accuracy of the approach present how the neural networks approach has been applied
when compared to other cost estimation techniques. Table 1 to software cost estimation. We also present the architecture
shows a summary of a few software effort prediction studies of the network that will be used in our case study. In Section
that used artificial neural networks. 3, we discuss the results obtained when using the network to

0-7803-7280-8/02/$10.O0 (92002 lEEE 1162


estimate the software development effort. In Section 4, we ‘very high’, and ‘extra-high’. Among these 17 attributes, we
briefly outline the principle of the Benitez’s method that have retained the KDSI and 12 other attributes that we had
will be used to extract the if-then fuzzy rules from our already represented by fuzzy sets [6]. The other attributes
network. In Section 5, we apply the Benitez’s method to our are not used in our case study because their description
network and we discuss the interpretation of the obtained proved insufficient for fuzzification. The fuzzy sets
fuzzy rules in software cost estimation. A conclusion and an associated to each selected attribute will be used in the
overview of future work conclude this paper. interpretation of the if-then fuzzy rules deduced from our
neural network.
2. ARTIFICIAL NEURAL NETWORKS FOR
Input layer Hidden layer Output layer
SOFTWARE COST ESTIMA TION

KDSI
Many different models of neural networks have been
proposed [13]. They may be grouped in two major RELY
categories. First, feed-forward networks, in which no loops Effort
ACAP
in the network path occur. Second, feedback networks that -
have recursive loops. The feed-forward multi-layer AEXP
Perceptron with the Backpropagation learning algorithm are
the most commonly used in the cost estimation field. In
these nets, neurons are arranged in layers and there are only
connections between neurons in one layer to the following. SCED
Figure 1 illustrates a possible network architecture
configured for software development effort estimation. The Figure 1: A neural networkarchitecture
for software development effort
network generates output (effort) by propagating the initial
inputs (cost drivers or project attributes) through subsequent
As mentioned earlier, the use of a neural network
layers of processing elements to the final output layer. Each
approach to estimate software development effort requires
neuron in the network computes a nonlinear function of its
certain decisions and choices about the architecture,
inputs and passes the resultant value along its output. The
learning algorithm and the activation functions. In our case,
favored activation function is the Sigmoid function:
the method, that we use to generate the if-then fuzzy rules
I
m=-
I
(1) from the neural network, requires that the architecture must
be a three-layer Perceptron with the activation functions of
Before the network is ready to make estimates for new the hidden layer and the output layer are the sigmoid and the
software projects, it is trained by a set of combinations of identity functions, respectively. Our neural network has 13
inputs and outputs that are known as the training data. Our inputs (COCOMO cost drivers) and one output
case study consists of estimating the software development (development effort). All the inputs as well as the output of
effort by using the neural networks approach on the the network are numeric. The inputs are normalized to speed
COCOM0’8 1 dataset. The COCOM0’8 1 dataset contains up the training process of the network [14]. The network is
63 software projects [2,3,4]. Each project is described by 17 trained by iterating through the training data many times.
attributes: the software size measured in KDSI (Kilo The learning algorithm used is Backpropagation with
Delivered Sources Instructions), the project mode defined as teaching rate and maximum error are equals to 0,03 and lo’,
either ‘organic’, ‘semi -detached’ or ‘embedded’, and the respectively.
remaining 15 attributes are measured on a scale composed
of six linguistic values: ‘very low’, ‘low’, ‘nominal’, ‘high’,

I Study I Learning algorithm I Dataset [ Number [ Predicting I Results 1


of projects
Venkatachalam Backpropagation COCOMO 63 Development effort Promising
Wittig &Finnie Backpropagation Desharnaid 81 Effort MMRE= 17%

TABLE 1: SUMMARY OF NEURAL NETWORK STUD1 ES (171

0-7803-7280-8/02./$10.00 02002 ZEEE 1163


Finally, Backpropagation assumes that weights in the used to extract the if-then fuzzy rules from our network [I].
network are initialized to small, random values prior to In the next section, we present this method.
training. According to the architecture of our network,
the effort (output) is given by: 4. EQUIVALENCE BETWEEN NEURAL
NETWORKS AND FUZZY R ULES-BASED
h n SYSTEMS: BACKGROUND
Effort = x z j J j with zj= f(xwjjxi)
j=l i=I Since its foundation by Zadeh in 1965 [22], fuzzy logic
where f is the sigmoid function, wij are the weights of has been the subject of many investigations. One of its main
contributions to solve complex problems is undoubtedly the
the connections from the inputs layer to the hidden layer
Fuzzy Rule-Based Systems. Basically, an FRBS is based on
whereas fi are those from the hidden layer to the output a set of if-then fuzzy rules. A fuzzy rule is an if-then
layer.
statement where the premise and the consequence consist of
fuzzy propositions whereas in a classical production rule the
3. OVERVIEW OF THE EMPI RICAL RESULTS premise and the consequence are crisp. An example of fuzzy
rule in cost estimation may be ‘if the competence of the
The following section presents and discusses the analysts is high then the e#iort is low ’. The main advantage
results obtained when applying OUT neural network to the of fuzzy rules over classical rules is that they are more
COCOMO’Sl dataset. The calculations were made using understandable for humans and may be easily interpreted.
a software prototype developed with the C language Indeed, fuzzy rules use, unlike classical rules, linguistic
under a PC Microsoft Windows environment. The values instead of numerical data in their premises and
accuracy of the estimates is evaluated by using the consequences. As a result, some researches have
magnitude of relative error MREdefined as: investigated the equivalence between neural networks and
FRBS’s [ 1,121. These investigations have the objective to
translate the knowledge embedded in the neural network
into a more understandable language that is the if-then fuzzy
rules. Benitez et al. have developed a method that proves the
The MRE is calculated for each project in the dataset. equality between a neural network, such as the one used in
In addition, we use the measure prediction level Pred. the previous section, and a fuzzy rule-based system which
This measure is often used in the literature. It is defined uses Sugeno rules [l]. In the following section, we present
by: this method.
k Let us consider a three-layer Perceptron neural network
Pred (p) =-
N with the sigmoid function for the hidden units and the
where N is the total number of observations, k is the identity function for the output unit. This three-layer
number of observations with an MRE less than or equal Perceptron neural network is equal to a fuzzy rule-based
to p. A common value for p is 0.25. Other four quantities system that uses a set of fuzzy rules, Rjk, associated to all
are used in this evaluation: min of MRE, max of MRE, pairs of its units (hidden, output).
median of MRE, and mean of MRE (MMRE). n
We have conducted several experiments to choose the
number of hidden units. These experiments use the full
COCOM0’81 dataset for training and testing the i= I
network. An acceptable accuracy is obtained with 13 where xi are the inputs, yk is the output, wyare the weights
hidden units and 300,000 learning iterations (Table 2). of the inputs units to the hidden units, & are the weights of
the hidden units to the output unit and A is a fuzzy set with
MRP? NN with 13 hidden units and 300,000 leaming iterations the membership function is the sigmoid function of the
Max 16,67 hidden units. In order to make the fuzzy rules R+ easily
Mean 1,50 interpreted, Benitez et al. have shown that each of them can
I Min I 0,oo I be given by:
TABLE 2: ESTIMATES A CCURACY OF A NETWORK WITH
13 HIDDEN UNITS
As we have noticed, in this work, we are not concerned
with the accuracy of our network. The aim is to explain where Aik are fuzzy sets obtained from A and wy. Their
its process by mapping the network to a fuzzy rule-based membership functions are given by:
system. The method developed by Benitez et al. will be PA;k (x) = P A (XWij )
and * is the i-or operator defined by:

0-7803-7280-8/02/$10.00 02002 IEEE 1164


ala2 ...a, our previous experiments [6,7,8,9,10,11]. For simplicity, we
-
i or(a1, a2. .... a,) =
( I - a l ) ( l - a z ) . . . ( l - a,)+ala2 ...a, discuss only two fuzzy rules (Table 4).
The i-or operator is an hybrid between T-norm and T- By analyzing these two rules, we notice that the output of
the first rule, R I , is positive (6049,77) whereas the one of
conorm. The fuzzy proposition x i s A i k , may be
the second rule, R2, is negative (-2979,21). These two values
interpreted as ‘x is approximately greater than 2,2/w U’ if are a m n g the synapse weights of the hidden layer to the
wU is positive or k is approximately not greater than output unit. Consequently, the natural interpretation that we
2,2/-wij’ if wi, is negative. can give to the outputs of the 13 fuzzy rules is that they can
be considered as partial contributions to the total effort.
5. VALIDATION AND INTERPRETATION OF They have the same d e as the effort multipliers in the
THE.FUZZY RULES COCOM0’81 model. They can increase (positive value) or
decrease (negative value) the total development effort. The
In this section, we apply the method developed by only difference between them is that the outputs of the fuzzy
Benitez et al. on the neural network presented and rules are positive or negative. This is because the cost
discussed in Sections 2 and 3. The neural network that function of the FRBS uses the summation operator, whereas
we consider has 13 hidden units and uses the full in the COCOM0’81 model, because the COCOM0’81 cost
COCOM0’8 1 dataset for training. Consequently, the rule function uses the multiplication operator, the effort
base obtained is composed of 13 fuzzy rules. Each fuzzy multipliers are greater than 1 (increase the effort) or lower
rule contains 13 fuzzy propositions. Each one is than I(decrease the effort).
associated to one cost driver (inputs of the network). The The premise of each fuzzy rule is composed of 13 fuzzy
output of each fuzzy rule is a numerical value (positive or propositions combined by the i-or operator. Each
negative). These 13 fuzzy rules express the knowledge proposition is expressed by a statement such as ‘ x is
encoded into the synaptic weights of our network. The approximately greater than v ’ or ‘ x is approximately not
objective is to give a more comprehensible interpretation greater than v ’. The linguistic qualification ‘approximately
of these fuzzy rules in software cost estimation by using greater than v ’ is represented by a fuzzy set with
membership function of the Equation 1.

DATA is approximately greater than


VIRTmi is approximatelynot greater than
TIME is approximately not greater than
STORE is approximately not greater than
VIRTma I is approximately not greater than
TURN I is approximately not greater than

-
TABLE 3: TWO EXAn4PLE S OF THE OBTAWED FU ZZY RULES

117.57 30.14
(a) (b)
Figure 2: (a) Fuzzy set associatedto the qualification ‘approximately greater than 117,57’.
@)Fuzzy set associated to the qualification ‘ approximately not greater than 30,14’

0-7803-7280-8/02/$10.00 02002 JEEB 1165


The value v is the one for which the membership which overlap with the linguistic values ‘low’, ‘nominal’
function has a degree equal to 0,9 in case of ‘approximately and ‘high’ (Figure 4).
greater than v ’ or equal to 0,l in case of ‘approximately The same situation discussed above can be presented for
not greater than v ’. According to Benitez et al., the values the linguistic qualification ‘approximately greater than ’.
0,1 and 0,9 are used in neural literature to indicate the total For example, the proposition ’LEXP is approximately
absence of activation and full activation of the neurons, greater than 985,36’ of the Iule Rz may be considered as an
respectively. In our case, the numerical values of the 13 equivalent to the proposition ‘LEXP is Qfiigh) ,’ where Q is
inputs of the network a e positive. Thus, we only use the a linguistic modifier such as ‘very’ or ‘extra’.
positive part of the fuzzy sets domain corresponding to the Until now, we have suggested a natural interpretation of
qualification ‘approximately (not) greater than v ’. Figure 2 the premises and the outputs of the obtained fuzzy rules.
shows two examples that illustrate the fuzzy sets used in However, the meaning of the Mr operator, which is used to
the two first propositions of the rule R I . combine the fuzzy propositions of the premises, still
The interpretation of each fuzzy proposition depends on remains to be explained. According to Benitez et al., the i-
the meaning of the cost driver used by this proposition. For or operator has a natural interpretation and may be used in
instance, the fuzzy proposition ‘DATA is approximately the evaluation of many reaLworld situations such as
greater than 117,57’ uses the DATA cost driver (Figure 3). evaluation of scientific papers and the evaluation of the
The DATA attribute, with other three cost drivers, quality of a game developed by two tennis players. By
represents the effect of the size and the complexity of the analyzing all the 13 fuzzy rules obtained, it seems that the
database on the software development effort. Thus, the i-or operator is not appropriate to evaluate the effect of the
higher the value of DATA, the more it will increase the premises on the development effort. Indeed, the effect of
total effort. By using our fiuzification of the DATA one premise on the effort is calculated by considering all
attribute, we notice that the value 117,57 belongs to the those associated to the 13 cost drivers. In cost estimation,
linguistic value ‘high’ [6]. Consequently, the fuzzy the effect of one cost driver on the effort depends on its
proposition ‘DATA is approximately greater than 117.57’ type (monotonous increasing or decreasing) and its
may be considered as equivalent to the proposition ‘DATA importance. Moreover, according to its definition and its
is high or very high’. The latest proposition is easily properties, the i-or operator cannot adequately model the
understood by cost estimation community. complex set of relationships that exist between the cost
However, in many fuzzy rules obtained, the value, v, drivers. There are other reasons that prevent the i-or
used in the proposition ‘x is qproximately (no0 greater operator to be easily interpreted:
than v ’ is out of the range values allowed for one cost It is neither a t-norm nor a t-cononn operator. Also, there
driver. For example, in the first fuzzy rule R I , the is no linguistic quantifier that can express its meaning
proposition ‘LEXP is approximately not greater than Let us consider that the truth-value of one fuzzy
921.91 ’ uses a value (921,91), which is not in the interval proposition is closer to 1 and for all the others fuzzy
of all possible values of the LEXP attribute. Indeed, LEXP propositions the truth-values are in the vicinity of 0. In
represents the programming language experience of the such a situation, when combining the various truth-
programmers and it is measured by the number of months values by the i-or operator, the firing strength of the rule
of experience. In the COCOM0’81 model, the highest will be closer to 1. This is in contradiction with our
allowed value for LEXP is equal to 36 months. By using intuition, especially if the fuzzy proposition that has the
our fuzzification of the LEXP attribute, the proposition truth-value closer to 1 contains the less important cost
‘LEXP is approximately not greater than 921,91’ of the driver.
rule RI may be considered as equivalent to ‘LEXP is Q(very The i-or operator has the value 0.5 as a neutral element
low) ’where Q is a linguistic modifier such as ‘more’. In whereas if the truth-value of one fuzzy proposition is
this example, we have ignored the parts of the fuzzy set equal to 0.5, the proposition must contribute in the
representing ‘approximately not greater than 921,91’ evaluation of the firing strength of the premise.

r I r I r 1
5 D/P
10 55 100 550 loo0

Figure 3: Membership functions of fuzzy sets defined for the DATA cost driver (61

0-7803-7280-8/02/$10.00 02002 IEEE 1166


Nominal High
Approximately not
greater than 92 1,91

I I I I I
-L.
0,5 1 2 4 8 12 36 921,91

Figure 4: Example where the value used for the LEXP attribute is out of ran ge values

6. CONCLUSION AND FUTURE WORK [6] A. Idri, L. Kjiri, and A. Abran, “COCOMO Cost Model Using Fuzzy
Logic”, 7* International Conference on Fu zzy Theoly & TechnologW
Atlantic City, NJ, February, 2000. pp. 219223
In this paper, we have studied one of the most important [7] A. Idri, and A. Abran, “Towards A Fuzzy Logic Based Measures For
limitations of neural networks, that is understanding why Software Project Similarity”, Sixth Maghrebim Conferenceon Computer
an neural network makes a particular decision is a very Sciences, Fes, Momoco, November, 2000. pp. 9- 18
[8]A. Idri, and A. Abran, “A Fuzzy Logic Based Measures For Software
difficult task. Our study is intended for the cost estimation Project Similarity: Validation and Possible Improvements”, 7‘h
field. The neural network that we have used to predict the Intemational Symposium on Software Metrics, lEEE computer socieQ,4-
software development effort is the Backpropagation three- 6 April, England, 2001. pp. 85 -96
layer Perceptron with the sigmoid function in the hidden [9] A. Idri, and A. Abran, “Evaluating Software Project Similarity by
using Linguistic Quantifier Guided Aggregations”, b IFSA World
units and the identity function in the output unit. We have Congresd20 th NAFIPS Intemational Conference, 25 -28 July, Vancouver,
used the entire COCOM0’81 dataset to train and to test 200 1. pp. 41 6-421
the network. It is observed that the obtained accuracy of [IO] A. Idri, A. Abran, T. M. Khoshgoftaar, “Fuzzy Analogy: A new
the network is acceptable. Approach for Software Cost Estimation”, 11 Intemational Workshop in
Software Measurements,28-29 August, Montreal,2001,pp: 93-101
After training and testing the network, we have applied [ I l l A. Idri, T. M. Khoshgoftaar, A. Abran, , “Btimating Software
the Benitez’s method to extract the if-then fuzzy rules Project Effort by Analogy based on Linguistic values”, To be presented
from this network. These fuzzy rules express the in 9 IEEE Intemational Software Metrics Symposium”, 47 Ottawa,
information encoded in the architecture of the network. Canada, 2002
The interpretation of each fuzzy rule is determined by [12] J. S. Jang, C. T. Sun, ‘Functional equivalence between radial basis
fimction networks and f i z z y inference systems’, IEEE Transaction m
analyzing its premise and its output. Our case study shows Neural Networks, Vol. 4,1992, pp. 156-158
that we can explain the meaning of the output and the [ 131 M Jorgersen, ‘Experience with Accuracy of Software Maintenance
propositions composing the premise of each fuzzy rule. Task Effort Prediction Models’, IEEE Transaction on Software
Engineering, Vol. 21(8), 1995, pp. 674-681
However, the b r operator seems to be inappropriate to [I41 A. Lapedes, Farber R., ‘Nonlinear signal prediction using neural
combine the effects of the various fuzzy propositions on networks’, Prediction and System modeling’, Los Alamos National
the output of the fuzzy rule in software cost estimation. Laboratory, Tech. Report, LA-UR-87-2662,1987
Consequently, we are currently exploring the use of other [15] R. P. Lippman, , ‘An Introduction to computing with neural nets’,
IEEE ASSP Mag, vol. 4, pp.4-22, 1987
methods in order to extract more understandable fuzzy [I61 B. Samson, Ellison D., Dugard P, ‘Software Cost Estimation using
rules from the type of neural network used in this case an Albus Perceptron’, @ International COCOMO Estimation meeting,
study. Pittsburgh, 1993
[ 171 C. Schofield, ‘Non-Algorithmic Effort Estimation Techniques’,
Tech. Report TR9801, March, 1998
7. BIBLIOGRAPHY [ 181 C. Serluca, ‘An Investigation into Software Effort Estimation using
a Backpropagation Neural Network, MSc. Thesis, Bournemouth
[I] J. M. Benitez, J.L. Castro, I. Requena, ’Are Artificial Neural University, 1995
Networks Black Boxes?’, IEEE Transaction on Neural Networks, Vol. 8, [I91 K. Srinivasan, Fisher D, ‘Machine Leaming Approaches to
NO. 5, September, 1997, pp. 1156-1 164 Estimating Software Development Effort’, IEEE Transactionon Software
[2] B.W. Boehm, Software Engineen’ng Economics, PrenticsHall, 1981. Engineering, Vol. 21, No. 2, February, 1995, pp. 126-136
[3] B.W. Boehm, and d.,“Cost Models for Future Software Life Cycle [20] A. R. Verkatachalam, ‘Software Cost Estimation using Artificial
Processes: COCOMO 2.0”, Annals of Software Engineen‘ng on Sof”? Neural Networks, Intemational Joint Conference on Neural Networks,
Process and Product Measurement, Amsterdam, 1995. Nogoya, IEEE, 1993
[ 4 ] D.S. Chulani, “Incorporating Bayesian Analysis to Improve the [21] G. Wittig, G. Finnie, ‘Estimating Software Development Effort with
Accuracy of COCOMO I1 and Its Quality Model Extension”, Ph.D. connectionist Models’, Information and Software Technology, vol. 39,
Qualifying Exam Report, USC, February, 1998. 1997, pp.469476
[5] R.T Hughes, ‘An Evaluation of Machine Leaming Techniques for [22] L.A. Zadeh, “Fuzzy Set”, Information and Control, Vol. 8, 1965,
Software Effort Estimation’, University of Brighton, 1996 pp. 338-35

0-7803-7280-8/02/$10.00 02002IEEE 1167

Вам также может понравиться