Вы находитесь на странице: 1из 12

1. WHAT IS SOFTWARE EFFORT ESTIMATION?

1.1 INTRODUCTION
Software development efforts’ estimation is the process of predicting the most realistic
use of effort required to develop or maintain software based on incomplete, uncertain
and/or noisy input. Effort estimates may be used as input to project plans, iteration plans,
budgets, investment analyses, pricing processes and bidding rounds.
Many people have referred to estimation as a “black art.” This makes some intuitive
sense: at first glance, it might seem that estimation is a highly subjective process. One person
might take a day to do a task that might only require a few hours of another’s time. As a
result, when several people are asked to estimate how long it might take to perform a task,
they will often give widely differing answers. But when the work is actually performed, it
takes a real amount of time; any estimate that did not come close to that actual time is
inaccurate. To someone who has never estimated a project in a structured way, estimation
seems little more than attempting to predict the future. This view is reinforced when off-
the-cuff estimates are inaccurate and projects come in late. But a good formal estimation
process, one that allows the project team to reach a consensus on the estimates, can
improve the accuracy of those estimates, making it much more likely that projects will
come in on time. A project manager can help the team to create successful estimates for
any software project by using sound techniques and understanding what makes estimates
more accurate.

There are three main challenges that have been identified in effort estimation. The team
must tackle to improve the existing models for more accurate prediction of software
effort / cost:
1. Estimation models must be able to deal with vague information. Indeed, most of the
software project attributes are measured on a scale composed of linguistic values such
as low and high.
2. Estimation models must appropriately handle the uncertainty in estimates.
3. Estimation models must learn from previous situations to account for the
continuously evolving software development and maintenance technology.
Software researchers and practitioners have been addressing the problems of effort
estimation for software development projects since at least the 1960s; see, e.g., work by
Farr[3] and Nelson[4]. Some fundamental issues on which practitioners deal are -
• Which software cost estimation model to use?
• Which software size measurement to use - lines of code (LoC), function points
or feature point?
• What is a good estimate?

Most of the research has focused on the construction of formal software effort estimation
models. The early models were typically based on regression analysis or mathematically
derived from theories from other domains. Since then a high number of model building
approaches have been evaluated, such as approaches founded on case-based reasoning,

1
classification and regression trees, simulation, neural networks, Bayesian statistics,
lexical analysis of requirement specifications, genetic programming, linear programming,
economic production models, soft computing, fuzzy logic modeling, statistical
bootstrapping, and combinations of two or more of these models. The perhaps most
common estimation products today, e.g., the formal estimation models COCOMO and
SLIM have their basis in estimation research conducted in the 1970s and 1980s. The
estimation approaches based on functionality-based size measures, e.g., function points,
is also based on research conducted in the 1970s and 1980s, but are re-appearing with
modified size measures under different labels, such as “use case points” [5]in the 1990s
and COSMIC in the 2000s.

1.2 ESTIMATION APPROACHES

There are many ways of categorizing estimation approaches, see for example [6] [7]. The
top level categories are the following:

• Expert estimation: The quantification step, i.e., the step where the estimate is
produced based on judgmental processes.
• Formal estimation model: The quantification step is based on mechanical
processes, e.g., the use of a formula derived from historical data.
• Combination-based estimation: The quantification step is based on a judgmental
or mechanical combination of estimates from different sources. Below are
examples of estimation approaches within each category.

Estimation Category Examples of support of


approach implementation of estimation
approach
Analogy-based Formal ANGEL
estimation estimation model
WBS-based Expert MS Project, company specific
(bottomup) estimation activity templates
estimation
Parametric Formal COCOMO, SLIM, SEER-SEM
models estimation model
Size-based Formal Function Point Analysis[12], Use Case
estimation estimation model Analysis, Story points-based
models[11] estimation in Agile software
development
Group estimation Expert Planning poker, Wideband Delphi
estimation
Mechanical Combination- Average of an analogy-based and a
combination based estimation Work breakdown structure-based
effort estimate
Judgmental Combination- Expert judgment based on estimates
combination based estimation from a parametric model and group
estimation

2
1.3 SELECTION OF ESTIMATION APPROACH

The evidence on differences in estimation accuracy of different estimation approaches


and models suggest that there is no “best approach” and that the relative accuracy of one
approach or model in comparison to another depends strongly on the context [8]. This
implies that different organizations benefit from different estimation approaches.
Findings, summarized in that [9] may support the selection of estimation approach based
on the expected accuracy of an approach include:

• Expert estimation is on average at least as accurate as model-based effort


estimation. In particular, situations with unstable relationships and information of
high importance not included in the model may suggest use of expert estimation.
This assumes, of course, that experts with relevant experience are available.

• Formal estimation models not tailored to a particular organization’s own context,


may be very inaccurate. Use of own historical data is consequently crucial if one
cannot be sure that the estimation model’s core relationships (e.g., formula
parameters) are based on similar project contexts.

• Formal estimation models may be particularly useful in situations where the


model is tailored to the organization’s context (either through use of own
historical data or that the model is derived from similar projects and contexts),
and/or it is likely that the experts’ estimates will be subject to a strong degree of
wishful thinking.

The most robust finding, in many forecasting domains, is that combination of estimates
from independent sources, preferable applying different approaches, will on average
improve the estimation accuracy[10][11][12].

In addition, other factors such as ease of understanding and communicating the results of
an approach, ease of use of an approach, cost of introduction of an approach should be
considered in a selection process.

Our research deals with software effort estimation via Machine learning. Our purpose is
to make effort estimation more comfortable to be estimated. Machine learning is a
scientific discipline that is concerned with the design and development of algorithms that
allow computers to evolve behaviors based on empirical data, such as from sensor data or
databases. A learner can take advantage of examples (data) to capture characteristics of
interest of their unknown underlying probability distribution. Data can be seen as
examples that illustrate relations between observed variables. A major focus of machine
learning research is to automatically learn to recognize complex patterns and make
intelligent decisions based on data; the difficulty lies in the fact that the set of all possible
behaviors given all possible inputs is too large to be covered by the set of observed
examples (training data). Hence the learner must generalize from the given examples, so
as to be able to produce a useful output in new cases. Artificial intelligence is a closely

3
related field, as are probability theory and statistics, data mining, pattern recognition,
adaptive control, computational neuroscience and theoretical computer science.

There are many Machine Learning approaches such as:

• FGRA (Fuzzy Grey Relational Analysis)


• CBR (Case Based Reasoning)
• Neural Nets
• ANN (Artificial Neural Networks)
• GA (Genetic Algorithm)

2. WHAT IS GENETIC ALGORITHM?


In this research paper we are explaining GA approach. It will be used for effective effort
estimation process of a software.

2.1 INTRODUCTION
Genetic Algorithms (GAs) are adaptive heuristic search algorithm premised on the
evolutionary ideas of natural selection and genetic. The basic concept of GAs is designed
to simulate processes in natural system necessary for evolution, specifically those that
follow the principles first laid down by Charles Darwin of survival of the fittest. As such
they represent an intelligent exploitation of a random search within a defined search
space to solve a problem.

First pioneered by John Holland in the 60s, Genetic Algorithms has been widely studied,
experimented and applied in many fields in engineering worlds. Not only does GAs
provide alternative methods to solving problem, it consistently outperforms other
traditional methods in most of the problems link. Many of the real world problems
involved finding optimal parameters, which might prove difficult for traditional methods
but ideal for GAs. However, because of its outstanding performance in optimization, GAs
has been wrongly regarded as a function optimizer. In fact, there are many ways to view
genetic algorithms. Perhaps most users come to GAs looking for a problem solver, but
this is a restrictive view.

Herein, we will examine GAs as a number of different things:


• GAs as problem solvers

• GAs as challenging technical puzzle

• GAs as basis for competent machine learning

• GAs as computational model of innovation and creativity

• GAs as computational model of other innovating systems

• GAs as guiding philosophy

4
However, due to various constraints, we would only be looking at GAs as pro blem
solvers and competent machine learning here. We would also examine how GAs is
applied to completely different fields.
Many scientists have tried to create living programs. These programs do not merely
simulate life but try to exhibit the behaviours and characteristics of a real organisms in an
attempt to exist as a form of life. Suggestions have been made that a life would
eventually evolve into real life. Such suggestion may sound absurd at the moment but
certainly not implausible if technology continues to progress at present rates. Therefore it
is worth, in our opinion, taking a paragraph out to discuss how a life is connected with
GAs and see if such a prediction is far fetched and groundless.

2.2 Methodology
In a genetic algorithm, a population of strings (called chromosomes or the genotype of
the genome), which encode candidate solutions (called individuals, creatures, or
phenotypes) to an optimization problem, evolves toward better solutions. Traditionally,
solutions are represented in binary as strings of 0s and 1s, but other encodings are also
possible. The evolution usually starts from a population of randomly generated
individuals and happens in generations. In each generation, the fitness of every individual
in the population is evaluated, multiple individuals are stochastically selected from the
current population (based on their fitness), and modified (recombined and possibly
randomly mutated) to form a new population. The new population is then used in the next
iteration of the algorithm. Commonly, the algorithm terminates when either a maximum
number of generations has been produced, or a satisfactory fitness level has been reached
for the population. If the algorithm has terminated due to a maximum number of
generations, a satisfactory solution may or may not have been reached.

Genetic algorithms find application in bioinformatics, phylogenetics, computational


science, engineering, economics, chemistry, manufacturing, mathematics, physics and
other fields.

A typical genetic algorithm requires:


1. a genetic representation of the solution domain,
2. a fitness function to evaluate the solution domain.

A standard representation of the solution is as an array of bits. Arrays of other types and
structures can be used in essentially the same way. The main property that makes these
genetic representations convenient is that their parts are easily aligned due to their fixed
size, which facilitates simple crossover operations. Variable length representations may
also be used, but crossover implementation is more complex in this case. Tree-like
representations are explored in genetic programming and graph-form representations are
explored in evolutionary programming.

The fitness function is defined over the genetic representation and measures the quality of
the represented solution. The fitness function is always problem dependent. For instance,
in the knapsack problem one wants to maximize the total value of objects that can be put
in a knapsack of some fixed capacity. A representation of a solution might be an array of

5
bits, where each bit represents a different object, and the value of the bit (0 or 1)
represents whether or not the object is in the knapsack. Not every such representation is
valid, as the size of objects may exceed the capacity of the knapsack. The fitness of the
solution is the sum of values of all objects in the knapsack if the representation is valid or
0 otherwise. In some problems, it is hard or even impossible to define the fitness
expression; in these cases, interactive genetic algorithms are used.

Once we have the genetic representation and the fitness function defined, GA proceeds to
initialize a population of solutions randomly, then improve it through repetitive
application of mutation, crossover, inversion and selection operators.

2.2.1 Initialization
Initially many individual solutions are randomly generated to form an initial population.
The population size depends on the nature of the problem, but typically contains several
hundreds or thousands of possible solutions. Traditionally, the population is generated
randomly, covering the entire range of possible solutions (the search space).
Occasionally, the solutions may be "seeded" in areas where optimal solutions are likely to
be found.

2.2.2 Selection
During each successive generation, a proportion of the existing population is selected to
breed a new generation. Individual solutions are selected through a fitness-based process,
where fitter solutions (as measured by a fitness function) are typically more likely to be
selected. Certain selection methods rate the fitness of each solution and preferentially
select the best solutions. Other methods rate only a random sample of the population, as
this process may be very time-consuming.

Most functions are stochastic and designed so that a small proportion of less fit solutions
are selected. This helps keep the diversity of the population large, preventing premature
convergence on poor solutions. Popular and well-studied selection methods include
roulette wheel selection and tournament selection.

2.2.3 Reproduction
The next step is to generate a second generation population of solutions from those
selected through genetic operators: crossover (also called recombination), and/or
mutation.

For each new solution to be produced, a pair of "parent" solutions is selected for breeding
from the pool selected previously. By producing a "child" solution using the above
methods of crossover and mutation, a new solution is created which typically shares
many of the characteristics of its "parents". New parents are selected for each new child,
and the process continues until a new population of solutions of appropriate size is
generated. Although reproduction methods that are based on the use of two parents are
more "biology inspired", some research [1], [2] suggests more than two "parents" are
better to be used to reproduce a good quality chromosome.

6
These processes ultimately result in the next generation population of chromosomes that
is different from the initial generation. Generally the average fitness will have increased
by this procedure for the population, since only the best organisms from the first
generation are selected for breeding, along with a small proportion of less fit solutions,
for reasons already mentioned above.
Although Crossover and Mutation are known as the main genetic operators, it is possible
to use other operators such as regrouping, colonization-extinction, or migration in genetic
algorithms.[2]

2.2.4 Termination
This generational process is repeated until a termination condition has been reached.
Common terminating conditions are:
• A solution is found that satisfies minimum criteria

• Fixed number of generations reached

• Allocated budget (computation time/money) reached

• The highest ranking solution's fitness is reaching or has reached a plateau such
that successive iterations no longer produce better results

• Manual inspection

• Combinations of the above


2.2.5 Genetic Algorithm Flowchart

7
Fig 1. GA FLOWCHART
2.2.6 Genetic Algorithm pseudo code

Simple generational genetic algorithm pseudo code

1. Choose the initial population of individuals


2. Evaluate the fitness of each individual in that population
3. Repeat on this generation until termination: (time limit, sufficient fitness achieved,
etc.)
4. Select the best-fit individuals for reproduction
5. Breed new individuals through crossover and mutation operations to give birth to
offspring
6. Evaluate the individual fitness of new individuals
7. Replace least-fit population with new individuals.

GAs were introduced as a computational analogy of adaptive systems. They are modelled
loosely on the principles of the evolution via natural selection, employing a population of
individuals that undergo selection in the presence of variation-inducing operators such as
mutation and recombination (crossover). A fitness function is used to evaluate
individuals, and reproductive success varies with fitness.

8
The paradigm of GAs described above is usually the one applied to solving most of the
problems presented to GAs. Though it might not find the best solution. more often than
not, it would come up with a partially optimal solution.

2.3 WHO CAN BENEFIT FROM GA?


Nearly everyone can gain benefits from Genetic Algorithms, once he can encode
solutions of a given problem to chromosomes in GA, and compare the relative
performance (fitness) of solutions. An effective GA representation and meaningful fitness
evaluation are the keys of the success in GA applications. The appeal of GAs comes from
their simplicity and elegance as robust search algorithms as well as from their power to
discover good solutions rapidly for difficult high-dimensional problems. GAs are useful
and efficient when

• The search space is large, complex or poorly understood.

• Domain knowledge is scarce or expert knowledge is difficult to encode to narrow


the search space.

• No mathematical analysis is available.

• Traditional search methods fail.

The advantage of the GA approach is the ease with which it can handle arbitrary kinds of
constraints and objectives; all such things can be handled as weighted components of the
fitness function, making it easy to adapt the GA scheduler to the particular requirements
of a very wide range of possible overall objectives.

GAs have been used for problem-solving and for modelling. GAs are applied to many
scientific, engineering problems, in business and entertainment, including:

• Optimization: GAs have been used in a wide variety of optimization tasks,


including numerical optimization, and combinatorial optimization problems such
as traveling salesman problem (TSP), circuit design, job shop scheduling and
video & sound quality optimization.
• Automatic Programming: GAs has been used to evolve computer programs for
specific tasks, and to design other computational structures, for example, cellular
automata and sorting networks.

• Machine and robot learning: GAs has been used for many machine- learning
applications, including classification and prediction, and protein structure
prediction. GAs have also been used to design neural networks, to evolve rules for

9
learning classifier systems or symbolic production systems, and to design and
control robots.

• Economic models: GAs has been used to model processes of innovation, the
development of bidding strategies, and the emergence of economic markets.
• Immune system models: GAs has been used to model various aspects of the
natural immune system, including somatic mutation during an individual's
lifetime and the discovery of multi-gene families during evolutionary time.

• Ecological models: GAs have been used to model ecological phenomena such as
biological arms races, host-parasite co-evolutions, symbiosis and resource flow in
ecologies.
• Population genetics models: GAs has been used to study questions in population
genetics, such as "under what conditions will a gene for recombination be
evolutionarily viable?"

• Interactions between evolution and learning: GAs has been used to study how
individual learning and species evolution affect one another.
• Models of social systems: GAs has been used to study evolutionary aspects of
social systems, such as the evolution of cooperation, the evolution of
communication, and trail-following behaviour in ants.

3. FUTURE WORK

We will develop a mechanism using GA to estimate the effort of software.

4. CONCLUSION

10
In the whole, developing a software project with acceptable quality within budget and on
planned schedule is the main goal of every software development firm. Failure of the
project mostly is attributed to failure to fulfill customers’ quality expectations or the
budget and schedule over-run. It is essential for a project manager to know the effort,
schedule and functionality of a project in advance. Perhaps there is no point in starting a
project when there is not enough time to finish it or enough money to fund it or if the
quality is so inadequate that the end product will be useless and unmarketable.

However, the project factors change in the duration of the project, and they may change a
lot. The worse thing is that one can seldom predict how they will change, yet we need to
know all these before we start. There is no way to calculate in advance and expect the
initial values to be correct. This does not render the estimates vain. On the contrary, it
calls for better quality estimation techniques, which will yield more accurate early results
and guide us to more targeted and effective contingency plans. Software estimation is the
act of predicting the duration and cost of a project. It is a complex process with errors
built into its very fabric; however it is very rewarding when done the right way. The
estimation process does not finish until the project finishes. This is the answer of the
project manager to the ever changing conditions of the project. An accurate estimate is a
critical part of the foundation of an efficient software project.

5. REFERENCES

11
1. A. E. Eiben, et al. Genetic algorithms with multi-parent recombination. PPSN III:
Proceedings of the International Conference on Evolutionary Computation. The
Third Conference on Parallel Problem Solving from Nature. (1994). Pp 78–87.
2. B. E. Anda, Angelvik, and K. Ribu. Improving Estimation Practices by Applying
Use Case Models. 4th International Conference on Product Focused Software.
Finland, Springer-Verlag, 2002. 24 Apr 2007. Pp 383-397.
3. E. A Nelson. Management Handbook for the Estimation of Computer
Programming Costs, Systems Development Corp. 1966.
4. H. Peter. Estimation Workbook 2. International Software Benchmarking
Standards Group ISBSG - Estimation and Benchmarking Resource Centre. 26 Jun
2010.
5. L. C. Briand and I. Wieczorek. Resource estimation in software engineering.
Encyclopedia of software engineering. New York, (2002). John Wiley & Sons:
1160-1196.
6. L. Farr and B. Nanus Factors that affect the cost of computer programming.
7. M. Jørgensen and M. Shepperd. A Systematic Review of Software Development
Cost Estimation Studies. ACM New York, NY, USA.2010 Pp 92-95.
8. M. Shepperd and G. Kadoda. Comparing software prediction techniques using
simulation".IEEE transaction on software engineering. 2008. Pp 1014-1022.
9. Morris Pam - Overview of Function Point Analysis Total Metrics - Function Point
Resource Centre.
10. R.L. Winkler Combining forecasts: A philosophical basis and some current
issues Manager. Sciencedirect. 2010
11.Idri, A., Khoshgoftaar, T. M. and Abran, A. 2002. Can neural networks be
easily interpreted in software cost estimation, Proceedings of the IEEE
International Conference on Fuzzy Systems, 1162-1167.

12

Вам также может понравиться