Академический Документы
Профессиональный Документы
Культура Документы
net/publication/281035681
CITATION READS
1 91
2 authors:
Some of the authors of this publication are also working on these related projects:
Big Data architecture prototype to find spatiotemporal significant associations between variables in a river basin in southern Chile, Araucania
Region. View project
All content following this page was uploaded by Samuel Sepúlveda on 17 August 2015.
TR-DCI-01-13 - V1.0
Samuel Sepúlveda
(samuel.sepulveda@ceisufro.cl)
Ania Cravero
(ania.cravero@ceisufro.cl)
ABSTRACT
Background: Systematic literature reviews (SLRs) have reached a considerable level of adoption in
software engineering (SE). However protocol adaptations for implementation remain tangentially
addressed, thus preventing them from reaching their full potential as a research methodology and as a
source of information for the software industry.
Objective: To account the use and adaptation of the SLR as a research methodology in SE, providing a
chronological study that includes its current status.
Methodology: A systematic literature search was performed, reviewing two sets of articles between 2004
and 2011, using digital data sources recognized by the scientific community. The first set includes 151
articles that published SLR in SE. In addition, 26 articles were reviewed that contain adaptations for
conducting SLR in SE, finally 11 papers were selected according to the inclusion/exclusion criteria.
Results: A chronological study is provided that includes the current state of the SLR as a research
methodology in SE and we show a summary of main proposals for protocol adaptations to conduct SLRs
in SE.
Conclusions: Although other papers have presented observations and critiques of SLRs in SE, no
evidence has been found of papers that specifically report results on their adaptations as a research
methodology applied in software engineering. The results indicate areas where the quantity and quality of
investigations needs to be increased.
1. Introduction
The importance of research activity in Software Engineering*(SE) is aimed at producing knowledge based on
the scientific method, and this has become one of the main challenges in strengthening the foundations of SE
as a discipline on its path to total maturity (Rodriguez 2005). This is not only related to the academic world,
industry has also been receiving the benefits of the scientific method to validate its software technologies
(Zelkowitz, Wallace et al. 2003) and improving the software processes (Chrissis, Konrad et al. 2003).
Different types of experimental studies can be used in SE (Wohlin, Höst et al. 2006). Some proposals to
support the fulfillment of these studies can be found in the technical literature (Wohlin, Runeson et al. 2000).
Researchers have applied primary studies to improving the knowledge of SE (Basili, Shull et al. 1999) in
order to support the processes related to SE technologies, mainly those related to appraising the technology
(Shull, Carver et al. 2001). In the other hand, researchers use the secondary studies too.
Secondary studies are those designed to produce or assemble systematic comparisons between the individual
investigations, scientifically selected within a series of primary studies that can support the creation of an
evidence-based body of knowledge (Kitchenham, Brereton et al. 2009).
Evidence-based research was developed initially in medicine, since research based on the expert opinion of
medical-doctors is not as reliable as the results of scientific experiments (Dybå, Kitchenham et al. 2005).
Then, many fields have adopted this approach, e.g. criminology, social policy, economy, and increasingly
over the last few years in SE (Jørgensen and Shepperd 2007). Evidence-based Software Engineering (EBSE)
is designed to provide the means to obtain the best current evidence from an investigation, integrating
practical experience and human values into the decision-making with respect to software development and
maintenance (Dybå, Kitchenham et al. 2005), understanding the evidence as a synthesis of high-quality
scientific studies on a specific topic or research problem.
In 2004 the concept of EBSE was introduced as an approach that integrated academic research and industrial
practice in SE (Kitchenham, Dyba et al. 2004). EBSE was then presented from the point of view of the SE
practitioner (Dybå, Kitchenham et al. 2005) and was complemented with a practical way of teaching EBSE to
university students (Jorgensen, Dyba et al. 2005). A far-sighted view of the use of empirical methods and how
they could contribute to improving research and practice in SE, identifying the main challenges, the main one
being the proposal to increase available resources so as to perform empirical studies in SE according to the
importance of software systems in their social context (Sjøberg, Dybå et al. 2007).
By analogy with evidence-based medicine, five steps are needed to practice EBSE (Sackett, Rosenberg et al.
1996): (1) convert the need for information (regarding the practice of SE) into questions and answers, (2)
identify with maximum efficiency the best evidence to respond to these questions, (3) assess the critical
evidence: its validity and utility, (4) put the results of this evaluation into practice in SE and (5) evaluate the
yield of this implementation. The end point of EBSE is that professionals use the appropriate directives to
provide SE solutions in a specific context (Kitchenham, Brereton et al. 2009).
The preferred method for the application of steps 2 and 3 is the systematic literature review (SLR) (da Silva,
Santos et al. 2011). Unlike a peer review, a SLR is a rigorous methodological review of research results, the
aim of which is not only to provide all the existing evidence on a research question, but also to support the
development of evidence-based directives for professionals (Kitchenham, Charters et al. 2007).
It was Bárbara Kitchenham (2004) who adopted the directives to implement the SLRs from medicine in SE.
Later, these directives were updated using concepts from the social sciences (Kitchenham, Charters et al.
2007). Nevertheless, a SLR process uses specific concepts and terms that may be unknown to researchers
who carry out ad-hoc literature reviews. In addition, SLRs require an additional steering effort, must be
planned prior to execution and the entire process must be documented, including the interim results
(Biolchini, Mian et al. 2005). This indicates the need to direct research efforts into the development of
planning and methodologies for execution, so as to guide researchers in carrying out the SLR process;
therefore, the need for adaptation to the field of SE must be considered.
The aim of this work is to account for the use and adaptation of the SLR as a research methodology in SE,
providing a chronological study that includes its current status. The motivation that guides this work
originates in the increase of SLRs conducted in SE, which is why it is interesting to show how the use of
SLRs has been adapted in SE to date. This article may be of interest particularly to researchers planning to
conduct additional studies on SLRs and their application in SE, as well as to industry professionals and new
researchers who wish to approach SLRs as a relevant source of information in SE.
The structure of the article presents a set of related works in section 2. In section 3 the stages that comprise a
SLR are explained and briefly discussed. In section 4 the adaptations in the use of SLRs in SE are shown and
discussed. Finally, in section 5 the main conclusions and considerations of this work are presented.
There are two checkpoints in the systematic review process: (1) before executing the systematic review, it is
necessary to guarantee that the planning is adequate and (2) the protocol must be evaluated and if there are
problems, the investigator must return to the planning stage to review the protocol. Likewise, if problems
with respect to the Internet search engines are found in the execution stage, then a new systematic review
must be executed (Mian, Conte et al. 2005).
The aforementioned stages may seem sequential, but it is important to recognize that many of the stages
involve repetition. In particular, many activities begin during the protocol development stage, and they are
then refined and adapted to be carried out again (Kitchenham 2004; Brereton, Kitchenham et al. 2007).
3. SLR adaptation in SE
Next, we present the results of studying the adaptation made to the SLR in SE. First, the methodology applied
is reported, and then the adaptation of SLRs in SE is explained, demonstrating the selected proposals as well
as their main characteristics. Afterwards, some works are shown that, although they do not propose changes to
the SLR protocol, add elements to the discussion that must be considered when improving the design and use
of the SLR in SE in the near future. Finally, some threats are identified that may infringe upon the validity of
the work being conducted, and then we end with a discussion regarding the topics treated in this section.
2004 (year of publication of the first SLR adaptation protocol for SE) and 2011. In addition, works were
compiled that provided proposals or changes to the protocol of how to carry out a SLR in SE.
- Research questions: The research questions to be answered with this work are basically two.
• RQ1: How did increase the use of the SLR methodology in SE?
• RQ2: How has the original protocol been modified for SLR implementation in SE? The latter was
divided into two questions:
o RQ2.1: How many protocol proposals to develop SLRs in SE or changes to these have been
published?
o RQ2.2: At what stages and activities are the proposals for changes to the protocol concentrated?
- Search for works: To attempt to answer the previous questions, the systematic search was based on
identifying: (1) SLRs performed between 2004 and 2011 and (2) the works that mention changes or proposals
to the protocol to guide SLRs in SE.
These works were sought in some of the sources most frequently used by the SE community (Brereton,
Kitchenham et al. 2007), in our case we consulted IEEEXplore, ACM Digital Library and Science Direct. In
these sources the following search strings were used: (1) initially “systematic literature review” OR
“systematic review” and (2) then refining with the string “software engineering”. In the case of the proposals
reporting changes to the protocol for conducting SLR in SE, the concepts: “guidelines”, “protocols”,
“lessons” and “studies” were added to the search string.
- Selection of works: Once the data sources were identified, and having executed the queries according to the
defined search strings, all those works that reported the results of carrying out SLRs on SE topics and
excluding those that did not were compiled by reviewing the title, abstract and key words of each work.
The selection of the works reporting changes to the protocol to perform a SLR was much more detailed,
including additional reading about the methodology used, the work carried out by the researchers and the
results. This was necessary to verify that each work provided proposals to modify the original protocol
established by Kitchenham in 2004. Initially compiling 26 works and selecting 11, which are the ones that
were analyzed and described in detail in section 4.3.
Considering the fact that the inclusion or exclusion of the works was done by reviewing and interpreting the
text (which is potentially ambiguous), the reliability between the reviewers was calculated using Cohen’s
Kappa statistic (Gwet 2002). The results were satisfactory (K = 0.827), which indicates that the scale
presented in (Clark, Sammut et al. 2004) provides a basis of sufficiently clear criteria, and that it does not
induce significant differences between the reviewers. In any case, for those cases where the investigators had
doubts at the time of whether or not to include a certain work, this was subjected to an individual review and
then a decision was made by group consensus.
- Inclusion and exclusion criteria: The following states the criteria that established the relevance of the
articles compiled for their inclusion with respect to their approaches to the protocol to develop SLRs in SE.
(i) Inclusion criteria: included were all the works that approach the topic of the SLR and that specifically
mention aspects dealing with the modification of the protocol to carry out a SLR, i.e. how to conduct a SLR
and the stages/activities that this entails.
(ii) Exclusion criteria: excluded were all those works that deal with SLR topics, but do not suggest proposals
on how to carry out or modify the defined protocol to develop SLRs in SE.
- Data extraction and synthesis: The data with respect to the SLRs in SE consisted of counting the works that
reported undertaking a SLR in SE between 2004 and 2011 and identifying the sources of their publication.
The results of these counts are summarized in the graphs that appear in the Figures 3 to 5.
As for the works that submit a proposal to amend the SLR protocol in SE, a previous literature review
established as to which would be the activities for which they seek changes or proposals and then the proposal
for each defined activity was extracted from each paper. The results of this review are summarized in Tables 1
to 9 in section 4.3.
Figure 3. Number of publications with SLR in SE (sources: ACM, IEEE and Science Direct)
Only with the idea of confirming what is shown in Figure 3, a complementary search was done using the
aforementioned search strings:
(1) Specific search for one of the selected sources, in this case Science Direct, obtaining a total of 4314
matches, and then when narrowed 63 matches, the annual number for which for the period 2004-2011 is in
Figure 4. The difference in the adoption of SLR between SE and other branches of science such as medicine
should be emphasized, because the latter have a history of performing SLRs since the mid-1960s.
(2) An extended search using Google Scholar as the reference, obtaining a total of 30900 matches (without
considering references to other articles), and then when narrowed 1035 matches, the annual number of which
for the period 2004-2011 is in Figure 5.
Figure
4.
Number
of
publications
with
SLR
in
SE
Figure
5.
Number
of
publications
with
SLR
in
SE
(sources:
Science
Direct)
(sources:
Google
Scholar)
Comparing the trends in the three previous figures, a significant increase in the number of SLR publications
in SE is observed in the 2007-2008 period.
From another perspective and according to the data shown in Figure 3, we can establish the level of increase
for SLRs in SE published between 2004-2011, verifying the increase of SLRs published between 2
consecutive years and obtaining an annual average rate of increase in publications. The absolute average
increase between 2 consecutive years is approximately 7 works and the average rate of increase between 2
consecutive years is approximately 44%.
Having selected the works and applied the inclusion/exclusion criteria, 11 works were found that present
proposals or modifications to the protocol to develop SLRs in SE.
In order to evaluate the impact of each work that presents proposals/changes when a SLR is being developed
in SE, Table 2 displays the total and annual number of references between 2004 and 2011. This number of
references was compiled from a public source like Google Scholar and is ordered from the highest to the
lowest number of references.
As Table 2 illustrates, the work with greatest number of references is the initial SLR protocol in SE defined
by (Kitchenham 2004), which could be attributed to this being the first work dealing with the subject of SLRs
and proposing a protocol of how to perform them in SE. It is also the longest standing in this selection and the
trend might indicate that it will continue to be mentioned as the standard work on this topic. In addition, the
first three on the list with the greatest number of references were led by or counted on the participation of
Bárbara Kitchenham; consequently, we can say, according to the evidence compiled to date, that she and her
research group are leading the work in terms of reviews of SLRs in SE. In the case of the work by (Caro, Ríos
et al. 2005), which presents no references, we believe this is due to the fact that they propose an adaptation of
the SLR protocol for undergraduate students, and unlike the rest of the works is published in Spanish.
As for the work by (Petersen, Feldt et al. 2007), although it specified the bases for developing systematic
mapping processes, it was included because it suggested a comparison between these and the SLRs,
establishing criteria and comments that positively influence the protocol design to carry out a SLR.
Also noteworthy is the proposal by (Mian, Conte et al. 2005), who suggested the use of templates to make the
SLR easier. The final work to be mentioned is (Grimán and Juristo 2007), who even changed the stages of the
protocol defined by Kitchenham, proposing an alternative with other stages and activities.
The idea is to reflect the changes that have been set out regarding the selected activities, knowing who
proposed them, which one they deal with and when they were created according to what is reported in the
literature. In addition, in each table of the stages reviewed there is a column called Code, which identifies
each proposal with the author and activity to which it is related so that it can be identified in the timelines that
appear in the following sections. This code was built from the first three letters of the last name of the main
author of the work, followed by the activity to which it alludes, in the case of matching the letters of the last
name numerical correlative is added between these and the activity. An example would be to consider for
activity 1 (A1), the work of (Brereton, Kitchenham et al. 2007): the code that identifies it is BreA1.
It can be observed that generally the proposals mentioned here suggest: (1) a guideline to help defining high
quality RQs, (2) guidelines to review that the defined RQs are indeed the most appropriate and (3) that the
RQs are not defined a priori, but rather defined as a greater knowledge of the subject being gained.
- Identification of relevant works (A2): Table 4 contains the changes or proposals for activity A2 (Kitchenham
2004; Caro, Ríos et al. 2005; Mian, Conte et al. 2005; Brereton, Kitchenham et al. 2007; Grimán and Juristo
2007; Kitchenham, Charters et al. 2007; Petersen, Feldt et al. 2007; Zhang, Babar et al. 2011).
It can be observed that generally the proposals mentioned here suggest: (1) identification and selection of
relevant data sources, (2) definition and justification of a systematic search strategy and according to the
defined RQs and (3) identification of categories for classification of the works identified.
- Selection of the relevant works (A3): Table 5 contains the changes or proposals for activity A3 (Kitchenham
2004; Caro, Ríos et al. 2005; Mian, Conte et al. 2005; Brereton, Kitchenham et al. 2007; Grimán and Juristo
2007; Kitchenham, Charters et al. 2007; Staples and Niazi 2007; Kitchenham, Brereton et al. 2010).
The first selection of articles is made based on the title and abstract. Then the StaA3 (Staples and
detail of each work selected is reviewed. The work of two reviewers is Niazi 2007)
suggested: one for the first stage and another for the second.
One must look beyond the abstracts, in SE and IT these are generally of low BreA3 (Brereton,
quality; the conclusions should also be reviewed. Kitchenham
et al. 2007)
Carried out in the first stage of the proposal on the basis of the title and GriA3 (Grimán and
abstract, obtaining an initial list of papers to review. The aim of the review Juristo 2007)
and classification of the papers selected are determined in detail, executing a
refined search that complements the list.
Use of automated searches versus manual annotated searches, based on Kit3A3 (Kitchenham,
whether the number or quality of the works selected is desired. Brereton et
al. 2010)
It can be observed that generally the proposals mentioned here suggest: (1) definition of guidelines to
establish the inclusion/exclusion criteria, (2) guidelines to resolve disagreements between reviewers when
selecting works, (3) use of peer review to avoid bias when selecting a work and (4) review of other elements
of the paper such as the conclusions, because abstracts are usually of low quality.
- Evaluation of the quality of the works selected (A4): Table 6 contains the changes or proposals for activity
A4 (Kitchenham 2004; Dybå, Dingsøyr et al. 2007; Grimán and Juristo 2007; Kitchenham, Charters et al.
2007; Staples and Niazi 2007; Kitchenham, Brereton et al. 2010).
Table 6. Proposals for the stage “Evaluation of the quality of the works selected (A4)”
Changes - Proposals Code Authors
Suggests guidelines for the definition of a criterion that makes it possible to KitA4 (Kitchenham
assess the quality of the selected works. Establishes hierarchies as far as 2004)
types of works in SE, as well as the development and use of quality
instruments.
Proposes a set of checklists with factors that can evaluate the quality of the Kit2A4 (Kitchenham,
selected works. Charters et al.
2007)
The proposal suggests that the same investigator do A4 and A5 StaA4 (Staples and
simultaneously. Niazi 2007)
It is a complete stage of the proposal, third and last that is carried out on the GriA4 (Grimán and
selected works to determine their quality, ensuring that they receive some Juristo 2007)
particular treatment during the synthesis.
Frame to evaluate the quality of the work selected based on eleven quality DybA4 (Dybå,
criteria including rigor, relevance and credibility. Dingsøyr et
al. 2007)
Quality evaluations should be based on the participation of three independent Kit3A4 (Kitchenham,
evaluators and include at least two rounds of discussion to settle Brereton et
disagreements in the evaluation. al. 2010)
It can be observed that generally the proposals mentioned here suggest: (1) guidelines and framework to
evaluate the quality of the selected work, (2) use of checklists with defined factors to evaluate the quality of
the work and (3) participation of multiple evaluators and discussion rounds to reach a consensus on criteria.
- Data extraction (A5): Table 7 contains the changes or proposals for activity A5 (Kitchenham 2004; Caro,
Ríos et al. 2005; Mian, Conte et al. 2005; Brereton, Kitchenham et al. 2007; Grimán and Juristo 2007;
Kitchenham, Charters et al. 2007; Petersen, Feldt et al. 2007; Staples and Niazi 2007).
It can be observed that generally the proposals mentioned here suggest: (1) design and use of forms to record
data, (2) use of software tools to support the documentation of data, (3) use of peer review and (4) recording
of the section of the article where the selected data is found.
- Data synthesis (A6): Table 8 contains the changes or proposals for activity A6 (Kitchenham 2004; Caro,
Ríos et al. 2005; Mian, Conte et al. 2005; Brereton, Kitchenham et al. 2007; Grimán and Juristo 2007;
Petersen, Feldt et al. 2007; Staples and Niazi 2007).
Niazi 2007)
Use of categorized table allows publication frequencies of each work to be PetA6 (Petersen,
obtained. Starting from the categories and aspects defined in the study, a Feldt et al.
bubble graph is constructed that shows the number of works on each topic in 2007)
terms of the size of the bubble.
Use of tabulated data to facilitate their combination and thus to clarify how BreA6 (Brereton,
the data answer the RQs. Kitchenham
et al. 2007)
Forms part of the gathering phase, but here is called experiment codification. GriA6 (Grimán and
The aim is to synthesize the input data in as formalized a way as possible, Juristo 2007)
avoiding investigator bias.
It can be observed that generally the proposals mentioned here suggest: (1) guidelines for synthesizing data,
(2) summary with statistical results from quantitative and qualitative data, and (3) use of tables and databases
to facilitate data queries and analysis.
It can be observed that generally the proposals mentioned here suggest: (1) formats and guidelines to publish
results of the SLR and (2) the reviews and decisions made during the process must be reported.
criteria, (5) synthesizing the data, and obtaining statistical results from quantitative and qualitative data and
using tables and databases to facilitate their analysis, and finally (6) publishing the results and reporting the
reviews and decisions made in the process.
3.3.2 Timelines
From the stages and activities identified, as well as from the changes proposed for each of these activities, a
timeline has been prepared for each stage (planning, implementation and documentation) with the aim of
illustrating graphically at what point these proposals are concentrated. To this end, use will be made of the
previously defined acronyms for each work reviewed.
Fig. 6 Proposals for changes to the SLR protocol for the planning stage.
Fig. 7 Proposals for changes to the SLR protocol for the implementation stage.
Fig. 8 Proposals for changes to the SLR protocol for the documentation/report stage.
SLRs in SE has generally attained a certain acceptance and stability within the SE community and the
emphasis of the community is now migrating towards improving the quality of primary studies. It is beyond
the scope of this work to verify whether this hypothesis is true or false, but we believe that this may give rise
to a new type of research within the SLR and SE with respect to the quality of primary works and the need to
establish more tertiary studies that are dedicated to reviewing the quality of secondary studies.
- Abstracts: With respect to the use that can be made of abstracts in SE when selecting articles, (Staples and
Niazi 2007) criticize their low quality and how they may be considered a key element in making this decision.
For their part, (Jedlitschka and Pfahl 2005) emphasize the use of the structured abstract and suggest its use as
an important source of information that serves the readers in general, summarizes the main aspects of the
work and emphasizes it as the only section of the publication that is accessible free of charge. This is
complemented by the recommendations and considerations of the use of structured abstracts proposed in
(Budgen, Kitchenham et al. 2008).
- Searches: As far as the search for relevant works using the search engines offered on the websites of the
main digital sources used by the SE community (IEEEXplore, ACM Digital Library, Springer Link, Science
Direct), it is necessary to use different search strings for the different sources, try them out and evaluate the
results (Kitchenham, Mendes et al. 2007.; Chen, Ali Babar et al. 2009). This is also supported by (Staples and
Niazi 2007), who illustrate the fact that the search engines do not support the use of the search strings to
conduct SLRs. It is relevant to mention how efforts in reviewing the processes to search for works have been
made, comparing the use of manual searches with automated wide searches as well as evaluating the
importance of grey literature (Kitchenham, Brereton et al. 2009). The work by (Kitchenham, Brereton et al.
2010), in addition to proposing changes in activities A3 and A4, presents: (1) a comparison between the
guidelines for medicine and SE in conducting SLRs, with respect to how to perform searches of relevant
works and (2) a glossary of terms adopted from the based experience medicine and which are not widely
known in SE, which can be of great help for those initiating SLRs. As far as having a unified source of SLRs
in SE, (Staples and Niazi 2007) pose the idea of generating a centralized SLR index in SE similar to the one
in medicine, the Cochrane Collaboration†.
- Quality: According to (Cruzes and Dybå 2011), the quality of the SLRs conducted can be positively
influenced if the challenges at the time of synthesizing the research around SE are better understood; in
addition, despite the focus being placed on SLRs, limited attention is given to this item because it requires
becoming a central aspect of the SLR so as to increase its importance and utility both in the research and
practice of the discipline. For their part, (Staples and Niazi 2007) suggest a simplification of the original
criterion raised by Kitchenham to evaluate the quality of the work shown in each paper, thus facilitating the
undertaking. In the future, instruments should be developed that support the implementation and control of a
SLR, similar to the PRISMA‡ proposal for medicine (Moher, Liberati et al. 2010).
- Protocol and stages: The improvements or critiques regarding how to conduct a SLR expressed by
(Brereton, Kitchenham et al. 2007) also present a set of learning strategies that have accumulated with the
development of the SLR in SE. They also define the stages of a SLR and which of these are used “as-is” or
which need to be adapted to the field or practice of SE. By contrast, with respect to the original protocol for
SLR, (Staples and Niazi 2007) talk about the little clarity in directives for synthesizing data, and although
they agree with Brereton about the importance of running a pilot project, they also then criticize Kitchenham
for not clarifying when to stop or when a pilot project must be run. Based on his experience, Staples discusses
the non-trivial nature of validating the protocol of a SLR because it is not easy to find reviewers, and he
attributes this to the paucity of experience in developing SLRs.
Finally, it should be emphasized that (Biolchini, Mian et al. 2006) also propose the use of templates to
conduct SLRs, but they also define an ontology that describes the knowledge of experimental studies, the
application of this template can be seen in the technical report (Biolchini, Mian et al. 2005). As far as the
reporting of results is concerned, (Jedlitschka and Pfahl 2005) provide guidelines about how to report results
in empirical SE and establish a comparison between the different guidelines for reporting results, which
include SLRs.
†
http://www.cochrane.org/cochrane-reviews
‡
http://www.prisma-statement.org/index.htm
The final selection included 151 works that report having conducted a SLR on SE subjects; 11 works were
also included that report protocol proposals for carrying out SLRs in SE or changes to these, between 2004
and 2011. We think that the specificity of the latter topic has caused the sample to be rather small, and due to
this same specificity, the review provides a reliable overall view of the state of research in this area.
We are aware that there are some threats that may affect the validity of the findings discovered to date, the
most important being:
• Possible bias at the time of selecting works, such that we considered only a subgroup of the existing
SLRs.
• To the aforementioned we must add that although data sources that are highly recognizable within the SE
community were used (IEEEXplore, ACM Digital Library, Science Direct and Google Scholar), we
stopped considering others that were equally relevant, basically due to aspects of scope and time.
• Limitations of the tools used to conduct the searches in the electronic data sources, as already mentioned
in previous sections.
We tried to mitigate these threats by means of an individual selection and a joint validation of the works, thus
avoiding individual bias. In order to avoid works being left out of the study as a result of the searches, the idea
was to review all the versions of a work, whether these were journals, conference proceedings or technical
reports.
- RQ1: How did increase the use of the SLR methodology in SE?
The sources consulted revealed a significant increase in the number of SLRs conducted, going from zero in
2004 to a total of 50 in 2011 and in the entire 2004-2011 periods, 151 works were published. In addition it
was possible to observe how from 2007 on the number of SLRs in SE published per year had increased
significantly. In order to ratify this upward trend in SLRs published from 2004 to 2011, it should be
emphasized that the average absolute increase between 2 consecutive years is approximately 7 works, and the
average rate of increase between 2 consecutive years is approximately 44%. For details see the Figures 1-3
and 9.
- RQ2: How has the original protocol been modified for SLR implementation in SE?
The protocol for the implementation of SLRs in SE was originally defined by (Kitchenham 2004), and later
works were published proposing changes to it, in one or more activities of the three stages included in the
original protocol.
Generally we can say that the proposals for changes to the SLR protocol in SE focus essentially on defining
guidelines for: (1) supporting the definition of the RQs, (2) identifying and selecting relevant data sources as
well as the definition of a search strategy aligned with the RQs and classification of the identified works by
category, (3) defining the inclusion/exclusion criteria, the solution of disagreements between reviewers when
selecting works and the caution in using only abstracts due to their low quality, (4) evaluating the quality of
the selected works and participation of several evaluators and how to reach a consensus on the criteria, (5)
synthesizing the data, and obtaining statistical results from quantitative and qualitative data and using tables
and databases to facilitate their analysis, and finally (6) publishing the results and reporting the reviews and
decisions taken in the process.
- RQ2.1: How many protocol proposals to develop SLRs in SE or changes to these have been published?
From the evidence collected, it may be stated that altogether there are 11 reviewed works that propose a
protocol or changes to it to conduct a SLR in SE and include the period between 2004 and 2011.
These 11 works contain 46 proposals, 25 of which were published in 2007, which means that 54% of the
proposals are concentrated in this year.
- RQ2.2: At what stages and activities are the proposals of changes to the protocol concentrated?
From the point of view of the stages of the process to conduct a SLR, the stage with the greatest number
of proposals is that of implementation, which concentrates 37, or 80% of all the proposals. With respect
to the activities, three were identified with the greatest number of proposals: Identification of relevant
works (A2), Selection of relevant works (A3) and Data Extraction (A5) with 8 proposals each, which is
equivalent to 17% in each case. The documentation/report stage presents the least number of proposals, 3,
or 7% of the total. Finally, it is worthy of note that some works not only present changes in some
activities of the protocol, but also define different stages and that these are executed in an order different
from the other proposals, as is the case with (Grimán and Juristo 2007). For details see Figures 6-8 and
Tables 3-9.
The reviewed data indicate that parallel to the significant increase in SLRs in SE in 2007-2008, which also
shows a growth rate maintained until today, but by the other way the number of proposals or changes to the
protocol to perform the SLR in SE have fallen drastically. This makes us think that the protocol to implement
SLRs in SE has generally attained a certain acceptance and stability within the SE community and the
emphasis of the community is migrating toward improving the quality of primary studies. We do not have the
arguments and it is beyond the scope of this work to verify whether this hypothesis is true or false, but we
believe that it can give rise to a new type of research within the SLR and SE with respect to the quality of
primary works and the need to establish more tertiary studies that are dedicated to reviewing the quality of
secondary studies.
In addition, we can state that if the authors and co-authors of each one of the 11 articles with proposals for the
SLR protocol in SE are reviewed, we observe that in 50% of these a set of six researchers is involved, and we
can therefore say that there is a group concerned with improving the processes involved in the performing
SLRs in SE. Among these authors, the case of Barbara Kitchenham stands out, who in addition to having
defined the protocol to develop SLRs in SE, is present in four of the 11 works, three of which as the author
and one as the co-author.
4. Related Work
In this section we present a compilation of works with observations and analyses of the SLRs performed in
SE. This compilation orders the works chronologically.
The literature review on research in SE carried out by (Glass 2002) suggested that it is broad in the topics
treated and narrow in the approaches and research methods used; in addition, the study shows a range of
research methods used in SE, and it is worthy of note that only 1.1% of investigations in SE use the method
called literature review/analysis.
In 2004 (Glass, Ramesh et al. 2004) mentioned that one of the criticisms of the research conducted is that the
investigators in SE and computer science, particularly in contrast to those in information systems, make little
or no use of the methods and experiences available from other disciplines of reference.
In 2009 (Kitchenham, Brereton et al. 2009) published an evaluation of the impact of SLRs between 2004 and
2007, concluding that the thematic areas covered until that time were limited and that the European
investigators, in particular those from the simulation laboratory, seemed to be the main representatives of
SLRs.
Next in 2010 (Kitchenham, Pretorius et al. 2010) published the results of tertiary studies between 2007 and
2010 with the aim of providing a set of comments available to the investigators developing SLRs in SE, and
they concluded that the works had improved in quality, but could not yet be considered a principal research
method in SE.
In 2011 (da Silva, Santos et al. 2011) analyzed the quality, topics covered and potential impact of the SLRs
published in 2008 and 2009, both for education and for the practice of the discipline, concluding that although
the quality and number of investigations had improved, most SLRs did not appraise the quality of the primary
studies and did not provide directives for professionals, thereby actually reducing the potential impact on the
practice of SE. In the same year, (Ramey and Rao 2011) referred to the SLR as a methodology, its being
“imported” from medicine, the changes made to adapt it to other disciplines and finally they suggested an
evaluation with a set of strengths and weaknesses of the method applied in SE.
Finally, (Zhang, Babar et al. 2011) conducted an empirical investigation into the use, adoption and advantages
of SLRs in the scientific area for SE, helping SE researchers and professionals to understand the perceived
value and the current or potential impact of SLRs.
The works mentioned in this section present a review that includes a set of comments, observations and
critiques of the SLRs conducted in SE. None of these, however, shows a study that presents the adaptation of
SLRs as an applied research methodology in SE from a perspective of the proposed changes to the initial
protocol used to develop them. This article also provides an account of the origin, development and current
state of the SLR in SE.
5. Conclusions
The work presented covers aspects of the origin, development, use and adaptation of the SLR as a research
methodology in SE, providing a chronological frame of reference that includes its current status in the field.
In addition, the answers and evidence for the RQs posed at the beginning of the work have been reviewed. We
believe that this work may be of interest to industry professionals and new researchers who wish to approach
the SLR as a relevant source of information as well as researchers planning to conduct additional studies on
SLR and their application in SE.
Although there are other works that present both a review and a set of observations and critiques of SLRs
conducted in SE, evidence of works that specifically report results on the adaptation of SLRs as a research
methodology applied in SE from a perspective of the changes proposed to the initial protocol used to develop
them have not been found. From this, we understand that more tertiary studies are required in this area that
makes it possible to delve into greater detail.
A future work is suggested, extending out from this one, adding and refining the RQs and more data sources
in order to ratify the ideas put forward here. Furthermore, the development of a prototype that indexes and
finds works that only include SLRs in SE is proposed, thus dealing with the deficiency indicated by the
literature for our discipline.
Acknowledgements
This work was conducted with the support of Vicerrectoría de Investigación y Postgrado at the Universidad
de La Frontera, through Research Project # DI14-0065. Special thanks to Mauricio Bustamante for his useful
comments, reviews and technical advice on this work.
References
Basili, V., F. Shull, et al. (1999). "Building knowledge through families of experiments." IEEE Transactions
on Software Engineering 25(4): 456-473.
Biolchini, J., P. G. Mian, et al. (2005). "Systematic Review in Software Engineering." System Engineering
and Computer Science Department COPPE/UFRJ, Technical Report ES 679(05).
Biolchini, J. C., P. G. Mian, et al. (2006). "Scientific research ontology to support systematic review in
software engineering." Advanced Engineering Informatics 21(2): 133-151.
Brereton, P., B. A. Kitchenham, et al. (2007). "Lessons from applying the systematic literature review process
within the software engineering domain." Journal of Systems and Software 80(4): 571-583.
Budgen, D., B. a. Kitchenham, et al. (2008). "Presenting software engineering results using structured
abstracts: a randomised experiment." Empirical Software Engineering 13(4): 433-458.
Caro, M. A., A. R. Ríos, et al. (2005). "Análisis y revisión de la literatura en el contexto de proyectos de fin
de carrera: Una propuesta." Revista Sociedad Chilena de Ciencia de la Computación 6(1).
Chen, L., M. Ali Babar, et al. (2009). Variability Management in Software Product Lines: A Systematic
Review. 13th International Software Product Line Conference, Carnegie Mellon University.
Chrissis, M. B., M. Konrad, et al. (2003). CMMI: Guidelines for process integration and product
improvement, Addison-Wesley Professional.
Clark, T., P. Sammut, et al. (2004). "Applied metamodelling: a foundation for language driven development."
Cruzes, D. S. and T. Dybå (2011). "Research synthesis in software engineering: A tertiary study." Information
and Software Technology 53(5): 440-455.
da Silva, F. Q. B., A. L. M. Santos, et al. (2011). "Six years of systematic literature reviews in software
engineering: An updated tertiary study." Information and Software Technology 53(9): 899-913.
Dybå, T., T. Dingsøyr, et al. (2007). Applying systematic reviews to diverse study types: An experience
report. First International Symposium on Empirical Software Engineering and Measurement, ESEM
2007, IEEE.
Dybå, T., B. A. Kitchenham, et al. (2005). "Evidence-based software engineering for practitioners." Software,
IEEE 22(1): 58-65.
Glass, R. L. (2002). "Research in software engineering: an analysis of the literature." Information and
Software Technology 39(2): 735-506.
Glass, R. L., V. Ramesh, et al. (2004). "An Analysis of Research in Computing Disciplines."
Communications on ACM 47(6): 89-94.
Grimán, A. and N. Juristo (2007). Proposal of a Review Process of Empirical Studies in Software
Engineering. International Doctoral Symposium on Empirircal Software Enginnering
(IDoESE2007): 25-32.
Gwet, K. (2002). "Inter-rater reliability: dependency on trait prevalence and marginal homogeneity."
Statistical methods for inter-rater reliability assessment 2: 1-9.
Jedlitschka, A. and D. Pfahl (2005). Reporting Guidelines for Controlled Experiments in Software
Engineering. International Symposium on Empirical Software Engineering, IEEE.
Jorgensen, M., T. Dyba, et al. (2005). Teaching evidence-based software engineering to university students.
Software Metrics, 11th IEEE International Symposium (METRICS’05).
Jørgensen, M. and M. Shepperd (2007). A systematic review of software development cost estimation studies.
IEEE Transactions on SE.
Kitchenham, B. (2004). Procedures for performing systematic reviews. Technical Report TR/SE-0401. S. E.
Group, Department of Computer Science, Keele University.
Kitchenham, B., P. Brereton, et al. (2009). "Systematic literature reviews in software engineering – A
systematic literature review." Information and Software Technology 51(1): 7-15.
Kitchenham, B., P. Brereton, et al. (2009). The Impact of Limited Search Procedures for Systematic
Literature Reviews – A Participant-Observer Case Study. Third International Symposium on
Empirical Software Engineering and Measurement.
Kitchenham, B., S. Charters, et al. (2007). Guidelines for performing Systematic Literature Reviews in
Software Engineering. EBSE Technical Report, EBSE-2007-01 Software Engineering Group, School
of Computer Science and Mathematics Keele University and Department of Computer Science
University of Durham.
Kitchenham, B., E. Mendes, et al. (2007.). "Cross versus within-Company Cost Estimation Studies: A
Systematic Review." IEEE Transactions on Software Engineering 33: 316-329.
Kitchenham, B., R. Pretorius, et al. (2010). "Systematic literature reviews in software engineering – A tertiary
study." Information and Software Technology 52(8): 792-805.
Kitchenham, B. a., P. Brereton, et al. (2010). "Refining the systematic literature review process—two
participant-observer case studies." Empirical Software Engineering 15(6): 618-653.
Kitchenham, B. A., T. Dyba, et al. (2004). Evidence-based software engineering. 26th International
Conference on Software Engineering, IEEE.
Mian, P., T. Conte, et al. (2005). A systematic review process to software engineering. 2nd Experimental
Software Engineering Latin American Workshop (ESELAW'05), Brazil.
Moher, D., A. Liberati, et al. (2010). "Preferred reporting items for systematic reviews and meta-analyses:
The PRISMA statement." The PRISMA Group International Journal of Surgery 8(5): 336–341.
Petersen, K., R. Feldt, et al. (2007). Systematic Mapping Studies in Software Engineering. 12th International
Conference on Evaluation and Assessment in Software Engineering.
Ramey, J. and P. G. Rao (2011). The systematic literature review as a research genre. Professional
Communication Conference (IPCC), Cincinnati, OH, USA, IEEE.
Rodriguez, D. (2005). Empirical software engineering research: epistemological and ontological foundations.
First Workshop on Ontology, Conceptualizations and Epistemology for Software and Systems
Engineering (ONTOSE).
Sackett, D. L., W. Rosenberg, et al. (1996). "Evidence based medicine: what it is and what it isn't." British
Medical Journal (BMJ) 312(7023): 71-72.
Shull, F., J. Carver, et al. (2001). An empirical methodology for introducing software processes. Joint 8th
European Software Engineering Conference (ESEC) and 9th ACM SIGSOFT Foundations of
Software Engineering (FSE-9), Vienna, Austria.
Sjøberg, D. I. K., T. Dybå, et al. (2007). The Future of Empirical Methods in Software Engineering Research.
Future of Software Engineering, FOSE'07, IEEE CS.
Staples, M. and M. Niazi (2007). "Experiences using systematic review guidelines." Journal of Systems and
Software 80(9): 1425-1437.
Wohlin, C., M. Höst, et al. (2006). "Empirical research methods in Web and software Engineering." Web
Engineering: 409-429.
Wohlin, C., P. Runeson, et al. (2000). Experimentation in software engineering: an introduction, Kluwer
Academic Publisher.
Zelkowitz, M. V., D. R. Wallace, et al. (2003). "Experimental validation of new software technology."
Lecture Notes on Empirical Software Engineering 12: 229-263.
Zhang, H., M. A. Babar, et al. (2011). "Identifying relevant studies in software engineering." Information and
Software Technology 53(6): 625-637.