001 Zhang
001 Zhang
Software Engineering
BACKGROUND: Systematic Literature Review (SLR) has become an important research methodology in
software engineering since 2004. One critical step in applying this methodology is to design and execute
appropriate and effective search strategy. This is quite time consuming and error-prone step, which needs
to be carefully planned and implemented. There is an apparent need of a systematic approach to designing,
executing, and evaluating a suitable search strategy for optimally retrieving the target literature from digital
libraries.
OBJECTIVE: The main objective of the research reported in this paper is to improve the search step
of doing SLRs in SE by devising and evaluating systematic and practical approaches to identifying relevant
studies in SE.
OUTCOMES: We have systematically selected and analytically studied a large number of papers to
understand the state-of-the-practice of search strategies in EBSE. Having identified the limitations of the
current ad-hoc nature of search strategies used by SE researchers for SLR, we have devised a systematic
approach to developing and executing optimal search strategies in SLRs. The proposed approach
incorporates the concept of ‘quasi-gold standard’, which consists of collection of known studies and
corresponding ‘quasi-sensitivity’ into the search process for evaluating search performance. We report the
case study and its finding to demonstrate that the approach is able to improve the rigor of search process in
an SLR, and can serves as the supplements to the guidelines for SLRs in EBSE. We plan to further evaluate
the proposed approach using several case studies with varying topics in software engineering.
Search strategy, quasi-gold standard, systematic literature review, evidence-based software engineering
no concrete strategy has been provided for search capture the concepts of interest White et al. (2001). An
strategy evaluation. optimum search strategy is expected to provide effective
solutions to a series of questions for search process in
Despite the current state that neither the above EBSE SLR:
papers nor the SLR guidelines include the practical
instructions about how to improve and evaluate the rigor 1. Which approach to be used in search process (e.g.
and performance of a search strategy, some issues manual or automated search)?
relating to literature search in SE have emerged and
been reflected in SLR reports, such as 2. Where (source or venue) to search, and which part
of article (field) should be searched?
• How to design a rigorous search strategy that
3. What (subject, evidence type) to be searched,
maximises the collection of relevant studies?
and what are inputs (search strings) to search
• What are criteria of an affordable and reliable strat- engines?
egy to effectively balance the search sensitivity
4. When is the search carried out, and what time
(quality) and precision (effort)?
span to be searched?
• Is it possible to evaluate a predefined search
strategy and corresponding search strings? Which approach(es)? The guidelines, Biolchini et
al. (2005), Kitchenham and Charters (2007), all
Moreover, the latest version of guidelines Kitchenham emphasise the literature search through web search
and Charters (2007) also encourage software engineer- engines provided by digital libraries, i.e. automated
ing researchers to develop and publish such strate- search. However, in practice, many reported SLRs
gies including identification of relevant digital libraries. also employed manual search, alone or combined with
Hence, there is a need for validated search strategies automated search, in specific sources (e.g., Jorgensen
for SLRs that optimise retrieval of relevant studies from and Shepperd (2007)).
digital libraries and electronic databases for researchers
and practitioners. This paper attempts to serve as a In manual (hand) search, investigators scan the sources
preliminary response to this need. We have devised a (e.g., journals or proceedings) paper by paper and
systematic and practical approach for search strategy issue by issue. This search method may ensure the
development in order to improve the rigor of search pro- capture of relevant studies in the specified sources, but
cesses in SLRs. This approach also strives to balance in the meantime, consumes much effort in examining
the retrieval of validated set of relevant studies in SE and many irrelevant studies. Instead, automated search uses
the effort consumed in this phase. search strings, which represent the identifiers of the
subject, to retrieve results from search engines (digital
This paper is structured as follows. Section 2 introduces libraries). Compared to manual search, this method
concepts related to search strategies for SLRs. In is more efficient, but its performance depends on the
Section 3, we describe a systematic and practical quality of search strings, capability of search engine, and
approach for implementing a relatively rigorous literature diversity of the subject.
search. This search approach is then demonstrated by a
‘replicated’ search (an observer-participant case study) Where to search? ‘Search source’ was used as a
and compared to its original SLR in Section 4. Finally, general term for where relevant studies can be retrieved.
some discussion and our conclusion are presented in We use ‘search source’ distinct from ‘search engine’
Section 5. in defining search strategies. As automated search
always retrieves results from search engine, in contrast,
the former is dedicated for the sources specified in
2. SEARCH STRATEGY IN SYSTEMATIC citations (e.g., journals and proceedings) in this paper,
LITERATURE REVIEWS they are specified and scanned in manual search. As
illustrated in Figure 1, generally speaking, there is a
2.1. Defining Search Strategy many-to-many relationship between them: one engine
can cover multiple sources, while one source may also
A necessary and crucial step of SLR is the identification be retrievable from more than one engine.
of as much relevant literature to research questions as
possible. Search strategy, which defines the methods What into search? Subject and article type, which are
to retrieve the relevant literature, has been developed normally defined in protocol, are two important filters
in many ways, but the typical approach can be for to remove irrelevant studies and low quality studies.
information professionals (in subject matter) to use their For SLRs in SE, the most used subjects are ‘computer
combined knowledge of databases (digital libraries), science’ and ‘software engineering’. Search strings,
search techniques, thesauri and the field of interest, to which are connected with logic operators, are inputs
explore, often iteratively, combinations of terms which to search engines in automated search. This paper
2
On Searching Relevant Studies in Software Engineering
3
Zhang • Ali Babar
2.3.1. Automated search vs. manual search Dieste and Padua (2007) investigated the optimal search
To investigate the realistic implementation of search strategies using the combination of alternative search
strategies in EBSE, we conducted a search of SLRs strings for automated search in SLR. Nevertheless,
published in SE, which extends the SLR search reported the ‘gold standard’ used to calculate sensitivity was
in Kitchenham et al. (2009) with the updated records by established from the studies already identified in another
the end of 2008. This up-to-date SLR search identified SLR by Sjoberg et al. (2005). In most cases of SLR,
38 SLRs. The search results consists of 68% (26 out of such a ‘gold standard’ is impossible to be accessed by
38) reported studies using automated searches in their researchers in the planning stage of their intended SLRs.
SLRs; 39% (15 out of 38) using manual search; and In other words, a ‘gold standard’ in this case provides
26% (10 out of 38) combining the both. Several SLRs no help to search strategy evaluation, and to ensure the
did not report the search method they used, or were retrieval quality of relevant studies in SLRs.
conducted based on the studies identified by other SLRs,
such as Hannay et al. (2007). So far, to the authors’ knowledge, neither comprehensive
definition and rigorous development method of search
2.3.2. Search engines and search sources strategy nor practical evaluation approach has been
Table 1-a summarizes 11 engines (digital libraries) used developed for retrieving relevant studies in SE.
more than once in SLRs for searching relevant studies
in SE, which are ranked in order of their frequencies.
Among them, IEEE Xplore and ACM Digital Library are 3. QGS BASED SCIENTIFIC SEARCH APPROACH
the main search portals for most SLRs in SE. Table 1-
Based on the concept of Quasi-Gold Standard (QGS),
b lists top sources for manual search used twice or
this section constructs a systematic, scientific, and
more in SLRs. The sources related to SE in general
also practical literature search approach for SE, which
(e.g., IEEE Software, TSE, ICSE) and empirical software
provides capability for search strategy development and
engineering (e.g., ESEM, ISESE) were most used in
evaluation.
manual search in the previous SLRs.
2.4. Related Work in Software Engineering Instead, we introduce the concept of ‘quasi-gold
standard’, which is a set of known studies from key
Some previous researchers have discussed the issues sources, e.g., domain-specific proceedings and journals
related to literature search in software engineering. recognized by the community in the subject, for a given
Brereton et al. (2007) identified several issues of time span. Note that compared to a gold standard,
electronic search derived from their experience in there are two more constraints associated with a
conducting SLRs. For instance, researchers must select ‘quasi-gold standard’: venues (where) and period (what
and justify a search strategy that is appropriate for time). In other words, a ‘quasi-gold standard’ can be
their research questions; primary studies could not be regarded as a ‘gold standard’ in the conditions where
retrieved from single source, etc. these constraints apply. Accordingly, a more objective
4
On Searching Relevant Studies in Software Engineering
5
Zhang • Ali Babar
Select library engines for automated search. The Figure 4 displays there might be backward link from
selection depends on the distribution of related sources the ‘decision’ to Step 3. In this case, the set of search
across libraries, the coverage and overlapping among terms has to be refined or enriched in order to capture
them, and their accessibility to the searchers. Whereas, more samples included in QGS through next round of
by observing the most reported SLRs, IEEE Xplore and automated search.
ACM Digital Library become the must-have literature
portals that are recommended for consideration of any Objective search string elicitation. One of the uses
automated search of future SLRs in SE. of QGS is to elicit the recommended search strings
using text mining. In the objective approach, a frequency
Given the many-to-many relationship between search analysis of citation information of the studies in QGS is
sources and engines (Figure 1), an optimum combina- undertaken followed by a statistical analysis of the most
tion of both should cover a maximum number of sources frequently occurring words or phrases. This analysis
with a minimum set of search engines (libraries), in other determines which terms would best distinguish relevant
words, eliminate as much overlapping as possible. studies from irrelevant ones.
3.2.2. Step 2: Establish QGS Some textual analysis packages, such as SimStat
The manual search is conducted by screening all and WordStat Provalia (2009), are able to facilitate
articles, one by one, published in the selected sources the identification of the frequently occurring terms in
(e.g., proceedings and journals) and during a given particular items of studies. For instance, the title-
period. The title-abstract-keywords fields of a paper are abstract-keywords of the papers in QGS are imported
first checked. The inclusion and exclusion criteria should into the analysis software for frequency analysis. This
be explicitly defined in advance. As recommended in may produce all the words or phrases being ranked
the guidelines Kitchenham and Charters (2007), the according to the number of records in which each word
reliability of inclusion decision should be assessed using appears by case. This technique is able to identify
the Kappa statistic between researchers, or reviewed the candidate search terms with exception of some
by an external panel. If selection decision could not be stop words which are deliberately excluded White et al.
made, the other fields (like conclusion or even full text) (2001) (e.g., ‘the’ and ‘of’).
need to be further examined.
Note that although the statistical software for textual
One important assumption underlying the manual analysis can help the search string elicitation, especially
search processes in the previous SLRs is that all relevant for a large scale QGS, subjective judgement might also
studies within the indicated sources could be identified be needed to finally construct the string to automated
by carefully screening all the articles. Hence, once the search based on the frequency list generated through
screening is completed and agreement on the selection the computer aided analysis.
is reached, all these identified studies are used to form
the QGS. 3.2.4. Step 4: Conduct automated search
This step uses the strings for automated search, which
As quasi-gold standard is source- (engine) and period- are (subjectively) defined or (objectively) elicited. As
specific, the sources selected in Step 1 can also be the search syntax varies between search engines, the
grouped by search engines. For a large scale SLR, in search strings need to be coded correspondingly in
addition to an overall QGS, this step may produce more advance by following the specific syntax and criteria
than one subset of QGS, each of which corresponds of each search engine (library). Given the capability
to one dedicated search engine. They enable testing limitations of some search engines (for example
search string’s performance for individual engine. ACM Dyba et al. (2007)), the automated search
sometimes has to be implemented by splitting the
3.2.3. Step 3: Define or elicit search strings combination of search terms into multiple simple ones.
Since search strings for automated search can be Note that due to the overlapping (such as between IEEE
defined based on subjective expertise or elicited from and ACM), the duplicate studies retrieved from different
the ‘quasi-gold standard’, the search process bifurcates search engines also need to be identified and removed
at this step. in this step.
Subjective search string definition. Most previously 3.2.5. Step 5: Evaluate search performance
reported SLRs in SE performed automated search in If the search strings for automated search are defined in
a subjective form. The reviewers defined their search the subjective approach, the search results need to be
strings based on their domain knowledge and past evaluated for securing the quality of automated search.
experiences. Though the strings they choose can be
evaluated later by QGS, in the subjective approach, it Calculate ‘quasi-sensitivity’. In EBSE, missing impor-
would be inspected by experts in the subject to reduce tant studies from an SLR may lead to the generation
the number of possible iterations and further save effort.
6
On Searching Relevant Studies in Software Engineering
7
Zhang • Ali Babar
The nominated sources were grouped into 5 libraries Table 4: Results from automated search
(Table 3), 4 of which were selected for the automated
search, i.e. IEEEXplore, ACM Digital Library, ScienceDi- Search engine #Results #In quasi-gold #Identified
Initial search
rect and SpringerLink. Note that other libraries can be IEEE Xplore 146 5 12
employed for automated search, but the QGS is only ACM digital library 34 1 6
DirectScience 31 6 6
valid for evaluating the search through them. SpringerLink 42 1 6
Overall 253 13 30
Refined search
4.2.2. QGS and automated search IEEE Xplore 270 8 15
In this case study, the searched articles should be ‘sys- ACM digital library 160 1 6
DirectScience 82 7 7
tematic reviews in software engineering’. Accordingly, SpringerLink 145 1 6
we refined the inclusion and exclusion criteria reported Overall 657 17 34
in the original SLR Kitchenham et al. (2009). Two re-
searchers screened all papers published in the sources
from 2004 to 2008 in manual search independently until terms (e.g., ‘survey’). So we refined the string as
reached joint agreements on all included studies. In total, (software AND (systematic OR controlled OR
21 studies were retrieved and 20 of them were used for structured OR exhaustive OR comparative) AND
building the QGS. Table 3 shows the source names and (review OR survey OR ‘‘literature search’’)),
their numbers of relevant studies (by 2007 and 2008). then performed the automated search again.
Table 3: Selected sources for manual search The revised automated search is able to capture
17 studies included in the quasi-gold standard,
Source Library/publisher/engine 2007 mid 2008 end which increases the ‘quasi-sensitivity’ up to 85% (i.e.
TSE IEEE 4 4
IEEE-SW IEEE 1 1 acceptable). By combining the studies from manual
ESEM(’07,’08) IEEE/ACM 0 2 search, the proposed search approach finally retrieves
ISESE(’04-’06) IEEE/ACM 2 2
Metrics(’04,’05) IEEE 0 0 38 SLRs for the tertiary study.
IST Elsevier 2 7
JSS Elsevier 2 2 4.3. Performance Comparison
EMSE Springer 0 2
EASE(’06-’08) IEE/BCS 0 1
Total 11 21 Although the similar inclusion and exclusion criteria
are employed in both the original and this replicated
The case study implemented automated search by searches, we exclude several ‘relevant’ studies that were
following the subjective definition approach, in which selected in the original SLRs during the manual search
the search strings are nominated based on the and selection due to the deviation caused by how strictly
authors’ knowledge relating to the subject of EBSE, the inclusion/exclusion criteria were followed.
and their observation of the studies included in the
Because of the disagreement between the original and
QGS. As we were looking for SLRs in SE, We
the replicated searches, we cannot directly compare
intuitively initiated the automated search with the
the numbers of identified studies from them given the
string (software AND systematic AND review) into
page limit. Instead, we focus on the comparison of
the fields of title-abstract-keywords through the above
performance between the implementations of different
engines. The search strings then were coded to fit the
search strategies. Table 5 shows the study numbers
syntax requirements and capability of each engine.
retrieved by following different strategies for the same
4.2.3. Evaluation and refinement research questions. The row headed with ‘manual only’
Table 4 summarizes the number of studies retrieved indicates how many studies can be identified if manually
by each database with the initial and refined search searching the sources given in Kitchenham et al. (2009)
strings. For example, there are 12 studies retrieved from 2004 till 2008. Two more SLRs could be found
by IEEE Xplore, 5 in the QGS. In total, 13 studies in when screening their specified sources (more than our
QGS were retrieved in the initial automated search. In sources in manual search). The ‘automated only’ row
terms of the sample size of QGS, the ‘quasi-sensitivity’ shows the search performance by search engines but
was calculated to be 65%(13/20), which is unacceptable without refinement; the bottom row presents the results
compared to the threshold (80%). As defined in Step 5, through the QGS based systematic search approach.
the search process had to go back to improve the string.
Table 5: Comparison among 3 strategies
By carefully checking the studies included in
Method SLRs identified Quasi-sensitivity
QGS but ignored in the initial automated search, Manual only 22 n/a
we found most of them published in the early Automated only (initial) 30 65%
Systematic 38 85%
years in the period (2004-2008) when the method
‘systematic review’ was just introduced to SE. Their
authors claimed the review studies using other
8
On Searching Relevant Studies in Software Engineering
5. DISCUSSION 6. CONCLUSION
The limitations of applying automated or manual search Systematic literature reviews have become an important
alone are illustrated in the case study. Manual search empirical research methodology in software engineering,
is difficult to scan a large number of sources within a and more and more SLRs are being conducted and
limited effort; on the other hand, the performance of reported. In SLR, an effective and rigorous literature
automated search highly relies on the quality of search search takes a critical role in evidence aggregation.
string, which may need continuous refinement in most In order to enhance the rigor and comprehension of
cases. Although some previous SLRs employed both methodology, with reference to the experience of SLRs
methods, most of them simply merged the search results in other disciplines (e.g., medicine and sociology), this
only. In contrast, the QGS based systematic search paper proposes a systematic search approach based
approach not only combines their results together, but on the concept of quasi-gold standard for retrieving and
establishes linkage between them for supporting each identifying relevant studies in software engineering. The
other with their own advantages. This approach also major contributions can be concluded as
suggests quantitative measurement for when you can
stop the iterative refinement of automated search, and • Provide a clear scope of search strategy and its
captures considerable identified studies with reasonable evaluation in searching relevant studies in SE.
effort.
• Introduce the concepts of ‘quasi-gold standard’
Some secondary studies related to a research topic and ‘quasi-sensitivity’ for developing and evaluat-
(subject matter), which have been screened and filtered ing the search strategy for a given SLR.
already by external researchers, could be introduced • Propose a systematic, scientific, and rigorous ap-
into quasi-gold standard to further reduce the effort proach for practical search strategy development,
in manual search. For instance, some previous SLRs implementation and evaluation.
directly used studies identified by Sjoberg et al. (2005)
as their full set of primary studies. As another example, Although the QGS based literature search approach is
the results from the mapping study by Jorgensen and proposed for improving the search processes in SLRs
Shepperd (2007) can be used to build QGS for more and EBSE, it can be used in other literature reviews in
specific SLRs in software cost estimation. In such cases, SE, and benefit the researchers and practitioners who
the results may need to be tailored in terms of subject intend to retrieve a relatively comprehensive collection
and time that conform to the new SLR. of relevant studies (for the subject and time given) within
reasonable effort.
As an alternative to search engine based search
strategy, reference list based search strategy can be Currently this approach is being effectively applied
another option for retrieving relevant studies. This in some systematic reviews in SE. We will continue
strategy was innovated with the concepts of co-citation the evaluation and improvement of this approach by
and bibliographic coupling Skoglund and Runeson conducting more case studies (with the objective and
(2009). However, as most of the major digital libraries in subjective search string elicitation methods) on varying
SE are not designed for supporting this kind of search, topics in software engineering. In addition, the future
it is very time-consuming in manually retrieving studies methodological work in ESE and EBSE community may
from reference list. Thus this search approach is not yet include to identify other issues and limitations of the
practical enough at present in software engineering, but SLRs reported in software engineering, and further to
is suggested as a supplementary source for a full SLR suggest practical improvements to the guidelines of
by Kitchenham and Charters (2007). systematic literature reviews.
9
Zhang • Ali Babar
Brereton, Pearl and Kitchenham, Barbara A. and Proceedings of 26th International Conference on
Budgen, David and Turner, Mark and Khalil, Mohamed, Software Engineering (ICSE’04), Edinburgh, Scotland,
(2007) Lessons from Applying the Systematic Literature May, pp. 273-284. IEEE Computer Society.
Review Process within the Software Engineering SimStat v.2.5 and WordStat v.5.1, (2009) Provalia
Domain, Journal of Systems and Software, 80(1), 571- Research, http://www.provalisresearch.com/.
583.
Sjoberg, Dag I.K. and Hannay, Jo E. and Hansen,
Biolchini, Jorge and Mian, Paula Gomes and Natali, Ove and Kampenes, Vigdis By and Karahasanovic,
Ana Candida Cruz and Travassos, Guilherme Horta, Amela and Liborg, Nils-Kristian and Rekdal, Anette
(2005) Systematic Review in Software Engineering, C. (2005) A Survey of Controlled Experiments in
Universidade Federal do Rio de Janeiro. Software Engineering, IEEE Transactions on Software
Dyba, T. and Dingsoyr, T. and Hanssen, Geir K. Engineering, 31(9), 733-753.
(2007) Applying Systematic Reviews to Diverse Study Skoglund, Mats and Runeson, Per (2009) Reference-
Types: An Experience Report. In Proceedings of based search strategies in systematic reviews. Pro-
1st International Symposium on Empirical Software ceedings of 13th International Conference on Evaluation
Engineering and Measurement (ESEM’07), Madrid, and Assessment in Software Engineering (EASE’09),
Spain, September, pp. 225-234. IEEE Computer Society. Durham, England, April. BCS.
Dyba, T. and Kitchenham, Barbara and Jorgensen, White, V.J. and Glanville, J.M. and Lefebvre, C. and
M. (2005) Evidence-Based Software Engineering for Sheldon, T.A. (2001) A Statistical Approach to Designing
Practitioners, IEEE Software, 22(1), 158-165. Search Filters to Find Systematic Reviews: Objectivity
Dieste, Oscar and Padua, Anna Griman. (2007) Enhances Accuracy, Journal of Information Science,
Developing Search Strategies for Detecting Relevant 27(6), 357-370.
Experiments for Systematic Reviews. In Proceedings Robert K. Yin (2003) Case Study Research: Design and
of 1st International Symposium on Empirical Software Methods (3rd edn). Sage Publication.
Engineering and Measurement (ESEM’07), Madrid,
Spain, September, pp. 215-224. IEEE Computer Society.
Dickersin, K. and Scherer, R. and Lefebvre, C. (1994)
Systematic Reviews: Identifying Relevant Studies for
Systematic Reviews, British Medical Journal, 309(6964),
1286-1291.
Hannay, Jo E. and Sjoberg, Dag I.K. and Dyba,
Tore (2007) A Systematic Review of Theory Use in
Software Engineering Experiments, IEEE Transactions
on Software Engineering, 33(2), 87-107.
Jorgensen, Magne and Shepperd, Martin (2007) A
Systematic Review of Software Development Cost
Estimation Studies, IEEE Transactions on Software
Engineering, 33(1), 33-53.
Kitchenham, Barbara and Brereton, O. Pearl and Bud-
gen, David and Turner, Mark and Bailey, John and
Linkman, Stephen, (2009) Systematic Literature Re-
views in Software Engineering: A Systematic Literature
Review, Information and Software Technology, 51(1), 7-
15.
Budgen, David and Kitchenham, Barbara A. and Char-
ters, Stuart M. and Turner, Mark and Brereton, Pearl and
Linkman, Stephen G. (2008) Presenting software engi-
neering results using structured abstracts: A randomised
experiment, Empirical Software Engineering, 13(4), 435-
468.
Kitchenham, Barbara and Charters, Stuart (2007)
Guidelines for Performing Systematic Literature Reviews
in Software Engineering (version 2.3), Keele University
and University of Durham.
Kitchenham, Barbara and Dyba, T. and Jorgensen,
M. (2004) Evidence-Based Software Engineering.
10