0% found this document useful (0 votes)
11 views

001 Zhang

The document proposes a systematic approach for developing optimal search strategies for identifying relevant studies in systematic literature reviews. It presents a case study applying the proposed approach and finds it improves the rigor of the search process compared to traditional ad hoc strategies. The authors plan to further evaluate the approach through additional case studies.

Uploaded by

Faryad Hussain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

001 Zhang

The document proposes a systematic approach for developing optimal search strategies for identifying relevant studies in systematic literature reviews. It presents a case study applying the proposed approach and finds it improves the rigor of the search process compared to traditional ad hoc strategies. The authors plan to further evaluate the approach through additional case studies.

Uploaded by

Faryad Hussain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

On Searching Relevant Studies in

Software Engineering

He Zhang Muhammad Ali Babar


Lero Software Engineering Research Centre, UL, Ireland IT University of Copenhagen, Denmark
National ICT Australia [email protected]
[email protected]

BACKGROUND: Systematic Literature Review (SLR) has become an important research methodology in
software engineering since 2004. One critical step in applying this methodology is to design and execute
appropriate and effective search strategy. This is quite time consuming and error-prone step, which needs
to be carefully planned and implemented. There is an apparent need of a systematic approach to designing,
executing, and evaluating a suitable search strategy for optimally retrieving the target literature from digital
libraries.

OBJECTIVE: The main objective of the research reported in this paper is to improve the search step
of doing SLRs in SE by devising and evaluating systematic and practical approaches to identifying relevant
studies in SE.

OUTCOMES: We have systematically selected and analytically studied a large number of papers to
understand the state-of-the-practice of search strategies in EBSE. Having identified the limitations of the
current ad-hoc nature of search strategies used by SE researchers for SLR, we have devised a systematic
approach to developing and executing optimal search strategies in SLRs. The proposed approach
incorporates the concept of ‘quasi-gold standard’, which consists of collection of known studies and
corresponding ‘quasi-sensitivity’ into the search process for evaluating search performance. We report the
case study and its finding to demonstrate that the approach is able to improve the rigor of search process in
an SLR, and can serves as the supplements to the guidelines for SLRs in EBSE. We plan to further evaluate
the proposed approach using several case studies with varying topics in software engineering.

Search strategy, quasi-gold standard, systematic literature review, evidence-based software engineering

1. INTRODUCTION an unbiased search strategy Kitchenham and Charters


(2007). The rigor of the search process is one factor
Systematic reviews (also referred as systematic litera- that distinguishes systematic reviews from traditional (ad
ture reviews, SLRs) aim to identify, assess and combine hoc) literature reviews.
the evidence from primary research studies using an ex-
plicit and rigorous method. This method has been widely Similar to other disciplines, many researchers doing
implemented in some disciplines, such as medicine SLRs rely on searches of digital libraries for identification
and sociology. Since their seminal paper of Evidence- of relevant studies in software engineering (SE).
Based Software Engineering (EBSE) was published in However, these database searches have typically been
2004 Kitchenham et al. (2004), systematic review has designed using methods lacking in scientific rigor,
become an important methodology of EBSE, and many instead often relying solely on investigator’s past
SLRs have been conducted and reported. experience and knowledge of the subject matter Boynton
et al. (1998). In practice, identifying primary studies
EBSE involves five distinct steps Dyba et al. (2005). The can be difficult for several reasons, including inadequate
second step, ‘search the literature for the best available search strategy, heterogeneity of language describing
evidence to answer the question’, builds the basis for the subject matter, and limited range of indexing terms
evidence aggregation, appraisal and further integration describing study methodology Dickersin et al. (1994).
with decision making practise. Kitchenham also states Though Biolchini et al. suggest evaluating search
that the aim of an SLR is to find as many primary studies engines to verify if they are capable of executing search
relating to the research questions as possible using strings during the planning phase Biolchini et al. (2005),

© The Authors. Published by the British


Informatics Society Ltd. 1
Zhang • Ali Babar

no concrete strategy has been provided for search capture the concepts of interest White et al. (2001). An
strategy evaluation. optimum search strategy is expected to provide effective
solutions to a series of questions for search process in
Despite the current state that neither the above EBSE SLR:
papers nor the SLR guidelines include the practical
instructions about how to improve and evaluate the rigor 1. Which approach to be used in search process (e.g.
and performance of a search strategy, some issues manual or automated search)?
relating to literature search in SE have emerged and
been reflected in SLR reports, such as 2. Where (source or venue) to search, and which part
of article (field) should be searched?
• How to design a rigorous search strategy that
3. What (subject, evidence type) to be searched,
maximises the collection of relevant studies?
and what are inputs (search strings) to search
• What are criteria of an affordable and reliable strat- engines?
egy to effectively balance the search sensitivity
4. When is the search carried out, and what time
(quality) and precision (effort)?
span to be searched?
• Is it possible to evaluate a predefined search
strategy and corresponding search strings? Which approach(es)? The guidelines, Biolchini et
al. (2005), Kitchenham and Charters (2007), all
Moreover, the latest version of guidelines Kitchenham emphasise the literature search through web search
and Charters (2007) also encourage software engineer- engines provided by digital libraries, i.e. automated
ing researchers to develop and publish such strate- search. However, in practice, many reported SLRs
gies including identification of relevant digital libraries. also employed manual search, alone or combined with
Hence, there is a need for validated search strategies automated search, in specific sources (e.g., Jorgensen
for SLRs that optimise retrieval of relevant studies from and Shepperd (2007)).
digital libraries and electronic databases for researchers
and practitioners. This paper attempts to serve as a In manual (hand) search, investigators scan the sources
preliminary response to this need. We have devised a (e.g., journals or proceedings) paper by paper and
systematic and practical approach for search strategy issue by issue. This search method may ensure the
development in order to improve the rigor of search pro- capture of relevant studies in the specified sources, but
cesses in SLRs. This approach also strives to balance in the meantime, consumes much effort in examining
the retrieval of validated set of relevant studies in SE and many irrelevant studies. Instead, automated search uses
the effort consumed in this phase. search strings, which represent the identifiers of the
subject, to retrieve results from search engines (digital
This paper is structured as follows. Section 2 introduces libraries). Compared to manual search, this method
concepts related to search strategies for SLRs. In is more efficient, but its performance depends on the
Section 3, we describe a systematic and practical quality of search strings, capability of search engine, and
approach for implementing a relatively rigorous literature diversity of the subject.
search. This search approach is then demonstrated by a
‘replicated’ search (an observer-participant case study) Where to search? ‘Search source’ was used as a
and compared to its original SLR in Section 4. Finally, general term for where relevant studies can be retrieved.
some discussion and our conclusion are presented in We use ‘search source’ distinct from ‘search engine’
Section 5. in defining search strategies. As automated search
always retrieves results from search engine, in contrast,
the former is dedicated for the sources specified in
2. SEARCH STRATEGY IN SYSTEMATIC citations (e.g., journals and proceedings) in this paper,
LITERATURE REVIEWS they are specified and scanned in manual search. As
illustrated in Figure 1, generally speaking, there is a
2.1. Defining Search Strategy many-to-many relationship between them: one engine
can cover multiple sources, while one source may also
A necessary and crucial step of SLR is the identification be retrievable from more than one engine.
of as much relevant literature to research questions as
possible. Search strategy, which defines the methods What into search? Subject and article type, which are
to retrieve the relevant literature, has been developed normally defined in protocol, are two important filters
in many ways, but the typical approach can be for to remove irrelevant studies and low quality studies.
information professionals (in subject matter) to use their For SLRs in SE, the most used subjects are ‘computer
combined knowledge of databases (digital libraries), science’ and ‘software engineering’. Search strings,
search techniques, thesauri and the field of interest, to which are connected with logic operators, are inputs
explore, often iteratively, combinations of terms which to search engines in automated search. This paper

2
On Searching Relevant Studies in Software Engineering

Figure 1: Search sources and engines


Figure 2: Search sensitivity, precision, and gold standard

proposes a systematic search approach that improves


search string development and evaluation. results (studies). Then the sensitivity and precision
corresponding to the search strings and engine can be
When and what time span to search? Time span of calculated as:
the studies in search is determined by the purposes of N umber of relevant studies retrieved
an intended SLR and its focused research questions. For Sensitivity =
T otal number of relevant studies
100% (1)

example, trend analysis for a given period, or synthesis


N umber of relevant studies retrieved
of collection of full evidence for answering a specified P recision =
N umber of articles retrieved
100% (2)
question. As it normally takes at least months from the
initial search to the appearance of an SLR for public Gold standard. The ‘gold standard’ represents, as
access, the search date(s) should be addressed in the accurately as possible, the known set of identified
report as well, i.e. when the search was conducted? primary studies in a collection according to the definition
of research questions in an SLR. Gold standard normally
2.2. Evaluating Search Strategy
plays two distinct roles in the evaluation framework.
Subjective vs. objective evaluation. The performance For SLRs, it is assumed to be truth in appraising the
of a search strategy can be evaluated by examining sensitivity of a search strategy; it is also a source of
the answers to the above search design questions and training samples for refining search strings White et al.
the results retrieved from the search process in which (2001). In practice, it may be appropriate to bifurcate the
the strategy applies. Roughly speaking, the evaluation is gold standard for these two purposes.
implemented in subjective and/or objective forms.
A highly sensitive search strategy will retrieve most of
In subjective evaluation, some external experts review the studies in gold standard, but may also retrieve many
the predefined search strategy as a part in an SLR unwanted articles (Figure 2). A highly precise search
protocol before the stage of conducting the review. strategy will retrieve only a small portion of irrelevant
After the automated search, some pre-indicated studies articles, but may miss a large number of papers in
(based on expert’s awareness of domain knowledge) are gold standard. A perfect search strategy would be 100%
compared to the search results. However, the reliability sensitive as well as 100% precise, capturing exactly the
of subjective evaluation highly relies on their personal gold standard without any irrelevant ones.
knowledge in the specific domain, which is difficult to be
Gold standard has been used for improving literature
quantified. Apart from the subjective approach, objective
search in systematic reviews in other disciplines,
evaluation employs a set of quantitative criteria to assess
such as in medical and clinical research and social
performance of a search strategy.
science Dickersin et al. (1994) and White et al. (2001).
Sensitivity vs. precision. Two important criteria Nevertheless, as the retrieval of a real gold standard
borrowed from medicine can be used for evaluating the is impossible for most systematic reviews, this paper
quality and efficiency of a search strategy. Sensitivity for instead introduces the concept of ‘quasi-gold standard’
a given topic is defined as the proportion of relevant that is a set of known studies from related literature
studies retrieved for that topic and precision is the sources identified to the research topic.
proportion of retrieved articles that are relevant studies.
2.3. State of the Practice
Figure 2 shows different search strategies within search
universe and the relation with gold standard. Since the introduction of EBSE and SLR, the number of
SLRs in SE has been growing rapidly. This subsection
In automated search, given search strings, the selected briefly summarizes the state-of-the-practice of search
search engine (library) retrieves a certain amount of strategies in EBSE from the above aspects.

3
Zhang • Ali Babar

2.3.1. Automated search vs. manual search Dieste and Padua (2007) investigated the optimal search
To investigate the realistic implementation of search strategies using the combination of alternative search
strategies in EBSE, we conducted a search of SLRs strings for automated search in SLR. Nevertheless,
published in SE, which extends the SLR search reported the ‘gold standard’ used to calculate sensitivity was
in Kitchenham et al. (2009) with the updated records by established from the studies already identified in another
the end of 2008. This up-to-date SLR search identified SLR by Sjoberg et al. (2005). In most cases of SLR,
38 SLRs. The search results consists of 68% (26 out of such a ‘gold standard’ is impossible to be accessed by
38) reported studies using automated searches in their researchers in the planning stage of their intended SLRs.
SLRs; 39% (15 out of 38) using manual search; and In other words, a ‘gold standard’ in this case provides
26% (10 out of 38) combining the both. Several SLRs no help to search strategy evaluation, and to ensure the
did not report the search method they used, or were retrieval quality of relevant studies in SLRs.
conducted based on the studies identified by other SLRs,
such as Hannay et al. (2007). So far, to the authors’ knowledge, neither comprehensive
definition and rigorous development method of search
2.3.2. Search engines and search sources strategy nor practical evaluation approach has been
Table 1-a summarizes 11 engines (digital libraries) used developed for retrieving relevant studies in SE.
more than once in SLRs for searching relevant studies
in SE, which are ranked in order of their frequencies.
Among them, IEEE Xplore and ACM Digital Library are 3. QGS BASED SCIENTIFIC SEARCH APPROACH
the main search portals for most SLRs in SE. Table 1-
Based on the concept of Quasi-Gold Standard (QGS),
b lists top sources for manual search used twice or
this section constructs a systematic, scientific, and
more in SLRs. The sources related to SE in general
also practical literature search approach for SE, which
(e.g., IEEE Software, TSE, ICSE) and empirical software
provides capability for search strategy development and
engineering (e.g., ESEM, ISESE) were most used in
evaluation.
manual search in the previous SLRs.

Table 1: Search engines and sources 3.1. Mechanism and Overview

To avoid the possible limitations of applying single search


Rank Search engine # of SLRs % of SLRs
1 IEEE Xplore 24 92% method (automated or manual) in SLR and to provide
2 ACM digital library 21 81% a practical and relatively rigorous method for search
3 ScienceDirect 15 58%
4 ISI Web of Science 10 38% string evaluation, we propose a systematic literature
5 EI Compendex 9 35% search approach, as complement to SLR guidelines, in
6 SpringerLink 8 31%
6 Wiley InterScience 8 31% support of retrieval of relevant studies. It recommends
6 Inspec 8 31% that an optimum search strategy should be an effective
9 Google Scholar 6 23%
10 SCOPUS 2 8% integration of manual and automated searches, which
10 Kluwer 2 8% support each other.
(a) search engines used more than once
3.1.1. QGS: quasi-gold standard
Rank Search source # of SLRs % of SLRs
1 IEEE Software 4 27% In terms of our observation (that is confirmed with
1 ESEM 4 27% the results from the case study), most reported SLRs
1 ISESE 4 27%
4 TSE 3 20% in SE developed their search strategies subjectively.
4 ICSE 3 20% Even for the well-conducted SLRs, search strategies
4 JSS 3 20%
4 IEEE Computer 3 20% were developed by teams with expertise and tested
8 Metrics 2 13% on collections of ‘well-known’ samples to assess the
8 TOSEM 2 13%
8 ESE 2 13% search performance. Unfortunately, such preset ‘well-
8 WWW 2 13% known’ samples cannot replace the gold standard for
8 ICSM 2 13%
8 MISQ 2 13% evaluation, as a full set of primary studies is impossible
(b) search sources used more than once to be accessed prior to the execution of an SLR.

2.4. Related Work in Software Engineering Instead, we introduce the concept of ‘quasi-gold
standard’, which is a set of known studies from key
Some previous researchers have discussed the issues sources, e.g., domain-specific proceedings and journals
related to literature search in software engineering. recognized by the community in the subject, for a given
Brereton et al. (2007) identified several issues of time span. Note that compared to a gold standard,
electronic search derived from their experience in there are two more constraints associated with a
conducting SLRs. For instance, researchers must select ‘quasi-gold standard’: venues (where) and period (what
and justify a search strategy that is appropriate for time). In other words, a ‘quasi-gold standard’ can be
their research questions; primary studies could not be regarded as a ‘gold standard’ in the conditions where
retrieved from single source, etc. these constraints apply. Accordingly, a more objective

4
On Searching Relevant Studies in Software Engineering

method for devising and testing search strategies is


developed and integrated into a systematic search
process, which may rely on an analysis of information
from the available records (QGS) rather than subjective
input from searchers’ perceptions (like some SLRs did).
On the other hand, for the subjective approach of search
string design, QGS can also be used for evaluating the
search strategy (see Section 4).

Figure 3 shows the mechanism underpinning the


proposed search approach. The results (studies) from
manual search are used for establishing a QGS, which
can further elicit the search strings for automated
search, or later evaluate the search strategy. In the
opposite direction, automated search complements
manual search, expands the coverage and capture of
most relevant studies in a relatively rigorous form.

Figure 4: Proposed scientific search process

through word frequency or content analysis tools. These


search strings are inputs to automated search, and
results will be combined with the QGS once they are
assessed as ‘acceptable’ in evaluation.

3.2. The Search Process

3.2.1. Step 1: Identify related sources and engines


The literature search process starts at the identification
of the sources (venues) of relevant publications. In SE,
many digital libraries are available for automated search,
and even more sources for manual search.

Select sources for manual search. Research ques-


Figure 3: Mechanism underpinning the approach tions for an SLR are motivated by the research in a
particular subject matter (domain) in SE. For an expe-
3.1.2. Approach overview rienced and knowledgeable researcher working in this
Figure 4 presents an overview of the proposed search area, the related domain-specific sources can be iden-
approach, which starts with identifying sources for tified without much difficulty. These sources consist of a
manual search and engines (libraries and databases) collection of proceedings of the conferences specialized
for automated search. The QGS is established by in that domain and major journals where the community
performing manual search in the selected sources, and often publishes their research.
the identified studies are then grouped by their residing
As manual search is time-consuming, a large number of
libraries and databases.
selected sources may lag behind the overall progress of
The design of search string can be in a subjective or SLR. In order to improve the efficiency of manual search,
objective form. In subjective approach, the search strings as well as to secure the quality of QGS, the nominated
are argued by researchers according to their knowledge sources for manual search also need to be evaluated by
in the subject (like many previous SLRs), then tested by independent experts in this domain, and any emerging
the ‘quasi-gold standard’. The objective method elicits disagreements must be resolved before the next step.
search strings automatically from articles in the QGS

5
Zhang • Ali Babar

Select library engines for automated search. The Figure 4 displays there might be backward link from
selection depends on the distribution of related sources the ‘decision’ to Step 3. In this case, the set of search
across libraries, the coverage and overlapping among terms has to be refined or enriched in order to capture
them, and their accessibility to the searchers. Whereas, more samples included in QGS through next round of
by observing the most reported SLRs, IEEE Xplore and automated search.
ACM Digital Library become the must-have literature
portals that are recommended for consideration of any Objective search string elicitation. One of the uses
automated search of future SLRs in SE. of QGS is to elicit the recommended search strings
using text mining. In the objective approach, a frequency
Given the many-to-many relationship between search analysis of citation information of the studies in QGS is
sources and engines (Figure 1), an optimum combina- undertaken followed by a statistical analysis of the most
tion of both should cover a maximum number of sources frequently occurring words or phrases. This analysis
with a minimum set of search engines (libraries), in other determines which terms would best distinguish relevant
words, eliminate as much overlapping as possible. studies from irrelevant ones.

3.2.2. Step 2: Establish QGS Some textual analysis packages, such as SimStat
The manual search is conducted by screening all and WordStat Provalia (2009), are able to facilitate
articles, one by one, published in the selected sources the identification of the frequently occurring terms in
(e.g., proceedings and journals) and during a given particular items of studies. For instance, the title-
period. The title-abstract-keywords fields of a paper are abstract-keywords of the papers in QGS are imported
first checked. The inclusion and exclusion criteria should into the analysis software for frequency analysis. This
be explicitly defined in advance. As recommended in may produce all the words or phrases being ranked
the guidelines Kitchenham and Charters (2007), the according to the number of records in which each word
reliability of inclusion decision should be assessed using appears by case. This technique is able to identify
the Kappa statistic between researchers, or reviewed the candidate search terms with exception of some
by an external panel. If selection decision could not be stop words which are deliberately excluded White et al.
made, the other fields (like conclusion or even full text) (2001) (e.g., ‘the’ and ‘of’).
need to be further examined.
Note that although the statistical software for textual
One important assumption underlying the manual analysis can help the search string elicitation, especially
search processes in the previous SLRs is that all relevant for a large scale QGS, subjective judgement might also
studies within the indicated sources could be identified be needed to finally construct the string to automated
by carefully screening all the articles. Hence, once the search based on the frequency list generated through
screening is completed and agreement on the selection the computer aided analysis.
is reached, all these identified studies are used to form
the QGS. 3.2.4. Step 4: Conduct automated search
This step uses the strings for automated search, which
As quasi-gold standard is source- (engine) and period- are (subjectively) defined or (objectively) elicited. As
specific, the sources selected in Step 1 can also be the search syntax varies between search engines, the
grouped by search engines. For a large scale SLR, in search strings need to be coded correspondingly in
addition to an overall QGS, this step may produce more advance by following the specific syntax and criteria
than one subset of QGS, each of which corresponds of each search engine (library). Given the capability
to one dedicated search engine. They enable testing limitations of some search engines (for example
search string’s performance for individual engine. ACM Dyba et al. (2007)), the automated search
sometimes has to be implemented by splitting the
3.2.3. Step 3: Define or elicit search strings combination of search terms into multiple simple ones.
Since search strings for automated search can be Note that due to the overlapping (such as between IEEE
defined based on subjective expertise or elicited from and ACM), the duplicate studies retrieved from different
the ‘quasi-gold standard’, the search process bifurcates search engines also need to be identified and removed
at this step. in this step.
Subjective search string definition. Most previously 3.2.5. Step 5: Evaluate search performance
reported SLRs in SE performed automated search in If the search strings for automated search are defined in
a subjective form. The reviewers defined their search the subjective approach, the search results need to be
strings based on their domain knowledge and past evaluated for securing the quality of automated search.
experiences. Though the strings they choose can be
evaluated later by QGS, in the subjective approach, it Calculate ‘quasi-sensitivity’. In EBSE, missing impor-
would be inspected by experts in the subject to reduce tant studies from an SLR may lead to the generation
the number of possible iterations and further save effort.

6
On Searching Relevant Studies in Software Engineering

of inaccurate evidence. Accordingly, compared to pre- 4. CASE STUDY


cision, sensitivity becomes the top criteria considered
when evaluating the search performance in most SLRs. This section investigates the proposed search approach
Unfortunately, as the gold standard for the subject is using a participant-observer case study (defined by Yin
unknown, the corresponding sensitivity cannot be cal- (2003)), in which the literature search of a published SLR
culated (Equation 1) at this stage. Whereas, our search is performed and compared.
approach uses the quasi-gold standard (from the manu-
4.1. The Original SLR
ally selected sources) to measure sensitivity instead of
the search universe (Figure 2). In order to avoid any subjective bias during the search
and screening process, the original SLR should be
Researchers calculate the number of relevant studies carefully selected as the reference. Some criteria were
retrieved from the selected sources (Step 1) through applied:
automated search (Step 4). Obviously, this number must
not be greater than the number of studies identified 1. Relevant studies can be identified with minimum
in Step 2. Divided by the pool size of QGS, the possible ambiguity. That minimizes the subjective
corresponding ‘quasi-sensitivity’ can be calculated. bias due to knowledge difference between the
researchers in the original and the replicated
Evaluate performance. The quasi-sensitivity could searches.
be 100% or less. It needs to be compared against a
rational threshold to finally determine if the performance 2. The articles in the original SLR must be explicitly
of automated search is acceptable. Although sensitivity constrained in definite time frame. Some SLRs with
and precision are the important criteria for evaluating search end date open ‘to present’ are excluded
search strategies and a tradeoff is always being pursued here.
between them in search strategies, a high sensitivity is
3. The publication that reports SLR must include
usually more desired than a high precision in terms of
the list of identified studies, which may enable
the goals of SLRs.
a detailed comparison with the results from the
Table 2 displays the search strategy scales used replicated search.
for evaluating search terms in Dieste and Padua
In terms of the above criteria, The SLR by Kitchenham
(2007), which was inferred from the sensitivity and
et al. (2009) that summarizes and reports the impact of
precision ranges of SLRs in medicine. Based on the
SLRs in software engineering is selected as reference in
scales, we suggest a threshold between 75% and
the case study. This SLR performed a manual search
85% as a reference for sensitivity evaluation of search
in 13 sources with explicit time span from Jan 2004
performance.
through mid of 2007. As an SLR is a type of secondary
Table 2: Search strategy scales study, their work can be regarded as a tertiary study.
It retrieved 34 relevant studies, among which 20 SLRs
Strategy Sensitivity Precision Comments
were identified as secondary studies.
High recall 85-90% 7-15% max sensitivity despite low precision
4.2. Search Implementation
High 40-58% 25-60% max precision rate despite low recall
precision
Optimum 80-99% 20-25% maximize both sensitivity & preci- 4.2.1. Identification of search sources and engines
sion At manual search stage, we chose the sources
Acceptable 72-80% 15-25% fair sensitivity & precision
(journals and proceedings) related to empirical software
engineering (ESE) and EBSE. By carefully considering
For example, if we choose 80% as the threshold for the sources available in SE community, 9 of them were
search string evaluation, then selected by the authors for this study (Table 3). Note that
( the selected sources for manual search in this paper
≥ 80%, then, move forward...,
quasi-sensitivity (3) are different from the original SLR somehow for two
< 80%, then, go back to Step 3.
reasons: (1) though the replicated search strategy is
If the search performance is considered acceptable designed for the same research questions, the authors
(quasi-sensitivity ≥ 80%), the results from the automated may have slightly different recognition of the ‘related’
search can be merged with the ‘quasi-gold standard’, sources from the original researchers; (2) the purpose
and the search process terminates. Otherwise, the of the manual search in this case study is to establish
process has to go back to Step 3 for search string the quasi-gold standards, rather than to strive to capture
refinement, which may form an iterative improvement as many relevant results as possible. Therefore, some
of search strings until the performance becomes originally used sources were ignored at manual search
acceptable. stage, and two additional sources, EASE and ESEM,
were added into the list in terms of their tight linkage to
EBSE.

7
Zhang • Ali Babar

The nominated sources were grouped into 5 libraries Table 4: Results from automated search
(Table 3), 4 of which were selected for the automated
search, i.e. IEEEXplore, ACM Digital Library, ScienceDi- Search engine #Results #In quasi-gold #Identified
Initial search
rect and SpringerLink. Note that other libraries can be IEEE Xplore 146 5 12
employed for automated search, but the QGS is only ACM digital library 34 1 6
DirectScience 31 6 6
valid for evaluating the search through them. SpringerLink 42 1 6
Overall 253 13 30
Refined search
4.2.2. QGS and automated search IEEE Xplore 270 8 15
In this case study, the searched articles should be ‘sys- ACM digital library 160 1 6
DirectScience 82 7 7
tematic reviews in software engineering’. Accordingly, SpringerLink 145 1 6
we refined the inclusion and exclusion criteria reported Overall 657 17 34
in the original SLR Kitchenham et al. (2009). Two re-
searchers screened all papers published in the sources
from 2004 to 2008 in manual search independently until terms (e.g., ‘survey’). So we refined the string as
reached joint agreements on all included studies. In total, (software AND (systematic OR controlled OR
21 studies were retrieved and 20 of them were used for structured OR exhaustive OR comparative) AND
building the QGS. Table 3 shows the source names and (review OR survey OR ‘‘literature search’’)),
their numbers of relevant studies (by 2007 and 2008). then performed the automated search again.

Table 3: Selected sources for manual search The revised automated search is able to capture
17 studies included in the quasi-gold standard,
Source Library/publisher/engine 2007 mid 2008 end which increases the ‘quasi-sensitivity’ up to 85% (i.e.
TSE IEEE 4 4
IEEE-SW IEEE 1 1 acceptable). By combining the studies from manual
ESEM(’07,’08) IEEE/ACM 0 2 search, the proposed search approach finally retrieves
ISESE(’04-’06) IEEE/ACM 2 2
Metrics(’04,’05) IEEE 0 0 38 SLRs for the tertiary study.
IST Elsevier 2 7
JSS Elsevier 2 2 4.3. Performance Comparison
EMSE Springer 0 2
EASE(’06-’08) IEE/BCS 0 1
Total 11 21 Although the similar inclusion and exclusion criteria
are employed in both the original and this replicated
The case study implemented automated search by searches, we exclude several ‘relevant’ studies that were
following the subjective definition approach, in which selected in the original SLRs during the manual search
the search strings are nominated based on the and selection due to the deviation caused by how strictly
authors’ knowledge relating to the subject of EBSE, the inclusion/exclusion criteria were followed.
and their observation of the studies included in the
Because of the disagreement between the original and
QGS. As we were looking for SLRs in SE, We
the replicated searches, we cannot directly compare
intuitively initiated the automated search with the
the numbers of identified studies from them given the
string (software AND systematic AND review) into
page limit. Instead, we focus on the comparison of
the fields of title-abstract-keywords through the above
performance between the implementations of different
engines. The search strings then were coded to fit the
search strategies. Table 5 shows the study numbers
syntax requirements and capability of each engine.
retrieved by following different strategies for the same
4.2.3. Evaluation and refinement research questions. The row headed with ‘manual only’
Table 4 summarizes the number of studies retrieved indicates how many studies can be identified if manually
by each database with the initial and refined search searching the sources given in Kitchenham et al. (2009)
strings. For example, there are 12 studies retrieved from 2004 till 2008. Two more SLRs could be found
by IEEE Xplore, 5 in the QGS. In total, 13 studies in when screening their specified sources (more than our
QGS were retrieved in the initial automated search. In sources in manual search). The ‘automated only’ row
terms of the sample size of QGS, the ‘quasi-sensitivity’ shows the search performance by search engines but
was calculated to be 65%(13/20), which is unacceptable without refinement; the bottom row presents the results
compared to the threshold (80%). As defined in Step 5, through the QGS based systematic search approach.
the search process had to go back to improve the string.
Table 5: Comparison among 3 strategies
By carefully checking the studies included in
Method SLRs identified Quasi-sensitivity
QGS but ignored in the initial automated search, Manual only 22 n/a
we found most of them published in the early Automated only (initial) 30 65%
Systematic 38 85%
years in the period (2004-2008) when the method
‘systematic review’ was just introduced to SE. Their
authors claimed the review studies using other

8
On Searching Relevant Studies in Software Engineering

5. DISCUSSION 6. CONCLUSION

The limitations of applying automated or manual search Systematic literature reviews have become an important
alone are illustrated in the case study. Manual search empirical research methodology in software engineering,
is difficult to scan a large number of sources within a and more and more SLRs are being conducted and
limited effort; on the other hand, the performance of reported. In SLR, an effective and rigorous literature
automated search highly relies on the quality of search search takes a critical role in evidence aggregation.
string, which may need continuous refinement in most In order to enhance the rigor and comprehension of
cases. Although some previous SLRs employed both methodology, with reference to the experience of SLRs
methods, most of them simply merged the search results in other disciplines (e.g., medicine and sociology), this
only. In contrast, the QGS based systematic search paper proposes a systematic search approach based
approach not only combines their results together, but on the concept of quasi-gold standard for retrieving and
establishes linkage between them for supporting each identifying relevant studies in software engineering. The
other with their own advantages. This approach also major contributions can be concluded as
suggests quantitative measurement for when you can
stop the iterative refinement of automated search, and • Provide a clear scope of search strategy and its
captures considerable identified studies with reasonable evaluation in searching relevant studies in SE.
effort.
• Introduce the concepts of ‘quasi-gold standard’
Some secondary studies related to a research topic and ‘quasi-sensitivity’ for developing and evaluat-
(subject matter), which have been screened and filtered ing the search strategy for a given SLR.
already by external researchers, could be introduced • Propose a systematic, scientific, and rigorous ap-
into quasi-gold standard to further reduce the effort proach for practical search strategy development,
in manual search. For instance, some previous SLRs implementation and evaluation.
directly used studies identified by Sjoberg et al. (2005)
as their full set of primary studies. As another example, Although the QGS based literature search approach is
the results from the mapping study by Jorgensen and proposed for improving the search processes in SLRs
Shepperd (2007) can be used to build QGS for more and EBSE, it can be used in other literature reviews in
specific SLRs in software cost estimation. In such cases, SE, and benefit the researchers and practitioners who
the results may need to be tailored in terms of subject intend to retrieve a relatively comprehensive collection
and time that conform to the new SLR. of relevant studies (for the subject and time given) within
reasonable effort.
As an alternative to search engine based search
strategy, reference list based search strategy can be Currently this approach is being effectively applied
another option for retrieving relevant studies. This in some systematic reviews in SE. We will continue
strategy was innovated with the concepts of co-citation the evaluation and improvement of this approach by
and bibliographic coupling Skoglund and Runeson conducting more case studies (with the objective and
(2009). However, as most of the major digital libraries in subjective search string elicitation methods) on varying
SE are not designed for supporting this kind of search, topics in software engineering. In addition, the future
it is very time-consuming in manually retrieving studies methodological work in ESE and EBSE community may
from reference list. Thus this search approach is not yet include to identify other issues and limitations of the
practical enough at present in software engineering, but SLRs reported in software engineering, and further to
is suggested as a supplementary source for a full SLR suggest practical improvements to the guidelines of
by Kitchenham and Charters (2007). systematic literature reviews.

As ‘sensitivity’ is the top priority in defining search


strategies in most SLRs, another criteria ‘precison’ is ACKNOWLEDGEMENTS
less discussed here due to the page limit. It is however
important to measure the productivity of search process. This work was supported, in part, by Science Foundation
Ireland grant 03/CE2/I303 1 to Lero - the Irish Software
As automated search mostly consults the fields Engineering Research Centre (www.lero.ie).
of title-abstract-keywords, the search performance is
also related to the quality and structure of these
7. REFERENCES
fields. An indicative title/abstract will increase search
sensitivity. Budgen et al. (2008) investigated the possible Boynton, J. and Glanville, J. and McDaid, D. and
influence of the quality of abstract to SLRs by Lefebvre, C. (1998) Identifying Systematic Reviews in
experiments, and suggested structured abstract for MEDLINE: Developing An Objective Approach to Search
improving understanding and study identification, which Strategy Design, Journal of Information Science, 24(3),
may further improve the search accuracy. 137-154.

9
Zhang • Ali Babar

Brereton, Pearl and Kitchenham, Barbara A. and Proceedings of 26th International Conference on
Budgen, David and Turner, Mark and Khalil, Mohamed, Software Engineering (ICSE’04), Edinburgh, Scotland,
(2007) Lessons from Applying the Systematic Literature May, pp. 273-284. IEEE Computer Society.
Review Process within the Software Engineering SimStat v.2.5 and WordStat v.5.1, (2009) Provalia
Domain, Journal of Systems and Software, 80(1), 571- Research, http://www.provalisresearch.com/.
583.
Sjoberg, Dag I.K. and Hannay, Jo E. and Hansen,
Biolchini, Jorge and Mian, Paula Gomes and Natali, Ove and Kampenes, Vigdis By and Karahasanovic,
Ana Candida Cruz and Travassos, Guilherme Horta, Amela and Liborg, Nils-Kristian and Rekdal, Anette
(2005) Systematic Review in Software Engineering, C. (2005) A Survey of Controlled Experiments in
Universidade Federal do Rio de Janeiro. Software Engineering, IEEE Transactions on Software
Dyba, T. and Dingsoyr, T. and Hanssen, Geir K. Engineering, 31(9), 733-753.
(2007) Applying Systematic Reviews to Diverse Study Skoglund, Mats and Runeson, Per (2009) Reference-
Types: An Experience Report. In Proceedings of based search strategies in systematic reviews. Pro-
1st International Symposium on Empirical Software ceedings of 13th International Conference on Evaluation
Engineering and Measurement (ESEM’07), Madrid, and Assessment in Software Engineering (EASE’09),
Spain, September, pp. 225-234. IEEE Computer Society. Durham, England, April. BCS.
Dyba, T. and Kitchenham, Barbara and Jorgensen, White, V.J. and Glanville, J.M. and Lefebvre, C. and
M. (2005) Evidence-Based Software Engineering for Sheldon, T.A. (2001) A Statistical Approach to Designing
Practitioners, IEEE Software, 22(1), 158-165. Search Filters to Find Systematic Reviews: Objectivity
Dieste, Oscar and Padua, Anna Griman. (2007) Enhances Accuracy, Journal of Information Science,
Developing Search Strategies for Detecting Relevant 27(6), 357-370.
Experiments for Systematic Reviews. In Proceedings Robert K. Yin (2003) Case Study Research: Design and
of 1st International Symposium on Empirical Software Methods (3rd edn). Sage Publication.
Engineering and Measurement (ESEM’07), Madrid,
Spain, September, pp. 215-224. IEEE Computer Society.
Dickersin, K. and Scherer, R. and Lefebvre, C. (1994)
Systematic Reviews: Identifying Relevant Studies for
Systematic Reviews, British Medical Journal, 309(6964),
1286-1291.
Hannay, Jo E. and Sjoberg, Dag I.K. and Dyba,
Tore (2007) A Systematic Review of Theory Use in
Software Engineering Experiments, IEEE Transactions
on Software Engineering, 33(2), 87-107.
Jorgensen, Magne and Shepperd, Martin (2007) A
Systematic Review of Software Development Cost
Estimation Studies, IEEE Transactions on Software
Engineering, 33(1), 33-53.
Kitchenham, Barbara and Brereton, O. Pearl and Bud-
gen, David and Turner, Mark and Bailey, John and
Linkman, Stephen, (2009) Systematic Literature Re-
views in Software Engineering: A Systematic Literature
Review, Information and Software Technology, 51(1), 7-
15.
Budgen, David and Kitchenham, Barbara A. and Char-
ters, Stuart M. and Turner, Mark and Brereton, Pearl and
Linkman, Stephen G. (2008) Presenting software engi-
neering results using structured abstracts: A randomised
experiment, Empirical Software Engineering, 13(4), 435-
468.
Kitchenham, Barbara and Charters, Stuart (2007)
Guidelines for Performing Systematic Literature Reviews
in Software Engineering (version 2.3), Keele University
and University of Durham.
Kitchenham, Barbara and Dyba, T. and Jorgensen,
M. (2004) Evidence-Based Software Engineering.

10

You might also like