Contemporary Empirical Methods In Software Engineering 1st Ed Michael Felderer pdf download
Contemporary Empirical Methods In Software Engineering 1st Ed Michael Felderer pdf download
https://ebookbell.com/product/contemporary-empirical-methods-in-
software-engineering-1st-ed-michael-felderer-22417778
https://ebookbell.com/product/contemporary-empirical-political-theory-
reprint-2020-kristen-renwick-monroe-editor-51815914
https://ebookbell.com/product/industrial-organization-contemporary-
theory-and-empirical-applications-5th-edition-lynne-pepall-6826654
https://ebookbell.com/product/the-protection-of-general-interests-in-
contemporary-international-law-a-theoretical-and-empirical-inquiry-
isn9780192846501-massimo-iovane-editor-36128252
https://ebookbell.com/product/the-meaning-of-citizenship-in-
contemporary-chinese-society-an-empirical-study-through-western-
lens-1st-edition-sicong-chen-auth-6793288
From Psychology To Phenomenology Franz Brentanos Psychology From An
Empirical Standpoint And Contemporary Philosophy Of Mind Biagio G
Tassone Auth
https://ebookbell.com/product/from-psychology-to-phenomenology-franz-
brentanos-psychology-from-an-empirical-standpoint-and-contemporary-
philosophy-of-mind-biagio-g-tassone-auth-5373942
https://ebookbell.com/product/compliance-with-planning-standards-
related-to-the-setbacks-around-domestic-buildings-empirical-evidence-
from-kenya-2nd-edition-dr-wilfred-ochieng-omollo-22982660
https://ebookbell.com/product/agricultural-transformation-in-africa-
contemporary-issues-empirics-and-policies-gbadebo-o-a-odularu-48813788
https://ebookbell.com/product/contemporary-science-education-and-
challenges-in-the-present-society-perspectives-in-physics-teaching-
and-learning-maurcio-pietrocola-iv-gurgel-cristina-leite-44942648
https://ebookbell.com/product/contemporary-skull-base-surgery-a-
comprehensive-guide-to-functional-preservation-a-samy-youssef-
editor-44944112
Michael Felderer
Guilherme Horta Travassos Editors
Contemporary
Empirical Methods
in Software
Engineering
Contemporary Empirical Methods in Software
Engineering
Michael Felderer • Guilherme Horta Travassos
Editors
Contemporary
Empirical Methods in
Software Engineering
Editors
Michael Felderer Guilherme Horta Travassos
Department of Computer Science Department of Systems Engineering and
University of Innsbruck Computer Science, COPPE
Innsbruck, Austria Federal University of Rio de Janeiro
Rio de Janeiro, Brazil
This Springer imprint is published by the registered company Springer Nature Switzerland AG.
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Foreword
v
vi Foreword
References
ix
x Contents
M. Felderer ()
Department of Computer Science, University of Innsbruck, Innsbruck, Austria
Department of Software Engineering, Blekinge Institute of Technology, Karlskrona, Sweden
e-mail: [email protected]
G. H. Travassos
Department of Systems Engineering and Computer Science, COPPE, Federal University of Rio
de Janeiro, Rio de Janeiro, Brazil
e-mail: [email protected]
1 Introduction
The term software engineering originated in the early 1960s (Hey et al. 2014).
During the NATO Software Engineering Conferences held in 1968 and 1969,
participants made explicit that engineering software requires dedicated approaches
that are separate from those for the underlying hardware systems. Until that
“software crisis,” software-related research mostly focused on theoretical aspects,
e.g., algorithms and data structures used to write software systems, or practical
aspects, e.g., an efficient compilation of software for particular hardware sys-
tems (Guéhéneuc and Khomh 2019). Since then, these topics are investigated in
computer science, which pertains to understanding and proposing theories and
methods related to the efficient computation of algorithms, and differs from software
engineering (research), which has become a very dynamic discipline on its own
since its foundation in the 1960s. IEEE (1990, 2010) defines software engineering
(SE) as: (1) The application of a systematic, disciplined, quantifiable approach to
the development, operation, and maintenance of software, that is, the application
of engineering to software, and (2) The study of approaches as in (1). Software
engineering also differs from other engineering disciplines due to the immaterial
nature of software not obeying physical laws and the importance of human factors
as software is written by people for people. Software engineering is fundamentally
an empirical discipline, where knowledge is gained applying direct and indirect
observation or experience. Approaches to software development, operation, and
maintenance must be investigated by empirical means to be understood, evaluated,
and deployed in proper contexts. Empirical methods like experimentation are
therefore essential in software engineering to gain scientific evidence on software
development, operation, and maintenance, but also to support practitioners in their
decision-making and learning (Travassos et al. 2008). The application of empirical
methods makes software engineering more objective and less imprecise, facilitating
the transfer of software technologies to the industry (Shull et al. 2001). Software
engineers learn by observing, exploring, and experimenting. The level of learning
depends on the degree of observation or intervention (Thomke 2003) promoted by
the experiences and studies performed.
Traditionally, empirical software engineering (ESE) is the area of research that
emphasizes the use of empirical methods in the field of software engineering.
According to Harrison and Basili (1996), “Empirical software engineering is the
study of software-related artifacts for the characterization, understanding, eval-
uation, prediction, control, management, or improvement through qualitative or
quantitative analysis. The quantitative studies may range from controlled experi-
mentation to case studies. Qualitative studies should be well-defined and rigorous.”
The role and importance of the different types of empirical methods in software
engineering have evolved since the foundation of software engineering. In this
chapter, we discuss the evolution of empirical methods in software engineering and
especially also take key venues and books into account as they reflect that evolution.
The Evolution of Empirical Methods in Software Engineering 3
Empirical methods rely on the collected data. Data collection methods may involve
qualitative or quantitative data. Some widely used qualitative data collection
methods in software engineering are interviews and participant observation (Seaman
1999). Some commonly used quantitative data collection methods are archival data,
surveys, experiments, and simulation (Wohlin et al. 2012). Once data are collected,
the researcher needs to analyze the data by using qualitative analysis methods,
e.g., grounded theory, thematic analysis, or hermeneutics, and quantitative analysis
methods, e.g., statistical analysis and mathematical modeling approaches.
In general, there are three widely recognized research processes called quanti-
tative research, qualitative research, and semiquantitative research. An alternative
option is the combination of both qualitative and quantitative research, denoted as
The Evolution of Empirical Methods in Software Engineering 5
mixed research (Creswell and Creswell 2018). The distinction between qualitative
and quantitative research comes not only from the type of data collected, but also
the objectives, types of research questions posed, analysis methods, and the degree
of flexibility built into the research design as well (Wohlin and Aurum 2015).
Qualitative research aims to understand the reason (i.e., “why”) and mechanisms
(i.e., “how”) explaining a phenomenon. A popular method of qualitative research
is case study research, which examines a set of selected samples in detail to
understand the phenomenon illustrated by the samples. For instance, a qualitative
study can be conducted to understand the impediments of automating system
tests. Quantitative research is a data-driven approach used to gain insights about
an observable phenomenon. Data collected from observations are analyzed using
mathematical and statistical models to derive quantitative relationships between
different variables capturing different aspects of the phenomenon under study. A
popular method of quantitative research are controlled experiments to examine
cause–effect relationships between different variables characterizing a phenomenon
in a controlled environment. For instance, different review techniques could be
compared via a controlled experiment. Mixed research collects quantitative and
qualitative data. It is a particular form of multi-method research, which combines
different research methods to answer some hypotheses, and is often used in empir-
ical software engineering due to the lack of theories in software engineering with
which we interpret quantitative data and due to the need to discuss qualitatively the
impact of the human factor on any experiments in software engineering (Guéhéneuc
and Khomh 2019). Semiquantitative research deals with approximate measurements
to data rather than exact measurements (Bertin 1978). It looks for understanding the
behavior of a system based on causal relations between the variables describing
the system. Semiquantitative models allow one to express what is known without
making inappropriate assumptions, simulating ranges of behavior rather than values
of point (Widman 1989). It has many applications in both the natural and social
sciences. Semiquantitative research supports cases where direct measurements are
not possible, but where it is possible to estimate an approximated behavior. In other
words, this type of study is applied in scenarios where the numerical values in the
mathematical relations governing the changes of a system are not known. In this
context, the direction of change is known, but not the size of its effect (Ogborn and
Miller 1994). Simulation-based studies in software engineering can benefit from
using semiquantitative research (Araújo et al. 2012).
The three major and well-established empirical methods in software engineering
are: survey, case study, and experiment (Wohlin et al. 2012). Primary studies using
such methods can be performed in vivo, in vitro, in virtuo, and in silico (Travas-
sos and Barros 2003). In vivo studies involve participants and projects in their
natural environments and contexts. Such studies are usually executed in software
development organizations throughout the software development process under real
working conditions. In vitro studies are performed in controlled environments, such
as laboratories or controlled communities, under configured working conditions. In
virtuo studies have the subjects interacting with a computerized model of reality.
The behavior of the environment with which subjects interact is described as a
6 M. Felderer and G. H. Travassos
to identify clusters of studies (that could form the basis of a fuller review with more
synthesis) and gaps indicating the need for more primary studies.
The scientific or industrial significance of empirical studies depends on their
validity, i.e., the degree to which one can trust the outcomes of an empirical
study (Kitchenham et al. 2015). Validity is usually assessed in terms of four
commonly encountered forms of threats to validity: internal, external, construct,
and conclusion validity (Shadish et al. 2002). Internal validity refers to inferences
that the observed relationship between treatment and outcome reflects a cause–
effect relationship. External validity refers to whether a cause–effect relationship
holds over other conditions, including persons, settings, treatment variables, and
measurement variables. Construct validity refers to how concepts are operational-
ized as experimental measures. Conclusion validity refers to inferences about the
relationship between treatment and outcome variables.
The accomplishment of empirical studies relies on performing well-defined and
evolutionary activities. The classical empirical study process consists of five phases:
definition, planning, operation, analysis, and interpretation, as well as reporting and
packaging (Juristo and Moreno 2001; Malhotra 2016). The definition phase makes
the investigated problem and overall objectives of the study explicit. The planning
phase covers the study design and includes the definition of research questions and
hypotheses as well as the definition of data collection, data analysis, and validity
procedures. In the operation phase, the study is actually conducted. In the analysis
and interpretation phase, the collected data is analyzed, assessed, and discussed.
Finally, in the reporting and packaging phase, the results of the study are reported
(e.g., in a journal article, a conference paper, or a technical report) and suitably
packaged to provide study material and data. The latter has become more critical
recently due to the open science movement (see chapter “Open Science in Software
Engineering”).
In the early years of software engineering, empirical studies were rare, and the
only research model commonly in use was the analytical method, where different
formal theories were advocated devoid of any empirical evaluation (Glass 1994).
According to a systematic literature review of empirical studies performed by
Zendler (2001), Grant and Sackman (1967) published the first empirical study in
software engineering in 1967. The authors conducted an experiment that compared
the performance of two groups of developers, one working with online access to
a computer through a terminal and the other with offline access in batch mode.
Another empirical study published early in the history of software engineering was
an article by Knuth (1971), in which the author studied a set of Fortran programs
to understand what developers do in Fortran programs. Akiyama (1971) describes
the first known “size law” (Bird et al. 2015), stating that the number of defects is a
function of the number of lines of code. The authors in these and other early studies
defined the goal of the study, the questions to research, and the measures to answer
these questions in an ad hoc fashion (Guéhéneuc and Khomh 2019). However, they
were pioneers in the application of empirical methods in software engineering.
In the second iteration, already more empirical studies, mainly in vitro experiments,
were conducted. Prominent examples are experiments on structured program-
ming (Lucas et al. 1976), flowcharting (Shneiderman et al. 1977), and software
testing (Myers 1978). The second iteration is characterized by first attempts to pro-
vide a systematic methodology to define empirical studies in software engineering in
general and experiments in particular. These attempts culminated in the definition of
the Goal/Question/Metrics (GQM) approach by Basili and Weiss (1984). The GQM
approach helped practitioners and researchers to define measurement programs
based on goals related to products, processes, and resources that can be achieved
by answering questions that characterize the objects of measurement using metrics.
The Evolution of Empirical Methods in Software Engineering 9
In the third iteration, not only experiments but also surveys (for instance,
by Burkhard and Jenster (1989) on the application of computer-aided software
engineering tools) and case studies (for instance, by Curtis et al. (1988) on the
software design process for large systems) were performed to some extent. Also,
the explicit discussion of threats to validity appeared in that iteration. One of the first
studies explicitly discussing its threats to validity was an article by Swanson and
Beath (1988) on the use of case study data in software management research. From
the late 1980s, researchers also started to analyze software data using algorithms
taken from artificial intelligence research (Bird et al. 2015). For instance, decision
trees and neural networks were applied to predict error-proneness (Porter and
Selby 1990), to estimate software effort (Srinivasan and Fisher 1995) and to model
reliability growth (Tian 1995).
In the third iteration, empirical studies began to attract the attention of several
research groups all over the world, who realized the importance of providing empir-
ical evidence about the developed and investigated software products and processes.
The experiences shared by NASA/SEL and the participation of several researchers
in conducting experiments together with NASA/SEL helped to strengthen the use
of different experimental strategies and the application of surveys.
The interest in the application of the scientific method by different researchers,
the identification of the need to evolve the experimentation process through sharing
of experimental knowledge among peers as well as the transfer of knowledge to
industry, among other reasons, led to the establishment of the International Software
Engineering Research Network (ISERN) in 1992. ISERN held its first annual
meeting in Japan in 1993 sponsored by the Graduate School of Information Science
at the Nara Institute of Science and Technology.
10 M. Felderer and G. H. Travassos
The need to share the ever increasing number of studies and their results and the
growing number of researchers applying empirical methods in software engineering
lead to the foundation of suitable forums. In 1993 the IEEE International Software
Metrics Symposium, in 1996, the Empirical Software Engineering International
Journal, and in 1997, the Empirical Assessments in Software Engineering (EASE)
event at Keele University were founded.
By the end of this iteration, several institutes dedicated to empirical software
engineering were established. In 1996, the Fraunhofer Institute for Experimental
Software Engineering (IESE) associated with the University of Kaiserslautern
(Germany) was established. In 1998, the Fraunhofer Center for Experimental
Software Engineering (CESE) associated with the University of Maryland, College
Park (USA) began operations. Also, other institutions and laboratories, such as
National ICT Australia as well as the Simula Research Laboratory and SINTEF
(both located in Norway), among others, started to promote empirical studies in
software engineering in the industry.
Finally, by the end of the 1990s, the publication of methodological papers on
empirical methods in software engineering started. Zelkowitz and Wallace (1998)
provided an overview of experimental techniques to validate new technologies,
Seaman (1999) provided guidelines for qualitative data collection and analysis, and
Basili et al. (1999) discussed families of experiments.
Since 2000 research methodology has received considerable attention, and therefore
the publication of methodological papers further increased. For instance, Höst et al.
(2000) discuss the usage of students as subjects in experiments, Shull et al. (2001)
describe a methodology to introduce software processes based on experimentation,
Pfleeger and Kitchenham (2001) provide guidelines on surveys in software engi-
neering, Lethbridge et al. (2005) provide a classification of data collection methods,
Kitchenham and Charters (2007) provide guidelines for performing systematic
literature reviews in software engineering, Shull et al. (2008) discuss the role
of replication in empirical software engineering, and Runeson and Höst (2009)
provide guidelines for case study research. In connection to the increased interest
in research methodology, also the first books on empirical research methods in
software engineering with a focus on experimentation written by Wohlin et al.
(2000) and Juristo and Moreno (2001) appeared around 2000 (see Sect. 5 for
a comprehensive overview of books on empirical software engineering). Also,
combining research methods and performing multi-method research became more
popular in the period. One of the first papers following a multi-method research
methodology was published by Espinosa et al. (2002) on shared mental models,
familiarity, and coordination in distributed software teams.
With the growing number of empirical studies, knowledge aggregation based
on these primary studies became more crucial to understand software engineering
The Evolution of Empirical Methods in Software Engineering 11
The empirical evidence gathered through analyzing the data collected from the
software repositories is considered to be an important support for the (empirical)
software engineering community these days. There are even venues that focus on
analysis of software data such as Mining Software Repositories (MSR), which was
organized for the first time in 2004 in Edinburgh (UK) and Predictive Models and
Data Analytics in Software Engineering (PROMISE), which was organized for the
first time in 2005 in St. Louis (USA).
In general, the growing interest in empirical software engineering in that period
resulted in projects such as the Experimental Software Engineering Research
Network (ESERNET) in Europe from 2001 to 2003 and the foundation of several
venues. In 2007, the first ACM/IEEE International Symposium on Empirical
Software Engineering and Measurement (ESEM) was held in Madrid (Spain).
ESEM is the result of the merger between the ACM/IEEE International Symposium
on Empirical Software Engineering, which ran from 2002 to 2006, and the IEEE
International Software Metrics Symposium, which ran from 1993 to 2005. In 2003,
Experimental Software Engineering Latin American Workshop (ESELAW) was
organized for the first time. Also, in 2003, the International Advanced School
on Empirical Software Engineering (IASESE) performed its first set of classes in
Rome (Italy). In 2006, the International Doctoral Symposium on Empirical Software
Engineering (IDoESE) was founded. Today, the ISERN annual meeting, IASESE,
IDoESE, and ESEM form the Empirical Software Engineering International Week
(ESEIW), which is held annually.
Fig. 1 International Advanced School on Empirical Software Engineering (IASESE) timeline and topics from 2003 to 2019
M. Felderer and G. H. Travassos
The Evolution of Empirical Methods in Software Engineering 15
Since the first empirical studies in the 1960s, the field of empirical software
engineering has considerably matured in several iterations. However, the empirical
methods resulting from the five iterations presented in the previous section are
not the end of the story, and as in any scientific discipline, research methods
develop further. The chapters of this book discuss contemporary empirical methods
that impact the current evolution of empirical software engineering and form the
backbone of its next iteration. For sure, the description of the current situation
and future trends is never complete and always subjective to some extent. But
we think that the chapters covered in this book show several interesting trends
in contemporary empirical methods in software engineering, which we want to
summarize here.
The evolution of empirical software engineering leads to the continuous adoption
of empirical methods from other fields and the refinement of existing empirical
methods in software engineering. The resulting plurality of research methods
requires guidance in knowledge-seeking and solution-seeking (i.e., design science)
research. The chapter “Guidelines for Conducting Software Engineering Research”
presents guidelines for conducting software engineering research based on the
ABC framework, where ABC represents the three desirable aspects of research
generalizability over actors (A), precise control of behavior (B), and realism of
context (C). Each empirical method has its strengths and weaknesses. It is beneficial
to utilize a mix of methods depending on the research goal or even to combine
methods. Case survey research combines case study and survey research, which rely
primarily on qualitative and quantitative data, respectively. The chapter “Guidelines
for Case Survey Research in Software Engineering” provides an overview of the
case survey method. While being an important and often used empirical method,
survey research has been less discussed on a methodological level than other types
of empirical methods. The chapter “Challenges in Survey Research” discusses
methodological issues in survey research for software engineering concerning
theory building, sampling, invitation and follow-up, statistical analysis, qualitative
analysis, and assessment of psychological constructs. Although software engineer-
ing is an engineering discipline, the design science paradigm has been explicitly
adapted to software engineering relatively late by Wieringa (2014b), and the full
potential of the design science paradigm has not been exploited so far in software
engineering. The chapter “The Design Science Paradigm as a Frame for Empirical
Software Engineering” uses the design science paradigm as a frame for empirical
software engineering and uses it to support the assessment of research contributions,
industry-academia communication, and theoretical knowledge building.
It is generally acknowledged that software development is a human-intensive
activity as software is built by humans for humans. However, traditionally SE
research has focused on artifacts and processes without explicitly taking human
factors in general and the developer perspective in particular into account. If the
perspective on how developers work was considered, then it was mostly measured
16 M. Felderer and G. H. Travassos
Since 2000 research methodology has received considerable attention in the soft-
ware engineering research community. Therefore, plenty of literature is available
on empirical research methodology in software engineering. Molléri et al. (2019)
identified in a recent systematic mapping study 341 methodological papers on
empirical research in software engineering—and therefore, a complete overview
would exceed the scope of this book chapter. However, following the style of this
book chapter, we provide an overview of the available English text and special
issue books explicitly dedicated to empirical research methodology in software
engineering in chronological order of their publication.
Wohlin et al. (2000) published a book entitled “Experimentation in Software
Engineering,” which provides an overview of the core empirical strategies in
software engineering, i.e., surveys, experimentation, and case studies and as its
18 M. Felderer and G. H. Travassos
main content all steps in the experimentation process, i.e., scoping, planning,
operation, analysis and interpretation as well as presentation and package. The
book is complemented by exercises and examples, e.g., an experiment comparing
different programming languages. Consequently, the book targets students, teachers,
researchers, and practitioners in software engineering. In 2012 a revision of this
popular book had been published with Springer (Wohlin et al. 2012).
Juristo and Moreno (2001) published a book entitled “Basics of Software Engi-
neering Experimentation,” which presents the basics of designing and analyzing
experiments both to software engineering researchers and practitioners based on
SE examples like comparing the effectiveness of defect detection techniques. The
book presents the underlying statistical methods, including the computation of test
statistics in detail.
Endres and Rombach (2003) published “A Handbook of Software and Systems
Engineering. Empirical Observations, Laws, and Theories.” The book presents rules,
laws, and their underlying theories from all phases of the software development
lifecycle. The book provides the reader with clear statements of software and system
engineering laws and their applicability as well as related empirical evidence. The
consideration of empirical evidence distinguishes the book from other available
handbooks and textbooks on software engineering.
Juristo and Moreno (2003) edited “Lecture Notes on Empirical Software Engi-
neering,” which aims to spread the idea of the importance of empirical knowledge
in software development from a highly practical viewpoint. It defines the body of
empirically validated knowledge in software development to advise practitioners on
what methods or techniques have been empirically analyzed and what the results
were. Furthermore, it promotes “empirical tests,” which have traditionally been
carried out by universities or research centers, for application in industry to validate
software development technologies used in practice.
Shull et al. (2007) published the “Guide to Advanced Empirical Software Engi-
neering.” It is an edited book written by experts in empirical software engineering. It
covers advanced research methods and techniques, practical foundations, as well as
knowledge creation, approaches. The book at hand provides a continuation of that
seminal book covering recent developments in empirical software engineering.
Runeson et al. (2012) published a book entitled “Case Study Research in
Software Engineering: Guidelines and Examples,” which covers guidelines for
all steps of case study research, i.e., design, data collection, data analysis and
interpretation, as well as reporting and dissemination. The book is complemented
with examples from extreme programming, project management, quality monitoring
as well as requirements engineering and additionally also provides checklists.
Wieringa (2014b) published a book entitled “Design Science Methodology for
Information Systems and Software Engineering,” which provides guidelines for
practicing design science in software engineering research. A design process usually
iterates over two activities, i.e., first designing an artifact that improves something
for stakeholders, and subsequently empirically validating the performance of that
artifact in its context. This “validation in context” is a key feature of the book.
The Evolution of Empirical Methods in Software Engineering 19
Menzies et al. (2014) published a book entitled “Sharing Data and Models in
Software Engineering.” The central theme of the book is how to share what has been
learned by data science from software projects. The book is driven by the PROMISE
(Predictive Models and Data Analytics in Software Engineering) community. It is
the first book dedicated to data science in software and mining software repositories.
Closely related to this book, Bird et al. (2015) published a book entitled “The Art
and Science of Analyzing Software Data,” which is driven by the MSR (Mining
Software Repositories) community and focuses mainly on data analysis based on
statistics and machine learning. Another related book published by Menzies et al.
(2016) covers perspectives on data science for software engineering by various
authors.
Kitchenham et al. (2015) published a book entitled “Evidence-based Software
Engineering and Systematic Reviews,” which provides practical guidance on how
to conduct secondary studies in software engineering. The book also discusses the
nature of evidence and explains the types of primary studies that provide inputs to a
secondary study.
Malhotra (2016) published a book entitled “Empirical Research in Software
Engineering: Concepts, Analysis, and Applications,” which shows how to imple-
ment empirical research processes, procedures, and practices in software engineer-
ing. The book covers many accompanying exercises and examples. The author
especially also discusses the process of developing predictive models, such as defect
prediction and change prediction, on data collected from source code repositories,
and, more generally the application of machine learning techniques in empirical
software engineering.
ben Othmane et al. (2017) published a book entitled “Empirical Research
for Software Security: Foundations and Experience,” which discusses empirical
methods with a special focus on software security.
Staron (2019) published a book entitled “Action Research in Software Engineer-
ing: Theory and Applications,” which offers a comprehensive discussion on the use
of action research as an instrument to evolve software technologies and promote
synergy between researchers and practitioners.
In addition to these textbooks, there are also edited books available that are
related to special events in empirical software engineering and cover valuable
methodological contributions.
Rombach et al. (1993) edited proceedings from a Dagstuhl seminar in 1992
on empirical software engineering entitled “Experimental Software Engineering
Issues: Critical Assessment and Future Directions.” The goal was to discuss the state
of the art of empirical software engineering by assessing past accomplishments,
raising open questions, and proposing a future research agenda at that time.
However, many contributions of that book are still relevant today.
Conradi and Wang (2003) edited a book entitled “Empirical Methods and Studies
in Software Engineering: Experiences from ESERNET,” which covers experiences
from the Experimental Software Engineering Research NETwork (ESERNET), a
thematic network funded by the European Union between 2001 and 2003.
20 M. Felderer and G. H. Travassos
6 Conclusion
Acknowledgements We thank all the authors and reviewers of this book on contemporary
empirical methods in software engineering for their valuable contribution.
References
Akiyama F (1971) An example of software system debugging. In: IFIP congress (1), vol 71. North-
Holland, Amsterdam, pp 353–359
Araújo MAP, Monteiro VF, Travassos GH (2012) Towards a model to support studies of software
evolution. In: Proceedings of the ACM-IEEE international symposium on empirical software
engineering and measurement (ESEM ’12). ACM, New York, pp 281–290
Arcuri A, Briand L (2014) A hitchhiker’s guide to statistical tests for assessing randomized
algorithms in software engineering. Softw Test Verification Reliab 24(3):219–250
Baros MO, Werner CML, Travassos GH (2004) Supporting risks in software project management.
J Syst Softw 70(1):21–35
Basili VR (1993) The experimental paradigm in software engineering. In: Experimental software
engineering issues: critical assessment and future directions. Springer, Berlin, pp 1–12
The Evolution of Empirical Methods in Software Engineering 21
Basili VR, Weiss DM (1984) A methodology for collecting valid software engineering data. IEEE
Trans Softw Eng SE-10(6):728–738
Basili VR, Zelkowitz MV (2007) Empirical studies to build a science of computer science.
Commun Assoc Comput Mach 50(11):33–37
Basili VR, Caldiera G, Rombach HD (1994) Experience factory. Encycl Softw Eng 1:469–476
Basili VR, Shull F, Lanubile F (1999) Building knowledge through families of experiments. IEEE
Trans Softw Eng 25(4):456–473
Basili V, Rombach D, Schneider K, Kitchenham B, Pfahl D, Selby R (2007) Empirical software
engineering issues. In: Critical assessment and future directions: international workshop,
Dagstuhl Castle, June 26–30, 2006, Revised Papers, vol 4336. Springer, Berlin
ben Othmane L, Jaatun MG, Weippl E (2017) Empirical research for software security: foundations
and experience. CRC Press, Boca Raton
Bertin E (1978) Qualitative and semiquantitative analysis. Springer, Berlin, pp 435–457
Biolchini MPNATG J (2005) Systematic review in software engineering: relevance and utility.
Technical report
Bird C, Murphy B, Nagappan N, Zimmermann T (2011) Empirical software engineering at
Microsoft research. In: Proceedings of the ACM 2011 conference on computer supported
cooperative work. ACM, New York, pp 143–150
Bird C, Menzies T, Zimmermann T (2015) The art and science of analyzing software data. Elsevier,
Amsterdam
Bjarnason E, Smolander K, Engström E, Runeson P (2016) A theory of distances in software
engineering. Inf Softw Technol 70:204–219
Boehm B, Rombach HD, Zelkowitz MV (2005) Foundations of empirical software engineering:
the legacy of Victor R. Basili. Springer, Berlin
Briand L, Bianculli D, Nejati S, Pastore F, Sabetzadeh M (2017) The case for context-driven
software engineering research: generalizability is overrated. IEEE Softw 34(5):72–75
Burkhard DL, Jenster PV (1989) Applications of computer-aided software engineering tools:
survey of current and prospective users. ACM SIGMIS Database Database Adv Inf Syst
20(3):28–37
Conradi R, Wang AI (2003) Empirical methods and studies in software engineering: experiences
from ESERNET, vol 2765. Springer, Berlin
Creswell JW, Creswell JD (2018) Research design: qualitative, quantitative, and mixed methods
approaches. SAGE, Los Angeles
Curtis B, Krasner H, Iscoe N (1988) A field study of the software design process for large systems.
Commun Assoc Comput Mach 31(11):1268–1287
de Mello RM, Da Silva PC, Travassos GH (2015) Investigating probabilistic sampling approaches
for large-scale surveys in software engineering. J Softw Eng Res Dev 3(1):8
Endres A, Rombach HD (2003) A handbook of software and systems engineering: empirical
observations, laws, and theories. Pearson Education, Old Tappan
Espinosa A, Kraut R, Slaughter S, Lerch J, Herbsleb J, Mockus A (2002) Shared mental
models, familiarity, and coordination: a multi-method study of distributed software teams. In:
Proceedings of ICIS 2002, p 39
Farzat F, Barros MO, Travassos GH (2019) Evolving JavaScript code to reduce load time. IEEE
Trans Softw Eng
Fink A (2003) The survey handbook. SAGE, Los Angeles
Glass RL (1994) The software-research crisis. IEEE Softw 11(6):42–47
Grant EE, Sackman H (1967) An exploratory investigation of programmer performance under on-
line and off-line conditions. IEEE Trans Hum Factors Electron 1:33–48
Guéhéneuc YG, Khomh F (2019) Empirical software engineering. In: Cha S, Taylor RN, Kang KC
(eds) Handbook of software engineering. Springer, Berlin, pp 285–320
Harman M, McMinn P, De Souza JT, Yoo S (2010) Search based software engineering: techniques,
taxonomy, tutorial. In: Empirical software engineering and verification. Springer, Berlin,
pp 1–59
Harrison W, Basili VR (1996) Editorial. Empir Softw Eng 1:5–10
22 M. Felderer and G. H. Travassos
Hey AJ, Hey T, Pápay G (2014) The computing universe: a journey through a revolution.
Cambridge University Press, Cambridge
Höst M, Regnell B, Wohlin C (2000) Using students as subjects—a comparative study of students
and professionals in lead-time impact assessment. Empir Softw Eng 5(3):201–214
IEEE (1990) 610.12-19919—IEEE standard glossary of software engineering terminology. IEEE,
New York
IEEE (2010) ISO/IEC/IEEE 24765:2010 systems and software engineering—vocabulary. IEEE,
Geneva
Ivarsson M, Gorschek T (2011) A method for evaluating rigor and industrial relevance of
technology evaluations. Empir Softw Eng 16(3):365–395
Johnson P, Ekstedt M (2016) The Tarpit–a general theory of software engineering. Inf Softw
Technol 70:181–203
Juristo N, Moreno AM (2001) Basics of software engineering experimentation. Springer, Berlin
Juristo N, Moreno AM (2003) Lecture notes on empirical software engineering, vol 12. World
Scientific, New Jersey
Kitchenham B (2004) Procedures for performing systematic reviews. Technical report
Kitchenham B, Brereton P (2013) A systematic review of systematic review process research in
software engineering. Inf Softw Technol 55(12):2049–2075
Kitchenham B, Charters S (2007) Guidelines for performing systematic literature reviews in
software engineering. Technical report
Kitchenham BA, Dybå T, Jorgensen M (2004) Evidence-based software engineering. In: Proceed-
ings of the 26th international conference on software engineering. IEEE Computer Society,
Silver Spring, pp 273–281
Kitchenham BA, Budgen D, Brereton P (2015) Evidence-based software engineering and system-
atic reviews, vol 4. CRC Press, Boca Raton
Knuth DE (1971) An empirical study of Fortran programs. Softw Pract Exp 1(2):105–133
Lethbridge TC, Sim SE, Singer J (2005) Studying software engineers: data collection techniques
for software field studies. Empir Softw Eng 10(3):311–341
Lucas J, Henry C, Kaplan RB (1976) A structured programming experiment. Comput J 19(2):136–
138
Lyu MR, et al (1996) Handbook of software reliability engineering, vol 222. IEEE Computer
Society Press, Los Alamitos
Malhotra R (2016) Empirical research in software engineering: concepts, analysis, and applica-
tions. Chapman and Hall/CRC, London
McGarry F, Pajerski R, Page G, Waligora S, Basili V, Zelkowitz M (1994) Software process
improvement in the NASA software engineering laboratory. Technical report, CMU/SEI-
94-TR-022. Software Engineering Institute/Carnegie Mellon University, Pittsburgh. http://
resources.sei.cmu.edu/library/asset-view.cfm?AssetID=12241
Menzies T, Kocaguneli E, Turhan B, Minku L, Peters F (2014) Sharing data and models in software
engineering. Morgan Kaufmann, Amsterdam
Menzies T, Williams L, Zimmermann T (2016) Perspectives on data science for software
engineering. Morgan Kaufmann, Amsterdam
Molléri JS, Petersen K, Mendes E (2019) Cerse-catalog for empirical research in software
engineering: a systematic mapping study. Inf Softw Technol 105:117–149
Münch J, Schmid K (2013) Perspectives on the future of software engineering: essays in honor of
Dieter Rombach. Springer, Berlin
Myers GJ (1978) A controlled experiment in program testing and code walkthroughs/inspections.
Commun Assoc Comput Mach 21(9):760–768
Nagappan N, Ball T (2005) Use of relative code churn measures to predict system defect density.
In: Proceedings of the 27th international conference on software engineering. ACM, New York,
pp 284–292
Ogborn J, Miller R (1994) Computational issues in modelling. The Falmer Press, Basingstoke
Ostrand TJ, Weyuker EJ, Bell RM (2004) Where the bugs are. In: ACM SIGSOFT software
engineering notes, vol 29. ACM, New York, pp 86–96
The Evolution of Empirical Methods in Software Engineering 23
Thomke SH (2003) Experimentation matters: unlocking the potential of new technologies for
innovation. Harvard Business Press, Boston
Tian J (1995) Integrating time domain and input domain analyses of software reliability using
tree-based models. IEEE Trans Softw Eng 21(12):945–958
Travassos GH, Barros MO (2003) Contributions of in virtuo and in silico experiments for the
future of empirical studies in software engineering. In: 2nd workshop on empirical software
engineering the future of empirical studies in software engineering, pp 117–130
Travassos GH, dos Santos PSM, Mian PG, Neto ACD, Biolchini J (2008) An environment
to support large scale experimentation in software engineering. In: 13th IEEE international
conference on engineering of complex computer systems (ICECCS 2008). IEEE, Piscataway,
pp 193–202
Wagner S, Fernández DM, Felderer M, Vetrò A, Kalinowski M, Wieringa R, Pfahl D, Conte T,
Christiansson MT, Greer D, et al (2019) Status quo in requirements engineering: a theory and
a global family of surveys. ACM Trans Softw Eng Methodol 28(2):9
Widman L (1989) Expert system reasoning about dynamic systems by semi-quantitative simula-
tion. Comput Methods Prog Biomed Artif Intell Med 6(3):229–247
Wieringa R (2014a) Empirical research methods for technology validation: scaling up to practice.
J Syst Softw 95:19–31
Wieringa RJ (2014b) Design science methodology for information systems and software engineer-
ing. Springer, Berlin
Wohlin C, Aurum A (2015) Towards a decision-making structure for selecting a research design in
empirical software engineering. Empir Softw Eng 20(6):1427–1455
Wohlin C, Runeson P, Höst M, Ohlsson M, Regnell B, Wesslén A (2000) Experimentation in
software engineering: an introduction. Kluwer Academic Publishers, Norwell, MA
Wohlin C, Runeson P, Höst M, Ohlsson MC, Regnell B, Wesslén A (2012) Experimentation in
software engineering. Springer, Berlin
Zelkowitz MV, Wallace DR (1998) Experimental models for validating technology. Computer
31(5):23–31
Zendler A (2001) A preliminary software engineering theory as investigated by published
experiments. Empir Softw Eng 6(2):161–180
Zimmermann T, Zeller A, Weissgerber P, Diehl S (2005) Mining version histories to guide software
changes. IEEE Trans Softw Eng 31(6):429–445
Part I
Study Strategies
Guidelines for Conducting Software
Engineering Research
1 Introduction
useful reference works, there are several issues with the current state of literature on
methodology. First, there is a strong emphasis on a limited set of specific methods,
in particular experimentation, case studies, and survey studies. Although these are
the three most used empirical methods (Stol and Fitzgerald 2018), many other
methods exist that have not received the same level of attention. A second issue
is that the field has no agreement on an overall taxonomy of methods, which is
somewhat problematic as methods vary in terms of granularity and scope. This
makes a systematic comparison of methods very challenging. Furthermore, new
methods are being adopted from other fields. Grounded theory, for example, has
gained widespread adoption within the SE literature in the last 15 years or so
(Stol et al. 2016b). (We note that, like many other methods used in SE, grounded
theory is not a “new” method, as it was developed in the 1960s by social scientists
Glaser and Strauss—however, its application is relatively new to the SE domain.)
Other techniques and methods that are relatively new to the software engineering
field include the repertory grid technique (Edwards et al. 2009) and ethnography
(Sharp et al. 2016). With new methods and techniques being adopted regularly, it
becomes challenging to understand how these new methods compare to established
approaches. Further, numerous sources present a range of research methods, but
these presentations are limited to “shopping lists” of methods: definitions without
a systematic comparison. Rather than maintaining a list of definitions of research
methods, a more systematic approach is needed that allows us to reason and position
existing methods, and new methods as they emerge. Hence, in this chapter we
present a taxonomy of research strategies.
There is an additional challenge within the software engineering research
community. Different methods have varying strengths and drawbacks, but it is
quite common to see unreasonable critiques of studies due to the research methods
employed. For example, a common complaint in reviews of case studies is that they
do not allow statistical generalizability. Similarly, experiments are often critiqued
on the basis that they involved computer science students solving “toy” problems,
thus rendering them unrealistic, and therefore not worthy of publication. Not
unreasonably, researchers may wonder which method, then, is the silver bullet that
can address all of these limitations?
The answer is none.
Instead of discussing research methods, we raise the level of abstraction and
have adopted the term research strategy. A research strategy can be considered a
category of research methods that have similar trade-offs in terms of generalizability
and the level of obtrusiveness or control of the research context—we return to these
two dimensions in a later section in this chapter. Previously, we outlined what we
have termed the ABC framework of research strategies and demonstrated how this
taxonomy is suitable for software engineering research (Stol and Fitzgerald 2018).
In this chapter we draw on this earlier work, elaborate on how the ABC framework
is related to Design Science, and provide general guidance for researchers to select
appropriate research strategies.
The remainder of this chapter is organized as follows. Section 2 starts with a
discussion of research goals, dimensions, and settings. This section presents the
Guidelines for Conducting Software Engineering Research 29
2 Foundations
This section introduces a number of concepts that together form the foundation
for the ABC framework. We first introduce the two modes of research in software
engineering: knowledge-seeking and solution-seeking research. These are two dis-
tinct modes representing different types of activities. This chapter focuses primarily
on one mode, namely knowledge-seeking research, but contrasts it with solution-
seeking research. In so doing, we draw a link to Design Science. Much has been
written on Design Science, which is why we do not discuss it in this chapter. Instead,
we refer interested readers to chapter “The Design Science Paradigm as a Frame for
Empirical Software Engineering” of this book.
We then return our attention to knowledge-seeking research and introduce
two key dimensions that are present in all knowledge-seeking studies: the level
of obtrusiveness and generalizability. Each research strategy represents a unique
combination along these two dimensions. This section ends with a discussion
of research settings, which refers to the environment in which the research is
conducted.
Environment
Technology, Organizations, People
Observations Deployment of
Solutions
Fig. 1 Knowledge-seeking and solution-seeking research: positioning the ABC framework and
Design Science
Guidelines for Conducting Software Engineering Research 31
Design Science to chapter “The Design Science Paradigm as a Frame for Empirical
Software Engineering”.
Increasingly Increasingly
more more specific
universal contexts and
contexts and systems
systems
is: to what extent does a researcher “intrude” on the research setting, or simply make
observations in an unobtrusive way. Research methods can vary considerably in the
level of intrusion and resulting level of “control” over the research setting. Clearly, a
study that seeks to evaluate the efficiency or performance of a tool requires a careful
study set-up, whereas a case study that seeks to describe how agile methods are
tailored at one specific company does not (Fitzgerald et al. 2013).
The second dimension is the level of generalizability of research findings. This
is a recurring concern in software engineering research, in particular in the context
of case studies. Indeed, exploratory case studies, and other types of field studies, are
limited in that the researcher cannot draw any statistically generalizable conclusions
from such studies. However, generalization of findings is not the goal of such
studies—instead, exploratory case studies and other types of field studies aim to
develop an understanding rather than generalization of findings across different
settings. Exploratory case studies can be used to theorize and propose hypotheses
about other similar contexts.
It is worth noting that a broader view of generalizability beyond that of the
statistical sample-based one has also been identified (e.g., Yin 2014; Lee and
Baskerville 2003). Yin identifies Level 1 inference generalizability which has two
forms. The first is the widely known statistical generalizability from a representative
sample to a population. He also identifies another Level 1 inference, namely from
experimental subjects to experimental findings, which is also quite relevant to
our research strategies. However, Yin also suggests a further Level 2 inference
category of analytic generalizability which involves generalizing to theory. This
could involve generalizing from a sample to a population, or, indeed, generalizing
from field study findings or experimental findings.
Research takes place in different settings, that is, the environment or context within
a researcher conducts research. McGrath (1984) identified four different types of
settings to conduct research. Building on the two dimensions described above, these
settings are positioned as four quadrants at a 45◦ angle with the main axes that
represent the two dimensions described above (see Fig. 3).
The first type of settings is natural settings, represented as Quadrant I. Natural
settings are those that naturally occur in the “field” and that exist independently from
the researcher conducting research; that is, settings that are host to the phenomenon
that a researcher wishes to study. For example, the “field” for a study on software
process improvement is likely to be a software development organization, whereas
the “field” can also be the online communication channels when the topic of study is
a particular open source software development project (Mockus et al. 2000)—after
all, for open source developers, these online channels are the (virtual) place where
they communicate and do work. Natural settings are always specific and concrete,
rather than abstract and general; hence, the quadrant representing natural settings is
Guidelines for Conducting Software Engineering Research 33
Quadrant II
Contrived Settings
Natural Settings
Neutral Settings
Quadrant I
Quadrant IV
Increasingly Increasingly
more more specific
universal contexts and
contexts and systems
systems
Quadrant IV
Non-empirical Settings
positioned on the right-hand side of Fig. 3. Researchers may still exert some level
of control over a natural setting (Quadrant I, above the x axis) or may simply make
empirical observations without manipulating the research setting (below the x axis).
In contrast to natural settings, contrived settings (represented as Quadrant II) are
created by a researcher for the study. In a software engineering research context,
contrived settings include laboratories with specific and dedicated equipment to
conduct an experiment on some algorithm or software tool. Contrived settings are
characterized by a significant degree of control by the researcher. This manifests as
the set-up of specialized equipment and measurement instruments that facilitate the
execution of a study. Many experimental studies within software engineering are
conducted in such contrived settings, whereby algorithms and tools are evaluated
for performance and precision. Contrived settings are created by a researcher to
either mimic some specific or concrete class of systems (the right-hand side of
Quadrant II), or a more abstract and generic class of systems (the left-hand side
of Quadrant II). Either way, a contrived setting is always specifically set up by a
researcher, implying the researcher has a high degree of control over the research—
hence, Quadrant II is positioned at the upper half of the x axis. A contrived setting is
essential to conduct the research—without measurement instruments and other tools
34 K.-J. Stol and B. Fitzgerald
such as the design of scenarios or tasks for human participants, the study could not
be performed.
There are, however, also studies that do not rely on a specific setting. Some
types of studies can take place in any setting, and so the setting is neutral—this
is represented in Fig. 3 as Quadrant III. Researchers may or may not manipulate the
research setting; in any case, because the research setting is neutral and not specific
to any concrete or specific instance, Quadrant III is positioned at the left-hand side
of the x axis.
Finally, the fourth type of setting is non-empirical, represented by Quadrant IV
in Fig. 3. That is, this type of research does not lead to any empirical observations.
Within software engineering, non-empirical research includes the development of
conceptualizations or theoretical frameworks, and computer simulations. While
software engineering as a field of study has not traditionally been strongly focused
on the development of theory, several initiatives have emerged in recent years to
address this (Stol and Fitzgerald 2015; Ralph 2015; Wohlin et al. 2015). Quadrant IV
is positioned at the bottom of Fig. 3 because the researcher does not “intrude” on any
empirical setting. Non-empirical research is typically conducted at the researcher’s
desk or in his or her computer, through the development of symbolic models and
computer programs that mimic real settings.
Having laid the foundations for the ABC framework, we now populate the grid
in Fig. 3 with eight archetypal research strategies. The result is what we have
termed the ABC framework (see Fig. 4). Several of the research strategies will sound
familiar; for example, field study, laboratory experiment, and sample study, which
includes survey studies. Other terms such as experimental simulation may be less
known within the SE field. Section 3 presents each of these eight strategies in detail.
We now turn our attention to the last aspect of the framework, which are the markers
A, B, and C.
The term “ABC” seeks to convey the fact that knowledge-seeking research
generally involves actors (A) engaging in behavior (B) in a particular context (C).
Within software engineering, actors include software developers, users, managers,
and when seeking to generalize over a “population,” can also include non-human
artifacts such as software systems, tools, and prototypes. Behavior can relate to
that of software engineers, such as coordination, productivity, motivation, and
also system behavior (typically involving quality attributes such as reliability and
performance). Context can involve industrial settings within organizations, open
source communities, or even classroom or laboratory settings.
In the context of our discussion on obtrusiveness and generalizability above,
researchers will want to maximize the generalizability of the evidence across
actor populations (A), while also exercising precise measurement and control over
behavior (B) being studied, in as realistic a context (C) as possible. However,
Guidelines for Conducting Software Engineering Research 35
Maximum
More
potential for Quadrant II
obtrusive
precision of Contrived research settings
research
measurement of
B
Laboratory Experimental
Neutral research settings Experiments Simulations
Studies Experiments
Quadrant I
Sample Field
Studies Studies
C
Formal Computer Maximum potential for
Theory Simulations realism of Context
Maximum potential for
generalizability over Actors
A
Less Quadrant IV
obtrusive
Non-empirical research settings
research
Fig. 4 The ABC framework positions eight archetypal research strategies along two dimensions:
generalizability of findings and obtrusiveness of the research (adapted from McGrath 1984)
limitation which can be overcome when using this research strategy. Therefore,
research studies adopting that strategy should not be criticized on that basis.
In this section we outline the eight archetypal research strategies that are positioned
in the ABC framework in Fig. 4. We discuss the research strategies as organized by
the quadrants discussed in Sect. 2.3, starting in Quadrant I. Table 1 summarizes
the discussion for each strategy, including a metaphor that might help in better
understanding the nature and essence of the research strategy, how that setting
manifests in software engineering research, and general suggestions as to when to
use that strategy.
Field studies are conducted in natural settings; that is, settings that pre-exist the
design of the research study. Field studies are best suited for studying specific
instances of phenomena, events, people or teams, and systems. This type of
research helps researchers to understand “what is going on,” “how things work,”
and tends to lead to descriptive and exploratory insights. Such descriptions are
useful because they provide empirical evidence of phenomena that are relevant
to software engineering practitioners, students, and researchers. The findings may
provide the basis for hypotheses, which can then be further studied using other
strategies. Typical examples in software engineering research are the case studies of
the Apache web-server (Mockus et al. 2000) and agile method tailoring (Fitzgerald
et al. 2013).
Field studies are relatively unobtrusive with respect to the research setting. The
setting for field studies is akin to a jungle, a natural setting that contains unexplained
phenomena, unknown tribes, and secrets that the researcher seeks to discover and
understand (see Fig. 5). Within a software engineering context, the researcher does
not manipulate the research setting, but merely collects data to describe and develop
an understanding of a phenomenon, a specific system, or a specific development
team. This is why field studies are best suited to offer a high degree of realism of
context as the researcher studies a phenomenon within its natural setting, and not
one that the researcher manipulated.
Typical research methods include the descriptive or exploratory case study, and
ethnography (Sharp et al. 2016), but archival studies of legacy systems also fall
within the category of field studies, for example, Spinellis and Avgeriou’s study
of the evolution of Unix’s system architecture (Spinellis and Avgeriou 2019). An
alternative metaphor for such archival studies is an archaeological site rather than
a jungle. Data collection methods for field studies include (but are not limited to)
Table 1 Research strategies in software engineering
Strategy Metaphor Setting in SE When to use
Field study Jungle: a natural setting that is ideally left Software engineering phenomena in a natural To understand phenomena: How does it work?
untouched, where creatures and plants can be context, such as pair programming in industry, open how and why do project teams do what they do?
observed in the wild with a great level of detail source software projects, etc. what are characteristics of a phenomenon?
Maximum potential to capture a realistic context
Field Nature reserve: a natural setting that has some level Industry or open source software projects or teams To measure “effects” of some intervention in a
experiment of manipulation, e.g., fences, barriers, closed-off with some level of researcher intervention; natural setting, acknowledging lack of precision
sections, sections treated with some intervention interventions could include different workflows, due to confounding factors that cannot be
tools controlled for
Experimental Flight simulator: a contrived environment to let Realism varies from classroom to industry settings To evaluate/measure behavior of participants on a
simulation pilots train specifically programmed scenarios to designed by researchers with a specific set of tasks set of tasks in a setting that seeks to resemble a
evaluate their behavior and decisions. Realism or scenarios which recruited participants are asked real-world setting
varies depending on resources to process
Laboratory Cleanroom/test tube: highly controlled setting Classroom or research laboratory settings with a To make high-precision measurements, e.g., for
experiment allowing a researcher to make measurements with specific set-up and instrumentation to measure, e.g., comparing different algorithms and tools.
high degree of precision performance of algorithms or tools Maximum potential for precision of measurement
Judgment Courtroom: neutral setting to present Online or offline setting to solicit input from To get input (“judgment”) from experts on a
study evidence/exhibits to a carefully selected panel, carefully selected experts after presenting them withgiven topic, which requires intense
asking them for a response (e.g., guilty) a question on a topic or an exhibit (e.g., a new tool)
stimulus/response communication
Guidelines for Conducting Software Engineering Research
Sample study Referendum: a process to collect a sample of data to Online surveys conducted among a population of To answer generalizability questions, incl.
seek generalizability over the population. Unusable (typically) developers, or data collected from a characterization of a dataset, correlation studies.
(invalid) data must be filtered out before analysis software repository. Data must be checked before Maximum potential for generalizability over
analysis findings
Formal Jigsaw puzzle: attempt to make sense of, integrate, Given a set of related observations and evidence To provide a framework that can describe,
theory or fit in different pieces into a coherent “picture” regarding a topic of interest, aim to find common explain, or predict phenomena or events of
patterns and codify these as a theory interest, while remaining consistent
(generalizable) across different events within
some boundary
Computer Weather forecasting system: model of the real A computer program that simulates a real-world To develop an understanding of phenomena and
simulation world is programmed, capturing as many phenomenon, capturing as many important settings that are too complex or expensive to
parameters as possible. Scenarios are run to make parameters as possible. Program different scenarios create in the real world
informed predictions, but cannot anticipate events to “run” to explore ranges of, and interactions
not programmed in the simulation between, parameters of interest
37
38 K.-J. Stol and B. Fitzgerald
Fig. 5 Field studies are conducted in pre-existing settings that are not manipulated by a researcher,
to study and observe natural phenomena and actors. Image credits: Public domain. Source:
maxpixel.net (no date)
Field experiments are also conducted in natural settings, but unlike field studies,
this type of study involves some type of manipulation, thus imposing a greater
degree of control. That is, the researcher introduces some form of experimental
set-up by making changes to some variables of interest. After making such changes,
the researcher may observe some effect. If field studies are conducted in a “jungle,”
then the setting for a field experiment is more like a nature reserve: a dedicated
area that may be very similar to a jungle, but the researcher can introduce specific
changes to study different aspects within the reserve. Figure 6 shows a jungle with
specific patches of trees cut down; the purpose of this study was to study the effects
of different types of forest fragmentation on wind dynamics and seed dispersal
(Damschen et al. 2014).
It is important to remember that field experiments take place in natural settings,
which should be clearly distinguished from contrived settings that provide the
setting for experimental simulations and laboratory experiments, discussed later.
A range of methods are available to conduct field experiments. These are not
limited to the traditional controlled experiment that separates a population of actors
into two or more different groups so as to make comparisons. Experimentation also
occurs when a researcher adopts the action research method; with action research, a
researcher follows a recurring cycle of making changes and evaluating the results of
those changes. While not a randomized controlled experiment, action research can
still be considered a form of experimentation.
Ebert et al. (2001) conducted a field experiment to investigate three factors that
might impact the cost of rework in distributed software development: (1) the effect
270° 90°
Corridor
Unconnected
rectangle patch
Unconnected
winged patch
40 K.-J. Stol and B. Fitzgerald
of co-location on the efficiency and effectiveness of defect detection; (2) the effect
of coaching on software quality, and (3) the effect of changes to the development
process on teamwork, and continuous build on management of distributed project.
To evaluate these effects, Ebert et al. used project data that the company had
gathered for several years.
Despite careful measurement of a range of parameters, certain factors are hard
to measure, such as “culture.” Ebert et al. divided the projects into different sets,
e.g., “within one culture (i.e., Europe).” While we can certainly generalize that
there are common attributes across different European cultures, there is no single
Europe culture—each European culture is quite distinct, with significant changes
even between neighboring countries such as Ireland and the UK, or the Netherlands
and Germany. Furthermore, each of the projects in the data set used by Ebert et
al. will undoubtedly have had specific obstacles, such as particularly challenging
technology or changing requirements, and strengths such as particularly talented
staff. These factors are very hard, if not impossible to capture, reducing the precision
of measurement.
Fig. 7 Flight simulators are experimental simulations that facilitate training and study of the
behavior of pilots in pre-programmed scenarios. Image credits: SuperJet International, distributed
under CC BY-SA 2.0. Source: SuperJet International (2011)
more akin to a greenhouse, which mimics a warmer climate. The researcher is still
interested in natural processes (e.g., how do flora flourish), but the setting in which
that process is observed is artificially created for that purpose.
The greenhouse metaphor links well to the jungle and nature reserve metaphors,
but another useful metaphor for the experimental simulation is a flight simulator (see
Fig. 7). Within the flight simulator, specific events can be introduced, such as heavy
storms and rainfall. Pilots in training would be asked to perform as they would in a
real aircraft, but the research setting is considerably easier and cheaper to plan.
The level of realism that is achieved in experimental simulations can vary
considerably, just like flight simulators. The latter can vary from low-cost set-ups
consisting of a standard PC, a budget flight yoke, and rudder pedals to high-
end, full motion flight simulators used by professional pilots that might cost
millions of dollars. The tasks that participants are asked to perform in experimental
simulation may also vary in realism. While such tasks are part of normal daily life
of the participants in field experiments, in experimental simulations participants
are recruited and invited to perform certain tasks designed by the researcher. The
process that the researcher wishes to study is simulated, facilitating systematic
measurement and comparison. The level of realism of the task can be very high,
such as debugging a program within a professional setting (Jiang et al. 2017), or it
can be as contrived as producing and trading colored shapes, such as blue squares
and red circles (Bos et al. 2004).
42 K.-J. Stol and B. Fitzgerald
ebookbell.com