0% found this document useful (0 votes)

7 views30 pages

s11219-015-9292-4

Uploaded by

Он самый

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views30 pages

s11219-015-9292-4

Uploaded by

Он самый

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Software Qual J (2017) 25:343–372

DOI 10.1007/s11219-015-9292-4

Test-Driven Development in scientific software: a survey

Aziz Nanthaamornphong1 • Jeffrey C. Carver2

Published online: 21 September 2015

Springer Science+Business Media New York 2015

Abstract Scientific software developers are increasingly employing various software

engineering practices. Specifically, scientists are beginning to use Test-Driven Develop-
ment (TDD). Even with this increasing use of TDD, the effect of TDD on scientific
software development is not fully understood. To help scientific developers determine
whether TDD is appropriate for their scientific projects, we surveyed scientific developers
who use TDD to understand: (1) TDDs effectiveness, (2) the benefits and challenges of
using TDD, and (3) the use of refactoring practices (an important part of the TDD process).
Some key positive results include: (1) TDD helps scientific developers increase software
quality, in particular functionality and reliability; and (2) TDD helps scientific developers
reduce the number of problems in the early phase of projects. Conversely, some key
challenges include: (1) TDD may not be effective for all types of scientific projects; and (2)
Writing a good test is the most difficult task in TDD, particularly in a parallel computing
environment. To summarize, TDD generally has a positive effect on the quality of sci-
entific software, but it often requires a large effort investment. The results of this survey
indicate the need for additional empirical evaluation of the use of TDD for the develop-
ment of scientific software to help organizations make better decisions.

Keywords Software engineering Test-Driven Development Survey

& Jeffrey C. Carver

[email protected]
Aziz Nanthaamornphong
[email protected]
1
Department of Information and Communication Technology, Prince of Songkla University, Phuket
Campus, Phuket, Thailand
2
Department of Computer Science, University of Alabama, Tuscaloosa, AL, USA

123
344 Software Qual J (2017) 25:343–372

1 Introduction

Scientific software is software that seeks to support or advance scientific or engineering

research via computation, rather than via experiment. The development of scientific
software allows researchers to investigate questions that may be impossible to investigate
experimentally. The criticality of the problems addressed through scientific software (e.g.,
weather forecasting, global climate change, genomics, human health, and drug discovery)
make it an important part of the software ecosystem (Carver 2011). Among the key factors
that characterize scientific software development is the presence of highly dynamic
requirements. Due to the nature of the task, i.e., investigation of new scientific or engi-
neering principles, it is typically not possible for researchers to completely define all
requirements a priori. Furthermore, successful testing of scientific software is problematic
due to the lack of elicitation and specification of requirements in the early phases (Sletholt
et al. 2012). Therefore, scientific developers need to employ appropriate software engi-
neering (SE) practices to mitigate these challenges.
One such practice is Test-Driven Development (TDD), introduced as a part of eXtreme
Programming to minimize developer effort (Beck 2002). In TDD, developers first write
tests for individual units/functions, then write only enough code to make those tests pass,
then refactor (i.e., reorganize or cleanup) the code. We hypothesize that TDD can be useful
to support the development of scientific software for the following reasons:
1. Support for incremental development Rather than requiring all functionality to be
known at the beginning, TDD allows developers to work on one function at a
time (Erdogmus et al. 2005; Janzen and Saiedian 2005).
2. Dealing with unmanageable code Because TDD could quickly lead to unmanageable,
spaghetti code, the refactoring step allows developers to clean up as they proceed.
3. Increase confidence The quick iteration loop in TDD helps developers ensure that new
functionality does not break existing functionality (Beck and Andres 2004).
Several studies that investigated the impact of TDD on development of traditional, i.e.,
non-scientific, software have shown mixed results. For example, Desai et al. (2008) sur-
veyed academia to gather evidence of the usefulness of TDD and found that it improves
software quality and productivity. Conversely, a literature review by Causevic et al. (2011)
highlighted a number of challenges that limit its adoption in industry. In spite of the
potential benefits TDD could provide to the developers of scientific software, our previous
work identified some factors that could limit its adoption (Nanthaamornphong et al. 2014),
including:
1. difficulty writing test cases for a complete simulation;
2. refactoring requires considerable practice; and
3. lack of knowledge about how to refactor.
Therefore, there is a need for empirical evidence to better understand the effects of TDD on
the development of scientific software.
The objective of this work is to gather information from the scientific community about
the appropriateness of TDD for their projects. To address this objective, we conducted a
community survey targeting scientific developers who had experience with TDD. The goal
of the survey is to provide information to help scientific developers and teams decide
whether to adopt TDD in their own projects. Specifically, the survey sought to answer the
following questions:

123
Software Qual J (2017) 25:343–372 345

1. What are the effects of using TDD, from the developers’ perspective?
2. What are the difficulties of using TDD?
3. Which testing methods do developers use?
4. Which refactoring techniques do developers use?
Based on the results of the survey reported in the paper, the primary contributions of this
work are:
• The benefits of TDD reported by the survey respondents should encourage scientific
developers to consider adopting TDD to improve software quality.
• The problems reported by survey participants highlight the need for additional
empirical evaluations of adopting TDD in various contexts.
• The code improvement methods and refactoring techniques reported by the survey
respondents provide practical suggestions for other developers who wish to adopt TDD
in their projects.
• From a SE point of view, the results indicate that adopting SE practices like TDD in a
non-traditional environment is not always a positive experience.
The remaining of this paper is organized as follows. Section 2 provides the background
and concepts related to this work. Section 3 describes the methodology of our survey.
Section 4 presents the results of the survey. Section 5 discusses the results. Section 6
describes the study limitation. Section 7 concludes and discusses the future direction for
research.

2 Related work

Because there has been little work on the use of TDD for the development of scientific
software, other than our own, this section focuses on three related background topics: TDD
in a traditional SE context, characteristics of the scientific context that seem well-suited for
TDD, and the refactoring practice within TDD.

2.1 TDD in traditional SE

Our review of the literature found both positive and negative effects of using TDD in a
traditional software development environment. On the positive side, in their systematic
meta-analysis of 27 studies that investigated the impact of TDD on software development,
Rafique and Misic reported that TDD helped the development team improve the software
quality (Rafique and Misic 2013). Moreover, another study showed that TDD helped a
team at IBM increase the quality of their software products (Sanchez et al. 2007). These
results suggest that TDD can have a positive impact on software quality.
On the negative side, two systematic literature reviews regarding the adoption of TDD
in industry point out some challenges of using TDD. First, Causevic et al. (2011) identified
seven key factors that limit the adoption of TDD in traditional, i.e., non-scientific,
industrial environments:
1. increased development time;
2. insufficient TDD experience/knowledge;
3. insufficient design;
4. insufficient developer testing skills;
5. insufficient adherence to the TDD protocol;

123
346 Software Qual J (2017) 25:343–372

6. tool-specific limitations; and

7. legacy code.
Second, Kollanus (2010) found that there was not yet enough empirical evidence on the
effects of TDD on internal and external software quality.
These prior studies reveal some interesting concerns relative to the context of this
article: (1) most studies focus on the use of TDD in a traditional software development, and
(2) the effect of TDD might be different when used for scientific software development.
Therefore, there is a need to further investigate these effects to provide scientific devel-
opers with the information they need to make informed decisions.

2.2 How TDD fits the scientific context

There are two key requirements of scientific software development that match well with
the characteristics of TDD. First, rather than focusing significant effort on the up-front
requirements and design analysis, scientific developers repeatedly implement small
increments of functional code, taking into account any changes in the requirements or
context (Sletholt et al. 2011). TDD fits well into this type of environment because TDD
does not require developers to gather and document all requirements up-front. Second,
because simulations often need to be conducted quickly, many scientists work with limited
time and a defined schedule for each iteration (Sanders and Kelly 2008). Process-heavy
software development approaches conflict with this need for quick turn-around of results
for publications and program review deadlines (Carver et al. 2007). Lifecycle models like
the Waterfall model (Ruparelia 2010), are inappropriate for such a setting (Nan-
thaamornphong et al. 2013). Therefore, scientific software development must employ
flexible, lightweight processes like TDD.

2.3 Refactoring

One of the key aspects of TDD is refactoring (Opdyke 1992). The most widely cited
definition for refactoring is ‘‘a change made to the internal structure of software to make it
easier to understand and cheaper to modify without changing its observable behav-
ior’’ (Fowler 1999). In other words, refactoring improves the internals of the software
without changing the externals. There are a number of common reasons for performing
refactoring.
• To ease the addition of new code Developers can quickly add new code without
worrying about how well that code fits the overall structure of the system and then
clean up the code later through refactoring.
• To improve the design of existing code By continuously refactoring the design of code,
refactoring improves the quality of code. As a result, developers can easily extend the
maintained code.
• To help developers avoid defects Refactoring is a method to clean up code, which
minimizes the chances of introducing defects.
• To gain a better understanding of the code Unclear code is difficult for developers to
comprehend and must therefore be removed by refactoring.
Researchers have proposed a number of refactoring techniques for Fortran, a pro-
gramming language widely used in scientific software (Orchard and Rice 2013; Overbey
et al. 2005, 2009). The primary goal of these techniques is to improve system performance.

123
Software Qual J (2017) 25:343–372 347

Additionally, the automated refactoring tool Photran (Eclipse 2013) helps developers
perform refactoring. Photran is an IDE based on Eclipse that includes 39 refactoring
methods, such as replacing common block and block data subprograms with module
variables, removing computed goto, and requiring explicit interface blocks.
Therefore, refactoring is an important step in the TDD process. The needs of the
scientific software development context suggest that the use of refactoring could be quite
beneficial. The fact that there are a number existing tools is also encouraging when con-
sidering whether scientific software developers might choose to employ TDD.

3 Methodology

This section describes the methodology that we used to design, execute and analyze the
survey.

3.1 Survey design

To ensure we gathered responses only from scientific developers who had experience with
TDD, the first part of the survey contained a series of screening questions to assess the
respondent’s experience with TDD. Only those respondents with TDD experience were
presented with the TDD-specific questions, which asked them to assess the effectiveness of
employing TDD, including writing tests and refactoring. Additionally, we asked specific
questions about testing and refactoring activities (e.g., testing problems, refactoring
problems, refactoring techniques).
The survey contained questions of different formats: multiple choice, Yes/No, self-
assessment items (using a five-point scale), and open-ended questions. Some questions also
contained an ‘‘Other (specify)’’ option to allow the respondents to provide additional
information. To help ensure the quality and completeness of the survey, we employed an
external scientific expert to evaluate the questions and provide feedback, which we
incorporated into the final survey.
Appendix 1 provides a complete list of the survey questions. Table 1 maps those survey
questions to the research questions described in Sect. 1. Note that survey questions 1–9
were the screening questions and therefore do not map to a specific research question.

3.2 Survey population

We built a list of 300 unique email addresses for potential survey respondents that
included:

Table 1 Mapping between research questions and survey questions

Research questions Survey questions

RQ1. What are the effects of using TDD, from the developers’ perspective? 10, 11
RQ2. What are the difficulties of using TDD? 12, 14, 15, 19, 24, 25
RQ3. Which testing methods do developers use? 18, 20, 21, 22, 23
RQ4. Which refactoring techniques do developers use? 13, 16, 17

123
348 Software Qual J (2017) 25:343–372

• attendees of the Software Engineering for Science workshop series,1

• authors who had publications that either focused on scientific software development or
described the adoption of SE practices in scientific software development (134 email
addresses). To identify those papers, we used the following keywords to search the
IEEExplore, ACM Digital Library and SpringerLink databases: ‘‘scientific software
development’’, ‘‘agile scientific’’, and ‘‘testing scientific’’ (126 email addresses).
• members of other related scientific software email lists to which we had access (40
email addresses).

3.3 Survey pilot

We conducted a pilot study to ensure that the survey questions were comprehensible and
valid with respect to the research questions. We emailed the pilot survey to a randomly
selected 5 % of the targeted participants (15 of the 300 email addresses). Note that we did
not include the pilot participants in the solicitation for the main study. We evaluated the
responses from the four pilot subjects who responded to determine whether they under-
stood the questions and provided answers that were sufficient for analysis. We used the
qualitative analysis process described in Sect. 3.5 to check the responses. In addition, we
emailed the four respondents directly with the following questions:
• Were all of the words on the survey understandable? (If not, please describe)
• Were all response choices appropriate? (If not, please describe)
• Was the format and layout easy to follow? (If not, please describe)
• Did the survey logic (i.e., question skipping) make sense? (If not, please describe)
Based on the responses, we made some minor adjustments to the wording of a few
questions to fix typos and increase clarity.

3.4 Survey distribution

After the pilot, we emailed the survey’s web link to the remaining population (285 email
addresses). To increase the response rate, we sent reminder emails after one week and after
two weeks. After we sent the email to the target list, some recipients informed us that they
had forwarded the survey link to additional potential respondents and some posted the link
on scientific community web sties, such as Software Carpentry.2 Therefore, we are not able
to accurately determine the total number of people who were solicited to participate in the
survey beyond the 285 we emailed directly.

3.5 Survey analysis

To analyze the survey responses, we performed a qualitative analysis process, called ‘‘open
coding’’ (Anselm and Juliet 1990). In this commonly used analysis methodology,
researchers identify and tentatively name the conceptual categories into which the phe-
nomena observed can be grouped. This methodology includes the following steps:
(a) creating categories of answers, (b) identifying and coding each answer carefully,
(c) organizing the answers into categories, and (d) comparing each new answer to the

1
http://SE4Science.org/workshops.
2
http://www.software-carpentry.org.

123
Software Qual J (2017) 25:343–372 349

existing categories to determine whether it fits into one of them. The goal of the coding
process is to allow patterns and explanations emerge directly from the data. When ana-
lyzing the output of the coding process, we treated the results as nominal or ordinal and
reported counts and cross-tabulations. To ensure valid results and minimize bias, each
author coded the responses separately. In addition, each author coded the responses
multiple times, each time obtaining the same results. Then, the authors compared their
results and discussed to resolve any discrepancies.

4 Results

This section begins with an analysis of the demographics of the survey respondents. Then it
continues with the results relative to each of the four research questions described in
Sect. 1.

4.1 Demographics

We conducted the survey from December 2013 to January 2014. A total of 77 people
responded to the survey of which 64 reported experience with TDD. These 64 respondents
answered the detailed questions about the use of TDD on their projects. The diversity of
the sample can be characterized on the following attributes:
• Experience with TDD (Fig. 1);
• Geographical distribution (Fig. 2);
• Type of organization (Fig. 3);
• Type of project, where research indicates a main goal of publishing a paper and
production indicates a main goal of producing software for real users (Fig. 4).

4.2 RQ1. What are the effects of using TDD, from the developers’
perspective?

The following subsections address the effects of TDD on software quality (Sect. 4.2.1) and
the overall benefits of using TDD (Sect. 4.2.2).

Fig. 1 Level of experience

123
350 Software Qual J (2017) 25:343–372

Fig. 2 Distribution of
respondents’ locations

Fig. 3 Types of organizations

Fig. 4 Types of projects

4.2.1 Software quality

Survey question 10 asked the respondents to rank the relative importance of the eight
software quality attributes described by the ISO/IEC 25010:2011 standard (ISO/IEC
2011). To ensure that the respondents understood the meaning of each characteristic, the
survey included a description of each characteristic drawn from the standard. We used
these rankings to calculate the relative importance of each attribute by giving the #1 ranked
attribute 8 points, the #2 ranked attributed 7 points, on down to the #8 ranked attribute 1
point. Figure 5 shows the results of this analysis in order of importance based on overall
points.

123
Software Qual J (2017) 25:343–372 351

Fig. 5 The importance of software quality characteristics

Survey question 11 asked the respondents to describe the effectiveness of TDD relative
to the most important software quality attribute they indicated in Question 10. Our qual-
itative analysis of the responses to this open-ended question resulted in the following
categories. For each category, we provide a list of the types of answers that constitute that
category:
• Very effective answers contain phrases like ‘‘very effective’’ or ‘‘very excellent’’,
• Effective answers contain other positive words,

Fig. 6 The effectiveness of employing TDD on software quality

123
352 Software Qual J (2017) 25:343–372

• Neutral answers contain words like ‘‘fair’’ or ‘‘adequately’’,

• Ineffective answers contain other negative words, and
• Unable to evaluate the respondents either indicated they could not provide an answer or
included words like ‘‘no idea’’.
Figure 6 shows that overall the use of TDD was either effective or very effective relative to
the most important software quality attributes. Note that because we did not ask the
respondents about each specific attribute, this analysis applies to the responses as a whole
and not to specific attributes.

4.2.2 Benefits of TDD

Survey question 19 asked the respondents to describe the benefits and challenges of
employing TDD in their projects. This section only describes the benefits, whereas the
challenges are explained in Sect. 4.3.4. Using the qualitative data analysis process
described in Sect. 3.5, we grouped the responses into five primary benefits (labeled ‘B#’
below). The numbers in parentheses represent the number of responses grouped into each
category. Note that each respondent’s answer could have been coded into multiple
categories.
B#1. Ensure the quality of code (22) The respondents stated that while implementing the
system, they gain confidence by being able to test very specific code behavior. Tests are
written in conjunction with the development of functionality, providing valuable regression
notifications when combined with continuous integration testing. For example, one
respondent said: ‘‘The main benefit is that I have confidence in my software. Quite literally,
I sleep better at night’’.
TDD enforces the disciplines of writing testable code and systematically testing that
code during development. Many respondents also underscored that writing unit tests before
code is better than writing tests after coding, when it is more difficult to trust the code.
TDD allows developers to safely change the code and let it evolve without the worry of
breaking more parts of the code with each new change.
B#2. Improve software changeability (13) Use of TDD increased developers’ confi-
dence that new capabilities would not break existing capabilities, results, or interfaces. In
addition, the testing artifacts are checked over time to ensure that modifications do not
break the original function. A software package with thorough and easy-to-use tests can be
refactored with confidence over time, thereby improving the longevity of the product.
Conversely, software that lacks a solid test suite can became extremely fragile and cannot
be refactored with confidence.
B#3. Improve software maintainability (11) Refactoring makes it easy to add features to
the software, aids in debugging, and facilitates the fixing of future problems. The tests also
provide documentation describing test drivers, inputs/outputs, and examples of program
execution. For example, one respondent said: ‘‘Having unit tests greatly improves main-
tainability, assisting refactoring and new development, but this is true when tests were
developed before the functional code’’.
B#4. Identify problems early (10) The respondents reported that they could find prob-
lems/bugs earlier in the development process. With TDD, the respondents could remove
defects and improve correctness at the beginning of the project, reducing the overall
number of defects. One respondent reported that s/he could also find performance bot-
tlenecks quickly.

123
Software Qual J (2017) 25:343–372 353

B#5. Better understanding of software requirements (10) The process of developing test
cases first requires the developers to understand the requirements clearly before writing the
code. The developers gained a deeper understanding of the actual problem before coding.
They were also forced to think through edge cases before writing code and thus encoun-
tered fewer surprises later in the process.

4.3 RQ2. What are the difficulties of using TDD?

This section first reports the overall difficulties of using TDD in scientific projects
(Sect. 4.3.1). Then, it describes the problems and solutions specifically related to testing
(Sect. 4.3.1) and refactoring (Sect. 4.3.3).

4.3.1 Difficulties

Survey question 12 asked the respondents to rank the difficulty of the following TDD
activities: implementing the code, writing a test, and refactoring. The results show that
writing a test was the most difficulty activity, followed by implementing the code.
Theoretically, the TDD process does not require software design before the code is
implemented because the developers have to write only the code corresponding to the test.
In practice, however, the developers may need to design the system before testing or
implementing actual code. In terms of when they performed design activities, 63 %
designed the software before implementing the code (Survey question 14) and 67 %
designed the software while coding (Survey question 15), with some overlap between the
two groups. This evidence may imply that it is difficult to adhere strictly to the TDD
protocol, which does not require a design stage before implementing the code.

4.3.2 Testing problems and solutions

Survey question 24 asked the respondents which problems they encountered when writing
tests and how they solved those problems. The respondents noted a range of testing issues
when employing TDD. We grouped the results into six testing problems (TP) and five
testing solutions (TSL). The numbers in parentheses represent the number of respondents
whose answers were grouped into each category. Note that each respondent’s answer could
have been coded into more than one category.
TP#1. Difficulty of writing a test (19) The main problem when writing a test is to write a
good test. Many respondents explained that writing a good test can be more difficult than
implementing the actual code, particularly for tests that do not have to be changed when
the code changes. For example, one respondent said: ‘‘It’s hard to write good tests,
especially tests that won’t have to be changed if you change the internals of functions’’.
Furthermore, during refactoring, developers sometimes realize that the code requires
changes, which requires modifications to the tests and to the documentation (e.g., com-
ments, or developer’s documentation). The respondents also explained that testing without
a tool or framework made it more difficult to write tests; thus, developers were less likely
to do a good job. From the perspective of senior developers, the experience of each new
developer is different, so it is difficult to expect their tests to be written well.
TP#2. Complex application (14) Based on the responses, we classified these issues into
3 groups: complicated algorithm, numerical computations, and parallel computing. In
scientific software, there are many parts in the code that consist of complex algorithms. The

123
354 Software Qual J (2017) 25:343–372

complex code requires complicated tests. Many problems with complex code occur at
production runtime, such as deadlocks and repetitive connection drops; it is thus difficult to
test these problems in advance. Furthermore, the respondents explained that tests for
numerical computations are difficult because the developers do not know the right answer
with full confidence. Regarding parallel computing, the respondents explained that writing
tests to examine concurrency issues in parallel computing was difficult. In particular, the
‘right answer’ for a complete simulation is generally not known a priori.
TP#3. Code coverage (13) Theoretically, tests should cover most of the code, but
respondents explained that the code coverage does not always reach 100 %. In particular, it
is difficult to provide comprehensive coverage for scientific code with many dynamic parts.
One respondent reported that the code coverage analysis tool does not work for their C??
template code. In addition to the traditional definition of code coverage, these responses
also include issues of test validity, i.e., whether a test case tests what it is really intended to
test. In scientific software development, it is difficult to correlate verification tests and
issues with validation tests, so the developers focus on testing based on code functionality.
TP#4. Lack of SE practices, tools, and standard (7) Respondents thought that writing
the tests required them to understand SE practices. One problem of writing tests is
introducing unit testing when not all developers have experience. Scientific developers
from a research environment tend to be less disciplined in building and maintaining tests
than required. Additionally, existing automated testing tools, which were designed for
software engineers, are not always easy for scientists to use (e.g., wording in the menu, no
support for Fortran). Finally, a respondent explained that he/she must assume a tolerance
for the computed output, but there is no standard for creating the tolerance value.
Therefore, the tolerance may be considerably larger than would be ideal.
TP#5. Time consuming (6) It takes longer to write the test, which affects budgets and
planning. For example, one respondent said: ‘‘it is very hard to justify spending a lot of
time on (tests don’t produce ‘‘results’’)’’. A respondent indicated that there is more test
code than actual program code. The testing code becomes a major barrier to improving the
code. For example, a change in the program that might involve 500 LOC could require
changing 100 lines of testing code. This amount of code makes the developers less inclined
to make changes in the production code that require test modification. Adequate testing is a
full-time job, especially when a new change is introduced by the customers or users on a
tight schedule. In particular, the research environment often requires developers to change
their code in ways that require test modification as well.
TP#6. Code or requirement is changed (4) The survey participants indicated that the test
must change when the requirement or API changes. For example, if the algorithm changes,
the test may no longer be sufficient because the random number generator is used in a
different manner.
TSL#1: Use a suitable tool (11) The respondents believed that a good testing framework
or automated testing tool would facilitate writing tests. The respondents also suggested that
the tool would save time compared to writing the test manually. They also recommended
choosing appropriate tools for the projects.
TSL#2: Experience (11) Many respondents indicated that writing a good test requires
experience, particularly when testing complex code or algorithms. Furthermore, experi-
enced developers could confidently provide the team with the solutions to testing problems.
Scientific developers often need to learn the basics of why software is tested and what
constitutes a good test. A collection of best practices and examples would be quite helpful
in this regard. In particular, mentoring or coaching on test writing could be of great benefit.

123
Software Qual J (2017) 25:343–372 355

TSL#3: Redesign the test (5) When the code coverage percentage is lower than desired,
developers must redesign the unit tests. This process requires another developer to review
the test, in addition to the code. In some cases, using a combination of unit testing and
system testing can help solve the code coverage problem. Also, considering the number of
bugs would help developers redesign the unit test to increase the code coverage.
TSL#4: Understand the requirements (4) A clear understanding of the requirements is
necessary to write the test, particularly when the developers are working on complex
problems, such as numerical or multi-physics. As in-depth understanding could improve
the test coverage and reduce the required amount of time.
TSL#5: Gain confidence (3) When developers work on simulation software involving
floating-point calculations and parallelism, they must check simpler output to gain confi-
dence. For example, to ensure that the testing of a stochastic model is correct, scientists
need to run multiple simulations and analyze the output.
In summary, Table 2 presents the mapping between problems and solutions. The
checkmarks represents solutions suggested by survey participants. However, based on our
own experience, we also suggest additional solutions to some problems (represented by
exclamation marks in the table). The last column presents the number of solutions for each
problem, including respondents’ solutions and our suggestions.
Additionally, in the traditional software engineering literature, the following methods
have been proposed to address some of the problems identified in Table 2.
• To gain better understanding of the requirements and more confidence, the developers
can employ ‘‘specification by example’’ instead of abstract or prose (Koskela 2007).
• In Test-Driven Development: By Example (Beck 2002), Kent Beck recommends two
approaches to improve the code coverage, including: (1) to write more tests and (2) to
simplify the logic of the program.

4.3.3 Refactoring problems and solutions

Survey question 25 asked the respondents about the problems they encountered during
refactoring and their solutions to those problems. Our analysis resulted in five refactoring
problems (RP) and four refactoring solutions (RSL). The numbers in parentheses represent

Table 2 Testing problems and solutions U = solution suggested by participants ! = solution based on our
experience
Problems Use a Experience Redesign Understand Gain Total (survey,
suitable the test requirements confidence our experience)
tool

Writing a test U U – ! ! 4 (2, 2)

Complex application – U – U U 3 (3, 0)
Code coverage ! ! U U – 4 (2,2)
Lack of SE practices, U – – – – 1 (1, 0)
tools, and standard
Time consuming U ! - U – 3 (2, 1)
Requirement is – – – – – 0
changed

123
356 Software Qual J (2017) 25:343–372

the number of respondents whose answers were grouped into each category. Note that each
respondent’s answer could have been classified into more than one category.
RP#1. Dependence on unit tests (8) Because the implementation is often driven by the
tests rather than by a requirements specification, the respondents indicated that the process
of refactoring is difficult if the unit tests are not well designed. However, software is nearly
impossible to refactor without thorough unit tests that are readily available. For example,
one respondent said: ‘‘Bad unit tests were a hindrance to refactoring, because they tested
the implementation details, not the requirements’’. Similarly, it is also hard to perform
refactoring if the code coverage of the test suite is low.
RP#2. Dependence on the architecture design (7) Although testing is essential to
refactoring, poor architecture design makes refactoring difficult or impossible. Further-
more, the respondents indicated that if the initial architecture or system design is poor, it is
occasionally necessary to redesign the system and revisiting large portions of it.
RP#3. Dependence on the development environment (7) The development environment
includes the platform, programming languages, tools, and interacting components. For
example, one respondent reported that refactoring an application implemented with C is
more difficult than refactoring an application implemented with Python, because Python is
object-oriented and C is not. The respondents also indicated that it is difficult to refactor
without tools. For example, providing code to a large number of users requires a certain
level of API compatibility with older versions. Refactoring is impossible without the
appropriate tools.
RP#4. Lack of knowledge regarding when and how to refactor (6) One common
problem is developers’ failure to understand refactoring and recognize its benefits. There is
lack of knowledge of how to refactor (at all and/or efficiently) or why refactoring is
conducted (because it does not add functionality). Refactoring also may require knowledge
of advanced programming techniques, e.g., template code and functional programming.
The respondents also found that it is difficult to decide when to start refactoring. Problems
are exacerbated when refactoring is delayed. If the refactoring affects many developers’
code, it is difficult to mitigate the problem within a short period of time.
RP#5. Legacy code (3) Refactoring is difficult when the respondents are working with a
diverse legacy code base. More specifically, for legacy code, the developers may not
actually have adequate tests to ensure that a refactoring does not cause problems.
RSL#1. Coaching developers (8) Education about refactoring is necessary. In the con-
text of a research environment, the team leader may need to convince the members that
refactoring saves time and improves software quality. Experienced developers should
provide guidance and help the other developers when they find problems. Regarding the
knowledge about the appropriate time to refactor, generally, refactoring can be performed
at any time during the development process. There is no rule regarding the time for
refactoring. The respondents’ suggestions indicate there are three options for refactoring:
(1) performing several rounds of refactoring would provide better results than would
performing only a few rounds, (2) refactoring when the code begins to become disorga-
nized, or (3) refactoring shortly after release, before he/she becomes tangled in new fea-
tures for the next release. Additionally, to reduce the effort of differing experience among
developers, each developer should try to use a simple refactoring method initially.
RSL#2. Use refactoring tools (7) The respondents explained that automated refactoring
tools would be useful, particularly when working on large and complex applications.
Additionally, the refactoring tool should be able to integrate with the IDEs.
RSL#3. Redesign the software architecture (4) In some cases, it is very difficult to
refactor poor code. One solution is to redesign the software. The respondents indicated that

123
Software Qual J (2017) 25:343–372 357

redesigning the software architecture might save time compared to refactoring code in
some cases.
RSL#4. Redesign the unit tests (2) Revisiting the unit tests would help developers reduce
the time involved in refactoring, especially redesigning some of the worst tests.
In summary, Table 3 presents the mapping between problems and solutions. The
checkmarks represents solutions suggested by survey participants. We also suggest addi-
tional solutions to some problems based on our own experience (represented by excla-
mation marks in the table). For each problem the last column presents the number of
solutions, including respondents’ solutions and our suggestions.
In addition, the following methods have been proposed in the traditional software
engineering literature to address some of the problems identified in Table 3.
• Abdel-Hamid (2013) proposed the following legacy code refactoring method: (1)
quick-wins refactoring (e.g., remove dead code, remove duplicated code), (2)
Decompose into the components, and (3) create automated-tests for components.
• Mens and Tourwé (2004) provides an overview of refactoring techniques in various
dimensions, including refactoring activities, specific techniques, software artifacts
being refactored, and effects of refactoring. This work would help developers gain
more knowledge about the refactoring.

4.3.4 Challenges

Survey question 19 asked the respondents to describe the challenges of employing TDD in
their projects. The goal of this question was to understand the challenges of TDD as a
whole rather than the difficulties of each TDD step, which was the focus of Question 12.
We grouped the responses into four primary challenges (C). The numbers in parentheses
represent the number of responses grouped into each category. Note that each respondent’s
answer could have been classified into more than one category.
C#1. Spending an excessive amount of effort (32) Generally, several survey respondents
reported excessive time spent writing the test, and thus, the developers often skipped
testing in the final stages. Additionally, adding TDD to an existing project requires
enormous effort. One respondent noted that TDD can only be performed by developers

Table 3 Refactoring problems and solutions U = solution suggested by participants ! = solution based on
our experience
Problems Coaching Use Redesign the Redesign the Total (survey, our
developers refactoring software unit test experience)
tools

Dependence on unit – – – U 1 (1, 0)

tests
Dependence on the – – U – 1 (1, 0)
architecture design
Dependence on the – U – – 1 (1, 0)
environment
Lack of knowledge U – ! – 2 (1, 1)
Legacy code ! ! – ! 3 (0, 3)

123
358 Software Qual J (2017) 25:343–372

who understand what code does in detail and are willing to invest time in code quality.
TDD works well if the developers know exactly what the code should produce, which
requires the developers to spend an excessive amount of time on the requirements or
specifications of the project. In terms of project management, it might be difficult to
demonstrate overall progress to the customers or users. For example, one respondent said:
‘‘ TDD requires more time before code is written, harder to show progress of the ’whole’ in
the nearer term’’.
Additionally, the amount of code could be twofold or threefold greater than the amount
of code that the developers would write without TDD. Furthermore, the developers have to
maintain the test code. The increased amount of code has a higher cost of maintenance.
In an academic environment, writing a test case prior to writing actual code can detract
from time and resource constraints on obtaining and publishing research results in the short
term. Similarly, there is not a sufficient amount of research funding to write the amount of
testing code that real TDD would require. There is a trade-off between the rigorous
adoption of TDD and idea exploration for the development of functions with a significant
research component to them. In addition, many scientific software teams contain scientific
domain experts (rather than trained computer scientists). TDD does not come naturally to
many scientific developers. Sometimes TDD just does not fit the context of the develop-
ment environment. The adoption of TDD requires substantial effort to promote the vision
and maintain the necessary discipline.
C#2. Difficult to write tests (16) Three specific types of tests that are challenging to
write include:
1. Integration tests for large numerical computations or complex functions. In scientific
software, for example, obtaining output is often computationally expensive, there are
issues with numerical precision, and in some cases, it is impossible to compute the
expected output. Additionally, the context can change between the tests and code,
especially in the development of complex code. Several respondents noted that it is
extremely difficult to test concurrency issues.
2. Tests for research code because the expected results are typically unknown.
3. Tests that achieve high code coverage for complex capabilities.
C#3. Learning curve (12) TDD is a new concept in the scientific domain. The idea of
writing tests before code is a new experience for many scientists and scientific developers.
Therefore, there is a learning curve associated with TDD for many developers. It can also
be difficult to convince large organizations of the financial benefits of TDD, especially
those organizations that are not interested in using SE practices, which are required for
refactoring. For example, one respondent said: ‘‘it is difficult to learn, hard to justify to
some management types’’.
C#4. Need to set up a new environment (8) Some survey participants reported that the
organizations must create an adequate TDD environment and maintain a consistent level of
testing across all developers. TDD cannot be successful without support from management
and developers. Although some of the benefits of TDD tend to accrue to a development
team or research group as a whole over time, an individual developer might view it as
significant additional work with only a modest short-term payoff. Additionally, some
organizations employ sub-contractors to implement a system. TDD works when both
parties operate from a common set of rules. Therefore, the goal, process, and protocols
must be established for collaboration between the organization and sub-contractors.
In the scientific software development environment, the developers lack appropriate
tools to readily perform TDD. The currently available tools are typically not ideal for

123
Software Qual J (2017) 25:343–372 359

Table 4 Testing methods

Survey Testing Definition Agree
question methods (%)

Q21 Unit testing Testing the smallest testable units (e.g., class, module, function) in a 95.3
software in isolation. Usually done with a specialized unit testing
framework
Q22 Integration Testing that occurs after Unit Testing and is intended to ensure that the 89.1
testing units interact properly
Q23 Regression Returning test cases (which were successful in the past) to ensure that 85.9
testing changes to the code have not introduced bugs

working with numerical software because they do not handle tolerance issues adequately.
For example, in Fortran, only a few unit testing frameworks are currently maintained, and
few Fortran developers actually have and use any of these frameworks.

4.4 RQ3. Which testing methods do developers use?

The first step was to determine whether scientific developers defined testing in the same
way as traditional software engineers. Survey questions 21, 22, and 23 asked respondents if
the agreed with standard definitions for unit testing, integration testing, and regression
testing, respectively. Table 4 shows the definitions3 along with the level of agreement.
Overall, there was a very high level of agreement. The disagreements can be summarized
as follows:
• Unit testing Only three respondents disagreed with the given definition. One disagreed
with the second sentence, stating that the framework should be changed to ‘general’
rather than ‘specialized’. Two respondents thoughts that ‘smallest’ was difficult and too
restrictive.
• Integration testing Seven respondents disagreed with the given definition. Five did not
think that integration testing was separate from unit testing or necessary to perform
after unit testing. One respondent thought that ’properly’ had no meaning, and another
one did not think that integration testing was important in scientific software
development.
• Regression testing Nine respondents disagreed with the given definition. All of those
respondents said regression testing does not ensure that ‘bugs are not introduced’.
Instead, it only ensure that the tested invariants are preserved. Furthermore, regression
testing does not always fix everything.
Survey question 20 asked the respondents to explain the testing methods used in their
projects. The responses indicate that respondents interpreted this question in different
ways. Because many of the respondents did not provide a detailed explanation for their
answer and many of the terms could have multiple meanings, here we simply report the
answers provided and do not attempt to provide our own opinion on what the respondents
meant. As such, there were two types of answers:

3
definitions adapted from https://msdn.microsoft.com/en-us/library/aa292484(v=vs.71).aspx.

123
360 Software Qual J (2017) 25:343–372

1. Multiple Some respondents reported testing at multiple levels of the software lifecycle,
including the following combinations: (1) unit, regression, and integration testing, (2)
unit, regression, integration, and system testing, and (3) unit and regression testing.
2. Single Other respondents reported use of only one approach. This list of approaches is
quite heterogeneous, ranging from levels of testing to types of testing. The responses
included (1) unit testing, (2) performance testing, (3) comparing known values, (4)
verification testing, (5) validation testing, (6) evaluating the code coverage, (7)
automated testing tool, (8) white box, (9) smoke test, (10) pre-release testing, (11)
positive testing, (12) negative testing, (13) black box testing, and (14) ad-hoc testing.
Figure 7 summarizes the number of respondents that provided each answer.
Survey question 18 asked whether the respondents used automated testing tools. The
majority, 80 %, reported that they did use automated testing tools. The responses indicate
that there are 24 different automated testing tools. While a complete list of the tools
appears in Appendix 2, Fig. 8 presents the seven most commonly mentioned, including:
1. CMake 4 A tool designed to build, test and package software. It is used to control the
software compilation process. CMake is invoked on the project’s source directory,
parses the text files describing the build process, and generates a native build chain for
the desired platform and compiler. CMake provides options to the user with which the
build process can be customized.
2. CTest 5 CTest is an automated testing tool distributed with CMake. CTest can perform
several operations, including configure, build, and execute predefined tests. It also
includes advanced features for testing, such as code coverage and memory checking.
3. JUnit 6 JUnit quickly became the de facto standard framework for writing and running
automated testing in the Java programming language.
4. Google Test or GTest 7 It is a framework for writing the C/C?? tests on a variety of
platforms, including Linux, Window, and OS X. Gtest provides various options for
running the tests.
5. Python Nose 8 Nose is a Python package that provides an alternate test discovery and
running process for unit tests.
6. Python unit test 9 Python unit test is similar to JUnit, but for the Python language.
7. CxxTest 10 CxxTest is a unit testing framework for C?? that is similar to JUnit.
CxxTest supports a very flexible form of test discovery.

4.5 RQ4. Which refactoring techniques do developers use?

This section describes how respondents identified poor code in need of refactoring and how
they refactored their code.

4
http://www.cmake.org.
5
http://www.cmake.org/cmake/help/v2.8.8/ctest.html.
6
http://www.junit.org.
7
https://code.google.com/p/googletest.
8
http://nose.readthedocs.org.
9
http://docs.python.org.
10
http://cxxtest.com.

123
Software Qual J (2017) 25:343–372 361

Fig. 7 Testing approaches

Fig. 8 Automated testing tools

123
362 Software Qual J (2017) 25:343–372

4.5.1 Identifying poor code methods

Survey question 17 asked respondents to explain how they identified poor code or a poor
design. Respondents could provide more than one answer. Our analysis of the responses
resulted in the following answers. Note that each respondent’s answer could have been
grouped into more than one category.
Code or peer review (47) This code review occurs either daily or periodically. In many
cases, the respondents used IDEs with syntax highlighting to support the code review
process. These IDEs helped reviewers identify certain issues like repetitive copy-and-paste
code or routines that declare large numbers of poorly named variables. The respondents
reported the following approaches for code review:
1. Comparison with guidelines or best practices Respondents used either programming
language-specific or general programming guidelines as a standard. In addition,
respondents used commonly accepted good SE practices to identify poor code, e.g.,
duplicated code, lack of modularity, lack of separation of concern, large argument list
in procedure calls, and lack of encapsulation.
2. Finding complex code Complex code is not a positive sign when a system is critical to
the software performance. Additionally, if the code is hard to understand within a
reasonable amount of time, it is considered poor code.
Poor performance (16) The respondents used profiling tools to identify low-performing
portions of the code. In addition, the respondents also observed the system during exe-
cution to note problems with unreasonably long execution times.
Code that is difficult to modify (9) Respondents indicated that an indication of poor code
is when a developer must modify several portions of code to make a single change. For
example, one respondent said: ‘‘When modifying code, if locating the right point is hard,
then I first refactor’’. Another symptom describing poor code is that the existing code is not
extensible (e.g., adding or extending new features to the system). This symptom will hinder
or prevent further changes.
Number of bugs or defects (8) When the program returned incorrect results or displayed
unexpected behaviors (e.g., sporadic shutdown), the developers examined the code related
to that result. Similarly, the respondents tracked which code tended to cause crashes or
errors most often. Bug reports from colleagues and users were also used to identify poor
code.
Lack of documentation (7) A system that lacks design document tends to have poor code
because the developers do not understand the existing system well, and thus, modifications
will result in many problems. Rather than reviewing the code, reviewing the design helps
developers identify poor code earlier in the process. The software design is simultaneously
reviewed while inspecting the requirements. Good design should conform to the given
requirements. One respondent explained that design reviews with users could help
developers to identify poor design quickly.
Code that is difficult to test (3) The respondents stated that non-testable code was often
identified as poor code.

4.5.2 Refactoring methods

Survey question 13 asked respondents to indicate which well-known refactoring methods

they used on their projects. The choices along with the number of responses (in paren-
theses) are:

123
Software Qual J (2017) 25:343–372 363

• Breaking large methods up into smaller methods (48),

• Renaming methods, variables or classes (47),
• simplifying control structure (e.g., series of ‘if’ statements or nested loops, etc.) (46),
• splitting large classes (e.g., move parts of the code from an existing class into a new
class) (45),
• adding or removing parameters from a method (40),
• moving methods or fields of a class to a super class (33),
• moving methods or fields of a class to a sub-class (32),
• creating an encapsulated field (e.g., using setter methods to make public member data
private) (27), and
• applying design patterns (20).
Interestingly, 20 out of 64 respondents employed design patterns in their system. The
most common design pattern that respondents used is the Factory Method pattern (5). Two
design patterns, Surrogate and Object were not among the traditional GoF pat-
terns (Gamma et al. 1995).

5 Discussion

This section revisits the four research questions described in Sect. 1 to provide answers
based on the results described in Sect. 4.

5.1 RQ1. What are the effects of using TDD, from the developers’
perspective?

The results indicate that the effects of TDD impact some particular software quality
characteristics, including Functionality, Reliability, Performance, and Maintainability.
TDD gives developers confidence that their software performs all of its intended functions
correctly. Based on the results, TDD is more effective on general scientific projects than
for parallel computing projects. Using the TDD process, developers might produce twofold
to threefold more test code than actual code and may generate thousands of unit tests.
Moreover, testing and refactoring are difficult for large software products, particularly
when testing concurrency issues. For example, mathematical software frameworks provide
such a wide variety of components for the user to choose from that it is nearly impossible
to consider the full coverage of test cases. Therefore, TDD might not be suitable for large
scientific projects, except in the presence of robust tools and experienced developers.
Best practices or guidelines might help developers successfully employ TDD in sci-
entific projects. In large projects, the developers should perform refactoring often. The use
of multiple testing methods would also help developers ensure that nearly all of the code
was tested.

5.2 RQ2. What are the difficulties of using TDD?

The most difficult aspect of TDD in the scientific environment is writing good tests,
especially when the software includes floating-point calculations, numerical methods, and
parallelism. Writing good tests requires a thorough understanding of the requirements.
Furthermore, time pressure from users can make it difficult for developers to spend the
time required to define and implement adequate unit tests for new features.

123
364 Software Qual J (2017) 25:343–372

Ideally, when using TDD, developers should not have to change unit tests that the code
has successfully passed. Unfortunately, in many cases, it is not practical to leave these tests
unchanged, especially after the code is refactored. For example, when reducing the number
of parameters in a function, developers might later change the unit test calling the refac-
tored function. This modification is helpful for developers, but it breaks the refactoring
practices.
When working with legacy code, a common practice in scientific software, the lack of
existing unit tests makes it difficult or impossible for developers to demonstrate correct
behavior. This inability to easily test for correctness also makes incremental refactoring
difficult. Another challenge with refactoring legacy code is that the large amount of time
required to run validation tests, when they exist, reduces the efficiency of the refactoring
process. While TDD is not strictly an OO phenomenon, the ability to use classes for testing
is helpful. A large amount of legacy code uses languages that are purely procedural (e.g.,
C) or the earlier procedural version of a language that now includes OO (e.g., Fortran and
Cobol) that lack the ability to use classes. One method used to improve the legacy code is
managing the code dependencies carefully. However, it is difficult to break dependencies
in procedural code.
Section 2 described a number of challenges to the adoption of TDD in industry as
identified by Causevic et al. (2011). The following challenges are also present when
attempting to adopt TDD in the scientific environment (the location in our survey appears
in parentheses): (1) increased development time (TP#5 and C#1), (2) insufficient TDD
experience/knowledge (TP#4 and RP#4), (3) insufficient design (TP#1 and RP#2), (4)
insufficient developer testing skills (C#3), (5) insufficient adherence to the TDD protocol
(Sect. 4.3), (6) tool-specific limitations (TP#4 and RP#3), and (7) legacy code (RP#5).
In addition, our survey revealed some TDD adoption problems that have not been
reported in the traditional software engineering literature and may be specific to the sci-
entific domain, including:
1. Complex application (TP#2), code coverage (TP#3), and difficult to implement (C#2)
scientific software is more likely than traditional software to contain complex
numerical computations or advanced algorithms. Additionally, this problem occurs in
the testing process of the scientific environment, where parallel computing is often
utilized. SE researchers should carefully study this problem because existing testing
methods may not be suitable for the parallel computing environment.
2. Code or requirement is changed (TP#6) This problem might appear especially in the
scientific environment because scientists cannot know all of the system features in
advance. More specifically, the requirements are based on the results of experiments.
In contrast, in traditional software development, most often developers have a better
idea of the requirements at the beginning of the project. Although the requirements
may not be complete, in more traditional software development, developers should be
better able to predict the needs of customers than in scientific projects.
3. Dependence on unit tests (RP#1) In the scientific domain, this problem might result
from the generally low percent of code coverage obtained by the test suite (TP#3).
4. Need to set up a new environment (C#4) Scientific projects often involve scientists
with different levels of programming expertise. Effective adoption of TDD requires
most developers to have a good understanding of the TDD process and its benefits.
These problems indicate that the adoption of TDD in a scientific environment may
require more attention than in a traditional environment. To minimize problems, both
technical solutions and managerial strategies are necessary.

123
Software Qual J (2017) 25:343–372 365

5.3 RQ3. Which testing methods do developers use?

Scientific developers used three primary testing approaches: unit testing, regression testing,
and integration testing. Scientific developers appear to use similar automated testing tools as
used by traditional software developers. In addition, scientific developers commonly use the
CMake framework because it works in various environments. For other tools, we believe that
the programming language is an important key to select a specific tool, such as CTest and
JUnit. While some software testing frameworks have explicit support for distributed paral-
lelism, i.e., pFUnit, few of respondents reported use of these tools. Conversely, about 20 %
of the scientific developers did not use any automated testing tools. This result suggests that
the scientific developers may need some additional training in the use of testing tools.

5.4 RQ4. Which refactoring techniques do developers use?

Although many scientific developers did apply refactoring techniques, most were relatively
simple. Because complex refactoring techniques involve advanced SE practices, scientific
developers often do not use difficult techniques. In some cases, the developers employed Design
Patterns, which are relatively new to scientific. In addition to using the GoF design pat-
terns (Gamma et al. 1995), they have also developed design patterns for specific programming
languages like Fortran.
The refactoring phase is often overlooked in an academic environment due to the tight
schedules and funding constraints. Additionally, ensuring high-quality code is not always the
most important goal for scientific developers because they view the software simply as a means to
an end, i.e., scientific research. (Note: We are not endorsing this view, just reporting the results.)
In many cases, scientific developers still need to be convinced of the benefits of
refactoring. The developers benefit from coupling training with hands-on experience. The
evidence indicates that many scientific developers view refactoring tools as essential
during the refactoring process. Automated refactoring tools would help scientific devel-
opers save the time and effort. Unfortunately, some of existing tools are limited to a few
programming languages (e.g., Java, C??).

6 Threats to validity

We organize this section around the three common types of validity threats.

6.1 Internal validity

This study has two primary threats to internal validity. The first threat is a potential
selection bias. Our email distribution list consisted only of authors who had attended
related workshops, published papers about scientific software development, or were
members of a selected email list. Therefore, it omitted any scientific developers who may
have used TDD but had not published any papers or participated in the workshops or
mailing lists. As mentioned in Sect. 3.4, some of the recipients of the survey invitation
posted the survey link in community forums frequented by scientific developers. This
additional availability of the survey reduces (but does not eliminate) the selection bias.
The second threat relates to the qualitative analysis process. Each author coded the survey
responses separately, introducing the possibility of a confirmation bias, in which each author seeks
out evidence that supports his preconceived notions. To reduce the confirmation, each author

123
366 Software Qual J (2017) 25:343–372

performed the analysis multiple times, each time arriving at the same results. Moreover, the authors
compared their individual results and refined those results until there were no disagreements.

6.2 Construct validity

Construct validity is concerned with whether the concepts being studied are correct and whether
the survey can be understood by participants. To mitigate this threat, we asked SE and scientific
experts to evaluate the survey questions. In addition, we conducted a pilot study with a subset of
the survey population. The results of the pilot helped us refine the wording of the questions to
make them as easy to understand as possible. In addition, the high level of agreement with the
testing definitions provided in Table 4 provides additional confidence in our results.

6.3 External validity

External validity focuses on the generalizability of the study results. The main threat to
external validity is the sample size of 64 participants. The demographics in Sect. 4.1
suggest that overall the pool of respondents is diverse. The one characteristic that is a bit
skewed is the geographical distribution. While the scientific community is international,
more than 60 % of the survey responses came from North America. It is possible that the
level of knowledge and experience differs across geographical boundaries. While we have
no evidence to suggest this distribution biased the results, it is possible. Because we cannot
guarantee that the survey participants adequately represent the population, the results of
this study may not be generalizable to all scientific communities.

7 Conclusion

The results of this survey provide empirical evidence about the effectiveness of TDD for
scientific software development. In terms of software quality characteristics, the primary effect
of TDD is to improve functionality. TDD supports the addition of new functionality with little
cost. When developing software with testability in mind, the result is that the software is
extensible, flexible, and maintainable. Additionally, TDD helps developers identify and
remove defects early in the project, thereby reducing the overall number of defects.
Writing tests, especially those that do not have to change when the code is refactored, is
the most difficult task in the TDD process. Furthermore, writing tests to examine con-
currency issues in parallel computing is difficult. Poorly implemented unit tests also makes
the refactoring process difficult. Regardless of the benefits of refactoring, there is little
motivation for refactoring code in an academic environment. Researchers in the scientific
domain rarely revise previous code after the paper is published.
We believe that the results of this work will be beneficial for scientific developers and
SE researchers. The solutions of testing and refactoring problems provide practical sug-
gestions to other scientists who are using TDD in their projects. The results of this work
should be of interest to researchers, because there is a need for additional empirical
evaluations of adopting TDD in various contexts.

Acknowledgments The authors gratefully thank all participants in the survey for their time and contri-
butions. Jeffrey Carver would like to acknowledge partial support from NSF Grants 1243887 and 1445344.

Appendix 1: Survey questions

123
Software Qual J (2017) 25:343–372 367

1. For which type of organization does you currently work?

University
Government Laboratory
Private Company
Other

2. What type of projects do you typically work on? (Select more than one)
Research (main goal is to publish papers)
Production (main goal is to produce software for real users)
Other

3. Please describe any other signiﬁcant work experience in ﬁelds other than your
educational background.

....................................................................................................................................................

4. Please describe your educational background (i.e. list degrees and Majors B.S.
in Chemistry; M.S. in Chemistry, etc)

....................................................................................................................................................

5. How many years have you been developing real scientiﬁc software projects?

....................................................................................................................................................

6. Please rate your programming language skills.

Languages No Learn in a class or Used on one real Used on

experience book, but never project multiple
used or a real projects
project
C++
Fortran
C
MATLAB
Python
Java
Visual Basic
C#
Perl
Smalltalk
Haskell
Mathematica

7. Do you know Test-Driven Development (TDD)?

Yes No

8.What is your previous experience with TDD in a scientiﬁc project? (Check an

item that applied the most)
I have only learned TDD without implementing in any real project.
I have ever employed TDD as part of a course, but never used on any real project.
I have ever employed TDD on only one real project.
I have ever employed TDD on many real projects.

9. How have you obtained your Test-Driven Development skill?

123
368 Software Qual J (2017) 25:343–372

Reading books
Training course
Co-workers
Learning on my own from online resources
Other
10. Please rank the software quality based on what is important to your soft-
ware [1 - Most Important]
Compatibility
(The ability of two or more software components to exchange information and/or to perform
the required function while sharing the same hardware or software environment)
Functional suitability
(The degree to which the software product provides functions that meet stated and implied
needs when the software is used under specific conditions)
Maintainability (The degree to which the software product can be modified. Modification
may include corrections, improvements or adaption of the software to change in environment,
and in requirements and functional specifications)
Operability (The degree to which the software product can be understood, learned, used and
attractive to the user, when using under specific conditions)
Performance eﬃciency
(The degree to which the software product provides appropriate performance, relative to the
amount of resource used, under stated conditions)
Reliability
(The degree to which the software product can maintain a specified level of performance when
using under specific conditions)
Security
(The protection of system items from accidental or malicious access, use, modification, de-
struction, or disclosure)
Transferability
(The degree to which the software product can be transferred from one environment to another)

11. Based on Question 10, how was the eﬀectiveness of employing TDD on the
most important software quality?

....................................................................................................................................................

12. Please rank these activities in terms of diﬃculty relative to TDD. [1- Most
diﬃcult]
Write a test
Write code to make the test pass
Refactoring

13. Which techniques did you use to refactor the code? (Code refactoring is dis-
ciplined technique for restructuring an existing body of code, altering its internal structure
without changing its external behavior. Refactoring is undertaken in order to improve some of
the nonfunctional attributes of the software.)
Breaking, large methods up into smaller methods
Renaming methods, variables or classes
Simplifying control structure (e.g., series of if statements, or nested loops, etc.)
Creating encapsulated field (e.g., using getter and setter methods to make public member
data private)
Splitting large classes (Move part of the code from existing class into a new class)
Adding or removing parameters from a method
Moving methods or fields of a class to a super class
Moving methods or field of a class to sub-class
Applying the design pattern(s) (Please specify the used design patterns)
Others
14. When using TDD, how often do you design the software before writing the
code?

123
Software Qual J (2017) 25:343–372 369

(Software design is the process of defining software methods, functions, objects, and the overall
structure and interaction of your code e.g., create flow charts, class diagrams)
Very frequently
Frequently
Occasionally
Rarely
Very rarely
Never
15. When using TDD, how often do you perform any software design activities
during code development?
Very frequently
Frequently
Occasionally
Rarely
Very rarely
Never
16. Besides refactoring, did you use other techniques or approaches to improve
the code?
Yes
No
17. Overall, how did you identify poor code or poor design?
....................................................................................................................................................
18. Did you use any automated testing tools? (CMake, CTest, GTest, etc...)
Yes (Please specify the tools)
No
19. Based on your experience, what are the benefits and challenges of TDD?
....................................................................................................................................................
20. Please explain the testing method(s) that you used in the project?
....................................................................................................................................................
21. Do you agree with this given definition of ‘Unit Testing’ ?
Definition: Testing the smallest testable units (e.g., class, module, function) in a software sys-
tem in isolation. Usually done with a specialized unit testing framework.
Yes
No

22. Do you agree with this given deﬁnition of ‘Integration Testing’ ?

Deﬁnition: Testing that occurs after Unit Testing and is intended to ensure that the units
interact properly.
Yes
No

23. Do you agree with this given deﬁnition of ‘Regression Testing’ ?

Deﬁnition: Returning test cases (which were successful in the past) to ensure that changes to
the code have not introduced bugs.
Yes
No

24. What did you learn about the problem of writing tests in your project? How
did you solve such problems?
....................................................................................................................................................

25. What did you learn about the problem of refactoring the code in your project?
How did you solve such problems?
....................................................................................................................................................

123
370 Software Qual J (2017) 25:343–372

Appendix 2: Automated testing tools

Tools Languages URL

Boost C?? http://www.boost.org

Buildbot Python http://buildbot.net
CDash Global http://www.cdash.org
Clover Java https://www.atlassian.com/software/clover/overview
CMake Global http://www.cmake.org
CppUnit C?? http://cppunit.sourceforge.net
CTest Global http://www.cmake.org/cmake/help/v2.8.8/ctest.html
CxxTest C?? http://cxxtest.com
GCov Global http://gcc.gnu.org/onlinedocs/gcc/Gcov.html
GTest C?? https://code.google.com/p/googletest/
Jenkins Java http://jenkins-ci.org
Igloo C?? http://igloo-testing.org
JUnit Java http://www.junit.org
Lcov Global http://ltp.sourceforge.net/coverage/lcov.php
Marathon Java http://marathontesting.com
Nose Python http://nose.readthedocs.org
pFUnit Fortran http://pfunit.sourceforge.net
Pytest Python http://pytest.org
Python unit test Python http://docs.python.org
Qtest Global http://www.qasymphony.com/qtest.html
Sikuli Global http://www.sikuli.org
Silk TestPartner Global http://www.borland.com/products/silktestpartner/
Travis Global https://travis-ci.org
XUnit .NET http://xunit.codeplex.com

References
Abdel-Hamid, A. (2013). Refactoring as a lifeline: Lessons learned from refactoring. In Agile Conference
(AGILE), 2013, pp. 129–136
Anselm, L. S., & Juliet, M. C. (1990). Basics of qualitative research: Rounded theory procedures and
techniques. Newbury Park, CA: Sage Publications.
Beck, K. (2002). Test driven development: By example. Boston, MA: Addison-Wesley Longman Publishing
Co. Inc.
Beck, K., & Andres, C. (2004). Extreme programming explained: Embrace change (2nd ed.). Boston, MA:
Addison-Wesley Professional.
Carver, J. (2011). Development of a mesh generation code with a graphical front-end: A case study. Journal
of End User Computing, 23(4), 1–16.
Carver, J. C., Kendall, R. P., Squires, S. E., & Post, D. E. (2007). Software development environments for
scientific and engineering software: A series of case studies. In The 29th international conference on
software engineering (pp. 550–559). MN: Minneapolis.

123
Software Qual J (2017) 25:343–372 371

Causevic, A., Sundmark, D., & Punnekkat, S. (2011). Factors limiting industrial adoption of test driven
development: A systematic review. In The 4th international conference on software testing (pp.
337–346). Berlin: Verification and Validation.
Desai, C., Janzen, D., & Savage, K. (2008). A survey of evidence for Test-Driven Development in academia.
SIGCSE Bulletin, 40(2), 97–101.
Eclipse. (2013). Photran—An integrated development environment and refactoring tool for fortran. http://
www.eclipse.org/photran/. Accessed December 2013
Erdogmus, H., Morisio, M., & Torchiano, M. (2005). On the effectiveness of the test-first approach to
programming. IEEE Transactions on Software Engineering, 31(3), 226–237. doi:10.1109/TSE.2005.37
Fowler, M. (1999). Refactoring: Improving the design of existing code. Boston, MA: Addison-Wesley
Longman Publishing Co. Inc.
Gamma, E., Helm, R., Johnson, R., & Vlissides, J. (1995). Design patterns: Elements of reusable object-
oriented software. Boston, MA: Addison-Wesley Longman Publishing Co. Inc.
ISO IEC. (2011). Systems and software engineering: System and software quality requirements and eval-
uation (SQuaRE)—System and software quality models. ISO/IEC, 25010, 2011.
Janzen, D., & Saiedian, H. (2005). Test-Driven Development concepts, taxonomy, and future direction.
Computer, 38(9), 43–50. doi:10.1109/MC.2005.314
Kollanus, S. (2010). Test-Driven Development—Still a promising approach? In Proceedings of the 7th
international conference on the quality of information and communications technology (pp. 403–408).
Portugal: Porto.
Koskela, L. (2007). Test driven: Practical TDD and acceptance TDD for java developers. Greenwich, CO:
Manning Publications Co.
Mens, T., & Tourwé, T. (2004). A survey of software refactoring. IEEE Transactions on Software Engi-
neering, 30(2), 126–139.
Nanthaamornphong, A., Morris, K., Rouson, D., & Michelsen, H. (2013). A case study: Agile development
in the community laser-induced incandescence modeling environment (CLiiME). In The 5th inter-
national workshop on software engineering for computational science and engineering (pp. 9–18).
Nanthaamornphong, A., Carver, J., Morris, K., Michelsen, H., & Rouson, D. (2014). Building cliime via
Test-Driven Development: A case study. Computing in Science Engineering, 16(3), 36–46.
Opdyke, W. F. (1992). Refactoring object-oriented frameworks. PhD thesis, University of Illinois at Urbana-
Champaign, Champaign, Illinois, USA.
Orchard, D., & Rice, A. (2013). Upgrading fortran source code using automatic refactoring. Proceedings of
the international workshop on refactoring tools (pp. 29–32). Indiana: Indianapolis.
Overbey, J., Xanthos, S., Johnson, R., & Foote, B. (2005). Refactorings for fortran and high-performance
computing. In Proceedings of the 2nd international workshop on software engineering for high per-
formance computing system applications (pp. 37–39). Missouri: St. Louis.
Overbey, J. L., Negara, S., & Johnson, R. E. (2009). Refactoring and the evolution of fortran. In Proceedings
of the international workshop on software engineering for computational science and engineering (pp.
28–34). British Columbia: Vancouver.
Rafique, Y., & Misic, V. (2013). The effects of Test-Driven Development on external quality and pro-
ductivity: A meta-analysis. IEEE Transactions on Software Engineering, 39(6), 835–856.
Ruparelia, N. B. (2010). Software development lifecycle models. SIGSOFT Software Engineering Notes,
35(3), 8–13.
Sanchez, J., Williams, L., Maximilien, E. (2007). On the sustained use of a Test-Driven Development
practice at ibm. In Agile conference (AGILE), 2007 (pp 5–14)
Sanders, R., & Kelly, D. (2008). Dealing with risk in scientific software development. IEEE Software, 25(4),
21–28.
Sletholt, M., Hannay, J., Pfahl, D., & Langtangen, H. (2012). What do we know about scientific software
development’s agile practices? Computing in Science Engineering, 14(2), 24–37.
Sletholt, M. T., Hannay, J., Pfahl, D., Benestad, H. C., & Langtangen, H. P. (2011). A literature review of
agile practices and their effects in scientific software development. In Proceedings of the 4th inter-
national workshop on software engineering for computational science and engineering (pp. 1–9).
Hawaii: Honolulu.

123
372 Software Qual J (2017) 25:343–372

Aziz Nanthaamornphong earned his Ph.D. degree in Computer Sci-

ence from the University of Alabama. He is a lecturer and researcher in
the Department of Information and Communications Technology at
Prince of Songkla University, Phuket Campus, Thailand. His main
research interests include empirical software engineering, software
quality, software engineering in scientific software, and software
maintenance. He is a Member of the IEEE Computer Society and the
ACM. Contact him at [email protected]

Jeffrey C. Carver earned his Ph.D. degree in Computer Science from

the University of Maryland. He is an Associate Professor in the
Department of Computer Science at the University of Alabama. His
main research interests include empirical software engineering, soft-
ware quality, software engineering for computational science and
engineering, software architecture, human factors in software engi-
neering and software process improvement. He is a Senior Member of
the IEEE Computer Society and a Senior Member of the ACM.
Contact him at [email protected].

123

Document 2 (2)
No ratings yet
Document 2 (2)
24 pages
Test-Driven-Development-TDD.pptx
No ratings yet
Test-Driven-Development-TDD.pptx
12 pages
_4_adoption_TDD_in_industry-2011
No ratings yet
_4_adoption_TDD_in_industry-2011
10 pages
OceanofPDF.com Software Testing Automation - Saeed Parsa
No ratings yet
OceanofPDF.com Software Testing Automation - Saeed Parsa
765 pages
SEPQM - SE3010 (1)
No ratings yet
SEPQM - SE3010 (1)
2 pages
18 Test Driven Development TDD 18
No ratings yet
18 Test Driven Development TDD 18
15 pages
1 s2.0 S0950584916300222 Main
No ratings yet
1 s2.0 S0950584916300222 Main
10 pages
wbma2011
No ratings yet
wbma2011
12 pages
101 Singing Tips
No ratings yet
101 Singing Tips
102 pages
ATDD Guide 26-03-21
No ratings yet
ATDD Guide 26-03-21
4 pages
Is TDD Dead
No ratings yet
Is TDD Dead
3 pages
On Translating Arabic (Cultural Approach) - Opt
100% (1)
On Translating Arabic (Cultural Approach) - Opt
238 pages
C16 Effective Unit Testing
No ratings yet
C16 Effective Unit Testing
46 pages
Defect Detection Efficiency: Test Case Based vs. Ex-Ploratory Testing
No ratings yet
Defect Detection Efficiency: Test Case Based vs. Ex-Ploratory Testing
11 pages
TDD as a model for enhancing reliability-1
No ratings yet
TDD as a model for enhancing reliability-1
3 pages
Symflower_Whitepaper_TDD-Made-Easy
No ratings yet
Symflower_Whitepaper_TDD-Made-Easy
15 pages
Test Driven Development Method in Software Development Process
No ratings yet
Test Driven Development Method in Software Development Process
5 pages
Jurnal 6
No ratings yet
Jurnal 6
6 pages
Obstacles and Opportunities For Model-Based Testin
No ratings yet
Obstacles and Opportunities For Model-Based Testin
10 pages
Test-Driven Development in Practice
No ratings yet
Test-Driven Development in Practice
27 pages
Test Driven Development TDD Challenges A
No ratings yet
Test Driven Development TDD Challenges A
6 pages
BSE-20S-038, Essay On Listning Skill
No ratings yet
BSE-20S-038, Essay On Listning Skill
4 pages
How To - Write User Stories 15-11-22
No ratings yet
How To - Write User Stories 15-11-22
6 pages
EGV - Module5 SE 102
No ratings yet
EGV - Module5 SE 102
8 pages
Test Driven Development v5
No ratings yet
Test Driven Development v5
30 pages
Test-Driven Development: Dr. Christoph Steindl Senior It Architect and Method Exponent Certified Scrummaster
No ratings yet
Test-Driven Development: Dr. Christoph Steindl Senior It Architect and Method Exponent Certified Scrummaster
44 pages
Advanced Topic - Test-Driven Development - Scaled Agile Framework
No ratings yet
Advanced Topic - Test-Driven Development - Scaled Agile Framework
4 pages
Test Driven Lasse Koskela Chapter 1: The Big Picture: Paul Ammann
No ratings yet
Test Driven Lasse Koskela Chapter 1: The Big Picture: Paul Ammann
14 pages
Testing Software - What Is TDD - A Practical Introduction With Q&A - by Eric Elliott - JavaScript Scene - Medium
No ratings yet
Testing Software - What Is TDD - A Practical Introduction With Q&A - by Eric Elliott - JavaScript Scene - Medium
8 pages
Test Driven Development
No ratings yet
Test Driven Development
2 pages
Agile Testing
No ratings yet
Agile Testing
29 pages
Se TDD
No ratings yet
Se TDD
16 pages
Mastering Extreme Programming: A Comprehensive Guidebook
From Everand
Mastering Extreme Programming: A Comprehensive Guidebook
Rob Proutyon
No ratings yet
Test Driven Development
No ratings yet
Test Driven Development
4 pages
Unit Test Using Test Driven Development Approach To Support Reusability
No ratings yet
Unit Test Using Test Driven Development Approach To Support Reusability
3 pages
Test Driven Development - (TDD)
No ratings yet
Test Driven Development - (TDD)
21 pages
Introduction
No ratings yet
Introduction
2 pages
Ian Sommerville Notes
No ratings yet
Ian Sommerville Notes
5 pages
windmillswinekebab_LATE_126681_5585159_windmillswinekebab-Assignment3
No ratings yet
windmillswinekebab_LATE_126681_5585159_windmillswinekebab-Assignment3
15 pages
Forward-Looking Practices to Onboard Generation Z at Entry-Level
From Everand
Forward-Looking Practices to Onboard Generation Z at Entry-Level
Dr. Assefa Belay Wondim
No ratings yet
Tutorial 1
No ratings yet
Tutorial 1
11 pages
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
No ratings yet
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
10 pages
Test
No ratings yet
Test
3 pages
Continous Auto Regression Testing and Refactoring Instead of TDD
No ratings yet
Continous Auto Regression Testing and Refactoring Instead of TDD
14 pages
Solution Manual For Starting Out With C Early Objects 8 e 8th Edition 013336092x
100% (52)
Solution Manual For Starting Out With C Early Objects 8 e 8th Edition 013336092x
18 pages
9058
No ratings yet
9058
204 pages
CR 88 Intro 2022
No ratings yet
CR 88 Intro 2022
6 pages
CORDIS_project_688228_en
No ratings yet
CORDIS_project_688228_en
6 pages
Junit White Paper
No ratings yet
Junit White Paper
3 pages
Learning and Understanding: Test-Driven Development in Software Development
No ratings yet
Learning and Understanding: Test-Driven Development in Software Development
5 pages
Software Testing Automation: Saeed Parsa
No ratings yet
Software Testing Automation: Saeed Parsa
594 pages
VW10540-3_EN_2005-01-01
No ratings yet
VW10540-3_EN_2005-01-01
3 pages
A Briefly Introduction To Test-Driven Development
No ratings yet
A Briefly Introduction To Test-Driven Development
3 pages
GNSS-SDR: An Open Source Tool For Researchers and Developers
No ratings yet
GNSS-SDR: An Open Source Tool For Researchers and Developers
15 pages
haskell
No ratings yet
haskell
3 pages
Layering
No ratings yet
Layering
3 pages
TUGAS TUTORIAL KE-1 (Online) Kode Pbis4102/Cross Cultural Understanding/Sks 2 Program Studi Pendidikan Bahasa Inggris
100% (2)
TUGAS TUTORIAL KE-1 (Online) Kode Pbis4102/Cross Cultural Understanding/Sks 2 Program Studi Pendidikan Bahasa Inggris
4 pages
Principles of Test-Driven Development: Definitive Reference for Developers and Engineers
From Everand
Principles of Test-Driven Development: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
DevTest Engineering Foundations: Definitive Reference for Developers and Engineers
From Everand
DevTest Engineering Foundations: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
SEERAH OF THE PROPHE
No ratings yet
SEERAH OF THE PROPHE
2 pages
Objectives
No ratings yet
Objectives
1 page
Iep Edu 203
100% (1)
Iep Edu 203
16 pages
The Testing Paradox
From Everand
The Testing Paradox
Pasquale De Marco
No ratings yet
Physics_Crossword_Puzzle
No ratings yet
Physics_Crossword_Puzzle
1 page
Lesson 0: Approach of This Tutorial
No ratings yet
Lesson 0: Approach of This Tutorial
8 pages
Software Development Lifecycle Made Simple: A Practical Guide with Examples
From Everand
Software Development Lifecycle Made Simple: A Practical Guide with Examples
William E. Clark
No ratings yet
PowerTest User Manual V2.3
No ratings yet
PowerTest User Manual V2.3
523 pages
Https Shipkraken-S3.skydropx - Com Uploads Label Label File
No ratings yet
Https Shipkraken-S3.skydropx - Com Uploads Label Label File
3 pages
First Quarter Requirement in Eapp: Submitted By: Genesis Ryms B. Domallig Student Name
No ratings yet
First Quarter Requirement in Eapp: Submitted By: Genesis Ryms B. Domallig Student Name
6 pages
Early Warning Signs in Complex Projects
From Everand
Early Warning Signs in Complex Projects
Ole Jonny Klakegg
No ratings yet
SINIAALTO Maria. Test Driven Development Empirical Body of Evidence
No ratings yet
SINIAALTO Maria. Test Driven Development Empirical Body of Evidence
15 pages
Software Engineering: Concepts, Principles, and Practices
From Everand
Software Engineering: Concepts, Principles, and Practices
Pasquale De Marco
No ratings yet
BIOBASE Automatic Electronic Analytical Balance BA-C Series User Manual 202207 (16.6)
No ratings yet
BIOBASE Automatic Electronic Analytical Balance BA-C Series User Manual 202207 (16.6)
15 pages
Expository Writing Course Outline and Weekly Plan
No ratings yet
Expository Writing Course Outline and Weekly Plan
6 pages
Masters-proj-gantt
No ratings yet
Masters-proj-gantt
1 page
cat 1 physics key
No ratings yet
cat 1 physics key
1 page
Morality: What Is The Best Account of Objective Moral Values and Duties? (New York and
No ratings yet
Morality: What Is The Best Account of Objective Moral Values and Duties? (New York and
3 pages
Tugas 1 Sesi 3 Binggris
No ratings yet
Tugas 1 Sesi 3 Binggris
4 pages
Orange Money KYC FORM
No ratings yet
Orange Money KYC FORM
1 page
Research Proposal Rubrics - Oral Presentation
No ratings yet
Research Proposal Rubrics - Oral Presentation
2 pages
Module 2-Educ 8 Assessment
No ratings yet
Module 2-Educ 8 Assessment
3 pages
Toefl Dasar-77-78
No ratings yet
Toefl Dasar-77-78
2 pages
Cad1501 Assignment 02
No ratings yet
Cad1501 Assignment 02
5 pages
Divine Mercy Chaplet
No ratings yet
Divine Mercy Chaplet
1 page
TNPSC Group 4 Govt Notes - History and Culture of India - English
No ratings yet
TNPSC Group 4 Govt Notes - History and Culture of India - English
134 pages
Pali Textbook
No ratings yet
Pali Textbook
51 pages
Professional Test Driven Development with C#: Developing Real World Applications with TDD
From Everand
Professional Test Driven Development with C#: Developing Real World Applications with TDD
James Bender
No ratings yet
Mastering Test-Driven Development (TDD): Building Reliable and Maintainable Software
From Everand
Mastering Test-Driven Development (TDD): Building Reliable and Maintainable Software
Robert Johnson
No ratings yet
0.682428 1369727293 PDF
No ratings yet
0.682428 1369727293 PDF
12 pages
Information Technology Project Management Interview Questions: IT Project Management and Project Management Interview Questions, Answers, and Explanations
From Everand
Information Technology Project Management Interview Questions: IT Project Management and Project Management Interview Questions, Answers, and Explanations
Equity Press
4/5 (4)
Learning Software Engineering
From Everand
Learning Software Engineering
IT Campus Academy
No ratings yet
Contextualization of Project Management Practice and Best Practice
From Everand
Contextualization of Project Management Practice and Best Practice
Claude Besner
No ratings yet
Group Project Software Management: A Guide for University Students and Instructors
From Everand
Group Project Software Management: A Guide for University Students and Instructors
Tommy Yuan
No ratings yet
ISTQB Certified Tester Advanced Level Test Manager (CTAL-TM): Practice Questions Syllabus 2012
From Everand
ISTQB Certified Tester Advanced Level Test Manager (CTAL-TM): Practice Questions Syllabus 2012
Gabriel Awoyemi
No ratings yet
Agile Approaches on Large Projects in Large Organizations
From Everand
Agile Approaches on Large Projects in Large Organizations
Brian Hobbs
No ratings yet
Cdac Sample Paper
No ratings yet
Cdac Sample Paper
30 pages
Software Testing Interview Questions You'll Most Likely Be Asked
From Everand
Software Testing Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet

s11219-015-9292-4

Uploaded by

s11219-015-9292-4

Uploaded by

Software Qual J (2017) 25:343–372

Test-Driven Development in scientific software: a survey

Aziz Nanthaamornphong1 • Jeffrey C. Carver2

Published online: 21 September 2015

Abstract Scientific software developers are increasingly employing various software

Keywords Software engineering Test-Driven Development Survey

& Jeffrey C. Carver

Scientific software is software that seeks to support or advance scientific or engineering

2.1 TDD in traditional SE

6. tool-specific limitations; and

2.2 How TDD fits the scientific context

3.1 Survey design

3.2 Survey population

Table 1 Mapping between research questions and survey questions

• attendees of the Software Engineering for Science workshop series,1

3.3 Survey pilot

3.4 Survey distribution

3.5 Survey analysis

Fig. 1 Level of experience

Fig. 3 Types of organizations

Fig. 4 Types of projects

4.2.1 Software quality

Fig. 5 The importance of software quality characteristics

Fig. 6 The effectiveness of employing TDD on software quality

• Neutral answers contain words like ‘‘fair’’ or ‘‘adequately’’,

4.2.2 Benefits of TDD

4.3 RQ2. What are the difficulties of using TDD?

4.3.2 Testing problems and solutions

4.3.3 Refactoring problems and solutions

Writing a test U U – ! ! 4 (2, 2)

Dependence on unit – – – U 1 (1, 0)

Table 4 Testing methods

4.4 RQ3. Which testing methods do developers use?

4.5 RQ4. Which refactoring techniques do developers use?

Fig. 7 Testing approaches

Fig. 8 Automated testing tools

4.5.1 Identifying poor code methods

4.5.2 Refactoring methods

Survey question 13 asked respondents to indicate which well-known refactoring methods

• Breaking large methods up into smaller methods (48),

5.2 RQ2. What are the difficulties of using TDD?

5.3 RQ3. Which testing methods do developers use?

5.4 RQ4. Which refactoring techniques do developers use?

6.1 Internal validity

6.2 Construct validity

6.3 External validity

Appendix 1: Survey questions

1. For which type of organization does you currently work?

6. Please rate your programming language skills.

Languages No Learn in a class or Used on one real Used on

7. Do you know Test-Driven Development (TDD)?

8.What is your previous experience with TDD in a scientiﬁc project? (Check an

9. How have you obtained your Test-Driven Development skill?

22. Do you agree with this given deﬁnition of ‘Integration Testing’ ?

23. Do you agree with this given deﬁnition of ‘Regression Testing’ ?

Appendix 2: Automated testing tools

Tools Languages URL

Boost C?? http://www.boost.org

Aziz Nanthaamornphong earned his Ph.D. degree in Computer Sci-

Jeffrey C. Carver earned his Ph.D. degree in Computer Science from

You might also like