Program Manipulation via Interactive Transformations - PDF Room
Program Manipulation via Interactive Transformations - PDF Room
Marat Boshernitsan
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission.
Program Manipulation via Interactive Transformations
by
Marat Boshernitsan
in
Computer Science
in the
GRADUATE DIVISION
of the
UNIVERSITY OF CALIFORNIA, BERKELEY
Committee in charge:
Professor Susan L. Graham, Chair
Senior Lecturer Michael Clancy
Professor Marti A. Hearst
Spring 2006
Program Manipulation via Interactive Transformations
Copyright 2006
by
Marat Boshernitsan
1
Abstract
Software systems are evolving artifacts. Keeping up with changing requirements, designs,
and specifications requires software developers to continuously modify the existing soft-
ware code base. Many conceptually simple changes can have far-reaching effects, requiring
numerous similar edits that make the modification process tedious and error-prone.
Repetitive and menial tasks demand automation. Given a high-level description of a
change, an automated tool can apply that change throughout the source code. We have
developed a system that enables developers to automate the editing tasks associated with
source code changes through interactive creation and execution of formally-specified source-
to-source transformations. We applied a task-centered design process to develop a language
for describing program transformations and to design a user-interaction model that assists
developers in creating transformations in this language. The transformation language com-
bines textual and graphical elements and is sufficiently expressive to deal with a broad range
of code-changing tasks. The transformation environment assists developers in visualizing
and directing the transformation process. Its “by-example” interaction model provides scaf-
folding for constructing and executing transformations on a structure-based representation
of program source code. We evaluated our system with Java developers and found that they
were able to learn the language quickly and to use the environment effectively to complete
a code editing task.
By enabling developers to manipulate source code with lightweight language-based pro-
gram transformations, our system reduces the effort expended on making certain types of
large and sweeping changes. In addition to making developers more efficient, this reduction
in effort can lessen developers’ resistance to making design-improving changes, ultimately
leading to higher quality software.
i
To Isabelle.
ii
Contents
1 Introduction 1
1.1 Source Code Manipulation via Program Transformations . . . . . . . . . . . 2
1.2 Thesis and Scope of This Research . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Outline of the Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
7 Conclusion 94
7.1 Contributions of This Research . . . . . . . . . . . . . . . . . . . . . . . . . 94
7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
7.2.1 Engineering Challenges . . . . . . . . . . . . . . . . . . . . . . . . . 96
7.2.2 Open Research Issues . . . . . . . . . . . . . . . . . . . . . . . . . . 97
7.3 Final Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Bibliography 99
Acknowledgments
The work presented in this dissertation could not have been completed without continuous
help and support from Susan Graham, my research advisor. Other members of my com-
mittee also contributed in no small way. Michael Clancy suggested early on to consider
the vast body of research in the psychology of programming. Marti Hearst critiqued early
prototypes of the user interface and encouraged me to think hard about usability.
Michael Van De Vanter has been a colleague, a mentor, and a friend whose thought-
provoking insights into software development influenced much of my work. I am indebted
to my employer Agitar Software, Inc. and to its CTO Alberto Savoia for allowing me to
complete this research despite my job obligations. Bob Evans, Russ Rufer, and Tracy Bialik
all helped tremendously by reviewing drafts of my papers and by making suggestions on
how to improve my prototypes.
Andy Begel, a comrade-in-arms through most of my graduate career, helped shape my
thinking about programming languages and software engineering. Many aspects of this work
emerged through our heated and passionate arguments. Other members of the Harmonia
research group worked very hard to keep our system together, especially during the past
few years when I had to abandon core Harmonia work and concentrate on my dissertation
research. Thank you, guys.
I would like to thank my friends and family for their support and encouragement. My
parents deserve special thanks for their unwavering belief that one day my formal education
will come to a close. Finally, I would like to thank my wife Milla for her love, patience, and
understanding. The weekends are finally yours.
The research described in this dissertation has been supported in part by the NSF
Graduate Student Fellowship, Sun Microsystems Graduate Fellowship, NSF Grants CCR-
9988531 and CCR-0098314, IBM Eclipse Innovation Grant, and by equipment donations
from Sun Microsystems.
1
Chapter 1
Introduction
Software systems are evolving artifacts. From the day the first line of source code appears on
the computer screen, the entire software system undergoes constant modification. Initially,
most of the changes to a software system are due to the evolving architecture and to the
refinement of the design and of the implementation. Further in the software lifecycle,
changes are frequently caused by changing requirements, bug fixes, and the addition of new
features.
Many changes are simple and isolated. A significant proportion of changes, however,
require large and sweeping modification to source code. Making large and sweeping changes
to source code can be tedious and error-prone. A conceptually simple modification may
require a significant code editing effort. Examples of such changes abound in the many
tasks faced by developers. Consider the following:
• A maintenance update to a software system requiring that the use of one library
component is replaced with the use of another that provides equivalent functionality
through a different API. This change entails finding all of the uses of the old API and
systematically replacing them with the equivalent invocations of the new API.
• A change to a widely used function requiring that all calls to that function are enclosed
within a guard clause that checks its return value before continuing execution.
• A code cleanup effort requiring functions that return error codes to throw exceptions
with the error codes instead.
Performing any of these changes in a large piece of software may take hours of the developer’s
time and introduce bugs due to the manual nature of the change process.
Repetitive and menial tasks demand automation. Given a high-level description of a
change, an automated tool can apply that change throughout the source code. Creating
this description can be viewed as a form of metaprogramming. A metaprogram is a compu-
tational abstraction that operates on some representation of another (target) program. The
computational medium of such a metaprogram is a representation of the target program’s
2
source code. The output is a modified version of the target program’s source code with all
changes applied.
Metaprograms can be constructed using any programming paradigm, such as procedural
or functional programming. Describing systematic source code changes, however, is partic-
ularly amenable to rule-based programming. In this paradigm, the computation is described
using declarative rules that consist of patterns and actions. A pattern describes a part of
the computational data structure to which the rule applies. An action describes a trans-
formation of that data structure into a new form. For a rule-based metaprogram this data
structure is a representation of the target program’s source code.
In the context of this investigation we refer to systematic source code changes as source-
to-source program transformations. Transformations can be construed broadly. In addition
to systematically modifying existing code, transformations can also generate and insert
new code fragments based on linguistic structure or on meta-information embedded in
program source code. The tools developed in this dissertation can support these types of
transformations. In the scope of this work, however, we explore the application of program
transformations only to systematic source code editing.
sometimes challenging. The refactoring tools are lightweight, simple, and better integrated,
making them easy to use for an average programmer. The focus of our research is to bridge
that gap.
The main contribution of our work lies in making the concepts behind general-purpose
program transformation systems and their versatility accessible to developers for lightweight
source code manipulation.
The principal difficulty in using general-purpose program transformation systems stems
from these tools exposing a structural model of the target program. Transformation of the
program structure is necessary for many source code changes. But understanding a struc-
tural model is challenging, because it bears little resemblance to the programmer’s intuitive
perception of their program’s structure. Existing program transformation tools provide no
support to the developers for understanding and manipulating that model. Yet, completely
hiding this model, as is done in the automated refactoring and restructuring tools, limits the
capabilities of those tools. Thus, the developers must rely on some structural representation
of source code in order to specify transformations. We posit the following design require-
ments for any tool that strives to empower developers through program transformations:
• The source code structure that is exposed to the developers for manipulation must
correspond closely to the developers’ intuitive understanding of source code. This
means that parse trees and abstract syntax trees that are used by most language-based
tools are not a useful conceptual representation for code editing transformations.
• This tool must permit code editing by transformation “in-line” with other coding
activities. Experience suggests that many developers are hesitant to use tools that
require them to step “outside” of their usual program development environment.
1. A new program model for Java source code manipulation designed to be understand-
able and intuitive to software developers.
4. A prototype interactive program transformation tool that implements our visual trans-
formation language and our user-interaction model in the Eclipse IDE.
Chapter 4 presents the iXj transformation language. We describe a program model for
Java source code that was designed specifically to help developers understand and manipu-
late program transformations. We discuss the key concepts in the transformation language,
describe syntactic elements of patterns and actions, and explain the semantics of pattern
matching and action execution. We conclude with a case study that demonstrates the ex-
pressiveness of iXj by implementing several transformations that arise during refactoring of
object-oriented programs.
Chapter 5 describes our implementation of a source code transformation tool for Eclipse,
a popular Java development environment. We describe how users construct iXj transfor-
mations using a special-purpose structure editor and how our interaction model assists de-
velopers in learning a new language and understanding their transformations. We conclude
this chapter with a brief overview of the architecture of the iXj plug-in for Eclipse.
Chapter 6 presents a formal usability evaluation of our implementation. We describe
our evaluation strategy, experimental setup, and the evaluation metrics. We show that
the participants in our user study quickly understood the concepts underlying the design
of the iXj transformation language and were able to use our tool to complete a sample
transformation task.
Finally, in Chapter 7 we summarize the results of our investigation and reflect on the
success of our work in meeting its goals. We conclude by presenting a number of promising
directions for future exploration.
6
Chapter 2
Various proposals have been made for automating systematic modification to source code.
Few tools, however, found their way to the “programming trenches.” Existing options
range from simple text-based substitution to sophisticated language-aware manipulation of
source code with special-purpose tools. In this chapter we discuss these options and present
several representative systems from a wide spectrum of tools that can be used to automate
repetitive and systematic changes. We discuss the capabilities and the shortcomings of these
systems, and use three of these tools in a case study of describing a source code maintenance
task with program transformations.
my earl!"), clearly not what Shakespeare intended.1 Care must be taken to work around
inconsequential variations in the program text, such as the non-uniform use of whitespace
and comments that might otherwise hide transformation sites.
Lexically-aware text processing tools, such as LSME [53], bring some language-awareness
to text processing. These tools incorporate a lexer that tokenizes the text stream before
processing. Tokenization allows patterns on lexical tokens, rather than on individual char-
acters. It also helps to eliminate some of the linguistic mismatches, such as matching the
same pattern in a variable name and within a string constant. But this form of pattern
recognition is still limited to regular languages, whereas most programming languages ex-
hibit (mostly) context-free structure. For example, regular matching precludes the ability to
match an arithmetic expression at a particular nesting depth in the expression structure—a
pattern that might arise in the process of simplifying arithmetic expressions.
Text-processing tools use a simple format for describing transformations. The descrip-
tion consists of two parts: the specification of what to replace (the target) and the replace-
ment text. The target may be specified in a variety of ways, ranging from a simple word
sequence to a complex regular expression pattern. Typically, the user provides the specifi-
cation in a text file or on the command line. This specification mechanism is far from ideal.
Research has shown that many users have difficulties creating and understanding regular
patterns, once their complexity extends beyond the trivial [4]. This problem is partially
alleviated in the “by-example” editing systems that have the ability to learn and generalize
editing actions from one or more examples. Blackwell’s SWYN [4] and Miller’s LAPIS [52]
are the two most recently developed systems that fall into this category.
This section explores the capabilities and the limitations of language-aware tools for
source code manipulation and provides specific examples that fall at different points along
the language tools continuum. The tools that we discuss here are merely representative
examples of program transformation systems; a more comprehensive and up-to-date list is
being maintained at http://www.program-transformation.org.
(MethodCall
(ObjectAccess
(Expression:$expr ["factorial"])
("(")
(ArgumentList:$param)
(")")))
The actions are specified in an AWK-like statement language (A*) or in the C programming
language (TAWK). While these action languages are not specifically directed at source
code manipulation, TAWK includes a C language library that offers a number of tree-
manipulation primitives.
As source-to-source transformation tools, both TAWK and A* offer some advantages
over their text-based brethren. For instance, the ability to express complex, structure-
based patterns is useful for describing many changes. But this flexibility comes at a price.
Creating patterns in a tree-centered representation requires a good understanding of syntax-
tree data structures. Unlike the pattern languages, the action languages offer very little in
the way of expressiveness. For instance, the most commonly used operation in the A* action
language is a print-statement.
A different class of tools traces its roots to the algebraic data types and pattern-matching
facilities found in languages such as Haskell and ML. Stratego/XT [66] is a relatively recent
representative example; many others were developed over the years. These tools are similar
to A* and TAWK in their tree-based modeling of program source code. These tools offer
better primitives for expressing transforming actions. Yet, reliance on a tree-centered repre-
sentation puts the skill required to use these tools beyond that of an ordinary programmer.
Many tools make an attempt to hide the details of tree construction. While internally
these tools still operate on tree-like structures, their pattern-matching facilities are often
9
designed to eliminate some of the complexity associated with manipulating trees directly.
Often, the patterns can be specified using an extended syntax of the underlying program-
ming language, which improves their readability and maintainability. In addition to the
purely syntactic information, some tools provide access to the static semantics of the source
programs. Representative examples in this category include REFINE [12], TXL [23], and
DMS [3]. We have used TXL in our case study of performing a code maintenance task with
existing tools (see Section 2.3).
Use of these general-purpose transformation tools requires both a good understanding
of the programming language syntactic structure and familiarity with a complex transfor-
mation language. JaTS [15] is a simpler tool, providing pattern-matching facilities that
can interpret source-code-like patterns. Its pattern language, however, is limited by a small
number of pattern variables for matching linguistic structures. Some tools, such as In-
ject/J [32], abandon a syntax-derived representation in favor of a high-level model centered
around concepts in the programming language. This approach makes it difficult to specify
transformations at the expression and statement level.
// Reads next token from the input and returns its type
public int nextToken();
}
(a) Excerpt from the interface of java.io.StreamTokenizer. Each subsequent token is accessed via the
nextToken() method. The value of the last token read from the input is stored in the sval field and
(possibly) in the nval field (only if the last token can be interpreted as a numeric value). The type of the
token (TT NUMBER or TT WORD) is stored in the ttype field.
Before After
java.io.StreamTokenizer s; java.util.Scanner s;
x = (int) s.nval x = s.nextInt()
x = s.nval x = s.nextDouble()
x = s.sval x = s.next()
if (s.ttype == TT_NUMBER) ... if (s.hasNextInt() ||
if (s.nextToken() == TT_NUMBER) ... s.hasNextDouble()) ...
if (s.hasNext() &&
if (s.ttype == TT_WORD) ...
!s.hasNextInt() &&
if (s.nextToken() == TT_WORD) ...
!s.hasNextDouble()) ...
s.nextToken() // no longer needed
Figure 2.2: Examples of transformations needed in source code to convert the uses of java-
.io.StreamTokenizer to the uses of java.util.Scanner.
a new, more capable class that implements an input scanner based on pattern matching
(java.util.Scanner). Each of these classes takes a different approaches to input tokeniza-
tion and their interfaces are not completely amenable to automatic translation. In many
situations, however, it is possible to translate most uses of java.io.StreamTokenizer to
the equivalent uses of the java.util.Scanner. Figure 2.1 summarizes the parts of both
interfaces that are relevant to this case study. Figure 2.2 presents several transformations
needed to transition from java.io.StreamTokenizer to java.util.Scanner.3
2.3.1 SED
SED, a Stream EDitor, transforms its input stream and produces an output stream. The
traditional way to use SED for source code transformation is to employ its ‘s’ command
that globally replaces each instance of text matching a regular expression. The substitution
string may be formed by referencing parts of the matching text with ‘\1’...‘\9’ metavariables
that refer to the nth group of characters in the pattern surrounded by \( and \). The
general format of the ‘s’ command is s/<pattern>/<substitution>/g (‘g’ indicates that
a substitution is to be performed globally, rather than on the first matching instance). SED
scripts may be stored in text files, which facilitates reuse of the transformations, provided
they are general enough to apply in a broader context.
The transformations listed in the Figure 2.2 may be implemented with the following
SED script:
s/java.io.StreamTokenizer/java.util.Scanner/g
s/(int)\([_a-zA-Z][_a-zA-Z0-9]+\)\.nval/\1.nextInt()/g
s/\([_a-zA-Z][_a-zA-Z0-9]+\)\.nval/\1.nextDouble()/g
3
We are ignoring the changes required in the setup and the instantiation of each component. Those parts
are typically handled manually, because adaptation of the java.io.StreamTokenizer setup code requires
some creative thinking to match the semantics of java.util.Scanner.
13
s/\([_a-zA-Z][_a-zA-Z0-9]+\)\.ttype==TT_NUMBER
/\1.hasNextInt()||\1.hasNextDouble()/g
s/\([_a-zA-Z][_a-zA-Z0-9]+\)\.nextToken()==TT_NUMBER
/\1.hasNextInt()||\1.hasNextDouble()/g
s/\([_a-zA-Z][_a-zA-Z0-9]+\)\.ttype==TT_WORD/\1.hasNext()/g
s/\([_a-zA-Z][_a-zA-Z0-9]+\)\.nextToken()==TT_WORD/\1.hasNext/g
s/[_a-zA-Z][_a-zA-Z0-9]+\.nextToken()//g
In addition to being virtually unreadable,4 this script is overly simplistic and does not
address the following situations that might arise in the source code:
• Whitespace and comments in the input. No provisions are made for syntacti-
cally insignificant material that might appear at the match location. For example,
s.nval gets properly transformed, whereas s . nval does not. Similarly, comments
appearing in the match string (and especially multiline /* */ comments, not easily
described by regular expressions) are not handled by this script.
• Expressions in place of identifiers. This script matches only a single identifier that
represents a java.io.StreamTokenizer instance, not allowing arbitrary expressions
in its place. This, for example, prevents getTokenizer().nval from being properly
replaced.
Some of these limitations may be addressed by extending this SED script to match more
complicated inputs. Yet, SED’s inability to perform any syntactic processing, such as
matching arbitrary Java expressions,5 remains the major limiting factor in using SED-like
tools for program transformation.
2.3.2 TXL
TXL was originally conceived as a language for creating extensions for the Turing pro-
gramming language (in fact, TXL stands for Turing eXtender Language). Over time, TXL
evolved into a general-purpose language for program transformation. TXL is designed
around the paradigm of rule-based structural transformations, and supports unification,
implied iteration, and deep pattern matching.
The input to the TXL processor consists of the parsing and unparsing grammars, the
transformation specification (TXL script), and the source text. The TXL processor trans-
forms input text into output text according to a TXL program. Processing of input text
begins with the parsing phase that constructs the structural representation (parse tree) for
4
Regular expression have been said to resemble “line-noise” and from this example it should become
obvious that there’s a good reason for that observation.
5
Matching nested expressions is outside of the expressive capabilities of regular-pattern matchers.
14
the input. The subsequent transformation phase rewrites that parse tree according to the
transformation rules described in the TXL program. The final processing phase “unparses”
the transformed tree into a textual representation. The parsing and unparsing grammars
need not represent the same language; cross-language translation is achieved by combining
two different grammars within a single description. Because TXL scripts are stored in text
files, they can be reused across different target programs, when applicable.
As a rule-based language, TXL permits very expressive specification of transformations.
Because a set of transformations is specified in conjunction with a parsing grammar, that
grammar needs to express only enough structure necessary for those transformations [25].
This permits manipulation of source text that is incomplete or incorrect with respect to
the full grammar for a source language. As long as enough structure is recovered, the
transformation engine can apply the rules and produce the desired result.6 The patterns
that constitute the transformation rules are specified textually in a form similar to the
source language. This leads to a more natural specification of the transformation rules than
those permitted by the tree-based systems such as TAWK and A*.
The TXL program7 to perform the transformations listed in Table 2.2 begins as follows:
include "Java.Grm"
include "JavaCommentOverrides.Grm"
function main
replace [program]
P [program]
by
P [transformTokenizerToScanner]
[transformNextInt]
[transformNextDouble]
[transformNext]
[transformNumberTest]
[transformWordTest]
[removeNextTokenStatement]
end function
The include command incorporates the rules for parsing and unparsing Java source code
into the current TXL program. The main function identifies the transformations that should
be applied to the entire program. The replace clause applies to a particular structure, in
this case, the program non-terminal. The job of the replace clause is to break down the
subtree designated by the non-terminal, according to the pattern specified in the replace-
ment rule. The pattern P [program] indicates that it matches the entire program and that
the result of the match should be assigned to the pattern variable P. The second part of
6
The TXL distribution includes full language parsing grammars for many popular programming lan-
guages. These grammars, however, do not permit any inconsistencies in the input.
7
This case study was performed using the release 10.4a of the TXL language and tools. TXL can be
downloaded from http://www.txl.ca/
15
the replace clause (following the by keyword) constructs the replacement for the matching
non-terminal. This replacement is created by applying several rules to the pattern variable
P in order. In effect, each rule rewrites P, yielding a modified program as a result. As there
are seven transformation rules, this transformation requires seven passes over the program
to complete. Following is the definition of the first of these rules:
rule transformTokenizerToScanner
replace [qualified_name]
java.io.StreamTokenizer
by
java.util.Scanner
end rule
This rule performs the first transformation from Figure 2.2. The pattern and the replace-
ment are specified as fragments of source code that are parsed according the Java language
grammar. The transformation writer needs to specify which non-terminal from the gram-
mar is represented by the pattern and by the replacement (in this case, qualified name).
This is a simple transformation with no variables in the pattern. Let us consider a more
complicated set of rules for performing the second transformation from Figure 2.2 from
(int) s.nval to s.nextInt():
rule transformNextInt
replace [expression]
(int) E [id] C [repeat component]
by
E C [transformNextIntInComponent]
end rule
rule transformNextIntInComponent
replace [repeat component]
.nval
by
.nextInt()
end rule
This transformation is somewhat complicated by the shape of the parse tree generated by
the Java grammar. This grammar represents qualified field access as an identifier terminal
followed by a sequence of “components,” terminals preceded by a . (dot) that separates
them from one another. This representation yields the following parse tree for qualified field
access (using brackets [ and ] to represent nesting):
This means that prior to matching the .nval component of the qualified fields access, the
parse tree for field access needs to be decomposed into the initial identifier that refers to
16
an instance of java.io.StreamTokenizer and the rest of the qualified name. This is the
purpose of the transformNextIntInComponent rule.
Due to the peculiarities of the TXL Java grammar, the rules above will not transform a
qualified field access in which the java.io.StreamTokenizer instance is represented as a
parenthesized Java expression. For example, the statement ((StreamTokenizer)s).nval
cannot be matched by the pattern of the transformNextInt rule. This limitation of the
TXL Java grammar can be addressed by an additional rule in the transformation description:
rule transformNextInt2
replace [expression]
(int) (E [expression]) C [repeat component]
by
(E) C [transformNextIntInComponent]
end rule
For the sake of brevity we will not present the remaining TXL transformation rules in this
case study. In Appendix A we include the full listing of the TXL program implementing
the transformations of Figure 2.2. As a source-to-source transformation tool, TXL exhibits
the following limitations:
• Disregard for the documentary structure of source code. The final stage of
TXL processing involves unparsing the tree structure into text. While the comments
17
are preserved by TXL as part of that structure, whitespace is not. This affects the vi-
sual layout of the source code, making TXL unacceptable as tool for describing source
code changes. Van De Vanter [64] emphasizes the significance of the documentary
structure and reports that the commercial version of TXL, distributed by the Legasys
Corporation, implements a special strategy that incrementally updates the source text
to avoid disrupting that structure.
TXL is not well-suited for lightweight transformations. Despite the high-level rule language,
the users of TXL are still required to reason about grammars and trees at a level that is
beyond what most developers understand. Using the provided language grammars requires
understanding of the structure at a finer level of detail than necessary for most transforma-
tions. Moreover, TXL is purely structure-based; semantic attribution is not incorporated
into the parsing grammars and needs to be specified separately (or computed as part of
the transformation). The transformations, especially those involving complicated patterns,
are difficult to create. In fact, the TXL tutorial suggests a workflow, whereby a generic
transformation is created by iteratively generalizing a rule that applies to one specific in-
stance of the transformation in the source code. Debugging TXL scripts is challenging as
no debugger is supplied with the TXL processor. Recently, however, Shimozawa and Cordy
demonstrated some advances in this area in their Transformation Engineering Toolkit for
Eclipse (TETE) [59].
Encapsulate field: Make a public field private and provide accessors (Fowler [31],
p. 206).
We apply the Encapsulate Field refactoring to the nval and sval fields of the java.io-
.StreamTokenizer class to wrap them in accessor methods. This enables a subsequent
redirection of accesses to these fields to the appropriate methods in java.util.Scanner.
This refactoring changes code as follows:
double x = s.getNval();
... ...
double x = s.nval; ⇒ double getNval() {
... return nval;
}
18
We do not encapsulate access to ttype. This is fully intentional: this field is typically used
for testing against one of the pre-defined type constants (TT WORD or TT NUMBER) and rather
than hiding just the access to that field, we want to encapsulate the entire test. This is
achieved in two subsequent steps:
Extract Method: Turn a code fragment that can be grouped together into a
method whose name explains the purpose of the method (Fowler [31], p. 110).
Move Method: Move a method into the class it uses most (Fowler [31], p. 142).
The Extract Method refactoring is applied to an expression that is to be isolated into a sep-
arate method, in this case s.ttype==TT NUMBER and s.ttype==TT WORD. Most automated
refactoring tools (including Eclipse) apply this refactoring to other instances of the same
expression occurring elsewhere in the source code.
if (isNum(s)) ...
... ...
if (s.ttype == TT_NUMBER) ... ⇒ boolean isNum(StreamTokenizer s) {
... return s.ttype == TT_NUMBER;
}
Extract Method leaves the extracted method in the class where the body of the method was
located prior to extraction. We apply Move Method to relocate the extracted method into
the java.io.StreamTokenizer class. Following similar steps, we encapsulate testing for
the TT WORD token type in the isWord() method.
At this point, we have isolated all uses of the low-level java.io.StreamTokenizer API
into high-level methods that implement concepts common to both java.io.StreamToken-
izer and java.util.Scanner. The next step involves rewriting the implementation of
these methods inside java.io.StreamTokenizer to use java.util.Scanner. This can be
done as follows:
Inline Method: Put the method’s body into the body of its callers and remove
the method (Fowler [31], p. 117).
We apply Inline Method to each of the methods that we modified in steps 2 through 6.
This refactoring removes these methods from java.io.StreamTokenizer and inlines their
implementation at every call site. At this point, the definition of the java.io.Stream-
Tokenizer class contains no useful public methods and no useful fields. This permits us to
apply one final refactoring to remove this (now) useless class:
Collapse Hierarchy: Merge a subclass with superclass when they are not very
different (Fowler [31], p. 344).
– In the presented sequence we assumed that we can modify the source code for
java.io.StreamTokenizer. While in practice this is almost never the case,
Eclipse allows us to create a copy of java.io.StreamTokenizer in the current
project that overrides the library version. This enables us to complete the pre-
scribed steps.
– This sequence relies on being able to “re-parent” java.io.StreamTokenizer to
inherit from java.util.Scanner. This is only possible because (a) java.io-
.StreamTokenizer has no declared superclass and (b) java.util.Scanner is
20
Notwithstanding the fact that the sequence of refactoring steps is fragile with respect to
the target program (which limits its reuse), most automated refactoring tools do not permit
creation of reusable transformation specifications. Recent versions of the Eclipse IDE can
preserve individual refactoring transformations in an off-line form as refactoring scripts.
These scripts can be packaged together with code that was subject to refactoring, so that
other code that relies on it can be updated accordingly.
2.4 Conclusion
This chapter presents several systems available to developers for automating application of
systematic changes to source code. The work presented here is closely related to the notion
of interactive program transformation that we present in the subsequent chapter. We also
set up the context for broader discussion of transformations in the rest of this dissertation.
8
In fact, java.util.Scanner is marked final, but we can circumvent that restriction by copying it into
the user’s project, as we did with java.io.StreamTokenizer, and editing the source code to remove the
keyword final.
9
Because the equality operator evaluates left-hand side before the right-hand side, replacing the latter
test with a call to a method extracted from the former test will change the evaluation order.
21
Chapter 3
A solution to this problem is to create a transformation system that scaffolds both the
construction of a transformation and the developers’ understanding of its effects on source
code. Thus, the transformation process becomes inherently interactive. The transformation
tool should “hold the developer’s hand” from the moment a transformation is conceived, to
the moment he or she returns to coding. Design of such a system must take into account
the limitations of the human cognitive apparatus.
To design iXj we adapted a user interface design technique called task-centered de-
sign [47]. Task-centered design focuses on the tasks that the users are expected to perform
with the system. The task-centered approach prescribes a sequence of steps necessary to
complete the design and the implementation. The rest of this chapter is loosely organized
along these steps. Each section begins with a brief summary adapted from Chapter 1 of
Lewis and Rieman [47].
• The developers use high-level linguistic notions, such as macros, variables, methods,
functions, and loops. This means that we can safely expose these notions in the
transformation language and expect developers to understand what they mean.
24
“Rename a few external variables to make their first six letters unique.”
“Put p:= link(p) into the loop of show_token_list, so that it doesn’t loop forever.”
“Change BI_* macros to BYTE_* for increased clarity; similarly for bi_* local vars.”
“Rename macros with XSTRING_* to string_* except for those that reference actual fields
in the Lisp_String object, following conventions used elsewhere.”
(b) Excerpts from XEmacs ChangeLog [73]
RCM: (Thinking out loud) “This doesn’t compile because we haven’t written the Throw
class.”
RSK: “Talk to me, Bob. The test is passing an integer, and the method expects a Throw
object. You can’t have it both ways. Before we go down the Throw path again, can you
describe its behavior?”
RCM: “Wow! I didn’t even notice that I had written f.add(5). I should have written
f.add(new Throw(5)), but that’s ugly as hell. What I really want to write is f.add(5).”
RSK: “Ugly or not, let’s leave aesthetics out of it for the time being. Can you describe
any behavior of a Throw object—binary response, Bob?”
RCM: “101101011010100101. I don’t know if there is any behavior in Throw; I’m beginning
to think a Throw is just an int. However, we don’t need to consider that yet, since we can
write Frame.add to take an int.”
(c) A transcript of a dialog between two programmers engaged in a pair-programming session [50]
Figure 3.1: Examples of developers describing changes to source code. The examples were
used for task and user analysis prior to designing iXj.
25
• The developers use patterns to describe classes of similar changes, for example BI_*
and bi* in the XEmacs ChangeLog. This means that a pattern-based transformation
language will naturally fit with their perception of a systematic change.
• To describe a location in the source code, the developers use both the language con-
cepts (“in class Employee, method getName”) and the code fragments in Java (“replace
get(x) with ...”), switching between these two levels as they feel is appropriate. Con-
sequently, the transformation language should support both of these mechanisms for
talking about code.
• The terminology used by developers was specific to the Java programming language.
As a result of this observation, the iXj transformation language is tightly coupled
with Java. (We expect, however, that our design methodology can be applied to
other programming languages.)
corresponding enclosing method. For example, the developers might change the code as
follows:
public void main() { public void main() {
... ...
out.println("compute"); ⇒ out.println("main(): compute");
... ...
} }
This transformation operates on the lexical (sub-token) structure of the program. In order
to enable this transformation, the tool must support operations that can split and merge
existing tokens. In this case, such a token is the string literal in the print-statement.
1
Douglas Adams. The Hitchhiker’s Guide to the Galaxy.
27
This transformation must use static type information to ensure that the static type of s
is java.io.StreamTokenizer. Otherwise, it may inadvertently change access to the nval
field of another class.
need substantial support in constructing transformations, both because they are exposed
to the structural information that they do not normally perceive, and because they are
working with a new language. Since developers are creating transformations on existing
source code, we decided to use a “by-example” approach to developing a transformation.
Programming by example (also known as “programming by demonstration”) is an old
and recurring theme in computer science (see Lieberman [48] for a summary). We adapted
this mode of interaction to transformation development. In this mode of interaction develop-
ers starts by selecting a single source code fragment that they want to change. The trans-
formation editor provides assistance by creating an identity transformation that changes
nothing. The developers use that transformation as a starting point for adding wildcards,
making it more general. The transformation editor assists with this process by offering
context sensitive help and showing the effects of the transformation after each modification
step.
intended to serve as discussion tools, rather than to provide deeply analytic and quantifiable
measures. The dimensions are not independent. For instance, an improvement to the con-
sistency of an interface may lead to an increase in error-proneness. Thus, a design process
guided by the CDs framework necessarily entails a series of tradeoffs. The cognitive dimen-
sions provide a common vocabulary for the users and the designers of the system. Together,
the dimensions determine a cognitive profile of the system. This profile does not represent
the quality of the system; different activities and different systems require different profiles.
The following summary of the twelve (from the total of fourteen) cognitive dimensions
that are relevant to our work is adapted from the cognitive dimensions tutorial created by
Blackwell and Green [5]. Where appropriate and necessary, we indicate how we want our
design to fare along a particular cognitive dimension.
Visibility and Juxtaposability. Visibility refers to the ability to view, scan, or skim
components of a notation. Juxtaposability indicates the ability to place components next to
one another for easy examination. Developing transformations with iXj involves operating
on complex source code structures. This makes high visibility and juxtaposability essential
to our design: the description of a transformation must be easily interpretable by the user.
Viscosity. A viscous system is one that is hard to modify. Blackwell and Green distin-
guish two types of viscosity: repetition viscosity, when a “single goal-related operation on
the information structure requires an undue number of individual actions” and knock-on
viscosity, when “one change ‘in the head’ entails further actions to restore consistency.”
Viscosity of a system can be quantified as the cost of making small interrelated changes
(resistance to change). We consider low viscosity to be one of the most important require-
ments in our design. By-example construction of transformations necessitates many small
incremental changes. These changes must be easy for the user to make.
must expose some of the program structure that the users typically perceive only intuitively.
The closer their intrinsic representation corresponds to that exposed by the tool, the easier
it is for the user to understand a description of the transformation.
Progressive Evaluation. The ability to check one’s work in progress prior to completing
a task is referred to as progressive evaluation. A canonical example of a system permitting
progressive evaluation is a spreadsheet: the values in cells are recomputed each time the
spreadsheet is modified. iXj must provide feedback to the users as the transformation is
developed. When making changes to a transformation, the developers must see how that
transformation affects source code. The immediate feedback can make the execution of
transformations transparent to the developers and can assist in their learning the transfor-
mation language.
Error-proneness. The degree to which a system invites the users to make mistakes is
called error-proneness. For example, a programming language not requiring variables to be
declared and defined makes it impossible to detect when a variable name is mistyped.
Hard Mental Operations. When a system puts high demand on the user’s cognitive
resources it fares badly on the dimension of hard mental operations. Having to remember
how to use an API is a typical hard mental operation faced by software developers.
transformations, they often do not know how to proceed. The ability to experiment and to
retract, if necessary, is essential to achieve user acceptance.
Figure 3.2 presents several of the key slides from the mockup. In this mockup we
experimented with an idea of “by-example” construction of a transformation. At that
point, we had not yet formally defined the language for describing patterns and actions.
Our first design of the transformation pattern language was inspired by SQL, a database
query language. We borrowed the notion of a “selection,” described by a select-statement
that selects statements for transformation. Classes, packages, and methods are selected
using name-based patterns. Statements and expressions are selected using code patterns.
This mockup was well-received by the audiences. It became evident that the “by-
example” approach to constructing a transformation was easy to follow. The audience
members also found that the ability to interactively manipulate transformation descriptions
simplified learning of the transformation language.
As the next step in our design, we formalized the transformation language by creating
more complete specifications for the representative tasks described in Section 3.3. This
activity uncovered several problems with the SQL-inspired approach. Consider the following
complete description of the transformation for converting L.removeElement(x) on java-
.util.Vector to L.remove(L.indexOf(x)) on java.util.ArrayList:
collection removeCalls =
select from project MyProject
package *
class *
method *
code <seq: java.util.Vector expr>.removeElement(<i: expr>)
Editor (Example1.java)
package example;
class Example1 {
void main(String[] a) {
Vector v = processArgs(a);
v.removeElement(a[0]);
}
}
Editor (Example2.java)
package example;
Cut
class Example2 { Copy
Vector r; Paste
void removeListener(Lsnr x) {
Transform…
r.removeElement(x);
}
}
(a) The developer starts with sample code that they want to transform
by selecting a code fragment and invoking “Transform...” from a context
menu
Selected Packages Selected Types Selected Members
example Example2 removeListener(Listener)
+ do Accept
Figure 3.2: First mockup of the transformation tool (continues on next page).
34
+ do Accept
(c) The developer begins by editing the action to indicate how the trans-
formation tool should transform the text matched by a pattern.
+ do Accept
Figure 3.2: First mockup of the transformation tool (continues on next page).
35
do replace
with r.remove(r.indexOf(x))
+ do Accept
(f) The developer makes the action more general by introducing refer-
ences to the pattern variables
Figure 3.2: First mockup of the transformation tool (continues on next page).
36
Editor (Example1.java)
package example;
class Example1 {
void main(String[] a) {
Vector v = processArgs(a);
v.remove(indexOf(a[0]));
}
}
Editor (Example2.java)
package example;
class Example2 {
Vector r;
void removeListener(Lsnr x) {
r.remove(indexOf(x));
}
}
(a) Context-sensitive assistance in the transformation editor. As the developer moves the cursor around
the transformation pattern, a context-sensitive transformation assistant (right) shows available opera-
tions. The developer can modify the package pattern directly, or through an input field in the assistant’s
panel. The list of packages shows all known packages in the system; those matching the pattern are
highlighted.
Figure 3.3: Second mockup of the transformation tool (continues on next page).
39
(b) When the developer proceeds to modify the pattern for the method pattern, the transformation
assistant displays all possible options for changing that pattern. As the developer selects different
options, the transformation pattern is updated to reflect the currently selected set. This mechanism
assists the developer in learning the transformation language.
Figure 3.3: Second mockup of the transformation tool (continues on next page).
40
Assign to metavariable
seq ue nc e
(c) When the developer works with code, the transformation assistant displays different ways in which
a selected code fragment can be replaced by a wildcard.
function
Concept-based names assist users
with identifying pattern structure
declaration
Wildcards in patterns represent “any” match
*
0..* 0..* = zero or more; 0..1 = zero or one
Figure 3.4: This figure shows how the boxes that surround pattern structure annotate the
pattern without disturbing its source-code-like structure.
names ending with “cs”, such as cs and eecs. Syntactic (structural) patterns are bracketed
by [< and >], as in [<Expression>]. Patterns with * indicate iteration. For example,
[<Statement>*] matches a sequence of Java statements.
It quickly became clear that the addition of syntactic escapes makes the transformation
pattern difficult to read, defeating the reason for basing the pattern language on the Java
syntax. This led us to abandon a text-based notation altogether. Instead, we decided to
augment Java source code with graphical boxes that demarcate structure. These boxes
enable us to add annotations that are visually independent of the source code text, thereby
substantially improving readability of the transformation patterns.
Figure 3.4 summarizes some of the visual elements used in this format; Figure 3.5 shows
two example patterns. These examples were presented at the OOPSLA 2004 Poster Ses-
sion [11]. This format was also evaluated with the members of the Harmonia research group.
The key idea behind the graphical notation is to arrange information visually in such a way
that the structure of the original source code fragment is undisturbed. The information
about the pattern, such as names of the pattern elements and multiplicity of the wildcards,
is separated from the program text that the pattern is intended to match. This design
improves the visibility of the pattern notation and improves closeness of mapping. The
developer no longer needs to name pattern elements explicitly—each pattern box is titled
with a concept name dictated by its syntactic context. This eliminates hidden dependency
between names and reduces the viscosity. The diffuseness of the format is high due to the
need for structural boxes. We solved this problem by designing a user-interaction model
that hides most of the structure when the developers do not require it. We based the iXj
transformation language on this format.
We discuss the iXj transformation language in more detail in Chapter 4. We present a
user-interaction model for creating and manipulating transformations in Chapter 5.
function
void * ( ){
statements
statement
*
0..*
method call
arg 0
statement
*
0..*
(a) This pattern matches any function that contains calls to de-
bugging print-statements. This pattern can be used to transform
the argument to the print() method to include the name of the
enclosing function. The name can be accessed in the transforma-
tion action by referring to the contents of the pattern box titled
with ‘name’.
class
name interfaces
interface 0
body declarations
field, method, interface, or class
*
0..*
field
type name
* * ;
*
0..*
tion phase, the implementation strategy needs to anticipate both minor and major design
changes. We implemented an iXj-based transformation tool as a plug-in for Eclipse. This
allowed us to rely on many existing features of the Eclipse platform, limiting the implemen-
tation effort to the core technologies required for transformation of program structure. We
present our implementation in detail in Chapter 5.
Tracking and improving the design requires an evaluation of the implementation with
users. We evaluated iXj with several professional Java developers. Chapter 6 presents the
results of the evaluation. The evaluation revealed several areas where the design can be
improved. We discuss some of these improvements in Chapter 7. These improvements,
however, are beyond the scope of this dissertation.
Chapter 4
IfThenElseStmt IfThenElseStmt
(a) (b)
Figure 4.1: Example of an abstract syntax tree (a) and of an abstract semantic graph (b).
code elements. Each tree node (entity) represents a source code element, such as a class,
a method, a statement, or an expression. The edges (relationships) between nodes reflect
their syntactic nesting. A syntax tree model corresponds to the structure of source code
represented in program text. This representation is often used in early processing stages of
compilers and other language-based tools. We show an example of an abstract syntax tree
in Figure 4.1a.
The second approach reflects the semantic composition of program elements. This model
often takes the shape of a graph data structure, where each node represents a program
element and the edges represent the semantic relationships between these elements. Those
edges that represent the containment relationship coincide with the edges in a tree-based
model. Other edges represent relationships such as class inheritance or name scoping.
This representation, sometimes called an abstract semantic graph (ASG), is used in the
later stages of language-based tools, following a static semantic analysis of source code.
Figure 4.1b presents an example of an abstract semantic graph.
Designers of language-based tools normally use a program model that is conducive to
the analyses and the manipulations that are performed by a tool. In contrast, we have
designed a new tree-based model that is appropriate for presentation to a human developer.
then IfEsleStatement
else condition
Figure 4.2: iXj program model for the if-statement entity presented in UML [7]. IfElse-
Statement is a class representing the if-statement. It extends an abstract Statement class
and is linked to the components of an if-statement via three relationships (associations in
UML terminology)—condition, then, and else.
tity. Our model makes a single exception to the syntactic nesting rule for entities that
represent Java names, such as class, method, and variable names. Every Java name is
treated as if it were uniquely qualified from the top level of the Java namespace. A
name for a method, for example, always includes its containing class and package, as
in java.util.Vector.toString(). A name for a local variable includes its containing
method’s name together with that method’s argument type signature, as in java.util-
.Vector.add(Object).obj. This approach to names represents a point of departure from
a purely syntax-oriented program model. We explain the significance of this scheme when
we present the iXj pattern language.
if (s.equals("one")) { IfElseIfStmt
f(1);
} else if (s.equals("two")) { ElseIfBranch ElseIfBranch Else
f(2);
} else { Test Then Test Then
f(42)
f(42);
}
s.equals(”one”) f(1) s.equals(”two”) f(2)
(a) (b)
Figure 4.3: iXj model representation (b) for the “if-else-if” construct (a). The syntactic
nesting is “flattened” and appears to the developer similarly to a switch-statement with
branches.
4.1.3 Summary
The iXj program model is reflected in the design of the pattern language for transforma-
tions. No a priori knowledge of the program model is expected of a programmer. The
understanding of our model is scaffolded by the transformation editor, making learning
the new model relatively transparent. We discuss this scaffolding as part of the iXj user
interaction model in Chapter 5.
(a)
(b)
Figure 4.4: Two basic iXj patterns: solid lines indicate fixed-length structures (a); dashed
lines represent variable-length sequence structures (b).
(a)
(b)
(c)
Figure 4.5: Patterns with wildcards (a), optional (b), and explicitly prohibited (c) elements.
50
(a)
(b)
Figure 4.6: Patterns with type-constrained wildcards. The expression wildcard (a) con-
strains the type of the matching expression to be an instance of java.io.Serializable.
The type reference wildcard (b) constrains the matching type reference to be a subtype of
java.io.Serializable.
that it does not matter whether an else-clause exists. The developers who specifically want
to prevent matching if-statements with an else-clause, could use the pattern in Figure 4.5c.
Figure 4.7: A pattern with exposed name scoping information. This pattern illustrates
how type names (Player), method names (get), and local variable names (player) are
represented in the pattern language. Our pattern editor hides this structure, unless a
developer specifically wants to use scoping rules in a transformation.
developer. The pattern editor hides this structure, unless the developer specifically wants
to write transformations that require the scoping rules.
Numeric Patterns
Numeric literals are matched using simple patterns that express numeric relationships.
When necessary, the developers can use familiar logical operators. For example: 42 or <42
or >=42 && <=54. Numeric patterns that begin with a logical operator compare the value
being matched against the numeric literal in the pattern using that operator. Numeric
patterns are constructed according to the following grammar:
52
NumericPattern → Number
| != Number
| < Number
| > Number
| <= Number
| >= Number
| ( NumericPattern )
| ! NumericPattern
| NumericPattern && NumericPattern
| NumericPattern || NumericPattern
The precedence of operators in the numeric patterns corresponds to their precedence in the
Java programming language.
Modifier Patterns
Java modifiers are treated as boolean annotations that can be tested for their presence or
absence. Modifier patterns can be combined with conjunctions and disjunctions. For ex-
ample: public || protected or public && static && !final. The following grammar
formally defines construction rules for the modifier patterns:
Modifier → one of Java type, field, method, or variable modifiers
ModifierPattern → Modifier
| ( ModifierPattern )
| ! ModifierPattern
| ModifierPattern ModifierPattern
| ModifierPattern && ModifierPattern
| ModifierPattern || ModifierPattern
The rule “ModifierPattern ModifierPattern” is equivalent to “ModifierPattern
&& ModifierPattern”, but permits a more familiar specification of modifier conjunction,
such as public static final.
be combined with conjunctions and disjunctions, just as other patterns. For example:
subtypeof Operator && subtypeof java.lang.Cloneable.
QualifiedTypeName → fully qualified Java name for the type
TypePattern → QualifiedTypeName
| subtypeof QualifiedTypeName
| supertypeof QualifiedTypeName
| ( TypePattern )
| ! TypePattern
| TypePattern && TypePattern
| TypePattern || TypePattern
The subtypeof operator ensures that the type being matched is the same as the type
specified or a subtype of it. The supertypeof operator is a reverse of subtypeof. This
operator checks that the type being matched is the same as the given qualified type name
or one of its base types.
(a) (b)
Figure 4.8: Two complete transformations that remove a boolean negation from an if-
statement and reverse its branches. This example illustrates how an action can be specified
either at the top level pattern element (a) or at one of the sub-patterns (b).
(a) (b)
(a) (b)
whose name starts with pack. The suffix of the matching method’s name is stored in a
pattern variable by the \( ... \) capturing group, which is referenced in the transforming
action as \1. This transformation can be generalized to rename all methods whose name
starts with pack by wildcarding the package, the type, and the signature in the pattern.
After this refactoring takes place every call to lastReading() contains the leftover casts
to Reading. The developers can use iXj to clean up these casts. The transformation
in Figure 4.10a implements this change by matching all casts to Reading whose casted
expression’s type is Reading. This transformation replaces the entire cast with the casted
expression, thereby dropping the unnecessary type. A more general transformation shown
in Figure 4.10b can be used to remove all unnecessary casts for all possible types.
56
(a)
Figure 4.11: Example of the preserve whole object refactoring transformations in iXj.
By applying this transformation, the developer can use other features of the range object
in the withinRange() method. Unfortunately, when this method is called in many places
in the program, modifying every call site can be time consuming. The transformation for
implementing this change is shown in Figure 4.11. This transformation looks for all calls
to the withinRange() method and modifies its argument list to contain just the expression
that returns an instance of TempRange. The body of the withinRange() method needs to
be modified manually to account for its new argument signature. In a typical development
environment, the need for this change would be indicated by a compilation error.
location at which the developer requests a completion. A code snippet parser is used by
the debugger for parsing small fragments of Java source code. Other parsers are used for
compilation, hyperlink navigation, language-aware source code searching, and so on.
The need for different program models is motivated by different contexts in which these
models are used. A code formatting tool needs to maintain code comments in its model;
a compiler does not. Some models need to be constructed from ill-formed, incomplete, or
inconsistent states of source code and represent only partial information about the program.
(Van De Vanter [63] describes this as the “I3” condition.) Even when the information that
needs to be represented is fixed, there is a considerable degree of variation in the way it can
be modeled. We explored some of these issues in our earlier work on an exchange format
for the Harmonia framework [10]. In our present work, the design of the program model for
program transformations was driven by the need to expose that model to the users of our
tool.
There has been considerable interest in finding the holy grail of program representations—
a canonical, universal all-purpose program model. Recent examples include the work on
JavaML [1], on srcML [20], and on the standardization of tool exchange formats [30, 38].
Yet, time and again we see the designers struggling with the decision of which features must
be included in the model and how to bring existing tools into compliance with the design.
Our work demonstrates that an application-centric program model can be more ap-
propriate. This notion was introduced in the early work on TXL [23] and REFINE [12].
Because these tools include a parser as part of the transformation engine, for any given
transformation the developers can use a grammar that derives the structure needed for
that transformation. More recently, TXL designers published a report on agile parsing—a
technique for adapting the language parsing grammar to the needs of a particular applica-
tion [25]. Our experience suggests that designing custom models on a case-by-case basis is
the right approach for building language-based tools. However, extracting model instances
from source code text and maintaining the correspondence between various models is a
topic for another dissertation.
Chapter 5
1. The developer selects a sample source code fragment needing transformation. The
system scaffolds the creation of a transformation by automatically generating an initial
pattern from the textual selection in source code. The generated pattern is concrete,
that is, it contains no wildcards and no matching constraints.
2. Using the generated pattern as a starting point, the developer interactively modifies
the transformation by inserting wildcards, matching constraints, and transforming
actions. The system displays a preview of the transformation as it is modified.
60
Add To Pending
Transformations Check if any pending
transformations need fixing
Bring Transform.
Fix?
Back For Editing
Need to create more
No transformations?
More?
No
Done!
Figure 5.1: The flow of user’s interaction with the transformation editor.
3. When the developer is satisfied with the result of the transformation, he or she applies
the transformation to source code and returns to other coding tasks.
This workflow embodies two key principles of the interaction model: by-example construc-
tion and iterative refinement. By-example construction enables developers to start with a
single instance of a transformation and to generalize the description of that transformation
to apply to similar source fragments. In practice, a single conceptual change may require
several related transformations. We support this through a notion of a pending transfor-
mation set. This set groups related transformations prior to their application to source
code and helps avoid intermediate inconsistent states of source code. Any transformation
in the set can be modified further, if the developer discovers that it does not affect the
code as expected. When the developer is satisfied with all transformations in the set, the
transformation editor can apply all of these transformation at once.
Figure 5.1 presents the developer’s workflow at a finer level of detail. This figure illus-
trates iterative refinement of transformations, guided by the feedback from the transforma-
tion editor. The feedback helps the developers to generalize the transformation pattern and
61
Transformation pattern
with transforming action (b) Pending transformations not
yet applied to source code (d)
Context-sensitive
transformation assistant (c)
Figure 5.2: iXj plug-in for Eclipse provides an interactive transformation editor.
to specify the transforming action. It also enables developers to evaluate the set of pending
transformations and to decide when that set is complete.
The developer initiates the transformation process in the Eclipse source code editor by se-
lecting a sample source code fragment that needs to be changed. The selection is performed
using traditional textual selection operations.
1
Figures 5.2a-d designate the panes with comments labeled (a)-(d).
62
The selection is unconstrained; however, the transformation editor activates only when the
developer selects a structurally complete source code fragment, such as a statement or an
expression. When this happens, the system automatically generates an initial pattern from
the selection in source code. This pattern appears in the transformation editor pane.
The initial pattern matches the exact source code fragment selected by the developer and
all the other source code fragments that are textually similar to the selection (whitespace
and comments are ignored by the pattern matcher). Sub-patterns are not shown initially
to provide a less cluttered view.
The transformation assistant describes the selected structure and reminds the developer
what can be done next.
The developer can select another source code fragment if the selected structure does not
appear to be a good starting point for a transformation. The developer can also choose to
“accept this selection for transformation.”
Having accepted a selection, the developer can change the structure of the pattern to
make it more general. The transformation editor offers manipulation of the transforma-
tion structure through direct manipulation and through the transformation assistant. The
transformation engine uses annotations in the source code editor to provide feedback to the
developers. A complete transformation may be stored in the pending transformation list
prior to application. The following sections describe each of these mechanisms.
Direct Manipulation
The developer can manipulate the transformation descriptions directly by invoking visual
“handles” to modify pattern structure and by changing pattern and action text.
63
Figure 5.3: iXj patterns structures can be manipulated with “handles” attached to the
pattern boxes.
Handles. Each pattern box provides handles that enable the developer to modify pattern
structure (Figure 5.3). The handles permit expansion and collapse of the pattern structure
(5.3a), conversion of a pattern element to and from a wildcard (5.3b), cycling through the
optionality states of a pattern element (5.3c), and addition and removal of a transforming
action to a pattern element (5.3d). To reduce visual clutter, the handles appear on the
screen only when the developer passes a mouse pointer over the pattern box.
Expansion and collapse of the pattern structure. This handle controls the visual appearance
of a pattern, but does not change its structure. It permits “drilling down” into the pattern,
when the developer wants to manipulate a substructure that is not initially visible. For
example, the developer can expand the initial pattern to see more of its structure.
Conversion of a pattern box to and from a wildcard. This handle toggles between a concrete
pattern box and a “match-anything-of-this-type” wildcard. The syntactic type of the wild-
card is determined by its context in the pattern. For example, a concrete Java expression
in the initial pattern can be converted to a wildcard that will match any Java expression
at its position.
⇒ ⇒
Every handle operation, including conversion to a wildcard, is reversible. The second toggle
of the wildcard handle converts the corresponding pattern element back to its unwildcarded
form.
64
Cycling through the optionality states of a pattern element. This handle only appears on
pattern boxes that are optional in the Java syntax, such as the else-clause in the if-
statement. Invoking this handle cycles through three optional states: (1) match anything
appearing in this pattern position, including nothing (zero-or-one match), (2) match nothing
(zero-match), and (3) match current pattern box, which may be a wildcard (one-match).
Our running examples does not contain any optional elements. Assuming, however, that
the developer selected the entire variable declaration, we can see that a variable declaration
in Java can contain an optional variable initialization expression.
⇒ ⇒
Addition and removal of the transforming action. This handle adds (or removes) the trans-
forming action for the pattern element. The action is initialized to an identity transforma-
tion that replaces the source code fragment matching that element with itself. To perform
the de Morgan law transformation, the developer adds an action to the bitwise-and expres-
sion.
As with other handles, the action may be removed by re-invoking its handle.
Pattern and Action Editing. The transformation editor permits free-form text editing
of the transforming action and parts of the transformation pattern. Because the trans-
forming action is specified in a text-based format, the developer can edit the action simply
by clicking on its text. Doing so creates an editable input field that accepts all standard
text editing commands. In order to reduce typing, long pattern variable names can be
abbreviated to their least unique qualified suffix. The suffix must contain a complete box
65
name, but this name need not be qualified beyond what is necessary to unambiguously
resolve that name in the currently visible expansion of the pattern. For instance, in our
running example it is sufficient to refer to $left$ and $right$, rather than $binop.left$
and $binop.right$. The names are expanded to their long unambiguous form when the
developer leaves (clicks outside of) the action editor.
⇒ ⇒
The transformation editor permits free-form editing of the non-structural elements in the
transformation pattern. These elements include string and numeric literal patterns, modifier
patterns, and Java name patterns. Clicking on such an entity in the transformation editor
creates an editable input field. The developer can edit that field to specify a more complex
pattern using the text-based pattern language appropriate for that entity (see Section 4.2).
As the text is edited, it is verified against the specification of the corresponding pattern
language. If the pattern does not conform to the specification, the developer is notified by
a change in the pattern’s color. For example, suppose the developer wanted to apply the
de Morgan transformation to just those bitwise-and expressions whose first operand is a
variable that starts with lower-case letter (contrived, but possible). In this case, he or she
could modify the pattern as follows.
Transformation Assistant
Any pattern element in the transformation editor can be selected by clicking anywhere
inside its box. The context-sensitive transformation assistant provides a description of the
selected pattern element and lists various actions that apply to it. When no specific pattern
element is selected, the transformation assistant describes each of the handles to remind
the developer of their purpose.
In addition to enabling all actions accessible via handles, the transformation assistant
includes several options that are not available through direct manipulation. These options
include specification of various matching constraints, such as a constraint on the Java type of
66
an expression that can be matched by an expression wildcard. For example, if the developer
wants to constrain the de Morgan transformation to apply only to the instances of java-
.lang.Byte,2 he or she can use the transformation assistant to specify this constraint.
The input field for the type constraint in the transformation assistant offers type-name
completion; a full list of known type names is accessible by following the “More...” hyperlink.
Immediate Feedback
As the pattern is edited, the pattern matcher runs continuously, providing visual feedback
to the developer. The pattern matcher highlights all matches in the source code editor. An
overview ruler on the right-hand side of the source code editor provides simple navigation to
the matches in the same source file that are not immediately visible. Tick-mark annotations
on source files and packages in Eclipse’s package explorer indicate presence of a match within
a source file.
The developers can associate a transforming action with a pattern at any time. Frequently,
the developers will experiment by adding an action to a concrete pattern, then making
the pattern more general by adding wildcards, and finally changing the action to introduce
missing pattern variables. The effect of the transformation is displayed immediately upon
specification of an action; however, the results are not yet “committed” to the source code.
For example, when the developer edits the action in the de Morgan law transformation, the
transformation engine immediately updates the view in the source editor:
2
This assumes that the program to which the transformation is applied relies on Java 5 automatic
unboxing semantics.
67
The immediate feedback provided by the transformation engine makes the execution of
transformations transparent to the developers and assists in their learning the transforma-
tion language. The interface encourages experimentation by enabling the programmers to
view partial results and to visualize the effect of the transforming action.
The second transformation in this list cleans-up double-negations left over after the de
Morgan law transformation.
68
Harmonia
Synchronization
Syntax Tree iXj Program Model
iXj Model
Synchronization
Eclipse’s
View of
Editing
Source Code
Source Code
Pattern Matching
and
Transform Match
Transformation Engine
Initial
Pattern Editor Construction
Java Bindings
Java Language Module
C Bindings
C++ Bindings Harmonia
XML Language Module Language Kernel
Tcl Bindings
Application
XEmacs Lisp
C++ Language Module Bindings
Figure 5.5: Component-level view of the Harmonia architecture. Language modules extend
the analysis kernel with language-specific information. An application, implemented in one
of the supported programming languages, uses the services of the analysis kernel through
bindings appropriate for that programming language.
The iXj pattern matcher uses a syntax-directed top-down tree traversal of the iXj pro-
gram model. The transformation engine applies the transformation textually to the part
of the source code text that corresponds to the match. Implementation of changes on the
text-based representation permits the least disruptive modification of source code with re-
spect to the non-syntactic material such as whitespace and comments. The text changes are
subsequently incorporated in the syntax tree-based representation by the Harmonia analysis
engine.
The iXj pattern editor uses the iXj program model to construct the initial representation
of the pattern structure. The editor uses a “box-and-glue” layout algorithm for presenting
pattern structure as nested graphical boxes. Our algorithm uses baseline-alignment con-
straints to ensure that the source code text contained in graphical boxes looks visually like
source code. This algorithm was inspired by the layout mechanisms of TEX [44]. Horizon-
tal layout and spacing are controlled by typesetting rules, similar to those formulated by
Baeker and Marcus for C programs [2]. This mechanism was derived from our earlier work
on displaying and editing source code in a programming environment [65]. When a pat-
tern spans multiple lines of source code, the algorithm uses a simple line-breaking strategy
that terminates the horizontal layout at appropriate points in the structural context (for
example, after the opening curly brace in the if-statement’s body).
Analysis
Invocation
Program Unit
Program Unit
Program Unit
Language Kernel
Analysis Services
Tree Access
and Editing Consistency
Maintenance Incremental
Lexer
Language Kernel
Harmonia Program Incremental
Application Representation Parser
(Eclipse) Incremental
Semantic
Grammar Grammar Attribution
Information a −> cb Information
c| External
c −> df e Analysis
... Infrastructure
Run−time Grammar
Representation
Incremental analyses. The analysis engine of the Harmonia framework provides fine-
grained incremental lexical and syntactic analyses that can both construct the syntax trees
from traditional text files and incorporate changes to the syntax trees incrementally, as they
are introduced. The framework includes an incremental GLR parser, an efficient incremental
scanner, and an infrastructure for building semantic analyses.
In order to give the developer needed flexibility in modifying programs, the framework
continues to provide services when programs are ill-formed, incomplete, or inconsistent.
The incrementality of the Harmonia parsing algorithms, together with history-sensitive
error recovery [67], naturally incorporates inconsistency into the syntax tree by enabling
syntax analysis to continue beyond malformed regions, and by enabling malformed regions
to contain well-formed substructure. The iXj transformation engine takes advantage of this
feature to enable transformations on all regions in source code that do not contain syntactic
or semantic errors.
Editing and analysis model. The Harmonia framework permits unrestricted editing of
the syntax tree data structures. The source code representation can be changed without
concern for transient ill-formedness introduced during an edit. A change discovery mech-
anism based on the persistent document representation is used to incrementally restore
consistency following a modification. The frequency of reanalysis is under control of the
analysis clients; in Eclipse, a reanalysis is invoked after a brief pause in the user’s typing.
This analysis model is instrumental for permitting text-based transformation of the
program structure. When the iXj transformation engine applies a transformation to source
code text, the program structure is recovered by a subsequent analysis.
2. Designing a program model for interactive transformations and implementing its con-
struction. Our program model can be used as the starting point for object-oriented
languages similar to Java. The analysis algorithms for constructing a program model
can use a standalone library (like Harmonia) or can rely on the analysis infrastructure
of an IDE (like Eclipse).
5. Implementing a pattern matching and transformation engine. For the most part,
the iXj’s pattern matcher and the transformation engine can be reused, though our
syntax-directed pattern matching algorithm includes language-specific components
that correspond to the iXj program model for Java. These components would have
to be reimplemented for a different programming language.
Interactive transformations can have many uses. Most modern programming languages and
most interactive development environments will benefit from having a tool for interactive
program transformations available for developers’ use. We hope that other researchers
will follow and bring the concept of interactive program transformations to programming
languages other than Java.
75
Chapter 6
In order to assess the ease of learning the visual transformation language and the usability
of the iXj user-interaction model, we conducted an evaluation of the Eclipse-based transfor-
mation environment through a usability study with five Java programmers. We trained the
participants to understand, construct, and evaluate iXj transformations. The participants
completed a short code editing task and filled out an evaluation questionnaire. This chapter
presents our methodology and discusses the results of the evaluation.
6.2 Participants
The participants in our study were proficient Java programmers with various levels of expe-
rience working with Eclipse. Our main selection criteria was programming proficiency with
1
Camtasia Studio is a commercial product available from TechSmith, Inc.
76
Java—we specifically wanted to avoid novices who may not have enough experience with
code maintenance tasks. Three participants were professional Java programmers employed
in the software industry. Two were students (one graduate and one undergraduate) in the
Computer Science Department at the University of California, Berkeley. All participants
considered themselves expert Java programmers, with an average of eight years of Java
programming experience. Two participants reported being novices to Eclipse and having
little familiarity with its automated refactoring facilities. Three participants use Eclipse for
their day-to-day Java programming.
The pre-study interview was aimed at establishing common terminology and understand-
ing of source code maintenance. We defined maintenance as any programming activity that
does not involve adding new code to a software system (authoring). We distinguished three
forms of maintenance: adaptive (adding new features), corrective (fixing defects), and per-
fective (anticipating future changes). These definitions coincide with those established in
software engineering literature, such as in Swanson [61].
During the pre-study interview all participants reported regularly performing adaptive
and perfective source code maintenance. Three participants estimated that they spend 20%
of their coding time on source code maintenance, two participants reported that fraction to
approach 40%-50%, and one participant estimated that 80% of her time is spent performing
some form of maintenance of the existing code. All participants reported using some tools
to assist them with these tasks. Of these tools, the Java compiler was considered the most
ubiquitous for its ability to locate places in source code that are semantically or syntactically
inconsistent after a change. The participants indicated that they often structure their
maintenance activities to intentionally cause compilation errors by starting with the most
disruptive change. This practice enables them to use the resulting compiler error messages
as a “to-do” list. (One participant referred to this as a “chasing the errors” approach.)
Three of the participants reported routinely using refactoring tools in Eclipse to assist
them with code maintenance tasks. Only one of the participants was comfortable using
command-line text-processing tools (such as the SED utility or PERL scripts), although all
participants indicated that they were aware of these tools.
6.3 Training
The training session consisted of a walkthrough of the Eclipse user interface, emphasizing
the interaction with our transformation environment. We demonstrated iXj on a simple set
of de Morgan’s law transformations, similar to those presented in Section 5.1.2. During the
evaluation session the participants learned the following features of iXj and of our interaction
model:
• How to construct the initial transformation pattern “by-example” from the selection
in the source code editor.
• How to read pattern representation based on nested structure demarcated with graph-
ical boxes.
77
• How to expand and collapse pattern structure and how the visual alignment of the
pattern elements helps to see the relationship between the pattern and the source
code.
• How to convert a pattern element to a wildcard and how to undo the conversion by
re-invoking the wildcard handle.
• How to refer to the parts of the matched pattern using box names as pattern variables.
• How to evaluate the transformation based on the feedback shown in the source code
editor and the package explorer.
• How to add a transformation to the pending list, how to individually toggle the
preview of transformations on that list, and how to bring a transformation on the
pending list back into the transformation editor.
// Reads next token from the input and returns its type
public int nextToken();
...
}
...
}
Figure 6.2: Transformations needed in the MineSweeper source code to convert the uses of
java.io.StreamTokenizer to the uses of java.util.Scanner.
79
(Figure 6.1) and with a table showing sample transformations needed in the MineSweeper
source code (Figure 6.2). We instructed participants that they should not interpret the
transformations in the table literally. For example, t can stand for any expression whose
type is java.io.StreamTokenizer, and s can stand for any expression whose type is
java.util.Scanner. Likewise, the transformation T1 never appears in the MineSweeper
source code verbatim. Rather, the source code contains statements like if (t.ttype !=
StreamTokenizer.TT NUMBER) {...}, requiring the result of the transformation to include
negation. The participants were told that they could construct these transformations in
arbitrary order.
6.5 Metrics
During evaluation we measured time to completion for each of the transformations presented
in Figure 6.2. We also recorded total time to completion of the entire task, but we did not
find that metric very useful—some participants decided to perform several transformations
that were not on the list, because “they seemed appropriate.”
Following completion of the sample task, the participants were asked to evaluate the
transformation tool by completing a twelve-item questionnaire that consisted of both qual-
itative and quantitative questions. During our analysis of the results we were trying to
determine (1) the understandability of the transformation language vocabulary, (2) the in-
tuitiveness of the pattern structure, and (3) the ease of developing transformations. We
were also interested in classifying the most common mistakes that the participants made
while attempting to complete the code editing task.
The questionnaire was constructed using the Cognitive Dimensions framework presented
in Chapter 2. We have put the CDs framework to dual use: in addition to making it part
of our usability evaluation, we applied the framework in the early design stages to gain
additional insight into the problem. In our design, we attempted to achieve high marks along
each of the dimensions. Thus, in addition to the overall usability picture, the responses to
the CDs questionnaire provide feedback on our design targets. In order to make the results
easily quantifiable, we augmented the traditional qualitative CDs questionnaire [6] with a
seven-point semantic differential scale. We present full questionnaire in Appendix C.
6.6 Hypotheses
The key hypothesis of our evaluation was that the participants would find the transformation
tool intuitive and easy to use. We expected them to perform well on the sample transfor-
mation task and to become reasonably proficient with the tool. After a brief exposure to
the transformation tool the participants should understand how to build a pattern, how to
create an action, and how to evaluate correctness and completeness of a transformation.
80
This was a tricky task because the non-negated version of this boolean expression does
not appear in source code. We expected the participants to realize that and create this
transformation as above.
The above transformation presents the most obvious solution that replaces the entire cast
expression. It is also possible to modify just the casted value. Such a transformation has
two forms:
81
or
The left version replaces the entire casted value. The right version modified the casted
value without using any pattern variables in the action. Both of these transformations
leave unnecessary casts to int. These casts can be cleaned up with another transformation
similar to the ones presented in Section 4.4.2.
This was another tricky task because java.util.Scanner.next() not only returns the
next token from the scanner, but also advances the input position. This means that read-
ing from java.io.StreamTokenizer.sval several times in a row without interleaving calls
to java.io.StreamTokenizer.nextToken() is not equivalent to calling java.util.Scan-
ner.next(). Such a sequence occurs in the MineSweeper game. The correct transformation
is to assign the value of java.util.Scanner.next() to a temporary variable (such as
tempSVal) and replace accesses to the sval field with a temporary variable as follows:
6.7 Results
We found the overall opinion of the participants to be very favorable. All participants
were able to complete the task and were satisfied with their work. During the post-study
interview the participants indicated that they enjoyed working with our tool. In this section
we present the summary of participants’ performance, the results of the cognitive dimensions
questionnaire, and the analysis of common mistakes, errors, and misconceptions.
82
Participants
Transformation 1 2 3 4 5
T1 Init 135 88 157 112 91
Fix 28 37 41
Total 163 125 157 112 132
T2 Init 84 174 93 136 219
Total 84 174 93 136 219
T3 Init 75 80 43 91 44
Fix 8
Total 75 88 43 91 44
T4 Init 46 48 63 – –
Fix 22
Total 68 48 63 – –
T5 Init 131 118 – 55 102
Total 131 118 – 55 102
T6 Init 16 138 94 47 39
Fix 166 20 60
Total 182 138 94 67 99
Figure 6.3: Time in seconds spent by each of the participants on each of the transformations
listed in Figure 6.2. Init represents the time spent on the initial attempt. Fix represents
the time spent on a subsequent correction, if any. Total represents total time spent for a
transformation. Not all participants attempted all transformations.
6.7.1 Performance
Figure 6.3 lists times (in seconds) spent by each of the participants on each of the trans-
formations listed in Figure 6.2. We recorded both the time spent on the first attempt to
construct a transformation (“Init”) and the time spent on any subsequent modification
(“Fix”). Subsequent modifications were necessary because some of the participants did not
introduce appropriate wildcards into a transformation pattern on the first attempt. After
realizing this, they went back to an earlier transformations from the pending transformation
list to correct their mistakes. Total transformation times reflect the complexity of transfor-
mation, though there was a great amount of variation depending on the order in which the
participants attempted the transformations.
Figure 6.4 presents the time (in seconds) spent by each participant on each task in the
order these tasks were performed. (This is the same information as in Figure 6.3, reordered
for easier presentation.) Transformation task times demonstrate the participants’ increased
fluency with the transformations they wrote later. (We also confirmed this observation
subjectively when analyzing screen recordings.) For example, everyone attempted transfor-
mations T2 and T3 in sequence because these transformations occur close together in the
source code. These transformations are comparable in pattern complexity and the second
transformation in the sequence was always specified more quickly than the first one.
83
Participants
1 2 3 4 5
T6 Init 16 T6 Init 138 T1 Init 157 T6 Init 47 T2 Init 219
T1 Init 135 T4 Init 48 T6 Init 94 T2 Init 136 T3 Init 44
T4 Init 46 T1 Init 88 T4 Init 63 T6 Fix 20 T1 Init 91
T6 Fix 166 T1 Fix 37 T2 Init 93 T3 Init 91 T6 Init 39
T1 Fix 28 T2 Init 174 T3 Init 43 T1 Init 112 T6 Fix 60
T4 Fix 22 T3 Init 80 T5 Init 55 T5 Init 102
T2 Init 84 T3 Fix 8 T1 Fix 41
T3 Init 75 T5 Init 118
T5 Init 131
Figure 6.4: Time in seconds spent by each of the participants on each of the transformations
listed in Figure 6.2 arranged in the order of completing a task. Init represents the time
spent on the initial attempt to construct a transformation. Fix represents the time spent
on a subsequent correction, if any.
Three of the participants missed one of the transformations from the list. We attribute
this to their inability to rely on the compiler to detect errors that would otherwise lead them
to places in the source code still needing transformation. (The compiler was not available
to them.) This limitation will be addressed in a future version of our tool.
Visibility: How easy is it to see or find various parts of the transformation description while
it is being constructed or changed?
Very Difficult Very Easy
1 2 3 4 5 6 7
In general Visibility: How easy
the participants is itthat
felt to see or find various
visibility parts of the transformation
of the transformation description language was
description while it is being constructed or changed?
good. One participant expressed concern about the long name references in the transfor-
84
mation action, noting that they get hard to read when the name refers to a deeply nested
structure, as in $if.test.value.left.conditional.then$. This name is required, for
example, to refer to the variable y inside if (!(x ? y : z)) { ... }.
Viscosity: How easy is it to make changes to the parts of the transformation description
that you completed?
Very Difficult Very Easy
1 2 3 4 5 6 7
The participants feltHow
Viscocity: thateasy
viscosity
is it to was
makelow, with to
changes onetheparticipant
parts of the noting that making changes
transformation
description that you completed?
was “much easier than [he] expected”. Another participant appreciated the ability to make
changes to the completed transformations that needed adjustment by taking them out of
the pending transformation list.
Diffuseness: Does the transformation notation let you describe what you want reasonably
briefly or is it long-winded?
Very Long Very Brief
1 2 3 4 5 6 7
The participants reported
Diffuseness: Doeslow diffuseness, although
the transformation two
notation let youofdescribe
the participants
what you expressed con-
want
cerns about thereasonably
long namebriefly or is it long-winded?
references in the transformation action. Coupled with another
participant’s feeling that this reduces visibility (above), this issue emerged as one of the
problems that we need to address.
Role Expressiveness: When looking at the transformation description, how easy is it to tell
the purpose of each part in the overall scheme?
Very Difficult Very Easy
1 2 3 4 5 6 7
iXj received high
Role marks for roleWhen
Expressiveness: expressiveness.
looking at theThe participants
transformation stated that “it is obvious
description,
where eachhow easy
part is it tofrom,”
comes tell the and
purpose of each
thanked uspart
forin“not
the overall
showingscheme?
these as a tree.”
Closeness of Mapping: How closely does the transformation description match your intuitive
understanding of program structure?
Not Close at All Very Close
1 2 3 4 5 6 7
The participants reported
Closeness that iXj
of Mapping: Howachieves closethemapping
closely does between
transformation the transformation de-
description
match your intuitive understanding of program structure?
scription and their understanding of the program structure. One participant noted: “The
pattern looks, visually, like source code. It makes sense to edit it as an example of the
change you want to make and convert things to wildcards where they are unnecessarily
specific.” Another participant commended iXj for “great indication of wildcarding.”
85
1 2 3 4 5 6 7
Most of the participants
Progressive were satisfied
Evaluation: How easywith
is it tothe
stopability to evaluate
in the middle a transformation-in-
of creating a
transformation description and check your work so far?
progress. One participant noted that he had trouble “mak[ing] sure [he] had grabbed all
matches that [he] intended.” This problem was also mentioned by other participants in the
post-study interview.
Error Proneness: How often do you find yourself making small slips that make the trans-
formation process frustrating?
Very Often Never
1 2 3 4 5 6 7
Participants’ marks
Error and comments
Proneness: How oftenon
do the
you error-proneness dimension
find yourself making small slipsconfirmed
that some of the
make the transformation process frustrating?
observations that we make in Section 6.7.3. One participant indicated that “[he] was often
not sure that [he] made the pattern sufficiently generic.” Two other participants noted that
it is easy to mistype a variable name in the transformation action. Another participant
disliked small icons for pattern box handles.
1 2 3 4 5 6 7
The participants felt that
Provisionality: they
When couldon
working easily explore various
a transformation directions
description, because
how easy is it was “fast
it to explore
to make changes” andvarious
they directions
could “seewhen
theyou are change
code not sure right
which away.”
way to go?
One participant par-
ticularly liked “going back and forth from wildcard to the original part [to see] the effect of
converting to a wildcard.”
Consistency: How would you rate the consistency of the transformation notation?
Very Inconsistent Very Consistent
1 2 3 4 5 6 7
Everyone felt that transformation
Consistency: notation
How would you rate thewas consistent,
consistency with
of the one participant emphasizing
transformation
descriptions?
that he “really liked that the pattern looks like the source code itself.”
Hard mental operations: What kinds of things require the most mental effort when con-
structing a transformation description?
In this category all participants listed the same two problems: (1) deciding which parts of
the pattern need to be “wildcarded” and (2) knowing when to stop adding wildcards.
86
Hidden Dependencies: Are there any parts in the transformation description that, when
changed, require you to make other related changes to other parts of the description?
Most participants did not notice any hidden dependencies in the transformation description.
Our subsequent analysis of the screen recordings, however, exposed two hidden dependencies
that caused confusion for the participants.
Mistakes
Forgetting to click outside of the action editor to preview transformation. Two of the partic-
ipants kept forgetting that they needed to click outside of the action editor to activate the
transformation and preview its results. This shortcoming can be addressed by introducing
a timeout that activates the transformation when the user stops modifying the action for a
given period of time.
Forgetting to accept transformation. Some participants kept forgetting to click on the “Add
to Pending Transformations” button to move a transformation to the pending transforma-
tion list. Others also found that the concept of the currently edited transformation being
separate from other pending transformations confusing. We intended to redesign this part
of the user-interaction model to avoid confusion.
Insufficient wildcarding. Several participants indicated that they had trouble deciding when
the pattern has enough wildcards to match all places in the source code needing transfor-
mation. This problem was also mentioned under the cognitive dimension of hard mental
operations. This result contrasts with our initial design intuition that the users will be
able to “reason” about a transformation and will add all of the necessary wildcards based
on the intent of the transformation. For example, we assumed that if the users wanted to
87
Unintended wildcarding of a pattern element. This simple slip resulted from users clicking
on the wildcard handle on the wrong pattern box. This happened infrequently and we
attribute this slip to the size of the icons (one of the participants mentioned this problem
in his responses to the questionnaire). When the participants made this mistake, they were
very pleased by the ability to toggle the wildcarding and “undo” their mistake.
Picking wrong pattern element for replacement. This mistake occurred when the partici-
pants wanted to attach a transforming action to one of the nested pattern elements. In two
cases they were confused about the appropriate level at which the transforming action needs
to be specified. We believe that this problem can be addressed through better visualization
of which part of the source code is affected by an action.
Errors
Missed inversion of the method call result. Three of the participants missed variations be-
tween the source code and the transformations described in the API mapping table. They
mistakenly replaced s.ttype != StreamTokenizer.TT NUMBER with s.hasNextInt(), not
realizing that the result of the method call needed to be inverted. This represents a con-
ceptual error that renders the transformed program incorrect. After introducing such an
error, the only possibilities of discovering it are through testing or through code inspec-
tion. We believe that this error is largely due to the participants being unfamiliar with the
java.io.StreamTokenizer and the java.util.Scanner interfaces and following the list
of transformations from the supplied table. In practice, the users are likely to have better
understanding of the transformation task because they would be the ones formulating it.
Still, the possibility of such errors is a concern.
Ignoring method call side-effects. Two participants introduced a conceptual error in the
program when replacing t.sval with s.next() (transformation T5). The method java-
.util.Scanner.next() has a side-effect of advancing the current position in the input
stream. But the doCommand() method in the MineSweeper source code repeatedly examines
the current input token as follows:
88
Replacing the accesses to the sval field with calls to next() is erroneous. Two of the partic-
ipants made that mistake. Again, we attribute this error to the participants’ unfamiliarity
with the java.util.Scanner’s behavior.
Misconceptions
Confusion about type constraints on the expression wildcard. Several participants expressed
hesitation prior to converting the instance argument expression to a wildcard. For example,
when generalizing getTokenizer().nextToken() to *.nextToken(), it was not clear to
them that this pattern will only match calls to nextToken() when the instance argument
is an instance of java.io.StreamTokenizer. In iXj patterns this restriction is implied by
the name scoping rules. nextToken is a method name that refers to a method in java.io-
.StreamTokenizer. Thus, the expression appearing as its instance argument can only be of
the java.io.StreamTokenizer type. This can be seen by expanding the pattern elements
representing nextToken:
This information, however, is not shown in the pattern editor unless the user “drills down”
into the nextToken pattern element. When this point was clarified, the participants seemed
more comfortable with wildcarding.
This problem represents an example of a hidden dependency, although the participants
did not list it as such on the cognitive dimensions questionnaire. We intend to implement
a solution to this problem by automatically generating a type constraint dictated by the
expression context when the wildcard is created. For example:
89
Because in the source code the java.io.StreamTokenizer part of that name exists solely
for the purpose of qualifying the reference to TT NUMBER, in the pattern that information
became hidden inside the fully-qualified representation of that name. TT NUMBER’s associ-
ation with java.io.StreamTokenizer can be seen only when the user “drills down” into
the TT NUMBER pattern element:
The participant’s ensuing confusion illustrates another hidden dependency in the transfor-
mation pattern language. This is an example where our pursuit of consistency resulted in a
design error that led to a hidden dependency. We will address this problem by rethinking
this part of our design.
Wanting to wildcard an operator. When working on the transformation T1, four partici-
pants wondered if they could convert the token representing the inequality operator into
a wildcard. Their intent was to match both t.type == StreamTokenizer.TT NUMBER and
t.type != StreamTokenizer.TT NUMBER. We do not currently support wildcarding of indi-
vidual keywords and tokens in iXj. It is possible to extend the language to add this feature,
but it is not clear that it would be a worthwhile addition—those participants that wanted
to wildcard an operator subsequently realized that that would not be appropriate for their
transformation.
Confusion about operator precedence. Two of the participants expressed concern when
working on the transformation that introduces the instanceof operator (transformation
T3). They felt insecure about introducing an operator into the replacement string without
90
considering the expression context in which that string will appear in source code. If other
operators of higher evaluation precedence are present in that expression, the evaluation order
of that expression can change. This is a valid concern. Consider the following example:
if (!s.atEndOfFile()) {
...
}
This code sequence fails to compile because the negation operator (!) has higher precedence
(binds more tightly) than the instanceof operator. As a result, the negation becomes
associated with s.ioException(), which is not a boolean expression.
There are two ways to fix this problem. The first option is to require the developers to
parenthesize the transformation action whenever there is a possibility of causing precedence
errors. This would constitute a hard mental operation. The second option is to enhance the
transformation engine to analyze the context in which a replacement expression is inserted
to ensure that the evaluation order is unchanged. We intend to implement this mechanism
in a future version of iXj.
performing a much wider range of transformation tasks and will be working on their own
code bases. A more thorough evaluation of iXj should include a longitudinal case study of
developers using our tool on their own source code for an extended period of time.
“The notation comprises the perceived marks or symbols that are combined to
build an information structure. The environment contains the operations or
tools for manipulating those marks. The notation is imposed upon a medium,
which may be persistent, like paper, or evanescent, like sound.” [5]
When we applied the CDs framework to iXj, we considered the transformation description
language as the notation, the transformation editor as the environment and the computer
screen as the medium. Yet, we failed to distinguish it from a second notational layer formed
by the interaction language—a series of commands that the user uses to manipulate the
information structure. Just like the transformation notation, the interaction language also
has syntax and semantics embodied in our user-interaction model. This secondary layer
should be analyzed separately, but we did not consider this difference in our evaluation.
Furthermore, our evaluation was hampered by the participants’ limited exposure to
the system. Because the participants only worked on a few hand-selected transformation
tasks, they were only exposed to some aspects of the transformation tool. Given more time
with the tool, it is possible that the participants would have discovered additional hidden
dependencies and hard mental operations.
When our transformation tool is made available to a broader population of software
developers and when these developers use our system more extensively, it would be beneficial
to perform another round of cognitive dimensions evaluation.
we cannot claim that most transformations can be completed in 107 seconds. We can,
however, use this metric to judge the efficacy of interactive transformations in improving
developer productivity.
The keystroke level model (KLM) [14] is a technique for computing the time to perform
a task using keyboard and mouse. KLM is one of the family of related techniques included
in GOMS (Goals, Operators, Methods, Selection Rules) [41]. KLM has been used to model
expert performance in text editing [13] and program entry [72]. A KLM is constructed
using operators that express the time spent on individual actions taken by the user. The
operators of interest to us include clicking a mouse button (B–0.2 sec.), typing a character
(K–0.28 sec.), moving hand to mouse or keyboard (H–0.40 sec.), mental preparation (M–
1.35 sec.), and pointing with a mouse (P–1.10 sec.). Mental preparation is added at the
start of a task and whenever the user enters any user-defined value. These times are taken
from Card, et al. [13] and are based on empirical observations.
Consider the transformation T5: (int) t.nval ⇒ s.nextInt(). We computed the
average time spent by the participants on this transformation to be 102 seconds. Let us
assume that the developer is using compiler errors to guide him to the locations in source
requiring change. (In Eclipse, compilation is incremental and the errors appear in a separate
pane below the editor window as the program is edited.) The sequence of operations to
perform one change can be modeled as follows: (1) mental preparation–M, (2) move hand
to mouse–H, (3) point with mouse at an error–P, (4) click mouse button–B, (5) move hand
to keyboard–H, (6) three keystrokes to delete old text, assuming word-at-a-time deletion–K,
(7) mental preparation–M, (8) eleven keystrokes to type the new text–K. Expressing this
as a formula for predicted time yields T = 2M + 2H + P + B + 14K = 8.72 seconds. This
indicates that after only twelve changes (102 sec. / 8.72 sec.) the time is spent on creating
a transformation is completely recovered. When developers change source code manually
there is also significant opportunity for introducing bugs and compilation errors. Fixing
those problems adds to the time developers spend on a change.
In addition to the immediate productivity benefits, the reduction in code changing effort
and the increased developer confidence enabled by iXj can lessen the developer’s resistance
to making design-improving changes. This can lead to improved developer productivity in
the long term, and, ultimately, to higher quality software.
This makes development of iXj’s transformations simpler, because the developer need not
anticipate future uses of their transformations. In contrast, the refactoring transformations,
for example, must be more general and more broadly applicable.
The transformations expressed in the visual transformation languages are more concise
and more readable than solutions discussed in Chapter 2. While direct comparison is
difficult, we can observe that all transformations needed for the sample task fit on a single
printed page. This is in contrast to the TXL transformations for the same task that occupy
four printed pages (see Appendix A).
Clearly there are TXL and refactoring transformations that are not expressible in iXj.
Notably, those transformations that require control- and data-flow information, such as the
Extract Method refactoring (Fowler [31], p. 110), are not supported. One of the challenges in
designing iXj was the need to balance the expressiveness of the language with the ability of
developers to understand and manipulate artifacts in that language. Our design represents
a compromise between these two requirements and, as we believe, it successfully addresses
its purpose of simplifying mundane and tedious source code editing operations.
94
Chapter 7
Conclusion
Making large and sweeping changes to source code can be a tedious and error-prone process,
requiring the developer to perform many systematic and menial source code edits. Our work
investigates the use of source-to-source program transformations in an interactive setting
as a potential solution for automating systematic source code editing. The thesis of our
research is that developers can use formal transformations of program source code effectively
and that the use of these transformations will reduce the effort expended on mundane and
time-consuming code editing tasks. This dissertation proves our thesis.
Program Model for Java Source Code Manipulation. In developing the concept of
interactive program transformations we designed a new program model for Java source code
that is more natural and understandable for human developers than existing tool-centric
approaches. Our program model facilitates visual presentation of the program structure. It
enables manipulation of program source code using entities and relationships that “make
sense” to a typical software developer. The model only supports structural representation
of program source code when necessary, falling back to text-based representation where
the structure is not needed. This program model is embodied in a visual language for
transformation of program source code.
involving type and scope information, and non-structural text-based patterns. The transfor-
mation language enables developers to associate transforming behavior with any structural
element in the pattern.
The design of the transformation language was influenced by the Cognitive Dimensions
(CDs) framework. Using this framework we designed a notation that exhibits high visibility,
low viscosity (resistance to change), and excellent closeness of mapping to the developers’
mental model of source code. We evaluated our transformation language as part of the
usability evaluation of the iXj prototype.
Task-centered Design for Software Development Tools. One of the indirect con-
tributions of this work is the validation of task-centered design as a viable and impor-
tant methodology for building software development tools. Software development tools are
unique in that their designers, developers, and users are often the same people. As a result,
it is unusual for development tool designers to employ any user-centric design process. In our
96
work, the task-centered design workflow was instrumental in creating a novel notation for
describing transformations and in devising a user-interaction model for manipulating that
notation. We benefited greatly from the iterative evaluation prescribed by task-centered
design. We introduced a new informal evaluation strategy into the methodology of task-
centered design. This strategy, based on the Cognitive Dimensions framework, helped us
refine the design and contributed to the user evaluation of our implementation. We con-
clude, from our experience, that task-centered design is an important technique for creating
new tools for software developers.
• The ability to save a constructed transformation set. One of the participants in the
evaluation inquired whether he would be able to reuse his java.io.StreamTokenizer
to java.util.Scanner transformations on other pieces of code that he might come
across. Doing so requires the ability to preserve transformations in an off-line form.
• The ability to use iXj code patterns for searching source code. This is often a pre-
requisite step for building a transformation. For example, two participants in the
evaluation wanted to find all uses of the java.io.StreamTokenizer class prior to
beginning their work on the transformations. While this type of search is supported
in Eclipse, they wanted to use iXj patterns to specify these searches.
• The ability to retract transformations after they are applied to source code. Having
constructed all of the necessary transformations, several participants in our evaluation
hesitated before hitting the “Apply” button. They inquired whether it is possible to
97
“undo” the application and go back to editing transformations, in case something did
not behave as expected.
In order to realize these engineering goals, we intend to release the current iXj prototype
to the open-source community. In addition to attracting new developers to our project,
we hope to achieve broader penetration of the ideas underlying iXj and to introduce more
software developers to the concept of interactive transformation of source code.
Control- and data-flow information in the program model. Our structural rep-
resentation of program source code incorporates typing and scoping rules. Some transfor-
mations, such as those needed for refactoring, require information about data and control
dependencies in source code. These dependencies are not currently exposed as part of the
program model. The challenge lies in incorporating this information in a way that makes
it easily understandable.
Support for more programming languages. We designed the iXj transformation lan-
guage specifically to support transformation of Java programs. The underlying concepts
of our design can also be extended to other programming languages. The challenge lies
in devising a program model for those languages that is sufficiently expressive to support
broad range of transformations and sufficiently simple for presentation to developers.
Bibliography
[1] Greg J. Badros. JavaML: a markup language for Java source code. WWW9/Computer
Networks, 33(1–6):159–177, June 2000.
[2] Ronald M. Baecker and Aaron Marcus. Human Factors and Typography for More
Readable Programs. ACM Press, 1990.
[3] Ira D. Baxter, Christopher Pidgeon, and Michael Mehlich. DMS: Program transforma-
tions for practical scalable software evolution. In International Conference on Software
Engineering, pages 625–634, 2004.
[4] Alan Blackwell. SWYN: A visual representation for regular expressions. In Henry
Lieberman, editor, Your Wish Is My Command. Morgan Kauffman, 2001.
[5] Alan F. Blackwell and Thomas R. G. Green. Cognitive dimensions of information arte-
facts: a tutorial, 1998. http://www.cl.cam.ac.uk/∼afb21/CognitiveDimensions/
CDtutorial.pdf.
[7] Grady Booch, James Rumbaugh, and Ivar Jacobson. The Unified Modeling Language
User Guide. Addison-Wesley Professional, 1998.
[10] Marat Boshernitsan and Susan L. Graham. Designing an XML-based exchange format
for harmonia. In Working Conference on Reverse Engineering, pages 287–289, 2000.
[11] Marat Boshernitsan and Susan L. Graham. iXj: interactive source-to-source transfor-
mations for java. In OOPSLA ’04: Companion to the 19th annual ACM SIGPLAN
100
[12] Scott Burson, Gordon B. Kotik, and Lawrence Z. Markosian. A program transformation
approach to automating software reengineering. In Proceedings of the 14th Annual
International Computer Software and Applications Conference, pages 314–322. IEEE
Computer Society Press, 1990.
[13] Stuart K. Card, Thomas P. Moran, and Allen Newel. The Psychology of Human-
Computer Interaction. Erlbaum, Hillsdale, NJ, 1983.
[14] Stuart K. Card, Thomas P. Moran, and Allen Newell. The keystroke-level model for
user performance time with interactice systems. Commun. ACM, 23(7):396–410, 1980.
[15] Fernando Castor, Kellen Oliveira, Adeline Souza, Gustavo Santos, and Paulo Borba.
JATS: A Java transformation system. Brazilian Symposium on Software Engineering,
2001.
[16] Hock Chan, Keng Siau, and Kwok-Kee Wei. The effect of data model, system and
task characteristics on user query performance: an empirical study. SIGMIS Database,
29(1):31–49, 1997.
[17] Peter Pin-Shan Chen. The entity-relationship model – toward a unified view of data.
ACM Trans. Database Syst., 1(1):9–36, 1976.
[18] Steven Clarke. Evaluating a new programming language. In G. Kadoda, editor, PPIG
13, May 2001.
[19] Don Coleman, Joel Confino, Peter Koletzke, Brian McCallister, Tom Purcell,
and John Shepard. Java IDE shootout. http://developers.sun.com/learning/
javaoneonline/2004/corej2se/BUS-2864.pdf, 2004.
[20] Michael L. Collard, Jonathan I. Maletic, and Andrian Marcus. Supporting document
and data views of source code. In DocEng ’02: Proceedings of the 2002 ACM symposium
on Document engineering, pages 34–41, New York, NY, USA, 2002. ACM Press.
[21] James R. Cordy, Ian H. Carmichael, and Russell Halliday. The TXL Programming
Language: Version 10.4, 2005. http://txl.ca/docs/TXL104ProgLang.pdf.
[22] James R. Cordy, Thomas R. Dean, Andrew J. Malton, and Kevin A. Schneider. Soft-
ware engineering by source transformation-experience with TXL. In Source Code Anal-
ysis and Manipulation, pages 170–180. IEEE Computer Society, 2001.
[23] James R. Cordy, Charles D. Halpern-Hamu, and Eric Promislow. TXL: A rapid proto-
typing system for programming language dialects. Comput. Lang, 16(1):97–107, 1991.
[25] Thomas R. Dean, James R. Cordy, Andrew J. Malton, and Kevin A. Schneider. Agile
parsing in TXL. Autom. Softw. Eng, 10(4):311–336, 2003.
[26] Francoise Detienne. Software Design - Cognitive Aspects. Springer Verlag, 2001.
[27] Andrea A. diSessa and Hal Abelson. Boxer: a reconstructible computational medium.
Commun. ACM, 29(9):859–868, 1986.
[28] Dale Dougherty. sed & awk. O’Reilly & Associates, Inc., 1991.
[30] Rudolf Ferenc, Susan Elliott Sim, Richard C. Holt, Rainer Koschke, and Tibor Gyi-
mothy. Towards a standard schema for C/C++. In Working Conference on Reverse
Engineering, pages 49–58. IEEE Computer Society Press, October 2001.
[31] Martin Fowler. Refactoring: Improving the Design of Existing Code. Object Technology
Series. Addison-Wesley, June 1999.
[32] Thomas Genssler and Volker Kuttruff. Source-to-source transformation in the large.
In JMLC, pages 254–265, 2003.
[34] Thomas R. G. Green. Instructions and descriptions: some cognitive aspects of pro-
gramming and similar activities. In Advanced Visual Interfaces, pages 21–28, 2000.
[35] Thomas R. G. Green and Marian Petre. Usability analysis of visual programming
environments: A ’cognitive dimensions’ framework. J. Vis. Lang. Comput, 7(2):131–
174, 1996.
[36] William G. Griswold, Darren C. Atkinson, and Collin McCurdy. Fast, flexible syntactic
pattern matching and processing. In A. Cimitile and H. A. Müller, editors, Proceedings:
Fourth Workshop on Program Comprehension. IEEE Computer Society Press, 1996.
[37] William G. Griswold and David Notkin. Automated assistance for program restruc-
turing. ACM Trans. Softw. Eng. Methodol., 2(3):228–269, 1993.
[38] Richard C. Holt, Andreas Winter, and Andy Schürr. GXL: Toward a standard exchange
format. In Working Conference on Reverse Engineering, pages 162–171, 2000.
[41] Bonnie E. John and David Kieras. The GOMS family of user interface analysis tech-
niques: Comparison and contrast. ACM Transactions on Computer-Human Interac-
tion, 3(4):320–351, December 1996.
[42] Brian Johnson and Ben Shneiderman. Tree-maps: a space-filling approach to the
visualization of hierarchical information structures. In VIS ’91: Proceedings of the 2nd
conference on Visualization ’91, pages 284–291, Los Alamitos, CA, USA, 1991. IEEE
Computer Society Press.
[43] Donald E. Knuth. The errors of TEX. Software– Practice and Experience, 19(7):607–
681, July 1989.
[44] Donald E. Knuth and Michael F. Plass. Breaking paragraphs into lines. Software:
Practice and Experience, 11(11):1119–1184, November 1981.
[45] Maria Kutar. A comparison of empirical study and cognitive dimensions analysis in
the evaluation of UML diagrams. In J. Kuljis, L. Baldwin, and R. Scoble, editors,
PPIG 14, June 2002.
[46] David A. Ladd and J. Christopher Ramming. A∗: A language for implementing
language processors. IEEE Transactions on Software Engineering, 21(11):894–901,
November 1995.
[47] Clayton Lewis and John Rieman. Task-Centered User Interface Design. Shareware,
1994. http://hcibib.org/tcuid/.
[49] Martin Lippert. Towards a proper integration of large refactorings in agile software
development. In XP, pages 113–122, 2004.
[50] Robert C. Martin and Robert S. Koss. Engineer notebook: An extreme programming
episode. In Robert C. Martin, editor, Advanced Principles, Patterns and Process of
Software Development. Prentice Hall, 2001.
[52] Robert C. Miller and Brad A. Myers. Interactive simultaneous editing of multiple
text regions. In Proceedings of the General Track: 2002 USENIX Annual Technical
Conference, pages 161–174, Berkeley, CA, USA, 2001. USENIX Association.
[53] Gail C. Murphy and David Notkin. Lightweight lexical source model extraction. ACM
Trans. Softw. Eng. Methodol., 5(3):262–292, 1996.
[55] Emmanuel Pietriga, Jean-Yves Vion-Dury, and Vincent Quint. VXT: A visual ap-
proach to XML transformations. In DocEng ’01: Proceedings of the 2001 ACM Sym-
posium on Document engineering, pages 1–10, New York, NY, USA, 2001. ACM Press.
[56] Peter G. Polson, Clayton Lewis, John Rieman, and Cathleen Wharton. Cognitive
walkthroughs: A method for theory-based evaluation of user interfaces. International
Journal of Man-Machine Studies, 36(5):741–773, 1992.
[58] Don Roberts, John Brant, and Ralph E. Johnson. A refactoring tool for Smalltalk.
Theory and Practice of Object Systems (TAPOS), 3(4):253–263, 1997.
[59] Derek M. Shimozawa and James R. Cordy. TETE: A non-invasive unit testing frame-
work for source transformation. In STEP 2005: 12th International Workshop on Soft-
ware Technology and Engineering Practice, 2005.
[61] E. Burton Swanson. The dimensions of maintenance. In ICSE ’76: Proceedings of the
2nd International Conference on Software Engineering, pages 492–497, Los Alamitos,
CA, USA, 1976. IEEE Computer Society Press.
[62] Lance Tokuda and Don Batory. Evolving object-oriented designs with refactorings.
Automated Software Engg., 8(1):89–120, 2001.
[63] Michael L. Van De Vanter. Practical language-based editing for software engineers.
Lecture Notes in Computer Science, 896, 1995.
[64] Michael L. Van De Vanter. The documentary structure of source code. Information &
Software Technology, 44(13):767–782, 2002.
[65] Michael L. Van De Vanter and Marat Boshernitsan. Displaying and editing source
code in software engineering environments. In Proceedings of Second International
Symposium on Constructing Software Engineering Tools, pages 39–48, Limerick, Ire-
land, 2000.
[66] Eelco Visser. Program transformation with Stratego/XT. In C. Lengauer et al., edi-
tors, Domain-Specific Program Generation, volume 3016 of Lecture Notes in Computer
Science, pages 216–238. Spinger-Verlag, June 2004.
[67] Tim A. Wagner. Practical Algorithms for Incremental Software Development Environ-
ments. Ph.D. dissertation, University of California, Berkeley, March 11, 1998. Technical
Report UCB/CSD-97-946.
[68] Tim A. Wagner and Susan L. Graham. Efficient self-versioning documents. In Pro-
ceedings of 42nd IEEE International Computer Conference, San Jose, CA, 1997.
104
[69] Tim A. Wagner and Susan L. Graham. Incremental analysis of real programming
languages. In Proceedings of the 1997 ACM SIGPLAN Conference on Programming
Language Design and Implementation, pages 31–43, 1997.
[70] Larry Wall, Tom Christiansen, and Jon Orwant. Programming Perl. O’Reilly and
Assoc., 2000.
[71] Richard C. Waters. Program translation via abstraction and reimplementation. IEEE
Transactions on Software Engineering, 14(8):1207–1228, August 1988.
[72] Marian G. Williams and J. Nicholas Buehler. A study of program entry time predictions
for application-specific visual and textual languages. In Papers Presented at the Seventh
Workshop on Empirical Studies of Programmers, pages 209–223. ACM Press, 1997.
[73] Ben Wing. ChangeLog entry for 2002-05-05. The XEmacs ChangeLog, 2002. http:
//cvs.xemacs.org/viewcvs.cgi/XEmacs/xemacs-20/src/ChangeLog.
[74] The XEmacs Project. XEmacs: the next generation of Emacs. http://www.xemacs.
org/.
[75] Xrefactory. A C/C++ refactoring browser for Emacs and XEmacs. http://www.xref.
sk/xrefactory.
105
Appendix A
This appendix presents a full listing of the TXL program used in the source code manip-
ulation case study in Chapter 2. This program implements the transformations specified
in Figure 2.2. We refer the reader to the TXL programming language manual [21] for
assistance in interpreting this TXL program.
include "Java.Grm"
include "JavaCommentOverrides.Grm"
function main
replace [program]
P [program]
by
P [transformTokenizerToScanner]
[transformNextInt]
[transformNextInt2]
[transformNextDouble]
[transformNextDouble2]
[transformNext]
[transformNext2]
[transformNumberTest]
[transformNumberTest2]
[transformWordTest]
[transformWordTest2]
[removeNextTokenStatement]
[removeNextTokenStatement2]
end function
106
rule transformTokenizerToScanner
replace [qualified_name]
java.io.StreamTokenizer
by
java.util.Scanner
end rule
rule transformNextInt
replace $ [expression]
(int) E [id] C [repeat component]
by
E C [transformNextIntInComponent]
end rule
rule transformNextInt2
replace $ [expression]
(int) (E [expression]) C [repeat component]
by
(E) C [transformNextIntInComponent]
end rule
rule transformNextIntInComponent
replace $ [repeat component]
.nval
by
.nextInt()
end rule
rule transformNextDouble
replace $ [expression]
E [id] C [repeat component]
by
E C [transformNextDoubleInComponent]
end rule
rule transformNextDouble2
replace $ [expression]
(E [expression]) C [repeat component]
by
(E) C [transformNextDoubleInComponent]
end rule
rule transformNextDoubleInComponent
107
rule transformNext
replace $ [expression]
E [id] C [repeat component]
by
E C [transformNextInComponent]
end rule
rule transformNext2
replace $ [expression]
(E [expression]) C [repeat component]
by
(E) C [transformNextInComponent]
end rule
rule transformNextInComponent
replace $ [repeat component]
.sval
by
.next()
end rule
rule transformNumberTest
replace [expression]
E [id] C [repeat component] == TT_NUMBER
by
E C [transformNumberTestInComponentNI] ||
E C [transformNumberTestInComponentND]
end rule
rule transformNumberTest2
replace [expression]
(E [expression]) C [repeat component] == TT_NUMBER
by
(E) C [transformNumberTestInComponentNI] ||
(E) C [transformNumberTestInComponentND]
end rule
108
rule transformNumberTestInComponentNI
replace [repeat component]
.ttype
by
.hasNextInt()
end rule
rule transformNumberTestInComponentND
replace [repeat component]
.ttype
by
.hasNextDouble()
end rule
rule transformWordTest
replace [expression]
E [id] C [repeat component] == TT_WORD
by
E C [transformWordTestInComponent]
end rule
rule transformWordTest2
replace [expression]
(E [expression]) C [repeat component] == TT_WORD
by
(E) C [transformWordTestInComponent]
end rule
rule transformWordTestInComponent
replace [repeat component]
.ttype
by
.hasNext()
end rule
rule removeNextTokenStatement
replace [statement]
E [id] C [repeat component] ;
where
C [isNextToken]
by
; % none
end rule
109
rule removeNextTokenStatement2
replace [statement]
(E [expression]) C [repeat component] ;
where
C [isNextToken]
by
; % none
end rule
function isNextToken
match [repeat component]
.nextToken()
end function
110
Appendix B
This appendix presents a partial listing of the source code for the MineSweeper game that
we used for user evaluation. We only list those parts of the source code that were affected by
the transformations in Figure 6.2. The source code for the MineSweeper game was originally
obtained from http://www.dcs.qmul.ac.uk/∼mmh/ItP/resources/MineSweeper/Notes.
html and modified to present more opportunities for transformation.
...
} else if (getTokenizer().sval.equals("mark")) {
...
} else if (getTokenizer().sval.equals("unmark")) {
...
} else if (getTokenizer().sval.equals("help")) {
...
} else if (getTokenizer().sval.equals("quit")) {
...
} else {
System.out.println("Unknown command -- try ’help’");
}
}
if (st.ttype != StreamTokenizer.TT_NUMBER)
throw new IllegalArgumentException();
int x = (int) st.nval;
st.nextToken();
if (st.ttype != StreamTokenizer.TT_NUMBER)
throw new IllegalArgumentException();
int y = (int) st.nval;
...
}
113
Appendix C
Cognitive Dimensions
Questionnaire for User Evaluation
This appendix presents a questionnaire that we constructed for evaluating iXj using the
Cognitive Dimensions framework (see Chapters 2 and 6). The language for the questionnaire
was adapted from the Cognitive Dimensions questionnaire optimized for users [6]. We
augmented the Blackwell and Green questionnaire with a seven-point semantic differential
scale in order to produce more easily quantifiable measures.
3. Does the transformation notation let you describe what you want
reasonably briefly or is it long-winded?
c. Are there any parts that you don’t really understand, but you put
them in because they just seem to be required? What are they?
a. Can you do this at any time you like? If not, why not?
b. Can you find out how much progress you have made and check at what
stage you are in your work? If not, what prevents you from doing so?
7. How often do you find yourself making small slips that make the
transformation process frustrating?
b. What features that would help you experimenting are missing from
the transformation tool?
b. What are the places where some things ought to be similar, but the
notation makes them different?
10. What kinds of things require the most mental effort when
constructing a transformation description?
11. Are there any parts in the transformation description that, when
changed, require you to make other related changes to other parts of
the description? What are they?