0% found this document useful (0 votes)
8 views

Assessing-the-impact-of-global-variables-on-program-_2010_Journal-of-Systems

This paper investigates the impact of global variables on program dependence and the formation of dependence clusters through a transformation-based analysis algorithm. The study, which analyzes 21 programs totaling over 50K lines of code, finds that many programs contain global variables that significantly affect overall dependence and can lead to dependence clusters. The results highlight the complexities of program dependence and suggest that understanding these effects is crucial for software engineering practices.

Uploaded by

denandasiri12
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Assessing-the-impact-of-global-variables-on-program-_2010_Journal-of-Systems

This paper investigates the impact of global variables on program dependence and the formation of dependence clusters through a transformation-based analysis algorithm. The study, which analyzes 21 programs totaling over 50K lines of code, finds that many programs contain global variables that significantly affect overall dependence and can lead to dependence clusters. The results highlight the complexities of program dependence and suggest that understanding these effects is crucial for software engineering practices.

Uploaded by

denandasiri12
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

The Journal of Systems and Software 83 (2010) 96–107

Contents lists available at ScienceDirect

The Journal of Systems and Software


journal homepage: www.elsevier.com/locate/jss

Assessing the impact of global variables on program dependence and


dependence clusters
David Binkley *,1, Mark Harman, Youssef Hassoun, Syed Islam, Zheng Li
King’s College London, Centre for Research on Evolution, Search and Testing (CREST) Strand, London WC2R 2LS, UK

a r t i c l e i n f o a b s t r a c t

Article history: This paper presents results of a study of the effect of global variables on the quantity of dependence in
Received 27 October 2008 general and on the presence of dependence clusters in particular. The paper introduces a simple transfor-
Received in revised form 24 February 2009 mation-based analysis algorithm for measuring the impact of globals on dependence. It reports on the
Accepted 19 March 2009
application of this approach to the detailed assessment of dependence in an empirical study of 21 pro-
Available online 1 April 2009
grams consisting of just over 50K lines of code. The technique is used to identify global variables that
have a significant impact upon program dependence and to identify and characterize the ways in which
Keywords:
global variable dependence may lead to dependence clusters. In the study, over half of the programs
Dependence cluster
Program slice
include such a global variable and a quarter have one that is solely responsible for a dependence cluster.
Global variable Ó 2009 Elsevier Inc. All rights reserved.

1. Introduction dependence has been linked to the ease with which a program can
be understood, in work on program comprehension (Balmas, 2002;
Global variables are generally deprecated in advice to program- Deng et al., 2001). The effect of program dependence is also felt in
mers, with many authors arguing that they have negative effects software maintenance and re-engineering, where it delimits the
(Marshall and Webber, 2000; Sward and Chamillard, 2004; Wulf changes that may be performed (Gallagher and Lyle, 1991; Tonella,
and Shaw, 1973). The use of global variables has been argued to 2003) and captures the impact that such changes will have (Black,
have harmful effects on many aspects of software engineering, 2001).
including maintainability (Yu et al., 2004) and correctness (Barnes, Dependence analysis forms the cornerstone for many software
2003). Some programming practitioners have gone so far as to sug- engineering activities that rely upon program analysis, such as pro-
gest, perhaps only semi-seriously, that programmers might be gram comprehension (Balmas, 2002; Deng et al., 2001), impact
fired for using global variables (Tennberg, 2002). Of course, the analysis and reduction (Black, 2001; Tonella, 2003), reuse(Cimitile
introduction of global variables can produce potential efficiency et al., 1996), software maintenance (Gallagher and Lyle, 1991),
gains (Sestoft, 1989), but such global-introduction transformations testing and debugging (Binkley, 1997; Harman et al., 2004;
are performed as a meaning-preserving compiler optimization, not Podgurski and Clarke, 1990), virus detection (Lakhotia and Singh,
as an approach to source-level code improvement. 2003), and restructuring and reverse engineering (Beck and
Though the typical view in programming language texts (for Eichmann, 1993; Lakhotia and Deprez, 1998).
example Strouptrup’s C++ book (Stroustrup, 2000)) is that global This paper presents a technique used to study the effects of
variables may often be a source of problems, this view is not uni- global variables on dependence as well as results from empirical
versally held. Some authors have even suggested that global vari- studies of these effects. The impact of even a single global can be
ables should be used in place of local variables (Fisher, 1983). very far-reaching; in some of the programs studied, a single global
However, despite much debate and advice on the use of global was found to account for most of the program’s overall dependence
variables over several decades, there remains little empirical study connectivity.
of the effects of global variables. The paper is also concerned with the effect that a global variable
Program dependence, which captures the influence of one pro- has on the presence or absence of large dependence clusters. A
gram component on another, is important because it has a bearing dependence cluster is a set of program statements, all of which
on so many aspects of software engineering. For example, program are dependent upon one another. Recent work (Binkley and Har-
man, 2005b) has shown that dependence clusters are surprisingly
* Corresponding author.
prevalent. Therefore, one of the additional goals of the study
E-mail address: [email protected] (D. Binkley). reported herein is to explore the ways in which global variables
1
On sabbatical leave from Loyola College in Maryland. can lead to dependence clusters.

0164-1212/$ - see front matter Ó 2009 Elsevier Inc. All rights reserved.
doi:10.1016/j.jss.2009.03.038
D. Binkley et al. / The Journal of Systems and Software 83 (2010) 96–107 97

As this paper will show, global variables can be the cause of 2. Dependence clusters
dependence clusters. The ability to identify the causes of depen-
dence clusters has implications for work related to these analyses. A dependence cluster is a maximal set of program points all of
For example, in program comprehension, several authors have con- which are mutually inter-dependent. Being maximal, a depen-
sidered hierarchical decomposition as a navigation mechanism dence cluster is not contained within any other dependence
that manages the cognitive complexity of program dependence cluster. One complexity in identifying dependence clusters is the
(Balmas, 2002; Deng et al., 2001). However, the presence of a non-transitive nature of dependence for certain language con-
dependence cluster will lead to a degenerate collapsed hierarchy, structs such as threads (Krinke, 1998) and procedures (Horwitz
in which such decomposition will be difficult. The ability to iden- et al., 1990). This means that dependence clusters are not simply
tify causes of clusters may allow such navigation tools to either strongly connected components within a dependence graph.
avoid them or to treat them as special cases. Furthermore, the abil- Fortunately, context-sensitive interprocedural dependence analy-
ity to break some clusters will improve the applicability of these sis captures the necessary calling-context information (Horwitz
approaches. et al., 1990).
The paper makes the following primary contributions: To visualize dependence clusters the Monotone Slice-size Graph
(MSG) was introduced (Binkley and Harman, 2005b). A slice is a
(1) It introduces an algorithm for variable substitution that sub-program that captures a semantically meaningful sub-compu-
allows the dependence due to a particular global variable tation from a program. Slices can be efficiently computed using the
to be ignored. This is used to assess and measure the effects System Dependence Graph (SDG) (Horwitz et al., 1990). An MSG is
of the global variable on dependence. a graph of all a program’s slice sizes plotted in monotonically
(2) The paper presents quantitative results that assess the effect increasing order on the x-axis. The y-axis measures slice size.
on dependence of 849 global variables in 21 programs. The For ease of comparison the MSGs shown in this paper use the
study reveals that more than half the programs considered percentage of the entire program on the y-axis. Thus, an MSG plots
have individual global variables that have a significant a landscape of monotonically increasing slice sizes, in which
impact on overall program dependence. dependence clusters correspond to sheer-drop cliff faces followed
(3) The paper presents further qualitative results of the effect by a long flat plateau (e.g., see the black line in Fig. 1).
these high dependence globals have on large dependence To illustrate cluster identification using an MSG, Fig. 1 shows
clusters. In some cases, a single global was found to be the four example MSGs. As explained below, each graph includes two
sole cause of a cluster, establishing evidence for a link MSGs: an original MSG in black and a post reduction MSG in grey.
between the use of globals and the presence of large depen- For example, the MSG in black in the left chart shows (reading
dence clusters. The study also investigates the categories along the horizontal axis) that approximately the first 20% of slices
into which these effects fall and the source code constructs are very small (containing about 10% of the program), after which
that cause them. the MSG reveals a sharp increase to almost 80% of the program.
(4) Finally, the paper presents a case study in which sets of glo- Most of the remaining slices are all of essentially the same size.
bal variables collectively combine to cause dependence This is visual evidence of a dependence cluster in the program.
clusters. The MSG for a program devoid of clusters is shown in grey under
this line.
The rest of the paper is organised as follows: Section 2 pre-
sents background material on dependence clusters. Section 3 3. Assessing the impact of global variables on dependence
introduces the algorithm for eliminating dependence due to a glo-
bal variable and uses this to measure the effect of each global has The impact of global variable g is measured in terms of the area
on overall program dependence. Sections 4 and 5 present the five under the MSG. That is, the difference in the area for the MSG con-
research questions addressed by the experiments and the exper- structed from the program with and without the dependence due
imental design. Sections 6 and 7 present the results of the quan- to g will be deemed to denote the impact of g on overall program
titative and qualitative studies of global variable dependence dependence.
effects, while Section 8 presents the case study of multi-global Of course, reducing the quantity of overall program dependence
variable dependence effects. Sections 9 and 10 consider threats may not mean the breaking of a dependence cluster for which MSG
to validity and related work, and finally, Section 11 summarises area reduction is a necessary, but not sufficient condition. This is
the paper. illustrated in Fig. 1 where the two charts show the two kinds of

Fig. 1. Two kinds of reduced dependence. In the left chart, the dependence cluster is broken. In the right chart, the cluster size is reduced, but the cluster remains unbroken.
98 D. Binkley et al. / The Journal of Systems and Software 83 (2010) 96–107

reduction. Both original (black) MSGs include a considerable clus- a = g + b where only the dependencies associated with global var-
ter. The (lower) grey MSG in the left chart shows the result of iable g is to be ignored. A minimum impact replacement, assuming
ignoring the dependence associated with the global variable enco- that g is an int variable, would be to replace g with a constant of the
dings; this clearly breaks the cluster. In the chart on the right the appropriate type (e.g., 0). While this would clearly change the
(lower) grey MSG shows only a reduction in the size of the slices meaning of the program (unless g happens to always have the va-
making up the cluster. In this example the cluster itself is still lue 0), the revised statement a = 0 + b preserves the dependence of
present. a on b and also the size of the program (in this case right down to
In order to measure the impact of ignoring the dependences the character level); its sole effect is to removal the dependence of
associated with a given global variable, the approach adopted in the statement on global variable g.
this paper transforms the program to remove all such dependence. In the general case, any type-appropriate constant will suffice.
However, although this may be simple in principle, it turns out that In the formalization of the replacement, the name ‘TAC’ is used to
the program analysis task of ‘ignoring dependence’ is not entirely denote this type-appropriate constant. For example, the constant
trivial in practice. ((int *)0) is used for a global variable of type int *.
The goal is to ignore all dependence that can be caused by def- To further illustrate the transformation, consider the replace-
initions and references of the global variable. To achieve this, it is ment of the expression ‘‘g++”. This is a two step process. The first
insufficient simply to mark as untraversable dependences due a gi- replaces g++ with g and then second replaces g with TAC. The
ven global variable. The reason for this is that such an edge-ignor- resulting expression removes dependencies associated with g,
ing approach will not take into account the secondary effects that a but does not change the number of SDG vertices used to represent
global variable has. For example, upon computations that support the code; thus, in terms of the analysis presented in the next sec-
the identification of dependence, such as dependencies due to tion, this change does not impact the program’s size.
pointers and the summary edge computation used by CodeSurfer A careful analysis of the grammar for C allows program size
Grammatech Inc. (2002) to build an SDG. reduction to be kept to a minimum. For example, C treats the array
To illustrate, consider ignoring the dependence caused by global expression a[i] internally as the pointer expression *(a + i), which is
variable g2 in the code fragment and partial SDG shown in Fig. 2. equivalent to *(i + a) and hence i[a]. This allows occurrences of a
Because p may point to g1 or g2, the assignment *p = 23 does not global array g to be replaced with a constant and thus removes
definitely kill the assignment g1 ¼ 42. However, in the absence the dependence associated with the global array. For example, glo-
of g2, *p = 23 kills the assignment g1 ¼ 42 because p only points bal_array[i++] is transformed into TAC[i++] where the resulting
to g1. This obviates the need for the data dependence edge from program omits dependencies associated with global_array, but
g1 ¼ 42 to local ¼ g1. maintains the dependencies associated with i.
As this example shows, simple edge marking is insufficient. To The formal rules for when a type-appropriate constant is suit-
overcome this, the paper introduces a simple transformation to able are given by the function DL (which identifies expressions that
syntactically delete the global variable from the AST. Of course, denote a named location). When this function returns bottom ð?Þ
such a transformation is not meaning preserving. It is not intended then the use of TAC is permitted. Otherwise, the expression involv-
to be. It merely serves the purpose of ignoring all dependence due ing the global, must be deleted from the program. In the transfor-
to the global variable under consideration. The resulting SDG is mation rules, this latter case leads to the global being rewritten to
used to analyse the impact of the global by comparison to the ori- the empty string, k. In most cases, such a replacement does not im-
ginal SDG (with all dependence due to the global included). In this pact the program’s (SDG’s) size. Its effect is primarily on the source
way, the global variable’s impact upon dependence can be fully represented by a given SDG vertex.
assessed. The remainder of the section formalizes the global variable
To study the impact of ignoring the dependences due to a global dependence-elimination transformation. The algorithm, which
variable, it is pragmatic to observe two principles: operates on C programs, is presented as a set of transformation
rules. Modification for other similar languages (e.g., C++) is
(1) The size of the program should not be changed more than straightforward. constraint set
necessary. Transformation rules are written as input source and make
(2) The effects of dependences not due to the global should use of the following notation: output source
remain unaffected.
 TAC denotes a type appropriate constant;
Taking these two concerns into account, the transformation  k denotes the empty string;
rules replace each occurrence of a global in such a way that depen-  ½xjy denotes the selection of x or y;
dence structure is otherwise unchanged while having minimum  var(declarator) denotes the variable declared.For example,
impact on program size. For example, consider the assignment var(a[]) and var(*a) are both a;
 g denotes the global targeted by the transformation.

The rules transform declarations and expressions that involve g.


In general, C language declarations involve storage class and type
specifiers followed by a list of declarators. The transformation,
shown in the upper left of Fig. 3 removes declarators that resolve
to g (e.g., g; g [10], or **g). While syntactically valid, if this leaves
an otherwise empty declaration, the tool removes the entire
declaration.
For most expression occurrences of g it is sufficient to replace g
with a TAC using the rule at the upper right of Fig. 3. However,
there are four special cases involving lvalues that require special
treatment because an lvalue denotes a modifiable memory location
(Ritchie, 1975). Lvalues are defined by the following production
Fig. 2. Illustration were marking edges is insufficient. from the C grammar (Ritchie, 1975):
D. Binkley et al. / The Journal of Systems and Software 83 (2010) 96–107 99

Fig. 4. Transformation examples showing the replacement of g with TAC.

4. Research Questions
Fig. 3. Transformations for removing variable g.

The empirical analysis addresses three sets of research ques-


tions, concerning the qualitative and quantitative effects of single
globals and the combined effects of sets of globals (multiple glo-
lvalue ! identifier bals). The study reports results on the quantitative effects that glo-
! expression bal variables have upon program dependence in general and the
! primary ‘½’ expression ‘’ specific qualitative effect that high dependence globals have upon
the program’s large dependence clusters.
! primary  > identifier
The majority of the results concern the effects of single globals
! lvalue : identifier because these potentially produce the most valuable information
! ðlvalueÞ to the software engineer. That is, if it is possible to identify a single
global variable that can be deemed to be responsible for a depen-
Only lvalues that denote (part of the) the location represented
dence cluster, then the engineer has a chance to consider the
by g require special treatment. Those that simply involve g in
meaning of this variable and ways in which it might be possible
denoting some other location do not. For example, no special treat-
to account for, ameliorate or otherwise reduce the impact of the
ment is needed for g[i], which is replaced by TAC[i] to correctly re-
dependence cluster. Clearly, where clusters have multiple causes,
tain the use of i. As noted above, this is legal in C and thus a
such amelioration may also be possible, but it is likely that dealing
convenient method for achieving the minimum impact removal
with single causes will be preferable.
of the dependence induced by the global. In the four special cases,
the function DL identifies those lvalues that denote a location di-
Research Question RQ1.1:
rectly associated with an identifier:
Overall quantitative effect of single global
Over all programs, what proportion of dependence is due to glo-
fun DL identifier = identifier bal variables and how many globals have a significant impact
j DL * expression = ? (as defined in Section 5.2) to total program dependence?
j DL primary ‘[’ expression ‘]’ = ?
j DL primary -> identifier = ? Research Question RQ1.2:
j DL lvalue . identifier = DL(lvalue) Quantitative effect of single global on each program
j DL ( lvalue ) = DL(lvalue) For each program considered separately, how many globals
have a significant impact (again using the statistical tests
described in Section 5.2) on dependence and what is the magni-
Thus, DLða½i þ þÞ ¼? because there is no identifier associated tude of their effect? (This question make more sense knowing
with the incremented memory location, while DL (p.x) = p because the results for RQ1.1.)
part of the location referred to by p is modified.
Research Question RQ2.1:
The four special cases are shown as the bottom four rules in
Qualitative assessment of effect of single global on dependence
Fig. 3; they occur when (some part of) g’s address is taken, when
clusters
it is modified by an assignment operator, and when an increment
What effects do global variables have upon dependence
or decrement operator is involved. In the first case, NULL is used
clusters?
rather than TAC. For the second, the assignment portion of the
assignment expression is removed. That is, ‘‘g+= i++” is trans-
Research Question RQ2.2:
formed into ‘‘i++”. For the final two cases, the increment or decre-
Qualitative assessment of the causes in source code
ment operator is simply removed.
What source code patterns are found that correspond to the
It is interesting to note that, for occurrences of g such as *g = 2,
effects of global variables on dependence clusters?
the general rule is sufficient because g itself is not being modified.
The four research questions above concern the effects of single
In practice this leaves an anonymous location being updated
global variables. Research Question RQ3 concerns the effects
(through TAC). Such locations do not cause dependencies in the
of multiple globals on dependence clusters.
SDG, and thus the transformation retains only the effect of the
right-hand side of the assignment expression. There are other sim- Research Question RQ3:
ilar cases when the transformation removes parts of statements Qualitative and quantitative effect of multiple globals on each
that have no effect on the program’s dependencies. For example, program
the transformation of ‘‘g++” leaves ‘‘TAC”. Further examples are What effects can be found where multiple global variables par-
shown in Fig. 4. ticipate collectively in causing dependence clusters?
100 D. Binkley et al. / The Journal of Systems and Software 83 (2010) 96–107

5. Experimental design 5.2. Statistical tests applied

This section briefly describes the programs studied and the tests The statistical tests used in this paper follow the well estab-
used to assess whether a global variable has a significant impact lished approach to measuring the significance of effects adopted
upon dependence. in work on control limits theory (Reid and Sanders, 2007). Two
statistical tests are applied to determine those global variables that
5.1. Programs studied and their MSGs have significant impact on overall program dependence. The first is
based on the notion of outliers, while the second is based on vari-
The 21 programs used as subjects in the study are described in ance. A global variable is taken to have a significant impact upon
Fig. 5. They were taken from online repositories such as direc- program dependence if it is determined to be significant by both
tory.fsf.org and planet-source-code.com except for space which is code tests. The reason for considering both tests is to reinforce each
created for the European Space Agency. For each program, the fig- individual test. Furthermore, tests based upon outliers are more
ure includes the name of the program, the number of slices taken robust against non-normality in the data distribution. Finally, note
when analyzing it, its size in lines of code as counted by the unix that most common statistical tests, such as a t-test, are not appli-
utility wc and the number of non-blank non-comment lines of code cable in this case. Many such tests assume that the data come from
as counted by sloc (Wheeler, 2005), and finally a brief description. normal population and compare population means but are not
These programs cover a variety of application domains such as pro- designed to test if a given value is expected to lie outside a
gramming utilities, terminal software, booking systems, simula- population.
tions, games, interpreters, and code transformation tools. Using the outlier approach (Armitage and Berry, 1994), a point
is significant if it lies three times the interquartile range below (or
above) the mean. Thus, using this approach, a global variable has a
significant impact upon program dependence if it causes a reduc-
tion in the area under the MSG that is more than three times the
interquartile range below the mean. In the variance approach
(Armitage and Berry, 1994), a point more than two standard
deviations below (or in general above) the mean is significantly
different, while a point three or more standard deviations below
the mean is taken as highly significantly different. Thus, using this
approach, a global variable has a significant impact upon program
dependence if it causes a reduction in the area under the MSG that
is two standard deviations below the mean. Furthermore, it has a
highly significant impact if it causes a reduction in the area under
the MSG that is three standard deviations below the mean.
In the data analysis the interquartile range and the standard
deviation are computed using two different samples. The first
includes all the data from all the programs collectively; a global
variable is considered to be significant iff it causes a significant
impact on the quantity of dependence, relative to the mean of all
849 variables considered in the whole study. The second approach
considers each program in isolation; thus, a global variable is
considered to be significant iff it causes a significant effect on
dependence, relative to the mean of all global variables in that
program.
It turns out that, for all programs, the variance approach is a
stricter test for significance than the outlier approach: if a result
is significant according to the variance approach, then it is also
significant with respect to the outlier approach (though not neces-
sarily vice versa). In the experiment while both tests were applied
and the resulting sets of global variables intersected, the result was
Fig. 5. The 21 programs studied.

Fig. 6. Histogram showing the counts (on the y-axis) of globals leading to different ranges of dependence reduction. As the left-hand histogram shows, most global variables
have little effect; they fall into the 0–10% range. The right-hand histogram zooms in on the remaining data which shows reductions from 10% to100%.
D. Binkley et al. / The Journal of Systems and Software 83 (2010) 96–107 101

always the same as if only the more strict variance approach was in black. The y-axis shows the remaining dependence for each glo-
used. bal variable shown on the x-axis. This represents 3.5% of all the glo-
bal variables. Therefore, the answer to RQ1.1 is that there are a few
6. Impact on dependence levels individual global variables that have a significant effect on the
quantity of program dependence, but most globals have little effect.
The quantitative study is concerned with addressing Research
Questions RQ1.1 and RQ1.2. It provides results of the assessment 6.2. RQ1.2: per program effects of single global causes
of the impact of each global variable (of 849 in total) on the depen-
dence in the programs in which these globals reside. The answer to Research Question RQ1.1 suggests that there are
only a few global variables that have a significant effect on depen-
6.1. RQ1.1: overall effects of single global causes dence. A natural question to ask is whether these are concentrated
in a few programs that are in some way ‘special cases’, or whether
For each of the programs, the MSG was first computed using the there are many programs that contain a global that causes a signif-
unmodified program. Then, it was re-computed 849 times. Each re- icant effect on dependence. This is the question raised by Research
computation ignored the dependence due to one of the global vari- Question RQ1.2.
ables. Dependence effects were then assessed by comparing the Over all 21 programs, twelve programs include at least one of
area under the MSGs. the variables identified in the previous section as causing an over-
Over all globals in the study, the average area remaining after a all significant dependence reduction. Fig. 8 shows the reductions
global variable’s dependences are ignored was 97.4% with a stan- for the top four variables from each of these twelve programs
dard deviation of 8.6%. Fig. 6 shows a histogram of results for the (the break down of the reduction kinds are discussed in the next
reduction due to globals. Clearly, most globals cause little reduc- section). It is noticeable from this figure that the second and
tion: 800 of the 849 global variables cause no more than a ten per- subsequent global variables have considerably lower impact than
cent reduction. However, more importantly, as can be seen in the that of the first global variable. These results provide an intriguing
right hand detailed histogram, there do exist some global variables answer to Research Question RQ1.2: though there may be few
that have a considerable impact on the quantity of dependence in significant global variables overall, more than half the programs
the programs in which they reside. studied have only one of them.
Fig. 7 shows the 30 of the 849 global variables that have a signif- Section 5.2 describes two statistical tests that are applied to
icant impact, with those having a highly-significant impact shown determine if a global variable causes a significant difference. These

Fig. 7. Impact of ignoring dependencies for each global variable. The chart measures the remaining dependence on the y-axis for the 3.5% of the global variables that have a
significant impact. Highlighted in black are those that have a highly significant impact.

Fig. 8. Impact of ignoring dependencies for each global variable. The chart shows the four reduction causing global variables for programs where at least one significant global
variable exists.
102 D. Binkley et al. / The Journal of Systems and Software 83 (2010) 96–107

specific impact of global variables on the dependence clusters of


the programs studied.

7.1. RQ2.1: overall categorization of the effects of single global causes

From Section 2, a reduction in area under the MSG is a neces-


sary, but not sufficient condition for the breaking of a dependence
cluster. Research Question RQ2.1 asks if the reduction in area un-
der the MSG is accompanied by a corresponding breaking of clus-
ters. This is a more subjective determination.
To address this question the 849 MSGs were examined. Four
patterns emerge, which will be denoted: break, partial break, sub-
cluster break, and drop. Representative examples of the four are
shown in Fig. 11 where each graph shows two MSGs: the original
MSG in black and the MSG after ignoring the dependence of the gi-
ven global variable in grey. For example, the MSGs for barcode
shown at the far left illustrate the break case where a large depen-
dence cluster disappears. Next to this is a partial break. Here the
MSG (when ignoring buffr of pc2c) shows a clear but partial break-
ing of the dependence clusters of pc2c. In this case approximately
three fifths of the cluster disappears. Next to this, is a sub-cluster
Fig. 9. For each program, the mean reduction leading to the number of globals break where the one large cluster in the MSG for ctags fractures into
causing a significant reduction (penultimate column). Each line of the table includes a collection of smaller clusters (evidenced by the stair step pattern
the mean area when ignoring the dependence of each global, the standard deviation in the MSG). The final MSG illustrates the drop case where no clus-
of the mean, the cutoff for a significant impact, and the two counts. The final ter breaking occurs; the size of the slices making up the cluster are
column is the number of significant globals using the mean over the entire
collection of programs, which is shown in the first line. The program replace has no
simply reduced, though the cluster remains. Thus, the characteris-
globals and the program space has only one. tic cliff-face and plateau is still present but with reduced height.
As shown in Section 6 using the data over all programs, 29 glo-
bal variables are significant, while using the per program data, 26
global variables are identified. Those global variables identified by
the per-program data that are not by the all-program data are all
drops of small magnitude.
Fig. 12 summarizes the categorization of the effect of each sig-
nificant global. Numerically, the 16 global variables significant in
both approaches produced three breaks, six partial breaks, two
sub-cluster breaks, and five drops. Thus, in total, ignoring the
dependence associated with just over half of the significant global
variables and 1.3% of all globals (11 of the 849), led to the breaking
of clusters. Fig. 8 includes the categorization for each program
using the data over all programs.
Over half the global variables that are considered significant
Fig. 10. Number of programs with various numbers of global variables having a (either per program or overall) play an important role in the forma-
significant impact. tion of a dependence cluster. Furthermore, as can be seen from
Fig. 8, all of the programs that contain a global variable that has
a significant effect on dependence also contain a global variable
with a non-trivial role to play in the formation of large dependence
tests can be applied to the entire pool of global variables (as done clusters.
above) or to the globals of each programs independently. This sec-
ond version asks not whether a variable, g is significant among all
variables studied, but whether g is significant among only those 7.2. RQ2.2: source code features that appear to cause clusters
globals that reside in the same program as g. The results serve to
strengthen confidence in the answer to RQ1.2. To answer Research Question RQ2.2 the source code of each
Fig. 9 shows a comparison of the two approaches to assessing program with one or more significant global variable was in-
significance. Fig. 10 shows the visual relationship between the spected in an attempt to identify source code patterns that cause
two approaches to testing for significance. As can be seen from the use of globals to lead to the presence of dependence clusters.
Fig. 9, apart from one case (the program time), all those programs Four patterns emerge. These will be denoted: central data structure,
that contain global variables that are significant among all globals, top-up, lazy programmer, and library. Each is now defined and illus-
also contain globals that are significant in the program in which trated using case study examples from the code studied.
they reside. This lends additional evidence to support the belief Examples of a central data structure include the board in a game,
that many programs may contain globals with a significant effect the memory registers used by interpreter, the current instruction pro-
on dependence. cessed by the assembler fass, and current state of the parser within
the pretty printer indent. In most cases, ignoring the dependence
7. Impact on dependence clusters associated with such a global variable simply causes a drop, but
not a breaking of dependence clusters. This is primarily because
The previous section considered the quantitative impact of glo- dependencies involving other variables continue to tie the dispa-
bal variables on dependence in general. This section focuses on the rate parts of the cluster together.
D. Binkley et al. / The Journal of Systems and Software 83 (2010) 96–107 103

Fig. 11. Examples from the classification of MSGs.

This pattern causes needless dependence connections between


functions; thereby, linking together otherwise unrelated parts of
the program.
An example of this pattern is the global FRS (Function Return
String) from pc2c. Its dependencies’ impact on the MSG (a drop)
Fig. 12. Effect of global variables with a significant impact on dependence clusters.
can be seen in the lower right of Fig. 11. Other examples include
(loop) counters and (input) file pointers.
The final pattern, library, occurs in three of the programs. It is
However, in some cases, the central data structure is all that the only one that was always associated with at least partial break-
binds the cluster; ignoring its dependencies breaks it. For example, ing of dependence clusters in this study. This pattern is similar to
in fass, the current instruction is held in a central data structure. the central data structure pattern except that the data structure is
Ignoring this global variable’s dependencies disconnects the code read only. Instances include
for processing each kind of instruction.
struct encoding encodings[]; from barcode,
The second pattern, top up, is caused by a variable that adds
static parserDefinition **LanguageTable; from ctags, and
an often small increment to a large collection of slices. The
char *output_format; from time.
most common cause of this pattern is an input buffer where
the reading of the input is part of most slices and gets ‘cut For example, the array encodings holds a pointer to each of the
off’ from each of these slices when ignoring the input buffer’s kinds of barcode that the barcode program is able to encode.
dependencies. Similarly, ctags can produce tags for a variety of languages. The
In four of the twelve occurrences of this pattern, the input was selection is achieved by indexing into the global array
subsequently used in sufficient decision logic to also lead to some LangaugeTable. Finally, output_format is used by the output function
evidence of cluster breaking. With three of the four, this applies to of time. The format is iterated over; thus, tying together the code
only a small number of slices. The fourth is buffr from pc2c. As for all the various output formats.
shown in the upper right chart of Fig. 11, approximately three Library variables cause dependence clusters primarily because
fifths of the large cluster is broken up by ignoring the (decision static analysis tools cannot determine that the particular element
based) dependencies of buffr. chosen will not change. For example, from the static analysis point
The third pattern, lazy programmer, occurs when a single global of view, it is possible for barcode to switch encoding functions in
variable is used in place of a collection of locals. Often this pattern the middle of an encoding. This serves to link together the different
is obvious from the global variable’s name (e.g., temp or xxindex). encoders into one large cluster. The (external) insight that only one
104 D. Binkley et al. / The Journal of Systems and Software 83 (2010) 96–107

encoder is ever used for any given encoding (execution of the Next to the percent reduction bar chart, the top row of Fig. 13
program) would allow a comprehension tool, for example, to break shows pc2c’s original MSG and the MSG resulting from ignoring
the cluster. However, such a tool would require sophisticated do- the dependence of all three variables. The interesting thing to note
main knowledge, placing this beyond the abilities of current in the final MSG is the complete breaking of the clusters. The
dependence analysis technology. second row shows the MSGs for buffr, FRS, and their combination,
which is interesting because it shows more than simply the
8. Multiple global causes combined effect. That is, in addition to the ‘break’ and ‘drop’, it also
shows evidence of a sub-cluster break. The final three charts show
This section considers the effects of ignoring the dependencies the MSG produced when ignoring dependence associated with
associated with combinations of globals. Several of the programs buffw, buffw and FRS, and buffr and buffw.
shown in Fig. 8 include such sets of multiple significant global vari-
ables. Program pc2c is used as an illustrative case study. Similar 9. Threats to validity
patterns exist in compress, file_server, protest, and sudoku. The case
study considers the two significant global variables: FRS and buffr This section considers threats to the external, internal, and
and the ‘almost significant’ global variable buffw. The dependence construct validity of the results presented. The main external
reductions achieved by ignoring various combinations are shown threat arises from the possibility that the selected programs are
on the y-axis of the chart at the top left of Fig. 13. not representative of programs in general, with the result that

Fig. 13. MSGs for selected global variable of pc2c showing the impact of ignoring dependences associated with combinations of global variables.
D. Binkley et al. / The Journal of Systems and Software 83 (2010) 96–107 105

the findings of the experiments do not apply to ‘typical’ programs. techniques for empirical assessment of dependence structures. The
The programs studied include a wide variety of different tasks present paper is an invited submission to a special issue of JSS and
including, applications, utilities, games, and system code. There it was largely through this work that the invitation to submit the
is, therefore, reasonable cause for confidence in the results ob- present paper arose. The authors have been encouraged by the edi-
tained and the conclusions drawn from them. However, all of the tor to include a brief overview of this previous work in this section.
programs studied were C or C++ programs. Therefore, it would To achieve this, the following paragraphs set out the results so far
be premature to infer that the results necessarily apply to other established in this on-going ‘empirical dependence analysis’ re-
programming languages. search agenda.
Internal validity is the degree to which conclusions can be Binkley and Harman (2003b) presented the first study which
drawn about the causal effect of the independent variable on the aimed to answer the question: ‘How big is a typical program slice?’.
dependent variable. In this experiment, the primary threat comes Though slicing had been first proposed some 24 years previously
from the potential for faults in the tools used to gather the data. by Weiser (1979), there had not been any subsequent attempt to
A mature and widely used slicing tool (CodeSurfer Grammatech systematically slice a large code base using every possible slicing
Inc. (2002)) was used to mitigate the concern. criterion, thereby providing baseline data on slice size. This paper
Construct validity assesses the degree to which the variables aimed to do just that. The paper constructed slices for 43 programs
used in the study accurately measure the concepts they purport containing just over one million lines of code. To date, this remains
to measure. Note that in the presence of human judgments, con- the largest empirical study of slicing in the literature. It also con-
struct validity is a more serious concern. In this study the only sidered the impact that calling context and structure field expan-
measurement is of slice size, which can be done accurately. sion has on slice size. In order to construct the large number of
slices required, several novel slice efficiency techniques were intro-
10. Related work duced (Binkley and Harman, 2003c; Binkley et al., 2007b). The pa-
per was later extended (Binkley et al., 2007a) to provide a larger
This paper considers the role global variables play in source-level study that also considered the effects of dead code, pointer analy-
dependence in general and dependence clusters in particular. sis, and slice granularity on the size of slices produced.
Though global variables have long been regarded as a potential Subsequently, Binkley and Harman noticed that, though for-
causes of problems (Wulf and Shaw, 1973), there has been no previ- ward and backward slicing are dual notions of dependence, there
ous work that has provided a quantitative assessment of their impact is an interesting difference in the distribution of size of slices. That
on dependence. Previous work on dependence clusters in software is, because of the duality of forward and backward dependence, the
engineering (Eisenbarth et al., 2001; Mahdavi et al., 2003; Mitchell average size of a set of forward slices of procedure or program p
and Mancoridis, 2006) has focused on higher levels of abstraction, will be identical to the average size of the backward slices of p.
such as models or functions. Previous work on source-level depen- However the distributions of these slices are very different; the for-
dence clusters has been primarily carried out in support of compiler ward slices tend to be smaller. This was demonstrated empirically,
analysis where semantics preservation is a key requirement (Fischer where it was shown that the difference in slice-size distribution is
and LeBlanc, 1988; Jones and Muchnick, 1981; Reps, 1994). entirely due to the effects of control dependence in structured lan-
Dependence analysis has been shown to be effective at reducing guages (Binkley and Harman, 2005a). This realization lends to for-
the computational effort required to automate the test-data gener- ward slicing a hitherto unrealized importance, making it all the
ation process (Harman et al., 2007). Using dependence analysis, it more surprising that forward slicing has been largely overlooked
is possible to reduce both the amount of code to be tested and in the literature, by comparison with its much more widely-stud-
the size of the input domain. However, the presence of large ied counterpart: backward slicing.
dependence clusters will mean that no such reduction can be Though forward slices have been demonstrated to be smaller
achieved when testing any part of the program that lies in a clus- than backward slices, the sad fact remains that all static slices, for-
ter. Identifying and busting these clusters can therefore be thought ward or backward, tend to be rather large. That is, a programmer
of as a step towards improving testability. faced with a million line program, is unlikely to be consoled by a
In software maintenance, dependence analysis is used to protect 300,000 line slice; though the slice may be smaller than the origi-
a software maintenance engineer against the potentially unfore- nal program, the threshold at which comprehension support be-
seen side effects of a maintenance change. This can be achieved comes realistic remains some way off. In order to address this
by measuring the impact of the proposed change (Black, 2001) or slice size problem, a new form of dependence analysis called Key
by attempting to identify portions of code for which a change can Statement Analysis was introduced (Harman et al., 2002) and re-
be safely performed, free from side effects (Gallagher and Lyle, cently empirically studied (Binkley et al., 2008a). In Key Statement
1991; Tonella, 2003). Unfortunately, all statements in a depen- Analysis, dependence computation is used to target those few
dence cluster transitively impact all other statements in the cluster. statements upon which the program’s dependence revolves. The
Therefore, the ripple effect for these statements will be large and empirical findings demonstrate that key statement analysis can
any attempt to perform a modification will be challenging. be used to identify the few statements in a program that capture
The transformation based approach to assessment of depen- most of the impact of the whole program’s dependence.
dence due to globals and the effects on dependence clusters is sim- A separate study presented the effects of formal parameters and
ilar to Krinke’s barrier slicing Krinke (2003) and the ‘wedge’ global variables on levels of predicate dependence (Binkley and
transformation of Lakhotia and Deprez Lakhotia and Deprez Harman, 2003a; Binkley and Harman, 2004a). Predicate depen-
(1998). In barrier slicing, barriers are used to prevent consideration dence was considered a subject worthy of study because the pred-
of any dependence past the barrier. In tucking, a wedge is inserted icates of a program capture its essential logical intention, the
into the code to ‘cap off’ dependence before the wedge so that the comprehension of which underpins so many software engineering
code may be split out and folded into smaller, ideally more cohe- activities. The primary finding of this work was the observation
sive, sub units. that, as the number of formal parameters available to a predicate
The work reported in this paper is part of a research agenda, increases, the proportion upon which it depends tends to decrease.
currently being pursued by two of the present authors (Harman This was noticed in many of the programs studied and it was a
and Binkley) and their colleagues and collaborators. For this agen- trend that was borne out by statistical analysis. No such trend
da, dependence analysis is advocated as a way to provide tools and was observed for backward slicing. This result may indicate that
106 D. Binkley et al. / The Journal of Systems and Software 83 (2010) 96–107

as functions increase the number of formal parameters available, Barnes, J., 2003. High Integrity Software: The SPARK Approach to Safety and
Security. Addison Wesley, New York, NY.
they tend to become less cohesive.
Beck, J., Eichmann, D., 1993. Program and interface slicing for reverse engineering.
Subsequent work (Jiang et al., 2008; Tao et al., 2008), exploited In: IEEE/ACM 15th Conference on Software Engineering (ICSE’93). IEEE
this observation regarding cohesion to develop Search Based Soft- Computer Society Press, Los Alamitos, California, USA, pp. 509–518.
ware Engineering techniques for automating the process of slicing Binkley, D.W., 1997. Semantics guided regression test cost reduction. IEEE
Transactions on Software Engineering 23 (8), 498–516.
procedures, guided by fitness functions that capture dependence Binkley, D.W., Harman, M., 2003a. An empirical study of predicate dependence
interactions. levels and trends. In: 25th IEEE International Conference and Software
Previous work has also considered other potential harmful ef- Engineering (ICSE 2003). IEEE Computer Society Press, Los Alamitos,
California, USA, pp. 330–339.
fects of dependence structures that can be uncovered using static Binkley, D.W., Harman, M., 2003b. A large-scale empirical study of forward and
analysis. Chief among these ‘dependence anti patterns’ (Binkley backward static slice size and context sensitivity. In: IEEE International
et al., 2008b) are dependence clusters (Binkley and Harman, Conference on Software Maintenance. IEEE Computer Society Press, Los
Alamitos, California, USA, pp. 44–53.
2005b). This work provided a definition of several forms of anti Binkley, D.W., Harman, M., 2003c. Results from a large-scale study of performance
pattern and dependence-based techniques for locating them. The optimization techniques for source code analyses based on graph reachability
empirical results indicated how these techniques found examples algorithms. In: IEEE International Workshop on Source Code Analysis and
Manipulation (SCAM 2003). IEEE Computer Society Press, Los Alamitos,
of anti patterns in open source and production industrial code. California, USA, pp. 203–212.
Other work illustrated the way in which the normally static nature Binkley, D.W., Harman, M., 2004a. Analysis and visualization of predicate
of these forms of dependence analysis could be brought to life in dependence on formal parameters and global variables. IEEE Transactions on
Software Engineering 30 (11), 715–735.
animations, that provide a ‘fourth dimension’ to dependence visu-
Binkley, D.W., Harman, M., 2004b. A survey of empirical results on program slicing.
alization (Binkley et al., 2006). Advances in Computers 62, 105–178.
Previous work has also presented empirical results on the rela- Binkley, D., Harman, M., 2005a. Forward slices are smaller than backward slices.
tionship between high level program concepts (such as credit, In: Fifth IEEE International Workshop on Source Code Analysis and
Manipulation. IEEE Computer Society Press, Los Alamitos, California, USA,
undercarriage and holiday entitlement) and low level dependence pp. 15–24.
at the statement level (Binkley et al., 2008; Gold et al., 2006). This Binkley, D., Harman, M., 2005b. Locating dependence clusters and dependence
work revealed that code which is conceptually similar also has a pollution. In: 21st IEEE International Conference on Software Maintenance. IEEE
Computer Society Press, Los Alamitos, California, USA, pp. 177–186.
tighter, more cohesive dependence structure. Binkley, D., Harman, M., Krinke, J., 2006. Animated visualisation of static analysis:
In 2004, Binkley and Harman provided a detailed survey of characterising, explaining and exploiting the approximate nature of static
empirical results on program slicing (Binkley and Harman, analysis. In: Sixth International Workshop on Source Code Analysis and
Manipulation (SCAM 06), Philadelphia, Pennsylvania, USA, pp. 43–52.
2004b), to which the reader is referred for a more detailed account Binkley, D.W., Gold, N., Harman, M., 2007a. An empirical study of static program
of related work and results concerning program slicing and pro- slice size. ACM Transactions on Software Engineering and Methodology 16 (2),
gram dependence. 1–32.
Binkley, D.W., Harman, M., Krinke, J., 2007b. Empirical study of optimization
techniques for massive slicing. ACM Transactions on Programming Languages
11. Summary and future work and Systems 30, 3:1–3:33.
Binkley, D., Gold, N., Harman, M., Li, Z., Mahdavi, K., 2008a. Evaluating key
This paper is concerned with the effect of global variables on pro- statements analysis, in: 8th International Working Conference on Source Code
Analysis and Manipulation (SCAM’08), IEEE Computer Society, Beijing, China,
gram dependence. It introduces a technique for measuring the ef- pp. 121–130.
fect of a global variable on the quantity of dependence present in Binkley, D., Gold, N., Harman, M., Li, Z., Mahdavi, K., Wegener, J., 2008b. Dependence
a program and uses this to study the effects of 849 global variables anti patterns, in: Fourth International ERCIM Workshop on Software Evolution
and Evolvability (Evol’08), L’Aquila, Italy, pp. 25–34.
from 21 programs. The results show that, while most global vari- Binkley, D., Gold, N., Harman, M., Li, Z., Mahdavi, K., 2008. An empirical study of the
ables have essentially no impact on program dependence, there relationship between the concepts expressed in source code and dependence.
are a few that have a large and significant effect. Moreover, though Journal of Systems and Software 81 (12), 2287–2298.
Black, S.E., 2001. Computing ripple effect for software maintenance. Journal of
there may be few such global variables, many programs (more than
Software Maintenance and Evolution: Research and Practice 13, 263–279.
half those studied) have at least one such significant variable. Cimitile, A., De Lucia, A., Munro, M., 1996. A specification driven slicing process for
The paper also studies the way in which dependencies due to identifying reusable functions. Software Maintenance: Research and Practice 8,
145–178.
some global variables hold together large dependence clusters.
Deng, Y., Kothari, S., Namara, Y., 2001. Program slice browser. In: 9th IEEE
The results show that globals can be the sole cause of such clusters. International Workshop on Program Comprehension. IEEE Computer Society
The paper presents a categorization of these effects and examines Press, Los Alamitos, California, USA, pp. 50–59.
the source code patterns behind the clusters. Eisenbarth, T., Koschke, R., Simon, D., 2001. Locating features in source code, IEEE
Transactions on Software Engineering 29 (3) (special issue on ICSM 2001).
The empirical results presented in the paper are findings from a Fischer, C.N., LeBlanc, R.J., 1988. Crafting a Compiler, Benjamin/Cummings Series in
study of C and C++ programs. Future work will consider other pro- Computer Science. Benjamin/Cummings Publishing Company, Menlo Park, CA.
grams and programming paradigms to assess the degree to which Fisher, D.L., 1983. Global variables versus local variables. Software – Practice and
Experience 13 (5), 467–469.
these results generalize. It will also investigate the opportunities for Gallagher, K.B., Lyle, J.R., 1991. Using program slicing in software maintenance. IEEE
dependence cluster-breaking refactoring suggested by the finding that Transactions on Software Engineering 17 (8), 751–761.
global variables may be the sole cause of some dependence clusters. Gold, N., Harman, M., Li, Z., Mahdavi, K., 2006. An empirical study of executable
concept slice size. In: 13th Working Conference on Reverse Engineering (WCRE
Future work will consider the relationships between the depen- 06), Benevento, Italy, pp. 103–114.
dence clusters of a program, techniques for helping the program- Grammatech Inc., 2002. The codesurfer slicing system. <www.grammatech.com>.
mer to break them into smaller, more manageable clusters, Harman, M., Gold, N., Hierons, R.M., Binkley, D.W., 2002. Code extraction algorithms
which unify slicing and concept assignment. In: IEEE Working Conference on
empirical assessment of their effects on program comprehension
Reverse Engineering (WCRE 2002). IEEE Computer Society Press, Los Alamitos,
and other potential causes of dependence clusters. California, USA, pp. 11–21.
Harman, M., Hu, L., Hierons, R.M., Wegener, J., Sthamer, H., Baresel, A., Roper, M.,
2004. Testability transformation. IEEE Transactions on Software Engineering 30
References (1), 3–16.
Harman, M., Hassoun, Y., Lakhotia, K., McMinn, P., Wegener, J., 2007. The impact of
Armitage, P., Berry, G., 1994. Statistical Methods in Medical Research. Macmillan, input domain reduction on search-based test data generation. In: ACM
London. Symposium on the Foundations of Software Engineering (FSE’07), Association
Balmas, F., 2002. Using dependence graphs as a support to document programs. In: for Computer Machinery, Dubrovnik, Croatia, pp. 155–164.
Second IEEE International Workshop on Source Code Analysis and Horwitz, S., Reps, T., Binkley, D.W., 1990. Interprocedural slicing using dependence
Manipulation. IEEE Computer Society Press, Los Alamitos, California, USA, pp. graphs. ACM Transactions on Programming Languages and Systems 12 (1), 26–
145–154. 61.
D. Binkley et al. / The Journal of Systems and Software 83 (2010) 96–107 107

Jiang, T., Harman, M., Hassoun, Y., 2008. Analysis of procedure splitability, in: 15th David Binkley is a Professor of Computer Science at Loyola College in Maryland
Working Conference on Reverse Engineering (WCRE’08), Antwerp, Belgium, pp. where he has worked since earning his doctorate from the University of Wisconsin
247–256. in 1991. From 1993 to 2000, Dr. Binkley was a visiting faculty researcher at the
Jiang, T., Gold, N., Harman, M., Li, Z., 2008. Locating dependence structures using
National Institute of Standards and Technology (NIST), where his work included
search based slicing. Journal of Information and Software Technology 50 (12),
participating in the Unravel program slicer project. While on leave from Loyola in
1189–1209.
2000, he worked with Grammatech Inc. on the System Dependence Graph (SDG)
Jones, N., Muchnick, S. (Eds.), 1981. Program Flow Analysis: Theory and
Applications. Prentice-Hall, Englewood Cliffs, NJ. based slicer CodeSurfer and in 2008 he joined the researchers at the Crest Centre of
Krinke, J., 1998. Static slicing of threaded programs. In: ACM SIGPLAN-SIGSOFT King’s College London to work, among other things, on dependence cluster analysis.
Workshop on Program Analysis for Software Tools and Engineering (PASTE’98), Dr. Binkley’s resent NSF funded research focuses continues to focus on improving
pp. 35–42. semantics-base software engineering tools. This work has recently broadened from
Krinke, J., 2003. Barrier slicing and chopping. In: IEEE International Workshop on exclusively considering programming-language semantics to now also consider
Source Code Analysis and Manipulation (SCAM 2003). IEEE Computer Society natural-language semantics through the use of Information Retrieval techniques
Press, Los Alamitos, California, USA, pp. 81–87. applied to source code and its supporting documents. His recent work also includes
Lakhotia, A., Deprez, J.-C., 1998. Restructuring programs by tucking statements into a seven school collaborative project aimed at increasing the representation of
functions. Information and Software Technology Special Issue on Program women and minorities in computer science.
Slicing 40 (11–12), 677–689.
Lakhotia, A., Singh, P., 2003. Challenges in getting formal with viruses. Virus
Bulletin, 15–19. Mark Harman is professor of Software Engineering in the Department of Computer
Mahdavi, K., Harman, M., Hierons, R.M., 2003. A multiple hill climbing approach to Science at King’s College London. He is widely known for work on source code
software module clustering. In: IEEE International Conference on Software analysis and testing and he was instrumental in the founding of the field of Search
Maintenance. IEEE Computer Society Press, Los Alamitos, California, USA, pp. Based Software Engineering, a field that currently has active researchers in 24
315–324. countries and for which he has given 14 keynote invited talks. Professor Harman is
Marshall, L.F., Webber, J., 2000. Gotos considered harmful and other programmers the author of over 140 refereed publications, on the editorial board of 7 interna-
taboos. In: Blackwell, A., Bilotta E., (Eds.), 12th Psychology of Programmers tional journals and has served on 90 programme committees. He is director of the
Interest Group Annual Workshop (PPIG 12), pp. 171–180. CREST centre at King’s College London, which has a current research grant portfolio
Mitchell, B.S., Mancoridis, S., 2006. On the automatic modularization of software in excess of £3m. More details are available from the CREST website:
systems using the bunch tool. IEEE Transactions on Software Engineering 32 (3),
crest.dcs.kcl.ac.uk.
193–208.
Podgurski, A., Clarke, L., 1990. A formal model of program dependences and its
implications for software testing debugging and maintenance. IEEE Youssef Hassoun is a research associate at the CREST CENTRE, Department of
Transactions on Software Engineering 16 (9), 965–979. Computer Science of King’s College London. His research interests include search-
Reid, D., Sanders, N.R., 2007. Operations Management: An Integrated Approach, based approach to Software Engineering, program analysis and testing.
third ed. Wiley.
Reps, T.W., 1994. Solving demand versions of interprocedural analysis problems. In:
Syed Islam has recently started a PhD at King’s College London, supervised by Jens
Fritzon, P. (Ed.), Compiler Construction, Fifth International Conference of
Krinke and Mark Harman. Upon completion of his undergrad from London Metro-
Lecture Notes in Computer Science, vol. 786. Springer, Edinburgh, UK, pp.
389–403. politan University in 2006, he started a Masters in Advanced SoftwareEngineering
Ritchie, D., 1975. The C reference manual. <cm.bell-labs.com/cm/cs/who/dmr/ at King’s College London. There, under supervision of Mark Harman he started
cman.pdf>. working on dependence clusters where he analyzed real world programs to show
Sestoft, P., 1989. Replacing function parameters by global variables, in: Fourth the qualitative and quantitative effects of global variable on the formation of
International Conference on Functional Programming Languages and Computer dependence clusters. Later following completion of his masters he moved to Ban-
Architecture, Imperial College, London, IFIP and ACM, ACM Press and Addison- gladesh to join BRAC University as a lecturer where he worked till Jan ’09. He
Wesley, pp. 39–53. recently moved back to London, to do his PhD and is currently working on the SLIM
Stroustrup, B., 2000. The C++ Programming Language, third ed. Addison-Wesley. (SLIcing state based Model) project at CREST (Centre for Research on Evolution
Sward, R.E., Chamillard, A.T., 2004. Re-engineering global variables in Ada. ACM Search & Testing).
SIGADA Ada Letters 24 (4), 29–34.
Tennberg, P., 2002. Refactoring global objects in multithreaded applications. C/C++
Users Journal 20 (5), 20–24. Zheng Li obtained his PhD from King’s College, London CREST centre in 2009 under
Tonella, P., 2003. Using a concept lattice of decomposition slices for program the supervision of Mark Harman. He is currently a post doctoral researcher in the
understanding and impact analysis. IEEE Transactions on Software Engineering CREST centre at King’s College. He was co-guest editor for two journal special issues
29 (6), 495–509. on software testing (one in Software Testing Verification and Reliability and one in
Weiser, M., 1979. Program slices: formal, psychological, and practical investigations the Journal of Systems and Software) and is on the programme and organising
of an automatic program abstraction method. Ph.D. Thesis, University of committee for the 9th IEEE International Working Conference on Source Code
Michigan, Ann Arbor, MI. Analysis and Manipulation and the organising committee of the 2nd IEEE Interna-
Wheeler, D.A., 2005. SLOC count user’s guide. <http://www.dwheeler.com/ tional Conference on Software Testing. His interests are slicing, testing and SBSE, on
sloccount/sloccount.html>. which topics he has published 8 papers, including those in Information and Soft-
Wulf, W., Shaw, M., 1973. Global variables considered harmful. ACM SIGPLAN ware technology, the Journal of Systems and Software, and IEEE Transactions on
Notices 8 (2), 28–34.
Software Engineering. The paper he co-authored for Fundamental Approaches to
Yu, L., Schach, S.R., Chen, K., Offutt, A.J., 2004. Categorization of common coupling
Software Engineering 2009 was awarded the prize for best theory paper at ETAPS.
and its application to the maintainability of the linux kernel. IEEE Transactions
on Software Engineering 30 (10), 694–706.

You might also like