Ch7 Refactoring
Ch7 Refactoring
Refactoring
1
Fair Use Notice
3
7.1 General Idea
• Developers continuously modify, enhance and adapt software.
• As software evolves and strays away from its original design,
three things happen.
– Decreased understandability
– Decreased reliability
– Increased maintenance cost
• Decreased understandability is due to
– Increased complexity of code
– Out-of-date documentation
– Code not conforming to standards
4
7.1 General Idea
• Decrease the complexity of software by improving its internal
quality by restructuring the software.
• Restructuring applied on object-oriented software is called
refactoring.
• Restructuring means reorganizing software (source code +
documentation) to give it a different look, or structure.
5
7.1 General Idea
• Source code is restructured to improve some of its non-
functional requirements:
– Readability
– Extensibility
– Maintainability
– Modularity
• Restructuring does not modify the software’s functionalities.
• Restructuring can be performed while adding new features.
6
7.1 General Idea
• Software restructuring is informally stated as the
modifications of software to make it
– easier to understand;
– easier to change;
– easier to change its documentation;
– less susceptible to faults when changes are made to it.
7
7.1 General Idea
• A higher level goal of restructuring is to increase the software
value
– external software value: fewer faults in software is seen to be better by
customers
– internal software value: a well-structured system is less expensive to
maintain
• Simple examples of restructuring
– Pretty printing
– Meaningful names for variables
– One statement per line of source code
8
7.1 General Idea
• Developers and managers need to be aware of restructuring
for the following reasons
– better understandability
– keep pace with new structures
– better reliability
– longer lifetime
– automated analysis
9
7.1 General Idea
• Characteristics of restructuring and refactoring
– The objective of restructuring and refactoring is to improve the internal
and external values of software.
– Restructuring preserves the external behavior of the original program.
– Restructuring can be performed without adding new requirements.
– Restructuring generally produces a program in the same language.
• Example: a C program is restructured into another C program.
10
7.2 Activities in a Refactoring Process
• To restructure a software system, one follows a process with
well defined activities.
– Identify what to refactor.
– Determine which refactoring to apply.
– Ensure that refactoring preserves the software’s behavior.
– Apply the refactorings to the chosen entities.
– Evaluate the impacts of the refactorings.
– Maintain consistency.
11
7.2.1 Identify what to refactor
• The programmer identifies what to refactor from a set of high-
level software artifacts.
– source code;
– design documents; and
– requirements documents.
• Next, focus on specific portions of the chosen artifact for
refactoring.
– Specific modules, functions, classes, methods, and data can be
identified for refactoring.
12
7.2.1 Identify what to refactor
• The concept of code smell is applied to source code to detect
what should be refactored.
• A code smell is any symptom in source code that possibly
indicates a deeper problem.
• Examples of code smell are:
– duplicate code;
– long parameter list;
– long methods;
– large classes;
– message chain.
13
7.2.1 Identify what to refactor
• Entities to be refactored at the design level
– software architecture;
• class diagram;
• statechart diagram; and
• activity diagrams;
– global control flow; and
– database schemas.
14
7.2.2 Determine which refactorings to apply
• Referring to Figure 7.1, some refactorings are
– R1: Rename method print to process in class PrintServer.
– R2: Rename method print to process in class FileServer. (R1 and R2
are to be done together.)
– R3: Create a superclass Server from PrintServer and FileServer.
– R4: Pull up method accept from PrintServer and FileServer to the
superclass Server.
– R5: Move method accept from PrintServer to class Packet, so that data
packets themselves will decide what actions to take.
– R6: Move method accept from FileServer to Packet.
– R7: Encapsulate field receiver in Packet so that another class cannot
directly access this field.
– R8: Add parameter p of type Packet to method print in PrintServer to
print the contents of a packet.
– R9: Add parameter p of type Packet to method save in class FileServer
so that the contents of a packet can be printed.
15
7.2.2 Determine which refactorings to apply.
16
7.2.2 Determine which refactorings to apply
• R1—R9 indicate that a large number of refactorings can be
identified even for a small system.
• A subset of the entire set of refactorings need to be carefully
chosen because of the following reasons.
– Some refactorings must be applied together.
• Example: R1 and R2 are to be applied together.
– Some refactorings must be applied in certain orders.
• Example: R1 and R2 must precede R3.
– Some refactorings can be individually applied, but they must follow an
order if applied together.
• Example: R1 and R8 can be applied in isolation. However, if both of
them are to be applied, then R1 must occur before R8.
– Some refactorings are mutually exclusive.
• Example: R4 and R6 are mutually exclusive.
17
7.2.2 Determine which refactorings to apply
• Tool support is needed to identify a feasible subset of
refactorings.
• The following two techniques can be used to analyze a set of
refactorings to select a feasible subset.
– Critical pair analysis
• Given a set of refactorings, analyze each pair for conflicts. A pair is
said to be conflicting if both cannot be applied together.
– Example: R4 and R6 constitute a conflicting pair.
– Sequential dependency analysis
• In order to apply a refactoring, one or more refactorings must be
applied before.
• If one refactoring has already been applied, a mutually exclusive
refactoring cannot be applied anymore.
– Example: after applying R1, R2, and R3, R4 becomes applicable. Now,
if R4 is applied, then R6 is not applicable anymore.
18
7.2.3 Ensure that refactoring preserves the
software’s behavior.
• Ideally, the input/output behavior of a program after
refactoring is the same as the behavior before refactoring.
• In many applications, preservation of non-functional
requirements is necessary.
• A non-exclusive list of such non-functional requirements is as
follows:
– Temporal constraints: A temporal constraint over a sequence of operations
is that the operations occur in a certain order.
• For real-time systems, refactorings should preserve temporal
constraints.
– Resource constraints: The software after refactoring does not demand
more resources: memory, energy, communication bandwidth, and so on.
– Safety constraints: It is important that the software does not lose its safety
properties after refactoring.
19
7.2.3 Ensure that refactoring preserves
the software’s behavior.
• Two pragmatic ways of showing that refactoring preserves the
software’s behavior.
– Testing
• Exhaustively test the software before and after applying refactorings,
and compare the observed behavior on a test-by-test basis.
20
7.2.4 Apply the refactorings to chosen entities
• The class diagram of Fig. 7.2(a) has been obtained from Fig.
7.1 by
21
7.2.4 Apply the refactorings to chosen entities
23
7.2.5 Evaluate the impacts of the
Refactorings on Quality
• In general, refactoring techniques are highly specialized, with
one technique improving a small number of quality attributes.
• For example,
– some refactorings eliminate code duplication;
– some raise reusability;
– some improve performance; and
– some improve maintainability.
24
7.2.5 Evaluate the impacts of the
Refactorings on Quality
• By measuring the impacts of refactorings on internal qualities,
their impacts on external qualities can be measured.
• Example of measuring external qualities
– Some examples of software metrics are coupling, cohesion, and size.
– Decreased coupling, increased cohesion, and decreased size are likely
to make a software system more maintainable.
– To assess the impact of a refactoring technique for better
maintainability, one can evaluate the metrics before refactoring and
after refactoring, and compare them.
25
7.2.5 Evaluate the impacts of the
Refactorings on Quality
• By measuring the impacts of refactorings on internal qualities,
their impacts on external qualities can be measured.
• Example of measuring external qualities
– Some examples of software metrics are coupling, cohesion, and size.
– Decreased coupling, increased cohesion, and decreased size are likely
to make a software system more maintainable.
– To assess the impact of a refactoring technique for better
maintainability, one can evaluate the metrics before refactoring and
after refactoring, and compare them.
26
7.2.6 Maintain consistency
• Rather than evaluate the impacts after applying refactorings,
one selects refactorings such that the program after
refactoring possesses better quality attributes.
• The concept of soft-goal graph help select refactorings.
• Exmple: A soft-goal graph for quality attribute (maintainability)
is a hierarchical graph rooted at the desired change in the
attribute, for example, high maintainability.
• The internal nodes represent successive refinements of the
attribute and are basically the soft goals.
• The leaf nodes represent refactoring transformations which
contribute positively/negatively to soft-goals which appear
above them in the hierarchy.
27
7.2.6 Maintain consistency
(Continued from the previous slide)
•A partial
example of a soft goal graph with one leaf node,
namely, Move, has been illustrated in Fig. 7.3.
•The dotted lines between the leaf node Move and three soft
goals – High Modularity, High Module Reuse, and Low Control
Flow Coupling imply that the Move transformation impacts those
three soft goals.
28
7.2.6 Maintain consistency
31
7.3.1 Assertions
• Programmers make assumptions about the behavior
of programs at specific points, and those
assumptions can be tested by means of assertions.
• An assertion is specified as a Boolean expression
which evaluates to true or false.
• Three kinds of assertions:
– invariants;
– preconditions; and
– postconditions.
32
7.3.1 Assertions
• Invariant
– An invariant is an assertion that evaluates to true
wherever in the program it is invoked.
– A class invariant is an invariant that all instances of that
class must satisfy.
• Precondition
– A precondition is a condition that must be satisfied
before a computation is performed.
• Postcondition
– A postcondition is a condition that must be satisfied
after a computation is performed.
33
7.3.1 Assertions
• Invariants, preconditions, and postconditions can be
applied to test the behavior preserving property of
refactorings.
• Examples of invariant in the context of
transformation of database schema is:
– All instance variables of a class, whether defined or
inherited, have distinct names.
– All methods of a class, whether defined or inherited,
have distinct names.
• Note: Static checking of preconditions,
postconditions, and invariants is computationally
expensive.
34
7.3.2 Graph Transformation
• Programs, class diagrams, and statecharts can be
viewed as graphs, and refactorings can be viewed as
graph production rules.
• Classes (C), method signatures (M), block structures
(B), variables (V), parameters (P), and expressions
(E) are represented by typed nodes in a graph.
• The possible relationships among the nodes are:
– method lookup (l); -- inheritance (i);
– membership (m); -- (sub)type (t);
– expression (e); -- actual parameter (ap);
– formal parameter (fp); -- cascaded expression (•);
– call (c); -- variable access (a); and
– update (u).
35
7.3.2 Graph Transformation
• Figure 7.4 shows an example program graph.
36
7.3.2 Graph Transformation
37
7.3 Formalisms for Refactoring
39
7.3.3 Software Metrics
• Two metrics considered are:
– cohesion; and
– coupling.
40
7.4 More Examples of Refactoring
• More examples are intuitively explained here.
– Substitute algorithm;
– Replace parameter with methods;
– Push Down Method;
– Parameterize Methods;
• Substitute algorithm
– Replace algorithm X with algorithm Y: (i) because implementation of Y
is clearer than X; (ii) Y performs better than X; and (iii) standardization
bodies want X to be replaced with Y.
– Algorithm substitution is easier if both X and Y have the same input-
output behaviors.
41
7.4 More Examples of Refactoring
• Replace parameters with methods
Consider the following code segment, where the method
bodyMassIndex has two formal parameters.
int person;
:
// person is initialized here;
:
int bodyMass = getMass(person);
int height = getHeight(person);
int BMI = bodyMassIndex(bodyMass, height);
:
The above code segment can be rewritten such that the
new bodyMassIndex method accepts one formal
parameter, namely, person, and internally computes the
values of bodyMass and height.
42
7.4 More Examples of Refactoring
The refactored code segment has been shown in the
following:
int person;
:
// person is initialized here;
:
int BMI = bodyMassIndex(person);
:
45
7.4 More Examples of Refactoring
46
7.5 Initial Work on Software Restructuring
• Software restructuring dates back to the mid 1960s,
as soon as programs were written in Fortran.
• Topics of discussion in this section are:
– Factors influencing software structure
– Classification of restructuring approaches
– Restructuring techniques
• Elimination-of-goto approach
• Localization and information hiding approach
• System sandwich approach
• Clustering approach
• Program slicing approach
47
7.5.1 Factors Influencing Software Structure
• Software structure is a set of attributes of the
software such that the programmer gets a good
understanding of software.
• Any factor that can influence the state of software or
the programmer’s perception might influence
software structure.
• One view of the factors that influence software
structure has been shown in Fig. 7.9.
– Code -- Documentation
– Tools -- Programmers
– Managers and policies -- Environment
48
7.5.1 Factors Influencing Software Structure
49
7.5.1 Factors Influencing Software Structure
• Code
– Code quality at all levels of details (e.g. variables, constants,
statements, function, and module) impact code understanding.
– Adherence to coding standards improves code quality.
– Adoption of common architectural styles enhances code understanding.
• Documentation
– Internal documentation (also known as in-line co-dumentation)
– External documentation
• Requirements doc
• Documents
• Design documents
• User manuals
• Test cases
50
7.5.1 Factors Influencing Software Structure
• Tools – Programming environment
– Development tools help programmers better understand the code.
• Tracing of source code help in understanding the dynamic behavior
of the code.
• Animation of algorithms help in understanding the dynamic strategy
adopted in algorithms.
• Cross referencing of global variables reveal interactions among
modules.
• Tools can reformat code for better readability via pretty printing,
highlighting of key words, and color coding of source code.
• Programmers
– Qualities of programmers influence their perception of structure.
– Examples of programmer qualities
• Individual capabilities
• Education
• Experience and training
• Aptitude 51
7.5.1 Factors Influencing Software Structure
• Managers and policies
– Management can play an influencing role in having a good initial
structure and sustain it by designing policies and allocating resources.
– Examples
• Management can design general policies to adhere to standards.
• Management can tie the annual performance review with the
programmer’s adherence to standards.
• Environment
– This refers to the general working environment of programmers.
– Example: Physical facilities and availability of resources when needed
52
7.5.2 Classification of Restructuring Approaches
• A broad classification of software restructuring
approaches has been shown in Fig. 7.10.
53
7.5.2 Classification of Restructuring Approaches
• Approaches not involving code changes
54
7.5.2 Classification of Restructuring Approaches
• Approaches involving code changes
– Practices: Some examples of restructuring practices are:
• Restructuring code with preprocessors.
• Making code understandable by means of inspection.
• Formatting code by adhering to standards and style guidelines.
• Restructuring code for reusability.
– Techniques: Some approaches are based on defined techniques.
• Incremental restructuring
• Goto-less approach
• Case-statement approach
• Boolean flag approach
• Clustering approach
– Tools
• Eclipse IDE, IntelliJ IDEA, jFactor, Refactorit, and Clone Doctor
55
7.5.3 Restructuring Techniques
• Restructuring techniques
– Those were developed in the mid-70s, before object-oriented
programming.
– The techniques are applied at different levels of abstractions.
• Example of restructuring techniques
– Elimination-of-goto Approach
– Localization and Information Hiding Approach
– System Sandwich Approach
– Clustering Approach
– Program Slicing Approach
56
7.5.3 Restructuring Techniques
• Elimination-of-goto Approach
– Before the onset of structured programming, much code was written in
the ‘70s with goto statements.
– Structured programming puts emphasis on the following control
constructs: for, while, until, and, if-then-else.
– Those constructs make occurrences of loop and branching clear.
– It has been shown that every flowchart program with goto statements
can be transformed into a functionally equivalent goto-less program by
using while statements.
57
7.5.3 Restructuring Techniques
• Localization and Information Hiding Approach
– Localization
• It is a process of collecting the logically related computational
resources in one physical module.
– Functions, procedures, operations, and data types are computational
resources.
• By localizing computational resources into separate modules,
programmers can restructure a program into a loosely coupled
system of sufficiently independent modules.
• Sometimes, localization is difficult to achieve.
– A variable may be imported into a module by means of the include
statement.
– Data sharing among functions is not explicitly represented in source
code.
58
7.5.3 Restructuring Techniques
• Localization and Information Hiding Approach
– Information Hiding
• The details of implementations of computational resources can be
hidden to make it easier to understand the program.
• For example, a queue is a high level concept which can be
implemented by means of a variety of low level data structures.
– Singly linked list
– Doubly linked list
– Arrays
• A programmer can design a function by using enqueue and
dequeue calls without any concern for their actual implementations.
59
7.5.3 Restructuring Techniques
• Localization and Information Hiding Approach
– A restructuring process based on localization of variables and functions
• Localization of variables
– Organize global variables and functions which refer to those global
variables into package-like groups.
– This organization can be achieved by applying the concept of closure of
functions to a set of global variables.
– This leads to groups of functions and global variables referred to by
those functions.
• Localization of functions
– Put locally called functions and the calling function in the same group.
• Information hiding and hierarchical structuring
– Organize groups of functions into hierarchical package structures
based on the visibility of functions within groups.
– Those functions and variables which are only externally referable and
visible to other packages constitute the package specification.
60
7.5.3 Restructuring Techniques
• System Sandwich Approach
– This approach is applied to those software which cannot be
restructured with any hope, but need to be retained for their outputs.
– As illustrated in Fig. 7.11, write a new front-end interface and a new
back-end data base so that:
• it is easy to interface with the program; and
• the program’s outputs are recorded in a more structured way.
62
7.5.3 Restructuring Techniques
64
7.5.3 Restructuring Techniques
• Clustering Approach (Contd.)
– Similarity metrics
• Distance measures
– Euclidean distance
– Manhattan distance
• Association coefficients
– Simple matching coefficient
– Jaccard coefficient
– Examples of association coefficients
• Let x and y be two entities. Let:
a = # of features present for both x and y.
b = # of features present for x but not y.
c = # of features present for y but not x.
d = # of features not present for both x and y.
• Simple matching coefficient: simple(x, y) = (a + d)/(a + b + c + d).
• Jaccard coefficient: Jaccard(x, y) = a/(a + b + c).
65
7.5.3 Restructuring Techniques
• Clustering Approach (Contd.)
– Clustering algorithms: three broad techniques applied.
• Graph theoretical algorithms
• Construction algorithms
• Optimization algorithms (aka iterative and improvement algorithms)
• Hierarchical algorithms
– Divisive algorithms (See Figure 7.12)
– Agglomerative algorithms (See Figure 7.13)
66
7.5.3 Restructuring Techniques
• Clustering Approach (Contd.)
– The general structure of an agglomerative algorithm
67
7.5.3 Restructuring Techniques
70
7.6 Summary
• General Idea
71