Reverse Engineering
Reverse Engineering
Abstract: - Software size estimation is an important task for a successful software development. There are a
number of systematic methods to estimate the size of software systems, in which the function point analysis has
achieved wide acceptance. Among various function point analysis methods, COSMIC-FFP is a relatively new
approach and its application is also relatively easier than most methods. The estimation is based on requirement
analysis artifacts such as use cases, sequence diagrams, or class diagrams. However, these artifacts are often
insufficient to verify the precision of the estimation methods, let alone adjust these methods to different
domains and organizations. In this study, we develop a tool that reverse engineers the Java source code to
retrieve information in the level of detailed sequence diagrams and calculates the function points. Then
regression analyses are applied to verify the applicability of function point estimation methods to a particular
system.
Key-Words: Software Size Estimation, Function Points, Reverse Engineering, Regression Analysis.
Such conversion is not straight forward. The application is also relatively easier than most
factors affecting the LOC corresponding to a methods. Moreover, it reliability or reproducibility
function point unit include the experience of the by different estimators is reported in several studies
development team, programming language, [8] [12].
application domain, and development method [3]. COSMIC-FFP is based on the experience of
The sources of these methods are outcomes from earlier function point analysis method and is
requirement analysis, mostly use cases, sequence intended to apply to a wider application spectrum
diagram or class diagram. During the requirement other than traditional information systems, for
analysis phase, these artifacts may not be complete example, real-time systems. The calculation of
or sufficiently refined to support rigor measurement COSMIC-FFP is quite simple and straightforward.
and analysis of function points [4]. Even when the COSMIC-FFP exhibits object-oriented traits by
project is completed, some of the artifacts may still looking at the system from the data aspect as well as
be incomplete or insufficiently refined, making the the function aspect. Its definition of a data unit is
postmortem analysis difficult to proceed. So how to called data group, which is equivalent to the
construct these artifacts thoroughly in more details normalized entity in information systems. One can
is the key to validate these methods. also use a more elaborated data unit called
Since the class diagrams, sequence diagrams “attribute” and it is equivalent to the normal
obtained from reverse engineering the source code definition of attribute in information systems.
contain more lower-level details than that generated System function is modeled through functional
in the requirement phase, we can use these diagrams processes, which can be further divided into two
to retrieve relevant information to calculate the sub-processes: data movement and data
associated function point size. manipulation sub-processes. Data movement sub-
In this work, a tool was developed to apply process is the main subject for function point
reverse engineering techniques on the source code computation. It can be divided into four categories:
of object-oriented systems to retrieve information in Entry – data movement from a user to a
the level of sequence diagrams. The tool, then, function process;
based on specific rules, calculates the function Exit – data movement from a function to a
points from the recovered information, and apply user;
regression analysis to verify the applicability of Read – data movement from a persistent
function point estimation methods and learn the storage to a function process. The persistent
extent of possible deviation of the application of a storage must be a part of the system;
function point size estimation method on the system. Write – data movement from a functional
process to a persistent storage. Again, the
persistent storage must be a part of the
system.
2 Function Points Analysis
In a functional process, the numbers of data
A function point analysis method commonly used in
group movement for the aforementioned four
the early days is the Use Case Points, proposed by
categories is presented as the function points, in a
Karner [5]. Some tools have been developed to
unit called Cfsu (COSMIC functional size unit). For
compute Use Case Points, but they can only be
functional process involves complex computation,
produce basic IFPUG function points from Unified
it is suggested to count additional function points
Modeling Language use cases [6]. The complexity
for the data manipulation sub-process. But there is
classification of function points still must be
no precise instruction on how to calculate such
completed manually.
additional function points.
There are other function point size measurement
All in all, the principle of this method is quite
procedures, and most of them are not yet augmented
intuitive and simple, but the premise is a functional
with automated tools [7] [8] [9] [10] [11]. Worse specification with certain details. To deal with
still, estimation results from most of these methods estimation in the initial stage, it is recommended to
are not analyzed quantitatively so that the reliability first find out all functional processes, categorize
and the accuracy of their results cannot be verified. them according to their complexity, and then set
Among various function point analysis methods, their function point value to the average size of a
COSMIC-FFP (The Common Software functional process obtained from historical statistics.
Measurement International Consortium – Full In this research, we use COSMIC-FFP method to
Function Points) is a relatively new approach and its do the function points estimation.
hand, if the P-value is greater than or equal to 0.05, specification as sequence diagram, and thus fails to
the linear regression relationship is not supported. capture function points inside a component. As a
The regression model is shown in Table 1. The result, the number of LOC for a function point, 148,
P-value of the F test is 9.34E-05, less than 0.05. So, is much higher in their case. However, the subject of
the linear relationship of the model is established. their study is electronic control units of a distributed
The coefficient of determination, which indicates network in a vehicle which may be more isolated
the goodness of fit for the model, is 0.7480. The and simpler in behavior.
slope coefficient is 39.08, which means that a They also find that different development teams
function point could be estimated to be equal to 39 would affect software code size by investigating
LOC for the first case. another development team that develops software
components of different electronic control units.
Table 1 The First Regression Model. Although there is also high correlation for the
standard second team, it is also clear that the linear model
coefficient error t test P-value developed for one development team will not
intercept 0 #N/A #N/A #N/A
X1 39.08 6.55 5.97 6.52E-05 perform well for the other development team, just as
The second case study is an open-source online in our case.
chat system from SourceForge. The P-value of the F
test is 9.337E-05, less than 0.05. So, the linear
relationship of our model is established. We are not 6 Conclusions
familiar with the system, so we randomly select 24 Among various function point analysis methods,
methods as the starting method to perform the COSMIC-FFP is an approach more suitable for
analysis. object-oriented systems and its estimation is simple
The regression model is shown in Table 2. The and straightforward. However, it is based on
P-value of the F test is 1.32E-28, less than 0.05. outcomes from requirement analysis which may be
Again, the linear relationship is established. The incomplete or insufficiently refined, making the
coefficient of determination is 0.9964, and a postmortem analysis difficult to proceed.
function point is estimated to around 24 LOC. We develop a tool to apply reverse engineering
techniques on the source code of object-oriented
Table 2 The Second Regression Model. systems to retrieve information in the level of
standard sequence diagrams. The tool, then, based on specific
coefficient error t test P-value rules, calculate the function points from the
intercept 0 #N/A #N/A #N/A
X1 23.75 0.30 79.91 1.26E-29
recovered information. Then, function points and
The results of two case studies suggest that the corresponding numbers of LOC are analyzed to
different kinds of applications can influence the produce regression models. Two study cases are
residuals between actual software code size and presented to verify the effectiveness of our method.
estimated software code size shall significantly. The The result shows that different kinds of applications
software in the first case study involves complex can influence the accuracy of the COSMIC-FFP
decisions and computations. In such case, it is method significantly. For systems without complex
suggested to count additional function points. Our decision and computation, the goodness of fit for the
results confirm the necessity of adjustment. model reaches a very high value, and such
However, there is no precise instruction on how information can be consulted to adjust the
to calculate additional function points [18]. estimation when applied to a new but similar system.
Therefore, our static analysis fails to account for this The estimation of systems involves complex
part, thus the precision of the estimated values can decisions and computations may suffer from the
only reach about 75%. On the other hand, there is no lack of a mechanism which adjusts the weighting of
complex decision and computation in second system function points. In the future, we plan to establish a
and the precision of the estimated values reach a model combining a software size estimation model
high value of 99.6%. combining COSMIC-FFP function points with some
The correlation between Cosmic function point complexity factors.
and actual software code size is noted by Lind and
Heldal [3]. They develop a linear model to estimate Acknowledgements
software code size from function point. However, This work was supported in part by the R.O.C.
their function point calculation is based on Ministry of Science and Technology under Grant
component diagrams which cannot convey behavior MOST 103-2221-E-017-007.