0% found this document useful (0 votes)
4 views

Reverse Engineering

Applying Reverse Engineering Techniques to Verify the Estimation of Software Code Size using COSMIC Full Function Point

Uploaded by

Jesus N
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Reverse Engineering

Applying Reverse Engineering Techniques to Verify the Estimation of Software Code Size using COSMIC Full Function Point

Uploaded by

Jesus N
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Recent Researches in Applied Informatics

Applying Reverse Engineering Techniques to Verify the Estimation of


Software Code Size using COSMIC Full Function Point
DOWMING YEH, YI-HONG CHEN, CHIH-YING YANG
Department of Software Engineering
National Kaohsiung Normal University
No.62, Shenzhong Rd., Yanchao Dist., Kaohsiung
TAIWAN
[email protected], {e007926, jrying1992}@gmail.com

Abstract: - Software size estimation is an important task for a successful software development. There are a
number of systematic methods to estimate the size of software systems, in which the function point analysis has
achieved wide acceptance. Among various function point analysis methods, COSMIC-FFP is a relatively new
approach and its application is also relatively easier than most methods. The estimation is based on requirement
analysis artifacts such as use cases, sequence diagrams, or class diagrams. However, these artifacts are often
insufficient to verify the precision of the estimation methods, let alone adjust these methods to different
domains and organizations. In this study, we develop a tool that reverse engineers the Java source code to
retrieve information in the level of detailed sequence diagrams and calculates the function points. Then
regression analyses are applied to verify the applicability of function point estimation methods to a particular
system.

Key-Words: Software Size Estimation, Function Points, Reverse Engineering, Regression Analysis.

1 Introduction The usage of function point is promoted by many


Software size estimation is an important task for a industry experts for its advantages over approaches
successful software development. It provides the based on lines of code since it is independent of
starting point for other tasks such as cost estimation, implementation details such as programming
development arrangement and human resource language. Based on the size estimation, function
allocation in the early beginning of a project [1]. point analysis can also predict the human resource
Software size estimation is almost always dependent cost and duration of the project.
on the experiences of the estimating engineers from Although there are standard methods,
similar projects in which they were involved. considerable experience is often required to apply
However, estimate approaches guided by human these methods to calculate the estimated function
intuition or without concrete historical data can point value with some precision from requirement
easily lead to inaccurate estimation of the size. specification [2]. Therefore, academia and industry
There are a number of systematic methods to are working to propose all sorts of methods and
estimate the size of software systems, in which the develop tools so that even a novice can follow
function point analysis has achieved wide certain steps to carry out the estimation work and
acceptance in estimating the scale of enterprise
still get fairly reliable results without extensive
information systems. Function point analysis is a
experience.
size estimation method that breaks down a software
system into various parts that correspond to the Besides obtaining a count of function points with
definition of function points. Such decomposing some accuracy and reliability, it is often necessary
process usually proceeds iteratively to reach the to transform the function point value into the
abstraction level of function points. Each number of lines of code (LOC) of the targeted
component then is assigned a function point value software. To estimate the effort for building a
based on its complexity and function type. Some software system, most models such as COCOMO
analysis methods such as IFPUG then multiply the take LOC as the unit of software size instead of
summation of function point with an influence function points. In embedded software, function
adjustment factor to produce the final function points must also be converted to LOC to estimate
points. the amount of memory needed.

ISBN: 978-1-61804-313-9 219


Recent Researches in Applied Informatics

Such conversion is not straight forward. The application is also relatively easier than most
factors affecting the LOC corresponding to a methods. Moreover, it reliability or reproducibility
function point unit include the experience of the by different estimators is reported in several studies
development team, programming language, [8] [12].
application domain, and development method [3]. COSMIC-FFP is based on the experience of
The sources of these methods are outcomes from earlier function point analysis method and is
requirement analysis, mostly use cases, sequence intended to apply to a wider application spectrum
diagram or class diagram. During the requirement other than traditional information systems, for
analysis phase, these artifacts may not be complete example, real-time systems. The calculation of
or sufficiently refined to support rigor measurement COSMIC-FFP is quite simple and straightforward.
and analysis of function points [4]. Even when the COSMIC-FFP exhibits object-oriented traits by
project is completed, some of the artifacts may still looking at the system from the data aspect as well as
be incomplete or insufficiently refined, making the the function aspect. Its definition of a data unit is
postmortem analysis difficult to proceed. So how to called data group, which is equivalent to the
construct these artifacts thoroughly in more details normalized entity in information systems. One can
is the key to validate these methods. also use a more elaborated data unit called
Since the class diagrams, sequence diagrams “attribute” and it is equivalent to the normal
obtained from reverse engineering the source code definition of attribute in information systems.
contain more lower-level details than that generated System function is modeled through functional
in the requirement phase, we can use these diagrams processes, which can be further divided into two
to retrieve relevant information to calculate the sub-processes: data movement and data
associated function point size. manipulation sub-processes. Data movement sub-
In this work, a tool was developed to apply process is the main subject for function point
reverse engineering techniques on the source code computation. It can be divided into four categories:
of object-oriented systems to retrieve information in  Entry – data movement from a user to a
the level of sequence diagrams. The tool, then, function process;
based on specific rules, calculates the function  Exit – data movement from a function to a
points from the recovered information, and apply user;
regression analysis to verify the applicability of  Read – data movement from a persistent
function point estimation methods and learn the storage to a function process. The persistent
extent of possible deviation of the application of a storage must be a part of the system;
function point size estimation method on the system.  Write – data movement from a functional
process to a persistent storage. Again, the
persistent storage must be a part of the
system.
2 Function Points Analysis
In a functional process, the numbers of data
A function point analysis method commonly used in
group movement for the aforementioned four
the early days is the Use Case Points, proposed by
categories is presented as the function points, in a
Karner [5]. Some tools have been developed to
unit called Cfsu (COSMIC functional size unit). For
compute Use Case Points, but they can only be
functional process involves complex computation,
produce basic IFPUG function points from Unified
it is suggested to count additional function points
Modeling Language use cases [6]. The complexity
for the data manipulation sub-process. But there is
classification of function points still must be
no precise instruction on how to calculate such
completed manually.
additional function points.
There are other function point size measurement
All in all, the principle of this method is quite
procedures, and most of them are not yet augmented
intuitive and simple, but the premise is a functional
with automated tools [7] [8] [9] [10] [11]. Worse specification with certain details. To deal with
still, estimation results from most of these methods estimation in the initial stage, it is recommended to
are not analyzed quantitatively so that the reliability first find out all functional processes, categorize
and the accuracy of their results cannot be verified. them according to their complexity, and then set
Among various function point analysis methods, their function point value to the average size of a
COSMIC-FFP (The Common Software functional process obtained from historical statistics.
Measurement International Consortium – Full In this research, we use COSMIC-FFP method to
Function Points) is a relatively new approach and its do the function points estimation.

ISBN: 978-1-61804-313-9 220


Recent Researches in Applied Informatics

3 Reverse Engineering c.p4();


X d = this.m4();
To constructed fine-grained sequential diagram or d.p6();
class diagram of an object-oriented systems X e = d;
(sequence diagram), we need the detailed structure if (b > 0)
and behavior information of the system. Without { e = new X(); e.p7(); }
such information, one can only resort to reverse e.p8();
engineering the source to obtain as much }
information available as possible. Many CASE tools public X m2(X f) {
are equipped with some reverse engineering f.p2();
X q = this.m3(f);
capabilities to extract basic static object-oriented
return q;
architecture of the system, such as class diagrams. }
However, there are still some details need a public X m3(X g) {
breakthrough, such as how to distinguish the g.p3();
aggregation relationship from a general association return g;
relationship [13]. }
In terms of size estimation, sequence diagrams public X m4() {
provide more important information than class this.fld.p5();
diagrams since they describe the functional behavior return this.fld;
}
of a system [8] [9] [10]. Compared to the reverse
private X fld = new X();}
engineering of system structures, reverse
engineering of the system behavior is more difficult.
Assuming that all methods of class X do not call
There are two major challenges, one for detecting
any other method, and the method m was chosen as
control flow that corresponds to two major
the starting point for analysis of a sequence diagram,
interaction operator, alt and loop, in the sequence
the generated diagram is shown in Fig. 1. The opt
diagram and the other for the merger of various
part marked in Fig. 1 represents the optional
execution trace [14] [15].
behavior under certain condition. This can be
The detection of control flows can be determined
generated by statically analyzing the conditional
statically by analyzing the control structure in the
structure in the source code. From the purpose of
source code. Merging execution trace analysis can
calculating function points, such structure need not
also be analyzed in a static manner, but the results
be distinguished since function point values in
are not complete [16]. As we aim to achieve the
different options are summed up to account for the
information related to estimating function point size
final function point value.
rather than a concise sequential diagram, there is
actually no need to overcome this challenge as
described in the following sections.

2.1 Generating Sequence Diagrams


A message in a sequence diagram corresponds to a
method invocation in the source code. Therefore, if
call graphs among various classes of objects are
established, we can choose some methods in the
source, say m, as the first message in a sequential
diagram, produce a series of message interactions
from method invocation, and the result is a sequence
diagram describing the sequence of messages after
the first message m. The call graphs can be
constructed statically, for example, from the points-
to method by Rountev [17].
Let us illustrate through the following code as an
example:
class X { ... }
class A { Fig. 1 Converted sequence diagram (source: [13]).
public void m(X a, int b) {
a.p1();
X c = this.m2(a);

ISBN: 978-1-61804-313-9 221


Recent Researches in Applied Informatics

The sequential diagram in Fig. 1 is not entirely


correct. Under further examination, one can find
that objects a, f, g, c are in fact the same, and object
fld, d can also be merged into one object. But from
the view-point of functional point computing,
merging or not and will not cause different results
because the only concern is the number of messages
transmitted in the sequence diagram and the Fig. 2 System architecture.
recipients of messages are of no difference. Then, the engineer selects one of the functional
Therefore, there is no need for complex analysis to processes (system functionalities, if you will), inputs
merge different object name. The analysis simply the starting method and the associated class to the
uses the names of objects in the source code directly function point analysis tool. The function point
as this example shows to compute the size of analysis module then starts counting all the function
function points. points in the functional process. During the process
of counting function points, the LOC of each called
2.1 Calculating Function Point Values method are also summed up by consulting the
The calculation of function point values from profile of the method. The analysis result consists of
sequence diagram is based on that proposed by the total function points in the process, and the total
Jenner. Arrows directing from actors to interface LOC in the process.
objects correspond to the Cosmic FFP entries; while There is, however, a complication in the function
Arrows emitting from interface objects to actors point analysis that results from the polymorphism
correspond to the Cosmic FFP exits. Arrows feature in Java. That is, for polymorphic methods,
between the objects in the system corresponds to the the called method cannot be determined statically. It
reads or the writes, but arrows representing depends on the instantiation of the object at the run
returning values will not be considered. time. To deal with polymorphic methods, we sum
Let's take the sequence diagram in Fig. 1 to up all the function points of these polymorphic
illustrate. There are one entries, twelve reads or methods since polymorphism is actually a selection
write. Therefore, the total number of the Cosmic mechanism that evokes an appropriate method based
FFP is 13 Cfsus (Cosmic Functional Size Unit). on the class type of the current object.
That is, disregarding different categories of After collecting sufficient analysis results, RAM
functional sub-process, the total number of the module performs statistical analysis to build a
Cosmic FFP is the number of messages in the regression model. The function point values serve as
sequence diagram (except those returning messages). independent variables in the regression analysis, and
Finally, a regression analysis is performed on the value of LOC is the dependent variable. The
calculated function point values and the model represents a regression model of software
corresponding lines of source code from various size based on COSMIC-FFP.
sequence diagrams. The result from the analysis
would reveal how many numbers of LOC are
approximately equal to one unit of Cosmic-FFP and
the possible deviation from such an approximation.
5 Case Studies
The first case study is a complexity analysis tool
from another project in our laboratory. The data for
the regression analysis consist of 13 records of
4 Design and Implementation functional processes in the tool, which were
The system architecture of the tool is shown in Fig. collected by the process described in the previous
2. The function point analysis tool is implemented section. Next, we perform regression analysis on
in Ruby for its strong regular expression processing these values. The intercept constant is set to zero
capability. Currently, only systems implemented in since no function point means no LOC.
Java can be analyzed by our tools. The F test examines the validity of the overall
The Java source code is first analyzed by the regression model with the Analysis of Variance
SFIM module to generate profiles for all methods. (ANOVA). If the P-Value of the F test is less than a
The profile of a method includes its name, the return designated significant level, usually 0.05, then there
type, argument types, lines of code of its body is a linear relationship between independent
(without comments), as well as what class and variables and the dependent variables. On the other
package the method belongs to.

ISBN: 978-1-61804-313-9 222


Recent Researches in Applied Informatics

hand, if the P-value is greater than or equal to 0.05, specification as sequence diagram, and thus fails to
the linear regression relationship is not supported. capture function points inside a component. As a
The regression model is shown in Table 1. The result, the number of LOC for a function point, 148,
P-value of the F test is 9.34E-05, less than 0.05. So, is much higher in their case. However, the subject of
the linear relationship of the model is established. their study is electronic control units of a distributed
The coefficient of determination, which indicates network in a vehicle which may be more isolated
the goodness of fit for the model, is 0.7480. The and simpler in behavior.
slope coefficient is 39.08, which means that a They also find that different development teams
function point could be estimated to be equal to 39 would affect software code size by investigating
LOC for the first case. another development team that develops software
components of different electronic control units.
Table 1 The First Regression Model. Although there is also high correlation for the
standard second team, it is also clear that the linear model
coefficient error t test P-value developed for one development team will not
intercept 0 #N/A #N/A #N/A
X1 39.08 6.55 5.97 6.52E-05 perform well for the other development team, just as
The second case study is an open-source online in our case.
chat system from SourceForge. The P-value of the F
test is 9.337E-05, less than 0.05. So, the linear
relationship of our model is established. We are not 6 Conclusions
familiar with the system, so we randomly select 24 Among various function point analysis methods,
methods as the starting method to perform the COSMIC-FFP is an approach more suitable for
analysis. object-oriented systems and its estimation is simple
The regression model is shown in Table 2. The and straightforward. However, it is based on
P-value of the F test is 1.32E-28, less than 0.05. outcomes from requirement analysis which may be
Again, the linear relationship is established. The incomplete or insufficiently refined, making the
coefficient of determination is 0.9964, and a postmortem analysis difficult to proceed.
function point is estimated to around 24 LOC. We develop a tool to apply reverse engineering
techniques on the source code of object-oriented
Table 2 The Second Regression Model. systems to retrieve information in the level of
standard sequence diagrams. The tool, then, based on specific
coefficient error t test P-value rules, calculate the function points from the
intercept 0 #N/A #N/A #N/A
X1 23.75 0.30 79.91 1.26E-29
recovered information. Then, function points and
The results of two case studies suggest that the corresponding numbers of LOC are analyzed to
different kinds of applications can influence the produce regression models. Two study cases are
residuals between actual software code size and presented to verify the effectiveness of our method.
estimated software code size shall significantly. The The result shows that different kinds of applications
software in the first case study involves complex can influence the accuracy of the COSMIC-FFP
decisions and computations. In such case, it is method significantly. For systems without complex
suggested to count additional function points. Our decision and computation, the goodness of fit for the
results confirm the necessity of adjustment. model reaches a very high value, and such
However, there is no precise instruction on how information can be consulted to adjust the
to calculate additional function points [18]. estimation when applied to a new but similar system.
Therefore, our static analysis fails to account for this The estimation of systems involves complex
part, thus the precision of the estimated values can decisions and computations may suffer from the
only reach about 75%. On the other hand, there is no lack of a mechanism which adjusts the weighting of
complex decision and computation in second system function points. In the future, we plan to establish a
and the precision of the estimated values reach a model combining a software size estimation model
high value of 99.6%. combining COSMIC-FFP function points with some
The correlation between Cosmic function point complexity factors.
and actual software code size is noted by Lind and
Heldal [3]. They develop a linear model to estimate Acknowledgements
software code size from function point. However, This work was supported in part by the R.O.C.
their function point calculation is based on Ministry of Science and Technology under Grant
component diagrams which cannot convey behavior MOST 103-2221-E-017-007.

ISBN: 978-1-61804-313-9 223


Recent Researches in Applied Informatics

References: [11] Tavares, H., Carvalho, A., Castro, J., 2002.


[1] Fenton, N., 1994. Software measurement: A Function points measurement from requirement
necessary scientific basis.Software Engineering, specification. In Proc. 5th Workshop
IEEE Transactions on, 20(3), 199-206. Engineering Requirements, Valency, Spain (pp.
[2] Jeng, B., Yeh, D., Wang, D., Chu, S. L., Chen, 278-298).
C. M., 2011. A Specific Effort Estimation [12] Top, O. O., Demirors, O., Ozkan, B., 2009.
Method Using Function Point. Journal of Reliability of COSMIC functional size
Information Science and Engineering, 27(4), measurement results: A multiple case study on
1363-1376. industry cases. In Software Engineering and
[3] Lind, K., Heldal, R., 2009. Estimation of real- Advanced Applications, 2009. SEAA'09. 35th
time software code size using COSMIC FSM. Euromicro Conference on (pp. 327-334). IEEE.
In Object/Component/Service-Oriented Real- [13] Yeh, D., Sun, P. C., Chu, W., Lin, C. L., Yang,
Time Distributed Computing, 2009. ISORC'09. H., 2007. An empirical study of a reverse
IEEE International Symposium on (pp. 244-248). engineering method for the aggregation
IEEE. relationship based on operation
[4] Bévo, V., Lévesque, G., Abran, A., 1999. UML propagation. Empirical Software
notation for functional size measurement Engineering, 12(6), 575-592.
method. In Proc. 9th International Workshop on [14] Lo, D., Maoz, S., Khoo, S. C., 2007. Mining
Software Measurement, Canada (pp. 230-242). modal scenario-based specifications from
[5] Karner, G., 1993. Resource estimation for execution traces of reactive systems.
objectory projects. Objective Systems SF AB, 17 In Proceedings of the twenty-second IEEE/ACM
[6] Clemmons, R. K., 2006. Project estimation with international conference on Automated software
use case points. The Journal of Defense engineering (pp. 465-468). ACM.
Software Engineering, 18-22. [15] Ziadi, T., Da Silva, M. A. A., Hillah, L. M.,
[7] Bertolami, M. A., Oliveros, A., 2005. Estimate Ziane, M., 2011. A fully dynamic approach to
of the functional size in the requirements the reverse engineering of UML sequence
elicitation. Journal of Computer Science & diagrams. In Engineering of Complex Computer
Technology, 5. Systems (ICECCS), 2011 16th IEEE
[8] Condori-Fernández, N., Abrahão, S., Pastor, O., International Conference on (pp. 107-116).
2007. On the estimation of the functional size of IEEE.
software from requirements [16] Rountev, A., Milanova, A., Ryder, B. G., 2001.
specifications. Journal of Computer Science and Points-to analysis for Java using annotated
Technology, 22(3), 358-370. constraints. In ACM SIGPLAN Notices (Vol. 36,
[9] Habela, P., Głowacki, E., Serafinski, T., Subieta, No. 11, pp. 43-55). ACM.
K. 2011. 3.4 Adapting the Use Case Model for [17] Rountev, A., Connell, B. H., 2005. Object
COSMIC FFP-Based Measurement. COSMIC naming analysis for reverse-engineered
Function Points: Theory and Advanced sequence diagrams. In Proceedings of the 27th
Practices,204. international conference on Software
[10] Jenner, M. S., 2001. COSMIC-FFP and UML: engineering (pp. 254-263). ACM.
Estimation of the Size of a System Specified in [18] Dowming Yeh, Yi-Hong Chen, Chih-Ying
UML–Problems of Granularity. In Proc. the Yang, Li-Wei Chen, Ying-Hsiu Wang, and Kai-
Fourth European Conference on Software Wei Chen, 2014. Applying reverse engineering
Measurement and ICT Control (pp. 173-184). and complexity analysis to refine a cost
estimation model based on function point.
International Computer Symposium 2014,
Taichung, Taiwan.

ISBN: 978-1-61804-313-9 224

You might also like