0% found this document useful (0 votes)
76 views

A Particle Swarm Optimized Learning Model of Fault Classification in Web-Apps

This document summarizes a research paper that aims to classify faults in web applications using machine learning algorithms and feature selection. The researchers conducted an empirical study classifying faults from bug reports and reviews of five open-source web apps. They evaluated five supervised learning algorithms using conventional tf-idf feature extraction and also proposed a particle swarm optimization method for feature selection to improve classifier performance. Their analysis found that the particle swarm optimization approach for feature selection outperformed the tf-idf filter-based classifiers, achieving an average 11% higher accuracy and 26% lower number of features on average. The decision tree algorithm after feature selection achieved the highest accuracy of 93.35%.

Uploaded by

Krishna Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views

A Particle Swarm Optimized Learning Model of Fault Classification in Web-Apps

This document summarizes a research paper that aims to classify faults in web applications using machine learning algorithms and feature selection. The researchers conducted an empirical study classifying faults from bug reports and reviews of five open-source web apps. They evaluated five supervised learning algorithms using conventional tf-idf feature extraction and also proposed a particle swarm optimization method for feature selection to improve classifier performance. Their analysis found that the particle swarm optimization approach for feature selection outperformed the tf-idf filter-based classifiers, achieving an average 11% higher accuracy and 26% lower number of features on average. The decision tree algorithm after feature selection achieved the highest accuracy of 93.35%.

Uploaded by

Krishna Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

SPECIAL SECTION ON NEW TRENDS IN BRAIN

SIGNAL PROCESSING AND ANALYSIS

Received January 8, 2019, accepted January 14, 2019, date of current version February 20, 2019.
Digital Object Identifier 10.1109/ACCESS.2019.2894871

A Particle Swarm Optimized Learning Model


of Fault Classification in Web-Apps
DEEPAK KUMAR JAIN1 , AKSHI KUMAR2 , SAURABH RAJ SANGWAN2 ,
GIA NHU NGUYEN 3 , AND PRAYAG TIWARI 4
1 Key Laboratory of Intelligent Air-Ground Cooperative Control for Universities in Chongqing, College of Automation, Chongqing University of Posts and
Telecommunications, Chongqing 400065, China
2 Department of Computer Science and Engineering, Delhi Technological University, New Delhi 110042, India
3 Graduate School, Duy Tan University, Danang 550000, Vietnam
4 Department of Information Engineering, University of Padova, Padua, Italy

Corresponding author: Gia Nhu Nguyen ([email protected])


This work was supported in part by the Key Laboratory of Intelligent Air-Ground Cooperative Control for Universities in Chongqing and
the Key Laboratory of Industrial IoT and Networked Control, Ministry of Education, College of Automation, Chongqing University of
Posts and Telecommunications, Chongqing, China.

ABSTRACT The term web-app defines the current dynamic pragmatics of the website, where the user has
control. Finding faults in such dynamic content is challenging, as to whether the fault is exposed or not
depends on its execution path. Moreover, the complexity and uniqueness of each web application make
fault assessment an extremely laborious and expensive task. Also, artificial fault injection models are
run in controlled and simulated environments, which may not be representative of the real-world fault
data. Classifying faults can intelligently enhance the quality of the web-apps by the assessment of
probable faults. In this paper, an empirical study is conducted to classify faults in bug reports of three
open-source web-apps (qaManager, bitWeaver, and WebCalendar) and reviews of two play store web-apps
(Dineout: Reserve a Table and Wynk Music). Five supervised learning algorithms (naïve Bayesian, decision
tree, support vector machines, K -nearest neighbor, and multi-layer perceptron) have been first evaluated
based on the conventional term frequency–inverse document frequency (tf-idf) feature extraction method,
and subsequently, a feature selection method to improve classifier performance is proposed using particle
swarm optimization (a nature-inspired, meta-heuristic algorithm). This paper is a preliminary exploratory
study to build an automated tool, which can optimally categorize faults. The empirical analysis validates
that the particle swarm optimization for feature selection in fault classification task outperforms the tf-idf
filter-based classifiers with an average accuracy gain of about 11% and nearly 26% average feature reduction.
The highest accuracy of 93.35% is shown by the decision tree after feature selection.

INDEX TERMS Classification, fault, feature selection, particle swarm, web-apps.

I. INTRODUCTION web development here is just not limited to developing an


With the increasing size of indexed Web, superior technology, alluring interface but creating web-based software. Thus, the
and optimal browser performance, the development of Web web-based software development primarily consists of three
has seen significant transformation from being an anachronis- ingredients, namely the development of websites, web appli-
tic static content repository to a turbulent, interactive, respon- cation development and development of web services [2].
sive content space. The websites now rely on programmatic ‘‘Agile development practices’’, ‘‘big-data’’, ‘‘security’’,
user input and data processing. The term web-based appli- ‘‘open source’’ and ‘‘customer-first design’’ are some of the
cations or simply web-app [1] defines the current dynamic key terms that characterize the latest technology trends in the
pragmatics of the website where the user has control. Tech- software development. A typical web application develop-
nologically, the current generation websites are more like ment workflow is similar to the conventional software devel-
web-based software which store data/interact with a database opment (Figure 1). It involves five phases, (i) brainstorm:
on the back end, and process business logic and information for requirement analysis (ii) design: design document &
in a more convoluted way. They have a web interface but prototype (iii) development: iterations, demo and feedback

2169-3536
2019 IEEE. Translations and content mining are permitted for academic research only.
18480 Personal use is also permitted, but republication/redistribution requires IEEE permission. VOLUME 7, 2019
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
D. K. Jain et al.: Particle Swarm Optimized Learning Model of Fault Classification in Web-Apps

a vital source of information that can be used by the


app developers and the vendors for de-bugging and ver-
sion control. But here too, most of the reviews are in an
unstructured form and mining useful analytical informa-
tion from them requires a great effort. Pro-actively find-
ing categories of faults in such open-source projects and
real-time data that too manually is arduous and expensive.
Researchers have recognized the limits of manual fault clas-
sification and have investigated automation solutions. Empir-
ical fault detection is one such promising research direction
which is based on fault classification as an imperative pre-
requisite task [4]. An automated tool to classify faults into
pre-defined categories can assist in fault-based testing of
web-apps. The primary purpose of any such tool is to pre-
FIGURE 1. Web-app development workflow. vent faults and find as many faults as possible, as early as
possible.
The vital sub-task of the fault classification pro-
(iv) quality assurance: identify defects and resolve bugs cess is feature extraction, which converts the input data
(v) deployment: production and technical support. (unstructured textual data indicative of faults), into an array
Web quality is defined as the degree to which the of representative features. Commonly, the feature extraction
web-based software meets the specified requirements, task is done using intrinsic ‘filtering’ methods which are fast,
is accessible, provides the reliable information and meets classifier-independent methods that rank features according
the user needs & expectations [2]. Quality assurance (QA) to predetermined numerical functions based on the measure
popularly known as QA testing is a systematic process of of the ‘‘importance’’ of the terms. A variety of scoring
determining whether a product or service meets specified functions such as, tf-idf, chi-square, mutual information,
requirements and customer expectations. This process-driven information gain, cross-entropy etc., have been used as sta-
approach is the key to ensure the performance and reliability tistical measures to pick features with the highest scores [5].
of the product. QA’s primary goal is tracking and resolv- Further, past literature conforms that an optimal feature
ing deficiencies prior to product release, that too in a pro- selection [6] improves the classifier performance (in terms
active way. That is, the purpose of QA is to prevent defects of speed, predictive power and simplicity of the model),
from entering into the system. Thus, the quality assessment reduces dimensionality, removes noise and helps visualizing
models in this new development setting call for a set of the data for model selection. In feature selection the features
acceptance criteria which define accessibility and usability are kept intact and n best features are chosen among them,
of the web-apps, demonstrating its effectiveness in terms of removing the redundant and co-linear features. This sub-task
user-experience. The acceptance of websites/apps by the end- of selecting the relevant subset of features and discarding
user depends on a variety of factors. The most measurable the non-essential ones is computationally challenging and
aspect of software quality is the number of faults, or bugs, that expensive task. Population-based meta-heuristics, especially
are discovered in a software product. It is an absolute quan- the ones inspired by nature have been proposed for feature
tifier of quality. Formally, a fault is defined as an ‘‘incorrect selection in relevant and prominent literature work as wrapper
step, process, or data definition in a computer program’’ [3]. methods to select the best possible subset of features for a
For example, attributes such as dead-links or browser com- given model.
patibility [2] are direct indicators of faults and a quality Swarm intelligence algorithms (SI) are contemporary com-
compromise. putational and behavioral metaphors for solving search and
The source code and bug reports for the open-source optimization problems which take collective biological pat-
projects are readily available but are awfully structured and terns provided by social insects (ants, termites, bees, wasps,
unlabeled. Similarly, the Google and Apple play store allow moths etc.) and other animal societies (fish, birds, grey
users to give feedback in the form of reviews. An app review wolves etc.) as stimulus to model algorithmic solutions. Sev-
typically includes star rating followed by a comment. Nega- eral algorithms inspired by natural phenomena have been
tive opinion polarity within these reviews is strong indicator proposed in the past years and among them, some meta-
of a fault/drawback/short-coming. For example, the com- heuristic search algorithms with population-based framework
ment ‘‘The link for few songs are dead. . . play now and have shown satisfactory capabilities to handle high dimension
download both not working for them!!! The text is not optimization problems. The work presented in this paper is an
readable as the font is too small. Moreover, the premium insight to this research trend which comprehends the adap-
version is a waste of money as compared to. . . .’’, clearly tive learning and collective intelligence behavioral models of
conveys the opinion of the reviewer(negative in this case) swarm-based algorithms. Swarm-based feature optimization
and the aspects of negative opinion. These reviews possess using particle swarm optimization (PSO) is demonstrated to

VOLUME 7, 2019 18481


D. K. Jain et al.: Particle Swarm Optimized Learning Model of Fault Classification in Web-Apps

cater to the dimensionality; complexity and fuzziness in the for classification taking into consideration presentation, logic
unstructured data. and data store faults. They defined a new fault type known as
The research presented in this paper, is an empirical study compatibility fault and fine grained the classification of logic
to put forward an optimized learning model to reduce the faults into browser interaction faults, session faults, paging
feature dimensionality in order to optimize fault prediction in faults, server side faults, encoding-decoding faults, locale
real-time web-apps. Bug reports of three open-source web- faults and others and calculated the frequency of faults on
app projects (qaManager1 , bitWeaver,2 WebCalendar)3 and Roller Weblogger and qaManager.
reviews from two play store applications (Dineout: Reserve Elbaum et al. [8] seeded the three types of faults namely
a Table,4 Wynk Music)5 have been considered to intelligently scripting, forms and database query faults to evaluate the
mine faults which are annotated based on seven categories as performance of web testing techniques. Li and Tian [9]
given by Sampath et al. [7]. A comparative analysis using five adapted an orthogonal defect classification approach (ODC)
supervised learning algorithms, namely naïve Bayesian (NB), like problems related to timing, interface or algorithms.
decision tree (DT), support vector machine (SVM), K-nearest Kumar et al. [10] have used supervised machine learning
neighbor (K-NN) and multi-layer perceptron (MLP) is done techniques on three open source web applications. They used
to find the best predictive classifier. Thus the contribution of area under ROC curve to analyze and compare the perfor-
this research is to build an optimal fault prediction model as mance of various machine learning techniques and reached
follows: the conclusion that multinomial naïve Bayesian gave the
• Implement five supervised learning algorithms to pre- best results. A review on the application of computational
dict faults using tf-idf feature extraction: NB, DT, SVM, evolutionary method, the genetic algorithm to automatically
K-NN, MLP search software errors was given by Mantere and Alander [11]
• Implement five supervised learning algorithms using
feature selection method: tf-idf +PSO III. SYSTEM ARCHITECTURE
• Performance analysis on the basis of accuracy The longer the fault goes without detection, the more expen-
The predictive model will give insights to the testing prac- sive the fault is to repair. Fault classification intends to offer
titioners to seed similar types of faults for comprehending feedback about the web development process. Based on this,
fault-based testing of Web-Apps. This study does not com- this research puts forward a fault prediction model using
pare various feature selection algorithms, but it demonstrates optimal feature selection. A variety of supervised learning
the benefit of adding the feature selection optimization pro- algorithms are evaluated to find the best model for fault
cess together with the fault classification task to enhance the classification. The goal of using classification schemes is to
accuracy of classifier. devise a preventive strategy for finding as many faults as
The organization of the paper is as follows: Section 2 dis- possible and that too as early as possible.
cusses the related work in the domain of research. Section 3 The preliminary step is to gather the required data, which
describes the system architecture along with the details of are: bug-reports of open source web-apps and reviews of
the proposed optimal predictive learning model using parti- play store apps. Pre-processing is then done for cleaning
cle swarm optimization. Section 4 discusses the results and the dataset from noise. Noise here basically connotes the
finally, section 5 presents the conclusion of the empirical language irregularities often present in text, as this noisy and
study. unstructured data affects the quality of the fault classification
task. Thus, after pre-processing, the representative features
II. RELATED WORK are extracted using the tf-idf filter. Particle swarm optimiza-
Few pertinent studies have been conducted in the area of tion (PSO) is then applied on this resulting matrix to generate
fault classification in web applications. Sampath et al. [7] an optimal feature matrix. These optimal features are then
conducted experiments for fault detection and by seeding used to train the classifier. The figure 2 shows the systematic
realistic faults and classified them into five categories namely flow of the model.
data store faults, logic faults, form faults, appearance faults The following sub-sections expound the details:
and link faults. Guo and Sampath [4] have carried out an
exploratory study on web application fault classification A. DATASET ACQUISITION
using the induction method wherein they state the classifi- To evaluate the system two types of datasets have been con-
cation dimension and used the work in [7] as the baseline. sidered. Firstly, three open source web-app projects, namely
They used the actual location of the fault as a dimension qaManager, bitWeaver and WebCalendarhave been consid-
ered. The bug reports for the same have been acquired from
1 qaManager: https://sourceforge.net/projects/qamanager/ sourceforge.net. A bug is an evidence of fault in the program.
2 bitWeaver: https://sourceforge.net/projects/bitweaver/
And so considering the bug report is an obvious choice for
3 WebCalendar: https://sourceforge.net/projects/webcalendar/
4 Dineout App: https:// play.google.com/ store/ apps/ details?id=com.
fault diagnosis. We deliberately consider Java/PHP based
dineout.book&hl=en_IN
applications as both these languages are the most popular
5 Wynk Music App: https://itunes.apple.com/lk/app/wynk-music/ and widely used ones for web development. The applications
id845083955?mt=8 with larger number of closed bugs were considered since

18482 VOLUME 7, 2019


D. K. Jain et al.: Particle Swarm Optimized Learning Model of Fault Classification in Web-Apps

FIGURE 2. System Architecture.

comments given in the closed bug section facilitated easy bug fault/drawback/short-coming. These reviews possess a vital
identification. The number of total bugs was also a significant source of information that can be used by the app developers
selection criterion. and the vendors for de-bugging and version control. Thus,
Secondly, reviews of two play store (1 Google and 1 Apple) these reviews are analyzed for negative opinion polarity using
apps, namely, Dineout: Reserve a Tableand Wynk Music were AFINN-111 sentiment lexicon, which is a list of 2477 English
acquired. The Dineout app had 1135 reviews whereas the words labeled with sentiment strength [12]. Sentiment refers
Wynk Music app has 8470 reviews. The reviews typically to the use of polarities (positive and/or negative) in written
include star rating followed by a comment. Negative opin- text [13]–[16]. Each word is assigned with an integer in a
ion polarity within these reviews is strong indicator of a range of polarity from −5 up to +5, negative to positive.

VOLUME 7, 2019 18483


D. K. Jain et al.: Particle Swarm Optimized Learning Model of Fault Classification in Web-Apps

It includes a number of words frequently used on the Internet The bag-of-words model [18] is used to transform and rep-
such as LOL (Laughing Out Loud), which are indicative of resent the text. The method takes into account the words
emotions of the user, especially on social media portals. The and their frequency of occurrence in the sentence or the
negative reviews were then analyzed for faults. document. The data is firstly tokenized to identify tokens
Five different web-apps were especially considered to and then cleaned by removing punctuations, numbers, special
determine the robustness of the learning model. To analyze characters, stop words. Stemming is carried out too, to reduce
the faults within the application’s bug report and negative words to their root word. University of Glasgow stop word
reviews, the bugs were scraped using Google Web Scraper6 list7 and the Porter stemming algorithm [19] are used for
and the reports was annotated for faults from seven categories stop-word removal and stemming respectively.
as given by Sampath et al. [7]. The categorization was based
on physical location of fault as data store, form, logic, link, C. FEATURE EXTRACTION AND SELECTION
appearance, compatibility and others. (Figure 3). In this work, tf-idf is used a filter method whereas, the PSO
is used for optimal feature subset selection. These are briefly
described as follows:
The conventional term frequency - inverse document fre-
quency (tf-idf) method is used for calculating the weights
and extracting the features. ‘‘Term frequency - inverse doc-
ument frequency is a conventional statistical weight which
measures how important a word is to a document’’ [20].
Moreover, it checks how relevant the keyword is throughout
the corpus [21], [22].
The term frequency, tf (t,d) is the raw count of a term in a
document and is calculated using equation (1)
No. of times term t appears in a document d
tf (t, d) = (1)
Total no. of terms in the document
The inverse document frequency, idf (t, D) is the measure of
how much information is provided by a specific word or term,
i.e. whether the word is rare or common across the corpus. idf
is calculated using equation (2)
 
Total no. of documents
FIGURE 3. Fault categories. idf (t, D) = log (2)
No. of documents with term t in it
The following table 1 depicts the number of faults for the Thus, tf-idf is calculated as given in equation (3)
five web-apps.
tf − idf (t, d, D) = tf (t, d) ∗ idf (t, D) (3)
TABLE 1. No. of faults in Web-Apps. where t denotes the terms; d denotes each document and
D denotes the collection of documents.
Feature selection is done to reduce the size of problem
for learning algorithms which may improve classification
accuracy due to reduction in computation requirement. This
also increases the speed of classification task as the size of
data to train the classifier is reduced [23], [24]. Many feature
selection algorithms are available across pertinent literature.
Swarm intelligence (SI) algorithms have proven capabilities
of computational intelligence for dealing with the complex
real world problems [6]. They often utilize the guidance of
nature to search for the optimal solution. They are a class
of nature inspired meta-heuristics and are population based
algorithms. They are based on the hunting, breeding, etc.
B. PRE-PROCESSING behaviors of birds, insects and the like. Ant colony opti-
To extract the structured text for analytics from the unstruc- mization, cuckoo search, particle swarm optimization are
tured bug reports/reviews, pre-processing is done [17]. examples of some popular swarm intelligence algorithms.
6 Google Web Scraper: https://chrome.google.com/webstore/detail 7 University of Glasgow stop-word list: http://ir.dcs.gla.ac.uk/resources/
/webscraper. linguistic_utils/

18484 VOLUME 7, 2019


D. K. Jain et al.: Particle Swarm Optimized Learning Model of Fault Classification in Web-Apps

Motivated by the adaptive learning and collective intel- • The term r1 .c1 . (Pid (t) − Xid (t)) is called cognitive
ligence behaviors of SI algorithms, in this work, the use component and it represents private thinking of an indi-
of particle swarm optimization (PSO) for enhancing fault vidual particle.
classification accuracy is demonstrated. PSO is an evolu- • The term r2 .c2 . (gd (t) − Xid (t)) is called social com-
tionary computation algorithm which aims to find a globally ponent and it represents the collaboration among the
optimized solution. It was developed in 1995 by Kennedy and particles.
Eberhart [25], [26] based on the imitating social behaviors • The two components, cognitive component and social
and random movements similar to that of a flock of birds or a component pull each particle towards pbest and gbest
school of fish. The original version of PSO was proposed positions respectively.
by modifying these initial imitations. Later, inertia weight • The three components, inertia term, cognitive compo-
was introduced by Shi and Eberhart [27] to illustrate the nent, and social component are combined to create a new
standard PSO algorithm. Initially, the algorithm is flooded velocity vector Vid (t + 1).
with a population of random solutions. These random solu- • This new velocity vector translates a particle position to
tions are called ‘particles’. Each particle is a point in an a new updated position in the search space Xid (t + 1)
S-dimensional search space. The ith particle is represented as which is probably a better location for the particle i.
Xi = (Xi1 , Xi2 , Xi3 , . . . .., XiS ). Every particle has a memory This way all particles are cooperating to find out the
to store its own best position denoted by ‘‘pbest’’ (personal best solution for an optimization problem. These rules are
best). The best experience of any particle i at a particular guaranteed to be obeyed by all the particles of the swarm.
time t is denoted by Pi (t). The best previous position (pbest, After a particle flies toward a new position, the performance
the position giving the best fitness value) of any particle is of particle is measured according to a pre-defined fitness
represented as Pi = (Pi1 , Pi2 , Pi3 , . . . . . . PiS ) and is recorded function. A maximum velocity, denoted by Vmax is use to
by the particles in their memory. The index of particle with limit particle’s velocity on each dimension. The length of
the best position among all the particles in the population steps taken by particle through the solution space in each
is represented by the symbol ‘gbest’ (global best). At any iteration is determined by Vmax. Small value of Vmax may
moment t the global best experience of swarm is denoted by lead to less exploration beyond locally good regions causing
g(t). So, at any point of time, we have pbest for every particle the algorithm to move towards the target slowly and particles
and gbest for complete swarm of particles. The concept of a could become trapped in local optima, whereas, if greater
flying particle is illustrated in figure 4. value has been taken for Vmax particles in the swarm will
move faster towards the global optimum as they are able to
move with bigger step in each iteration. Under such circum-
stances particles might fly past good solutions. The pseudo
code for standard PSO is given in the following figure 5.

D. SUPERVISED LEARNING ALGORITHMS


Five supervised learning algorithms, namely, naïve Bayesian,
decision tree, support vector machine, K-nearest neighbors
and multi-layer perceptron (neural network) have been imple-
mented and analyzed to find the best predictive learning
model for fault classification. The objective is to analyze the
bug report data and negative reviews and classify it into seven
FIGURE 4. The concept of flying particle. pre-defined categories as given by Sampath et al. [7]. The
following table 2 gives the description of these fault types in
Some considerations are necessary to fulfill the equations application code.
for the standard PSO model. These are as follows: A total of 4577 bugs were scrutinized from the five
• r1 and r2 are uniformly distributed random functions, web-apps as given in table 1. The description of the classifica-
and r1 and r2 ε [0, 1]. tion algorithms used in this work is available across pertinent
• d = 1, 2, 3, . . . . . . , S is the dimension. literature on machine learning [28], [29]. The training: test
• The term w.Vid (t) is the inertia term, and the coeffi- data split was 80:20 with a 10-fold cross validation.
cient w is the inertia coefficient or the inertia weight. Python 2.7 was used for implementation of the work. The
This term provides the particles a memory capability experimentation used open source Python library, Natural
for the exploration of new positions in the search space Language Toolkit (NLTK)8 for the natural language process-
while flying. The original version of PSO proposed ing tasks. Open source libraries Tensorflow and Keras were
in 1995 [25], [26] did not have any inertia term. How- used to build the neural network classifier. Other classifiers
ever, in 1998 Shi and Eberhart [27] added this term to were implemented through Scikit-Learn library.
formulate the standard PSO model.
• c1 and c2 are the acceleration coefficients. 8 Natural Language Toolkit: https://www.nltk.org

VOLUME 7, 2019 18485


D. K. Jain et al.: Particle Swarm Optimized Learning Model of Fault Classification in Web-Apps

TABLE 3. Parameters for PSO.

these two values (c1 + c2 ) are normally limited to 4 as given


by Aghdam and Heidari [30].

B. FEATURE SELECTION RESULTS


In the proposed work, initially 956 features were extracted
using tf-idf. Feature selection was carried out using
PSO to obtain the reduced feature subset. The following
Table 4 depicts the number of features selected using tf-idf
and tf-idf + PSO.

TABLE 4. Feature selection vs feature extraction.

FIGURE 5. Pseudo code for PSO.

TABLE 2. Faults types in application codes.

The basic feature extraction based on tf-idf filter used


the same number of features, i.e., 956 for all classification
algorithms. Applying PSO, the minimum number of features
selected were 442 for K-NN, which is 53.77 % reduction
in features. The maximum was 883 features for both SVM
and NN which show only 7.64% reduction in features. The
figure 6 depicts the average of feature selection.

IV. RESULTS AND DISCUSSION


The empirical analysis has been divided into four parts;
(i) parameter setting for PSO (ii) feature selection results
(iii) comparison of accuracy with and without feature selec-
tion (iv) reduction in time for training the classifier and
building the model

A. PARAMETER SETTING FOR PSO FIGURE 6. Feature reduction using PSO.


All the parameters were set to the best values as claimed &
demonstrated by Aghdam and Heidari [30]. These values are
as given in table 3. C. EFFECT OF FEATURE SELECTION ON
The maximum iteration and fitness function were set to CLASSIFICATION ACCURACY
100 and 0.95 respectively. The cognitive coefficient (c1 ) and This sub-section gives a comparison of the classifica-
the social coefficient (c2 ) were set between 0.3 and 0.4 since tion algorithms used, based on performance accuracy

18486 VOLUME 7, 2019


D. K. Jain et al.: Particle Swarm Optimized Learning Model of Fault Classification in Web-Apps

(misclassification-error). It also demonstrates the advantage


of using PSO based feature selection technique over the
conventional tf-idf technique. Table 5 depicts the accuracy
results.

TABLE 5. Accuracy obtained using before and after optimization.

FIGURE 9. Sensitivity and Specificity of training models.

The results indicate the maximum accuracy without opti-


mization is achieved by SVM, i.e., 90.038%. The maximum
accuracy gain was obtained by decision tree (25.633%). specificity (the true negative recognition rate). The following
It can be clearly observed the classification performance figure 9 depicts the result. DT has the highest true positive
is improved with the feature subset using PSO. The aver- recognition rate of 98.9%.
age improvement of 10.74% has been observed using PSO.
Figure 7 shows the accuracy comparison graphically. D. REDUCTION IN TIME
The aim of feature selection and optimization methods is
to reduce the computational time and complexity of the
prediction model. After feature selection, the five baseline
classifiers are trained with the reduced feature subset and
comparative analysis of the time for building the models has
been done in order to evaluate reduction in time and to evalu-
ate the effectiveness of the feature selection approach. The
following figure 10 gives comparative results of the time
for building various classification models thus evaluating the
reduction in time.

FIGURE 7. Comparison of accuracy with and without optimization.

The gain is accuracy is graphically depicted by figure 8.

FIGURE 10. Reduction in model building time.

Maximum reduction in time is observed in building SVM


with the value of 0.02 seconds whereas in all other classifica-
tion models a reduction of 0.01 seconds is observed.

V. CONCLUSION
FIGURE 8. Accuracy Gain. The effectiveness of fault-based testing depends on the qual-
ity of the fault model and analyzing faults in previous designs,
The performance of the training models is further eval- to predict and avert similar faults in future product designs.
uated using sensitivity (true positive recognition rate) and This research proposed a predictive learning model to classify

VOLUME 7, 2019 18487


D. K. Jain et al.: Particle Swarm Optimized Learning Model of Fault Classification in Web-Apps

faults which assists assessment of probable faults thus [15] A. Kumar and A. Sharma, ‘‘Socio-Sentic framework for sustainable
enhancing the quality of the web-apps. An empirical study agricultural governance,’’ Sustain. Comput., Inform. Syst., to be
published. [Online]. Available: http://www.sciencedirect.com/
to classify faults in bug reports of three open-source web science/article/pii/S2210537918302336, doi: 10.1016/j.suscom.2018.
applications and reviews of two play store applications using 08.006.
five baseline classifiers was conducted. These baselines were [16] A. Kumar, H. Ahuja, N. K. Singh, D. Gupta, A. Khanna, and
J. J. P. C. Rodrigues, ‘‘Supported matrix factorization using distributed
initially trained using conventional tf-idf feature extraction representations for personalised recommendations on twitter,’’ Comput.
method and subsequently using an optimal feature selec- Elect. Eng., vol. 71, pp. 569–577, Oct. 2018.
tion method, particle swarm optimization algorithm. A total [17] A. Kumar, V. Dabas, and P. Hooda, ‘‘Text classification algorithms for
mining unstructured data: A SWOT analysis,’’ in International Journal of
of 4577 bugs were analyzed based on accuracy as the perfor- Information Technology. Delhi, India: Springer, 2018, pp. 1–11.
mance metric of the classifier. The average accuracy gain is [18] Y. Zhang, R. Jin, and Z.-H. Zhou, ‘‘Understanding bag-of-words model:
of about 11% was observed with nearly 74% features were A statistical framework,’’ Int. J. Mach. Learn. Cybern., vol. 1, nos. 1–4,
pp. 43–52, 2010.
selected on average. The empirical analysis validates that
[19] M. F. Porter, ‘‘An algorithm for suffix stripping,’’ Program,
the PSO algorithm for feature selection optimization in fault vol. 14, no. 3, pp. 130–137, 1980. [Online]. Available: https://tartarus.
classification task outperforms the elementary classification org/martin/PorterStemmer/
task based on feature extraction. [20] M. P. S. Bhatia and A. K. Khalid, ‘‘Information retrieval and machine
learning: Supporting technologies for Web mining research and practice,’’
The results demonstrate and motivate to explore the use Webology, vol. 5, no. 2, p. 5, 2008.
of other meta-heuristic algorithms, such as elephant search, [21] M. P. S. Bhatia and A. Kumar, ‘‘A primer on the Web information retrieval
bacterial foraging, cuckoo search, firefly algorithm and wolf paradigm,’’ J. Theor. Appl. Inf. Technol., vol. 4, no. 7, pp. 657–662, 2008.
[22] A. Kumar, A. Jaiswal, S. Garg, S. Verma, and S. Kumar, ‘‘Sentiment
search algorithms etc. The results can also be analyzed analysis using cuckoo search for optimized feature selection on Kaggle
by using other intrinsic filters such as, information gain, tweets,’’ Int. J. Inf. Retr. Res. vol. 9, no. 1, pp. 1–15, 2019.
chi-square and their hybrids with swarm-based wrapper algo- [23] N. Omar, F. Jusoh, R. Ibrahim, and M. S. Othman, ‘‘Review of feature
selection for solving classification problems,’’ J. Inf. Syst. Res. Innov.,
rithms. Fuzzy-logic and evolutionary algorithms can also be vol. 3, pp. 64–70, Feb. 2013.
investigated to built optimal and robust predictive learning [24] X. Wang, J. Yang, and X. Teng, ‘‘Feature selection based on rough sets
models for fault classification in web-apps. and particle swarm optimization,’’ Pattern Recognit. Lett., vol. 28, no. 4,
pp. 459–471, 2007.
[25] J. Kennedy and R. C. Eberhart, ‘‘Particle swarm optimization,’’ in Proc.
REFERENCES IEEE Int. Conf. Neural Netw., Perth, WA, Australia, vol. 4, Nov./Dec. 1995,
pp. 1942–1948, doi: 10.1109/ICNN.1995.488968.
[1] A. Kumar and R. Goel, ‘‘Event driven test case selection for regression test- [26] R. Eberhart and J. Kennedy, ‘‘A new optimizer using particle swarm
ing Web applications,’’ in Proc. IEEE-Int. Conf. Adv. Eng., Sci. Manage., theory,’’ in Proc. 6th Int. Symp. Micro Mach. Human Sci., Nagoya, Japan,
Mar. 2012, pp. 121–127. Oct. 1995, pp. 39–43.
[2] A. Kumar and D. Gupta, ‘‘Paradigm shift from conventional software [27] Y. Shi and R. Eberhart, ‘‘A modified particle swarm optimizer,’’ in
quality models to Web based quality models,’’ Int. J. Hybrid Intell. Syst., Proc. IEEE Int. Conf. Evol. Comput., Anchorage, AK, USA, May 1998,
vol. 14, no. 3, pp. 167–179, 2017. pp. 69–73.
[3] J. Tian, ‘‘Quality assurance alternatives and techniques: A defect-based [28] A. Kumar and A. Jaiswal, ‘‘Empirical study of twitter and tumblr for sen-
survey and analysis,’’ SQP, ASQ, Dept. Comput. Sci. Eng., Southern timent analysis using soft computing techniques,’’ in Proc. World Congr.
Methodist Univ., Dallas, TX, USA, 2001, vol. 3, no. 3. Eng. Comput. Sci., vol. 1, 2017, pp. 1–5.
[4] Y. Guo and S. Sampath, ‘‘Web application fault classification-an [29] J. Alzubi, A. Nayyar, and A. Kumar, ‘‘Machine learning from theory to
exploratory study,’’ in Proc. 2nd ACM-IEEE Int. Symp. Empirical Softw. algorithms: An overview,’’ in Proc. 2nd Nat. Conf. Comput. Intell. (NCCI),
Eng. Meas. (ESEM), New York, NY, USA, 2008, pp. 303–305. 2018, vol. 1142, no. 1, p. 012012.
[5] G. Chandrashekar and F. Sahin, ‘‘A survey on feature selection methods,’’ [30] M. H. Aghdam and S. Heidari, ‘‘Feature selection using particle swarm
Comput. Elect. Eng., vol. 40, no. 1, pp. 16–28, Jan. 2014. optimization in text categorization,’’ J. Artif. Intell. Soft Comput. Res.,
[6] A. Kumar, R. Khorwal, and S. Chaudhary, ‘‘A survey on sentiment analysis vol. 5, no. 4, pp. 231–238, 2015.
using swarm intelligence,’’ Indian J. Sci. Technol., vol. 9, no. 39, Oct. 2016,
doi: 10.17485/ijst/2016/v9i39/100766.
[7] S. Sampath, S. Sprenkle, E. Gibson, L. Pollock, and A. S. Green-
wald, ‘‘Applying concept analysis to user-session-based testing of Web
applications,’’ IEEE Trans. Softw. Eng., vol. 33, no. 10, pp. 643–658,
Oct. 2007.
[8] S. Elbaum, G. Rothermel, S. Karre, and M. Fisher, II, ‘‘Leveraging user-
session data to support Web application testing,’’ IEEE Trans. Softw. Eng.,
vol. 31, no. 3, pp. 187–202, Mar. 2005.
[9] M. Li and J. Tian, ‘‘Web error classification and analysis for reliability
DEEPAK KUMAR JAIN received the Bachelor of
improvement,’’ J. Syst. Softw., vol. 80, no. 6, pp. 795–804, 2007.
Engineering degree from Rajiv Gandhi Proudyo-
[10] A. Kumar, R. Chugh, R. Girdhar, and S. Aggarwal, ‘‘Classification of faults
in Web applications using machine learning,’’ in Proc. ACM Int. Conf. giki Vishwavidyalaya, India, in 2010, the Master
Intell. Syst., Metaheuristics Swarm Intell., 2017, pp. 62–67. of Technology degree from the Jaypee University
[11] T. Mantere and J. T. Alander, ‘‘Evolutionary software engineering, of Engineering and Technology, India, in 2012,
A review,’’ Appl. Soft Comput., vol. 5, no. 3, pp. 315–331, 2005. and the Ph.D. degree from the Institute of Automa-
[12] F. A. Nielsen, ‘‘A new ANEW: Evaluation of a word list for sentiment tion, University of Chinese Academy of Sciences,
analysis in microblogs,’’ in Proc. ESWC, 2011, pp. 93–98. Beijing, China. He is currently an Assistant Pro-
[13] A. Kumar and T. M. Sebastian, ‘‘Sentiment analysis: A perspective on its fessor with the Chongqing University of Posts and
past, present and future,’’ Int. J. Intell. Syst. Appl., vol. 4, no. 10, 1–14, Telecommunications, Chongqing, China. He has
2012. presented several papers in peer-reviewed conferences and has published
[14] A. Kumar and T. M. Sebastian, ‘‘Sentiment analysis on twitter,’’ numerous studies in science cited journals. His research interests include
Int. J. Comput. Sci. Issues, vol. 9, no. 4, p. 372, 2012. deep learning, machine learning, pattern recognition, and computer vision.

18488 VOLUME 7, 2019


D. K. Jain et al.: Particle Swarm Optimized Learning Model of Fault Classification in Web-Apps

AKSHI KUMAR received the B.E. degree GIA NHU NGUYEN received the Ph.D. degree
(Hons.) in computer science and engineering from in computer science from the Hanoi University of
Maharshi Dayanand University, Rohtak, in 2003, Science at Vietnam National University, Vietnam.
the M.Tech. degree (Hons.) in computer sci- He is currently the Dean of the Graduate School,
ence and engineering from Guru Gobind Singh Duy Tan University, Vietnam. He has a total aca-
Indraprastha University, New Delhi, in 2005, and demic teaching experience of 18 years with more
the Ph.D. degree in computer engineering in the than 50 publications in reputed international con-
area of web mining from the Faculty of Technol- ferences, journals, and online book chapter con-
ogy, University of Delhi, in 2011. She is currently tributions (indexed by: SCI, SCIE, SSCI, Scopus,
an Assistant Professor with the Department of and DBLP). His current research interests include
Computer Science and Engineering, Delhi Technological University (for- network communication, security and vulnerability, network performance
merly Delhi College of Engineering). She has been with the university analysis and simulation, cloud computing, and image processing in biomed-
for the past 10 years. Her research interests include intelligent systems, ical. He is currently an Associate Editor of the International Journal of
user-generated big-data, social media analytics, and soft computing. Synthetic Emotions.

PRAYAG TIWARI received the M.S. degree from


NUST MISIS, Moscow. He is currently pursuing
the Ph.D. degree with the University of Padova,
SAURABH RAJ SANGWAN received the bach- Italy, where he is also a Marie Sklodowska-Curie
elor’s degree in computer science and engi- Researcher. He was a Research Assistant with
neering from DCRUST, Murthal, India, and the NUST MISIS and has teaching and industrial work
M.Tech. degree in software engineering from the experience. He has several publications in jour-
Department of Computer Science & Engineering, nals, book series, and conferences of the IEEE,
Delhi Technological University, New Delhi, India, ACM, Springer, Elsevier, MDPI, Taylor & Fran-
where he is currently a Research Scholar with the cis, and IGI-Global. His current research interests
Web Research Group. His current research inter- include machine learning, deep learning, quantum machine learning, and
ests include intelligent systems, text mining, and information retrieval.
social web.

VOLUME 7, 2019 18489

You might also like