Ensembles Based Combined Learning For Improved Software Fault Prediction: A Comparative Study

software defect detection by using machine learning

Uploaded by

Rahul Katragadda

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views

Ensembles Based Combined Learning For Improved Software Fault Prediction: A Comparative Study

software defect detection by using machine learning

Uploaded by

Rahul Katragadda

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

2017 12th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)

Ensembles Based Combined Learning for Improved Software Fault Prediction: A

Comparative Study

Chubato Wondaferaw Yohannese, Tianrui Li, Macmillan Simfukwe, Faisal Khurshid

School of Information Science and Technology
Southwest Jiaotong University
Chengdu 611756, China
[email protected], [email protected], [email protected], [email protected]

Abstract—Software Fault Prediction (SFP) research has made compared with an individual classifier. As illustrated in the
enormous endeavor to accurately predict fault proneness of literatures [6, 14, 32, 35], is more like, to make wise
software modules to maximize precious software test resources, decisions, people may counsel many experts in the area and
reduce maintenance cost, help to deliver software products on take consideration of their opinions rather than only depend
time and satisfy customer, which ultimately contribute to on their own decisions. In fault prediction, a predictive
produce quality software products. In this regard, Machine model generated by ML can be considered as an expert.
Learning (ML) has been successfully applied to solve Therefore, a good approach to make decisions more
classification problems for SFP. Moreover, from ML, it has accurately is to combine the output of different predictive
been observed that Ensemble Learning Algorithms (ELA) are
models. So that, all can improve or at least equal to the
known to improve the performance of single learning
algorithms. However, neither of ELA alone handles the
predictive performance over an individual models [6, 14, 32,
challenges created by redundant and irrelevant features and 35]. Therefore, in this study, we develop a new framework to
class imbalance problem in software defect datasets. Therefore, compare eminent ELA, namely, bagging [16, 30, 32, 35] and
the objective of this paper is to independently examine and AdaBoost.M1 [16, 31, 32, 35] with J48 Decision Tree (DT)
compare prominent ELA and improves their performance as a base classifier. In addition to that, we use McCabe and
combined with Feature Selection (FS) and Data Balancing (DB) Halstead Static Code Metrics [19, 22] datasets for
techniques to identify more efficient ELA that better predict experimental analysis.
the fault proneness of software modules. Accordingly, a new In ML, the ELA is known to improve the predictive
framework that efficiently handles those challenges in a performance of individual classifiers, but neither of these
combined form is proposed. The experimental results confirm ensemble techniques alone solves the data skewness (class
that the proposed framework has exhibited the robustness of imbalance) problem [16, 26] and the existence of redundant
combined techniques. Particularly the framework has high and irrelevant features, specifically in defect datasets [26].
performance when using combined bagging ELA with DB on Thus, to deal with these issues, the ensemble based
selected features. Therefore, as shown in this study, ensemble combined framework has to be designed specifically.
techniques used for SFP must be carefully examined and Therefore, in this study, we combine ELA with Feature
combined with both FS and DB in order to obtain robust Selection (FS) [9, 12-14, 20, 21] and Data Balancing (DB)
performance. [11, 20, 23-25] techniques. FS is carried out by removing
less important and redundant features, so that only important
Keywords-Software Fault Prediction, Ensemble Learning
Algorithms, Feature Selection, Data Balancing.
features are left for training the predictive models and the
performance of ELA could be improved. Moreover, as
software defect datasets are composed of Not Fault Prone
I. INTRODUCTION (NFP) instances with only a small percentage of Fault Prone
The growing demands of quality software in different (FP) instances, DB is carried out to resolve this skewed
industry have been igniting the Software Fault Prediction nature of defect datasets, so that building SFP models on
(SFP) research area; thereby the quality can be cautiously balanced data could improve ELA performance.
inspected and undertaken before releasing the software. SFP Therefore, this paper aims to independently examine and
is targeted to inspect and detect faulty proneness of software compare ELA and realize their performance improvement
modules and help to focus more on those specific modules when combined with FS and DB to identify efficient
predicted as faulty so as to manage resources efficiently and techniques that better perform for SFP. Hence, the main
reduce the number of faults occurring during operations. In contribution of this study is the empirical analysis of
this regard, statistical and Machine Learning (ML) multiple ELA in combination with FS and DB. Interestingly,
techniques have been employed for SFP in most studies [2- the proposed framework has exhibited the robustness of
14]. On the other hand, from ML techniques, Ensemble combined techniques. Particularly, it has high performance
Learning Algorithms (ELA) have been demonstrated to be when combining ensemble techniques with DB on selected
useful in different areas of research [7, 15-18], where all of features, which constitutes a primary contribution credited to
them have confirmed that ELA can effectively solve this study.
classification problems with better performance when

The remainder of this paper is organized as follows: methods. They demonstrated that ensembles of few rankers
related works presented in Section II . Section III discusses are effective and even better than ensembles of many or all
details of the proposed framework and the algorithms. rankers. As noticed, these studies don’t compare different
Section IV presents the experimental design. Section V ensemble techniques and perform DB to resolve class
reports our results and discussion together with a comparison imbalance issues. Therefore, to the best of our knowledge, no
of ELA when combined with FS as well as both FS and DB study has made an attempt to make a combined comparison
techniques. Section VI discusses threats to validity with of ELA with FS and DB (in our case, Information Gain (IG)
regard to our experimental setups. Section VIII presents final and Synthetic Minority Over-sampling Techniques
conclusions based on obtained results and future works. (SMOTE)) concepts together for SFP.
Having this gap in mind, which has not been addressed
II. RELATED WORK by many studies, we design a new framework that follows
This section focuses on the studies that have tried to two strategies step by step. Details of our new framework for
address the problems in SFP using ELA. improved SFP based on combined ensembles are presented
These studies typically build SFP models to help in Section III.
software engineers to focus development activities on FP
modules, make better use of test resources and produce III. ENSEMBLE BASED COMBINED FRAMEWORK FOR
quality software products [9, 26, 28, 29]. Khoshgoftaar et al. IMPROVING SOFTWARE FAULT PREDICTION
[18] investigated FS, Under-sampling with AdaBoost An ensemble based combined framework for improved
algorithm and their main drive was to make a comparison SFP is shown in Figure 1. We follow two strategies to attain
between repetitive sample FS and individual FS. And they the objective of the study.
confirmed that using boosting is effective in improving In strategy one, we perform FS using IG evaluation
classification performance. On the other hand, Shanthini et al. methods based on Ranker search methods for all datasets
[15] investigated bagging ensemble with Support Vector used in this study to identify useful features for ensemble
Machine (SVM) as a base learner for SFP. They showed that learning. The input of the framework for this strategy is top
the proposed ensemble of SVM is superior to individual ranked features from all datasets. Then we build the SFP
approach for SFP. As noticed, both of these studies don’t models on the selected features using two prominent ELA
consider employing different ensemble approaches so as to (Bagging and AdaBoost.M1 with J48 DT as the base
identify the techniques that are more efficient for SFP. classifier). Experiments have carried out by running a 10-
As discussed earlier, the ELA is known to increase the fold cross-validation and each experiment repeated 10 times.
performance of individual classifiers, but neither of these Then the results are captured using AUC, and Accuracy
ensemble techniques alone solves the existence of irrelevant Performance Evaluation (PE) criteria. As shown in Figure 1,
features and class imbalance problem of defect datasets. To this strategy serves as to compare combined ELA. Also we
deal with these issues, we have realized that the performance use the efficiently performing techniques for further
of ELA can be increased by keeping the quality of software comparable reference in the subsequent performance
defect datasets, which can be done by applying either FS experiments.
and/or resolving class imbalance problem [26]. For instance, In strategy two, we resolve the data skewness problem
Shivaji et al. [12] investigated multiple FS using NB and using SMOTE algorithm on the selected features of the
SVM classifiers. Wang et al. [13] made a comprehensive defect datasets. To get reasonably balanced data for
empirical study to assess ensembles of feature ranking ensemble classification, we make the target ration for NFP

Figure 1. Ensembles Based Combined Framework For Improved SFP

and FP module as recommended to be 65% and 35%, This will be used to compare with the previous performance
respectively by Khoshgoftaar et al. [5]. As shown in Figure 1, experiment.
this strategy serves as a demonstration of performance Finally, we make comparison between the selected
improvement by combining both IG and SMOTE with ELA, combined ELA from both strategy one and strategy two
and as a final point to realize the efficient ensemble classifier. category based on AUC PE criteria to realize performance
improvement and more efficiently performing combined
Overall Algorithm : Experiment al P rocedure
ELA for SFP.
Dat aset s m {D 1, D 2, ..., D n }; / *defect dat aset s*/
FS m IG; / *redunda nt and ir relevant fea t ure removing met hod */
DB m SMOTE; / *Dat a Balancing met hod*/ IV. EXPERIMENTAL DESIGN
ELA m {BaG, AdB}; / *Ensemble Learning Algorit hms*/
R m 10; / *t he number of repeat ing experiment */ During the experiments, in this study, performance
N m 10; / *t he numb er of folds*/
for each D Dat aset s do evaluation is carried out by running a 10-fold cross-
Pi m C i ,D / D / *probabilit y of arbit rary t uple blongs t o class C i * /
m
validation [7]. First, we rank the attributes using the IG FS
Info(D ) m ¦ pi log2 ( pi ) / * entropy of D * / techniques for each dataset independently. After ranking the
iv 1
InfoA (D ) m ¦ Dj / D u Info(D j ) / * expect ed informat ion required t o
attributes, following the recommendation from literatures
j 1 [13], we select the top [log2 n] attributes (n is the total
classify a t uple based on t he part it ioning by at t ribut e A * /
InfoGain (A ) m Info(D ) InfoA (D ) / *informat ion gained by A * / number of independent features). Then the class attribute is
for each A D do / * attiribute or feature selection * / included to yield final datasets.
rankOfA m InfoGain (A ) / * t he highest r a nked at t ribut e
A is t he one wit h t he largest informat ion gain */ The performances of each ensemble techniques are
rankOfA '[ ] m {rankOfA 1, rankOfA 2, .... rankOfA n }; evaluated against each other for each dataset. Employed
endfor
for I = 0 t o lengt h-1 do / * sort A based on I nfoG ain(A ) */ algorithms are performed using WEKA version 3.8 [27]. The
highR ank = I; final comparison is based on using the AUC measure.
for J = I + 1 t o lengt h-1 do
if rankOfA '[J ] > rankOfA '[highR ank] Since the proposed ensemble based combined framework
h ighR ank = J ;
endif is implemented step by step following two strategies, and
endfor evaluated using eight publicly available software defect
Temp = rankOfA '[I]; rankOfA '[I] = rankOfA '[highR ank ];
rankOfA '[highR ank ] = Temp; datasets. Moreover, the experimental procedures have
endfor presented in the following pseudo-code to summarize overall
selectedFeatures m top ª¬log2n º¼ + class A; / * select ing t op ra nked feat ures*/
for each t imes [1, R] do / *R t imes N-fold cross-validat ion*/ combined approach. The selected eight datasets are the latest
D' m selectedFeatures; and cleaned versions from publicly accessible PROMISE
foldDat a m generat e N folds from D';
for each fold m [1, N] do repository of NASA software projects [1]. These datasets are
t est Dat a m fold Dat a[fold]; popular in SFP studies and have been used by many studies
t rainingDat a m D' - t est Dat a;
for each combFSELA m ELA do / *evaluat e combined [29]. Table I summarizes some main characteristics of the
ELA wit h FS*/ selected datasets. All the datasets consist of McCabe and
fault P redict or m combFSELA(t r ain ingDat a);
V m evaluat e combined ELA on t est Dat a; / * obtain Halstead Static Code Metrics. Note that data imbalance is
t ot al vot e received by each class (FP and NFP )*/ consistently observed in all datasets. The value of using
predict orP erformance m choose t he class t hat receives
t he highest t ot al V as t he final classificat ion; static code metrics to build SFP models has been empirically
endfor illustrated by Menzies et al. [10].
endfor
endfor
S mino D'; / * S mino is a subset of # F P i ns t an ces */ V. ANALYSIS AND DISCUSSIONS
S majo D'; / * S majo is a subset of # NFP ins t an ces */
Pmajo m recommended/ required percent age of S majo ; This section presents the experimental analysis and
x i S mino ; / * is the # FP ins t an ce underconsi deration
it s one of the k - nearest neighbors for x i :x i^ S mino */ discussions based on the objective of the study. Performance
k m 5; / *set 5 as k-nearest neighbors for each example x i S mino * / comparison in terms of AUC and Accuracy among
G [0, 1]; / * is a uniformal dist ribut ion random var iable */
100 prominent ELA with FS technique (in our case IG) and data
rat e = (( S majo * ) (S majo S mino )) / S mino * 100; / * det ermine rat e imbalance problem solution (in our case SMOTE) presented
Pmajo
t o synt het ically creat e inst an ces from S mino * / in the following figures and tables. The results are based on
for x i S mino
x i^ m one of randomly choosed # FP from k - near est neighbors; the performance of ELA combined with IG FS and ELA
x newMino m x i (x i^ x i ) u G ; / * new # FP ins t an ce creation */ combined with both IG and SMOTE.
endfor
for each t imes [1, R] do / *R t imes N-fold cross-validat ion*/
balancedData m D' x newMino ; / * balanced data on s elected features */ TABLE I. A DESCRIPTION OF DATASETS.
foldDat a m generat e N folds from balancedData ;
for each fold m [1, N] do Dataset #Attr. #Ins. #NFP #FP %NFP %FP
t est Dat a m fold Dat a[fold]; ar1 29 121 112 9 92.56% 7.44%
t rainingDat a m balancedData - t est Dat a;
for each combDBFSELA m ELA do / *evaluat e combined ELA ar4 29 107 87 20 81.31% 18.69%
wit h DB and FS */ JM1' 21 9593 7834 1759 81.66% 18.34%
fault P redict or m combDBFSELA(t rainingDat a); KC2 21 522 415 107 79.50% 20.50%
V m evaluat e combined ELA on t est Dat a; / * obt ain t ot al vot e MC1'' 38 1988 1942 46 97.69% 2.31%
received by each class (FP and NFP )*/ MW1' 37 264 237 27 89.77% 10.23%
predict orP erformance m choose t he class t hat receives t he
highest t ot al V as t he fin al classificat ion; PC3' 37 1125 985 140 87.56% 12.44%
endfor PC4'' 37 1287 1110 177 86.25% 13.75%
endfor
endfor
endfor
(a) (b)
Figure 2. Comparison of ELA Combined with IG FS using Accuracy, and AUC

TABLE II. CLASSIFICATION RESULTS OF ELA COMBINED WITH IG A. Comparison: ELA Performance Combined with IG
IGDTBagging IGDTAdaBoost.M1
Performance comparison of bagging and AdaBoost.M1
Dataset
Accuracy AUC Accuracy AUC using IG FS are given in Figure 2 (a) and (b) and Table II. In
JM1' 81.766 0.72 80.568 0.696 terms of both indexes used in this study AdaBoost.M1
MC1" 97.712 0.8 98.305 0.821 appears to perform low and bagging demonstrates the
MW1' 88.799 0.678 86.373 0.669 highest values.
PC3' 86.587 0.803 85.024 0.777
PC4" 88.874 0.908 87.887 0.894
Thus, the result reflects the better performance of
ar1 90.404 0.755 87.929 0.744 bagging over AdaBoost.M1. Except that out of eight datasets,
ar4 84.545 0.833 80.709 0.794 in MC1” and KC2, AdaBoost.M1 shows better performance
KC2 81.934 0.833 82.718 0.801 in accuracy as well as in AUC measure using MC1” dataset.
Average 87.58 0.791 86.19 0.775
However, considering the average performance of all
TABLE III. CLASSIFICATION RESULTS OF ELA COMBINED WITH BOTH
datasets, bagging still outperforms AdaBoost.M1.
IG AND SMOTE
B. Comparison: ELA Performance Combined with IG and
Dataset
SMOTEIGDTBagging SMOTEIGDTAdaBoost.M1 SMOTE
Accuracy AUC Accuracy AUC
JM1' 80.773 0.855 78.926 0.835
In Figure 3 (a) and (b) and Table III, the performance
MC1" 96.596 0.988 97.303 0.992 comparison of bagging and AdaBoost.M1 combined with
MW1' 84.966 0.916 85.595 0.907 both IG and SMOTE are given. In terms of both indexes,
PC3' 83.266 0.905 83.834 0.911 AdaBoost.M1 appears to perform low and bagging
PC4" 90.158 0.962 90.439 0.963
ar1 82.092 0.901 81.931 0.896
demonstrates the highest values. However, for MC1"(97.303,
ar4 77.401 0.854 76.923 0.846 0.992), MW1' (85.595), PC3' (83.834, 0.911), and PC4"
KC2 80.39 0.871 79.078 0.836 (90.439, 0.963) datasets, AdaBoost.M1 outperforms bagging
Average 84.46 0.907 84.25 0.898 ensemble learning in both accuracy and AUC (except MW1')
measures. Nevertheless, considering the average

(a) (b)
Figure 3. Comparison of ELA Combined with both IG and SMOTE using Accuracy and AUC
performance of all datasets, bagging still outperforms drown based on the important features selected using IG. In
AdaBoost.M1. Thus, the results reflect the better terms of the total instances and number of classes, the
performance of combined bagging but closely followed by datasets may not be good representatives. However, this
combined AdaBoost.M1 on software defect datasets. On the practice is common among the fault prediction research area.
other hand, based on this performance results, we can say
that, after resolving class imbalance problem with some VII. CONCLUSION AND FUTURE WORKS
datasets, AdaBoost.M1 competitively shows good This study made empirical evaluation of the capability of
performance, which clearly needs further investigation with ELA in predicting FP software modules and compared their
more datasets from another software metrics. performance combined with FS and both FS and DB
C. Comparison: IGDTBagging with SMOTEIGDTBagging techniques using eight NASA software defect datasets. Our
objective of using FS and DB was that, by combining those
As expected, selecting useful features and resolving class filtering techniques with ELA, we would be able to prune
imbalance problem has been proved to be useful and non-relevant features and balance classes, and then learn an
improve ELA performance. In this regard, based on our ELA that performs better than from learning on the whole
proposed framework, Sections V (A) and (B) experimental feature set and in imbalanced classes. Accordingly, the
results show achieved performance improvements. And the experimental results reveal our combined technique assures
more efficiently performed ELA in average is found to be the performance improvement. Thus, dealing with the
combined bagging in both strategy one and two. Therefore, challenges of SFP mentioned in this study, our proposed
this section points out the performance improvement framework confirms remarkable classification performance
achieved through combined bagging ELA when combined and lays the pathway to software quality assurance.
with IG as well as combined with both IG and SMOTE using As the future work, we plan to explore more ELA
AUC PE. Accordingly, as shown in Figure 4, the including vote and stacking, and more data preprocessing
performance of combined bagging algorithm gives better techniques with more defect datasets which consist of
results in all datasets when combining with both IG and different software metrics; and to realize how the proposed
SMOTE than combining only with IG. These affirms the framework helps to identify the more efficient combined
contribution of combined preprocessing as removing ensemble techniques and improve its classification
irrelevant and redundant features as well as resolving class performance to accurately predict FP software modules.
imbalance problem and its power to improve the
performance of ELA. ACKNOWLEDGEMENT
This work is supported by the Fundamental Research
Funds for the Central Universities (No. 2682015QM02).
REFERENCES
[1] T. Menzies, R. Krishna, and D. Pryor. (2016). The Promise
Repository of Empirical Software Engineering Data. Available:
http://openscience.us/repo
[2] E. Arisholm, L. C. Briand, and E. B. Johannessen, "A systematic and
comprehensive investigation of methods to build and evaluate fault
prediction models," Journal of Systems and Software, vol. 83, pp. 2–
17, 2010.
[3] K. O. Elish and M. O. Elish, "Predicting defect-prone software
modules using support vector machines," Journal of Systems and
Software, vol. 81, pp. 649–660, 2008.
[4] I. Gondra, "Applying machine learning to software fault-proneness
prediction," Journal of Systems and Software, vol. 81, pp. 186–195,
Figure 4. Comparison of Bagging ELA Combined with IG and both IG
2008.
and SMOTE using AUC [5] T. M. Khoshgoftaar, C. Seiffert, J. V. Hulse, A. Napolitano, and A.
Folleco, "Learning with limited minority class data," in the Sixth
International Conference on Machine Learning and Applications,
VI. THREAT TO VALIDITY Cincinnati, OH, 2007.
[6] J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and
There are threats that may have an effect on our Techniques: Morgan Kaufmann Publishers Inc., 2011.
experimental results. The proposed prediction models were [7] R. Kohavi, "A study of cross-validation and bootstrap for accuracy
created without changing the parameter setting except that estimation and model selection," in the International Joint Conference
DT algorithm is used with ensembles techniques, which was on Artificial Intelligence, 1995.
[8] I. H. Laradji, M. Alshayeb, and L. Ghouti, "Software defect
not default in both cases. Thus, investigations were not made prediction using ensemble learning on selected features," Information
by changing the default parameters setting to see how the and Software Technology, vol. 58, pp. 388–402, 2015.
variation affects the model performance. In addition to that, [9] R. Malhotra, "A systematic review of machine learning techniques for
as many software metrics are defined in literature, different software fault prediction," Applied Soft Computing, vol. 27, pp. 504-
518, 2015.
software metrics might be better indicator to defectiveness of [10] T. Menzies, J. Greenwald, and A. Frank, "Data mining static code
modules. However, we used static code software metrics attributes to learn defect predictors," IEEE Transactions on Software
which were available in selected datasets. Conclusions were Engineering, vol. 33, pp. 2–13, 2007.
[11] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, [33] F. Provost and T. Fawcett, "Robust classification for imprecise
"SMOTE: Synthetic minority over-sampling technique," Journal of environments," Machine Learning, vol. 42 pp. 203–231, 2001.
Artificial Intelligence Research, vol. 16, pp. 321-357, 2002. [34] C. Catal, "Performance evaluation metrics for software fault
[12] S. Shivaji, E. J. Whitehead, R. Akella, and S. Kim, "Reducing prediction studies," Acta Polytechnica Hungarica, vol. 9, pp. 193–206,
features to improve code change-based bug prediction," IEEE 2012.
Transactions on Software Engineering, vol. 39, pp. 552–569, 2013. [35] R. Polikar, "Ensemble based systems in decision making," IEEE
[13] H. Wang, T. M. Khoshgoftaar, and A. Napolitano, "A comparative Circuits and Systems Magazine, vol. 6, pp. 21-45, 2006.
study of ensemble feature selection techniques for software defect
prediction," in the Ninth International Conference on Machine
Learning and Applications, IEEE, Washington, DC, 2010.
[14] E. Frank, M. A. Hall, and I. H. Witten, The WEKA Workbench.
Online Appendix for "Data Mining: Practical Machine Learning
Tools and Techniques," 4th ed.: Morgan Kaufmann, 2016.
[15] A. Shanthini and R. M. Chandrasekaran, "Analyzing the effect of
bagged ensemble approach for software fault prediction in class level
and package level metrics," in the IEEE International Conference on
Information Communication and Embedded Systems (ICICES), India,
2014.
[16] Mikel Galar, Alberto Fernandez, Edurne Barrenechea, Humberto
Bustince, and F. Herrera, "A Review on Ensembles for the Class
Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based
Approaches," IEEE Transactions on Systems, Man, and Cybernetics,
Part C (Applications and Reviews), vol. 42, pp. 463 - 484, 2012.
[17] [17] S.K. Mathanker, P.R. Weckler, T.J. Bowser, N. Wang, and N. O.
Maness, "AdaBoost classifiers for pecan defect classification,"
Computers and Electronics in Agriculture, vol. 77, pp. 60–68, 2011.
[18] Taghi M. Khoshgoftaar, Kehan Gao, and A. Napolitano, "Improving
software quality estimation by combining feature selection strategies
with sampled ensemble learning," in the IEEE 15th International
Conference on Information Reuse and Integration (IRI), San
Francisco, California, USA, 2014.
[19] D. Radjenovic, M. Hericko, R. Torkar, and A. Zivkovic, "Software
fault prediction metrics: A systematic literature review," Journal of
Information and Software Technology, vol. 55, pp. 1397–1418, 2013.
[20] H. Liu, H. Motoda, and L. Yu, "A selective sampling approach to
active feature selection," Artificial Intelligence, vol. 159, pp. 49–74,
2004.
[21] S. Liu, X. Chen, W. Liu, J. Chen, Q. Gu, and D. Chen, "FECAR: A
feature selection framework for software defect prediction," in the
38th Annual International Computers, Software and Applications
Conference, Vasteras, 2014.
[22] T. J. McCabe, "A complexity measure,," IEEE Transactions on
Software Engineering, vol. SE-2, pp. 308–320, 1976.
[23] V. García, J. S. Sánchez, and R. A. Mollineda, "On the effectiveness
of preprocessing methods when dealing with different levels of class
imbalance," Knowledge-Based Systems, vol. 25, pp. 13-21, 2012.
[24] H. He and E. A. Garcia, "Learning from Imbalanced Data," IEEE
Transactions on Knowledge and Data Engineering, vol. 21, pp. 1263-
1284, 2009
[25] P. Sarakit, T. Theeramunkong, and C. Haruechaiyasak, "Improving
emotion classification in imbalanced YouTube dataset using SMOTE
algorithm," in the 2nd International Conference on Advanced
Informatics: Concepts, Theory and Applications, Chonburi, 2015.
[26] W. Y. Chubato and T. Li, "A Combined-Learning Based Framework
for Improved Software Fault Prediction," International Journal of
Computational Intelligence Systems, vol. 10, pp. 647–662, 2017.
[27] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H.
Witten, "The WEKA data mining software: an update; SIGKDD
Explorations," Retrieved 01 Sep. 2017.
[28] C. Catal, "Software fault prediction: A literature review and current
trends," Expert Systems with Applications, vol. 38, pp. 4626–4636,
2011.
[29] T. Hall, S. Beecham, D. Bowes, D. Gray, and S. Counsell, "A
systematic literature review on fault prediction performance in
software engineering," IEEE Transactions on Software Engineering,
vol. 38, pp. 1276–1304, 2012.
[30] L. Breiman, "Bagging predictors," Machine Learning, vol. 24, pp.
123-140, 1996.
[31] Yoav Freund and R. E. Schapire, "Experiments with a new boosting
algorithm," in Thirteenth International Conference on Machine
Learning, San Francisco, 1996, pp. 148-156.
[32] Polikar R., “Ensemble Learning,” Scholarpedia, 2009.

PROS 4 1-s2.0-S0957417424027866-main
No ratings yet
PROS 4 1-s2.0-S0957417424027866-main
17 pages
After IJCA Comments Paper-F Ver 27-5-2018
No ratings yet
After IJCA Comments Paper-F Ver 27-5-2018
12 pages
Software_Defect_Prediction_Using_an_Intelligent_Ensemble-Based_Model
No ratings yet
Software_Defect_Prediction_Using_an_Intelligent_Ensemble-Based_Model
20 pages
Software Defect Prediction Using ML
No ratings yet
Software Defect Prediction Using ML
6 pages
Overview of Software Defect Prediction Using Machine Learning Algorithms
No ratings yet
Overview of Software Defect Prediction Using Machine Learning Algorithms
12 pages
Software Defect Prediction Using Ensemble Learning
No ratings yet
Software Defect Prediction Using Ensemble Learning
6 pages
A Systematic Literature Review On Fault Prediction Performance in Software Engineering
100% (2)
A Systematic Literature Review On Fault Prediction Performance in Software Engineering
7 pages
OPABP NidhiSrivastava
No ratings yet
OPABP NidhiSrivastava
7 pages
Jurnal Utama MDPI
No ratings yet
Jurnal Utama MDPI
20 pages
Predicting Root Cause Analysis (RCA) Bucket For
No ratings yet
Predicting Root Cause Analysis (RCA) Bucket For
4 pages
IEEE - INDIACom 2018 Paper
No ratings yet
IEEE - INDIACom 2018 Paper
6 pages
A Comprehensive Analysis of Ensemble-Based Fault Prediction Models Using Product, Process, and Object-Oriented Metrics in Software Engineering
No ratings yet
A Comprehensive Analysis of Ensemble-Based Fault Prediction Models Using Product, Process, and Object-Oriented Metrics in Software Engineering
8 pages
Fault Prediction
No ratings yet
Fault Prediction
9 pages
P11 - Software Fault Prediction A Literature Review and Current Trends
No ratings yet
P11 - Software Fault Prediction A Literature Review and Current Trends
11 pages
Investigating The Effect of Dataset Size, Metrics Sets, and Feature Selection Techniques On Software Fault Prediction Problem
No ratings yet
Investigating The Effect of Dataset Size, Metrics Sets, and Feature Selection Techniques On Software Fault Prediction Problem
19 pages
Software Defect Prediction Using Machine Learning
No ratings yet
Software Defect Prediction Using Machine Learning
5 pages
Software Bug Prediction Using Machine Learning Approach
No ratings yet
Software Bug Prediction Using Machine Learning Approach
6 pages
Model-Based Software Defect Prediction From Softwa
No ratings yet
Model-Based Software Defect Prediction From Softwa
19 pages
Electronics 11 02707 v2
No ratings yet
Electronics 11 02707 v2
13 pages
Project Report
No ratings yet
Project Report
54 pages
SDP Edited1.edited
No ratings yet
SDP Edited1.edited
8 pages
Content Server
No ratings yet
Content Server
21 pages
Romi Jse Template 2014
No ratings yet
Romi Jse Template 2014
5 pages
Software Defect Prediction Using Supervised Machine Learning and Ensemble Techniques
No ratings yet
Software Defect Prediction Using Supervised Machine Learning and Ensemble Techniques
17 pages
A Systematic Literature Review On Fault Prediction Performance in Software Engineering PDF
No ratings yet
A Systematic Literature Review On Fault Prediction Performance in Software Engineering PDF
4 pages
A General Software Defect-Proneness Prediction Framework: Qinbao Song, Zihan Jia, Martin Shepperd, Shi Ying, and Jin Liu
No ratings yet
A General Software Defect-Proneness Prediction Framework: Qinbao Song, Zihan Jia, Martin Shepperd, Shi Ying, and Jin Liu
15 pages
A Survey of Different Machine Learning M
No ratings yet
A Survey of Different Machine Learning M
13 pages
August 2024: Top 10 Cited Articles in Software Engineering & Applications
No ratings yet
August 2024: Top 10 Cited Articles in Software Engineering & Applications
31 pages
Comparative Analysis of Software Reliability Prediction Using Machine Learning and Deep Learning
No ratings yet
Comparative Analysis of Software Reliability Prediction Using Machine Learning and Deep Learning
6 pages
Software Defect Prediction Using an Intelligent Ensemble-Based Model - Abstract
No ratings yet
Software Defect Prediction Using an Intelligent Ensemble-Based Model - Abstract
5 pages
Defect Prediction-Survey
No ratings yet
Defect Prediction-Survey
14 pages
Calibration of Software Quality: Fuzzy Neural and Rough Neural Computing Approaches
No ratings yet
Calibration of Software Quality: Fuzzy Neural and Rough Neural Computing Approaches
4 pages
An Enhanced Bayesian Decision Tree Model For Defect Detection On Complex SDLC Defect Data
No ratings yet
An Enhanced Bayesian Decision Tree Model For Defect Detection On Complex SDLC Defect Data
6 pages
Exploring Metaheuristic Optimized Machine Learning
No ratings yet
Exploring Metaheuristic Optimized Machine Learning
45 pages
Paper On Aae and Are
No ratings yet
Paper On Aae and Are
11 pages
IJCA Paper-F Ver 28-4-2018
No ratings yet
IJCA Paper-F Ver 28-4-2018
12 pages
Ensemble Machine Learning Model For Software Defect Prediction
No ratings yet
Ensemble Machine Learning Model For Software Defect Prediction
11 pages
A-systematic-and-comprehensive-investigation-of-method_2010_Journal-of-Syste
No ratings yet
A-systematic-and-comprehensive-investigation-of-method_2010_Journal-of-Syste
16 pages
Software Defect Prediction: A Survey With Machine Learning Approach
No ratings yet
Software Defect Prediction: A Survey With Machine Learning Approach
6 pages
Software Defect
No ratings yet
Software Defect
46 pages
JMP for Mixed Models
From Everand
JMP for Mixed Models
Ruth Hummel
No ratings yet
Predictive software maintenance utilizing cross-project data
No ratings yet
Predictive software maintenance utilizing cross-project data
16 pages
Comprehensive Study On Machine Learning
No ratings yet
Comprehensive Study On Machine Learning
10 pages
Software Reliability Prediction Using Machine Learning and Deep Learning
No ratings yet
Software Reliability Prediction Using Machine Learning and Deep Learning
6 pages
62 1520327334 - 06-03-2018 PDF
No ratings yet
62 1520327334 - 06-03-2018 PDF
7 pages
Software Refactoring Prediction Using SVM and Optimization Algorithms
No ratings yet
Software Refactoring Prediction Using SVM and Optimization Algorithms
10 pages
A Developer Centered Bug Prediction Model
No ratings yet
A Developer Centered Bug Prediction Model
21 pages
Ensemble_voting_system_for_anomaly_based
No ratings yet
Ensemble_voting_system_for_anomaly_based
6 pages
A-study-on-software-fault-prediction-techniques
No ratings yet
A-study-on-software-fault-prediction-techniques
73 pages
Power Plant Induced Draft Fan Fault Prediction Using Machine Learning Stacking Ensemble
No ratings yet
Power Plant Induced Draft Fan Fault Prediction Using Machine Learning Stacking Ensemble
9 pages
Neural Network Parameter Optimization Based On Genetic Algorithm For Software Defect Prediction
No ratings yet
Neural Network Parameter Optimization Based On Genetic Algorithm For Software Defect Prediction
2 pages
Constrained Conditional Model: Fundamentals and Applications
From Everand
Constrained Conditional Model: Fundamentals and Applications
Fouad Sabry
No ratings yet
SVVT Ass5
No ratings yet
SVVT Ass5
5 pages
14 Apr
No ratings yet
14 Apr
9 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Sivam 219303066 Research Paper Testing 1
No ratings yet
Sivam 219303066 Research Paper Testing 1
13 pages
A Framework For Software Defect Prediction Using Neural Networks
No ratings yet
A Framework For Software Defect Prediction Using Neural Networks
11 pages
Our Research Paper
No ratings yet
Our Research Paper
7 pages
A Hybrid Machine Learning Approach for Enhanced Software Defect Prediction Through Optimized Feature Selection
No ratings yet
A Hybrid Machine Learning Approach for Enhanced Software Defect Prediction Through Optimized Feature Selection
26 pages
Thesis Anjana Perera
No ratings yet
Thesis Anjana Perera
239 pages
Unit 1.2 Desigining A Learning System
No ratings yet
Unit 1.2 Desigining A Learning System
15 pages
W10 Presentation Machine Learning
No ratings yet
W10 Presentation Machine Learning
19 pages
IDS 572 - Data Mining Assignment 5
No ratings yet
IDS 572 - Data Mining Assignment 5
6 pages
Deep Learning
No ratings yet
Deep Learning
189 pages
Btech Cs 5 Sem Application of Soft Computing Kcs056 2021
100% (1)
Btech Cs 5 Sem Application of Soft Computing Kcs056 2021
2 pages
Power BI Interview Question Answer
No ratings yet
Power BI Interview Question Answer
12 pages
A PID Automatic Tuning Method For Distributed-Lag Processes (2009)
No ratings yet
A PID Automatic Tuning Method For Distributed-Lag Processes (2009)
7 pages
Tutorial Xinda-Hu Imp
No ratings yet
Tutorial Xinda-Hu Imp
8 pages
Signals & Systems: Dr. Ranjeet Kumar
No ratings yet
Signals & Systems: Dr. Ranjeet Kumar
17 pages
Text Mining With R
No ratings yet
Text Mining With R
15 pages
SS Assignment 3 22-23 Sem 1
No ratings yet
SS Assignment 3 22-23 Sem 1
1 page
Information Technology s7 & s8
No ratings yet
Information Technology s7 & s8
317 pages
Convolutional Coding Presentation
No ratings yet
Convolutional Coding Presentation
23 pages
Major Project Presentation Harsh
No ratings yet
Major Project Presentation Harsh
12 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
1 page
1 s2.0 S0109564123000416 Main
No ratings yet
1 s2.0 S0109564123000416 Main
13 pages
Mock Endsem Question Paper Image ProcessingElective V INSEM SEM
No ratings yet
Mock Endsem Question Paper Image ProcessingElective V INSEM SEM
3 pages
Document 2484229.1 Recover Oracle Error
No ratings yet
Document 2484229.1 Recover Oracle Error
3 pages
Potato Disease Classification Using Convolutional Neural Networks
No ratings yet
Potato Disease Classification Using Convolutional Neural Networks
2 pages
CSPC - 204
No ratings yet
CSPC - 204
4 pages
Cbse - Department of Skill Education: Artificial Intelligence (Subject Code 843)
100% (1)
Cbse - Department of Skill Education: Artificial Intelligence (Subject Code 843)
7 pages
MFT Question Bank
No ratings yet
MFT Question Bank
11 pages
SQL Joint
No ratings yet
SQL Joint
5 pages
2203 04822
No ratings yet
2203 04822
18 pages
AI Magazine - 2021 - Steck - Deep Learning For Recommender Systems A Netflix Case Study
No ratings yet
AI Magazine - 2021 - Steck - Deep Learning For Recommender Systems A Netflix Case Study
12 pages
A Review On Churn Prediction and Customer Segmentation Using Machine Learning
No ratings yet
A Review On Churn Prediction and Customer Segmentation Using Machine Learning
5 pages
Artificial Intelligence C 1&2
No ratings yet
Artificial Intelligence C 1&2
19 pages
Lecture Slide 1: Introduction To Control Systems
No ratings yet
Lecture Slide 1: Introduction To Control Systems
39 pages
Sentiment Analysis Using Twitter Data
No ratings yet
Sentiment Analysis Using Twitter Data
7 pages
Octave MLP Neural Networks
No ratings yet
Octave MLP Neural Networks
25 pages