0% found this document useful (0 votes)
177 views

Prediction of Insurance Fraud Detection Using Machine Learning Algorithms

This document summarizes a research paper that compares various machine learning algorithms for detecting insurance fraud. It finds that decision trees achieved the highest accuracy at 79%, slightly higher than Adaboost at 78%, for classifying insurance claims as fraudulent or not using supervised learning techniques. The paper aims to develop an effective method for insurance companies to detect fraud using algorithms like support vector machines, random forests, decision trees, and neural networks applied to insurance claim data.

Uploaded by

Amare Lakew
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
177 views

Prediction of Insurance Fraud Detection Using Machine Learning Algorithms

This document summarizes a research paper that compares various machine learning algorithms for detecting insurance fraud. It finds that decision trees achieved the highest accuracy at 79%, slightly higher than Adaboost at 78%, for classifying insurance claims as fraudulent or not using supervised learning techniques. The paper aims to develop an effective method for insurance companies to detect fraud using algorithms like support vector machines, random forests, decision trees, and neural networks applied to insurance claim data.

Uploaded by

Amare Lakew
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Mehran University Research Journal of Engineering and Technology

Vol. 41, No. 33 - 40, January 2022


p-ISSN: 0254-7821, e-ISSN: 2413-7219
DOI: https://doi.org/10.22581/muet1982.2201.04

Prediction of Insurance Fraud Detection using Machine


Learning Algorithms
Laiqa Rukhsar1a, Waqas Haider Bangyal1b, Kashif Nisar2a, Sana Nisar2b

RECEIVED ON 23.01.2021, ACCEPTED ON 05.05.2021


ABSTRACT
In current era, people are influenced with various types of insurance such as health insurance, automobile
insurance, property insurance and travel insurance, due to the availability of extensive knowledge related to
insurance. People are trending to invest in such kinds of insurance, which helps the scam artist to cheat them.
Insurance fraud is a prohibited act either by the client or vendor of the insurance contract. Insurance fraud
from the client side is encountered in the form of overestimated claims and post-dated policies etc. Although,
insurance fraud from the vendor side is experienced in the form of policies from non-existent companies and
failuew to submit premiums and so on. In this paper, we perform a comparative analysis on various
classification algorithms, namely Support Vector Machine (SVM), Random-Forest (RF), Decision-Tree (DT),
Adaboost, K-Nearest Neighbor (KNN), Linear Regression (LR), Naïve Bayes (NB), and Multi-Layer
Perceptron (MLP) to detect the insurance fraud. The effectiveness of the algorithms are observed on the basis
of performance metrics: Precision, Recall and F1-Score. The comparative results of classification algorithms
conclude that DT gives the highest accuracy of 79% as compared to the other techniques. In addition to this,
Adaboost shows the accuracy of 78% which is closer to the DT.

Keywords: Insurance, Fraud Detection, Supervised Learning, Classification Algorithm, Random Forest, SVM,
Decision Tree

1. INTRODUCTION investigation costs also cause hindrance in detecting


fraud cases. Therefore companies go without carrying

T
he major issue faced by insurance companies out appropriate investigations that lead to several
is a fraud that causes immense loss to future pitfalls. Manual fraud detection being costly
insurance companies sometimes beyond and inefficient is outdated now; we need to investigate
repair. The main concern is to avoid fraudulent the fraud before the claim payment. Different machine
activities at any cost because combating fraud cases learning and data mining techniques have proven to be
specifically in insurance companies is a challenging promising in detecting frauds.
task. It is reported that 21% - 36% of cases of auto-
insurance claims are suspected to be fraudulent but Machine Learning (ML) is a sub-area of Artificial
only 3% of cases are prosecuted [1]. The first step to Intelligence with the main aim to mimic human
avoiding fraudulent cases is to detect them which is intelligence abilities. ML focuses on constructing
quite difficult and not "cost-effective” as well because models with high prediction capabilities. The most
the lengthy and cumbersome investigations may basic feature is “Learning” which is done by looking
infuriate the authentic customers [2]. Higher at the given data. The two basic learning techniques
are Supervised and Unsupervised. In supervised

1
Department of Computer Science, University of Gujrat, Gujrat, Pakistan. Email: [email protected],
b
[email protected] (Corresponding Author)
2
Faculty of Computing and Informatics, University Malaysia Sabah, K.K. Sabah, Malaysia.
Email: [email protected], [email protected]
This is an open access article published by Mehran University of Engineering and Technology, Jamshoro under CC BY 4.0
International License.
33
Prediction of Insurance Fraud Detection using Machine Learning Algorithms

learning, we are provided with fully labeled data that Steps wee in this sequence: first of all, a patient graph
means in the training data against each input we have was constructed based on most similar info for the
the desired result as well. It is highly useful for solving patient level hospital admission. Then a clustering-
problems of classification and regression. In based graph algorithm was used for finding the peak
classification, the aim is to predict a discrete value and real meaning for individual clusters. Lastly, the
whereas regression deals with continuous data. On difference in the patient cluster was found and the
contrary, in an unsupervised learning paradigm, we are probability of fraud for each patient was calculated.
provided with unlabeled data where results are not The comparison was made with other state of the art
known. In a fraud detection scenario in a supervised algorithms i.e. Decision Trees, Support Vector
learning method we can find out fraud and legal cases Machines, GridLOF, BP Growth, MLP and LSTM. It
from training data but in unsupervised learning, we was claimed that the proposed approach produced the
cannot infer which one is a fraud case and which one highest accuracy.
is legal.
Dhieb et al. [7] proposed a method based on an
One major task in ML is data classification that is also Extreme Gradient Boosting (XGBoosting) algorithm
considered as pattern recognition. A classification for detecting frauds in insurance agreements. They
problem is encountered when there is an urge to presented an online learning solution for meeting
classify an instance in an already defined class based online time to time requirements. The proposed
on its similarity to other instances classified into that method combined AI-based methods with blockchain
class [3]. In classification, the aim is to develop such architecture to get better security. The proposed
algorithms that dare to create models that can be used machine learning technique worked in the following
to differentiate the exemples/instances based on way: first preprocessing and cleaning of data was
recognized patterns [4]. Classification is important for performed then data visualization techniques were
several different applications such as voice used to get insights of data. The third step was to store
recognition, image classification, text categorization data privacy by not disclosing personal information
etc. [5]. There are many classification algorithms and lastly, model building using XGBoosting provided
known that are proved highly beneficial in solving the probability of frauds in the future based on
real-world problems. The most famous are K Nearest information available. A very fast Decision tree
Neighbor, Support Vector Machine, Decision trees algorithm was designed for online learning solutions.
and Neural Networks. Comparison of XGBoosting was made with other
machine learning classifiers i.e. Decision Tree, Naïve
ML can have wide range of applications, the most Bayes, and Nearest Neighbor and the proposed
prominent ones are Social Media Services, Online methodology was better than all in terms of accuracy.
Customer support, Email Spam Filtering, Fraud
detection, Product Recommendation, and the list goes Kirlidog and Asuk [8] used the Support Vector
on. Machine (SVM) for the detection of health insurance
frauds and anomalies. Different data mining
2. LITERATURE REVIEW approaches were also discussed in their research.
Research was done on the dataset of Turkish insurance
Sun et al [6] f presented a novel approach for detecting companies which contained the total claimed records
frauds, called Patient Cluster Divergence-based and other information of clients. The system was
Healthcare Insurance Fraudster Detection (PCDHIFD) implemented in Oracle by using SVM with a linear
in presence of camouflage responses. For the kernel. The training was done by classifying the
experimental purpose, the health care dataset was records in genuine claims and anomalies. SVM made
chosen and the dataset comprised of around 40M classification by comparing individual records with
admission records of 10000 patients of the previous genuine and fake claims. The system then calculated
five years. The proposed technique worked in 3 steps the probability for every single record and if the
for three basic records: Life history of patients, probability was higher than 50% than the record was
diagnosis record, and medical practitioners attended.
Mehran University Research Journal of Engineering and Technology, Vol. 41, No. 1, January 2022 [p-ISSN: 0254-7821, e-ISSN: 2413-7219]

34
Prediction of Insurance Fraud Detection using Machine Learning Algorithms

considered as an anomaly. Anomalies were considered for detection of Insurance fraud. A comparative
based on three conditions: how many claims were analysis is also presented for the application.
rejected, how many uncontrolled claims were found in
health center types, and how many claims were 3. TYPES OF CLASSIFICATION
identified in health centers. Data mining approaches ALGORITHMS
like grouping, classification, and variance detection
could be used in insurance fraud detection, and based Many machine learning algorithms are being used in
on the previous data we can predict future fraud claims various fields of research to help in solving the real-
using these techniques. world problems. Mostly used machine learning
classification algorithms are discussed below:
Bhowmik [9] applied different machine learning
approaches for predicting and assessing fraud in 3.1 Support Vector Machine (SVM)
automobile insurance. Machine learning approaches
used in this research were Bayesian Networks, The SVM is a popular machine learning classifier that
Decision Trees, and rule-based algorithms. This work is used in our research. It is being applied for both
created two Bayesian networks based on the linear and nonlinear problems in real-world domain
assumptions i.e. driver is cheating and the driver is areas [11]. A hyperplane is used to separate instances
honest. And these two probabilities were calculated of classes in SVM. Because of its kernel function
separately. And the one with the higher probability which is used to convert low dimension space to high
was considered as output. Decision trees were based dimension space SVM is best suited for nonlinear
on subtrees or labels known as classes i.e. legal and classification problems. Summarizing we can say that
fraud in the research. Further Gini, minority, or SVM can be used for classifying instances in complex
entropy measures were used to get the impurity within problems in an efficient way.
a class and get the final output. Rule-based system
proceeded in with if-then rules and here conditions 3.2 Linear Regression (LR)
were driver’s age, driver’s rating, and auto age.
Results and performance were shown in the form of Regression helps to find out the relationship between
confusion matric and the accuracy was good. input and target variable. Linear regression is
supervised ML algorithm that instead of classifying
Liu et al. [10] proposed a new technique for insurance into different categories predicts a quantitative
fraud detection for an imbalanced dataset. The novel response within a continuous range of values, output
technique was based on data partitioning under- has a constant slope. There are two types of linear
sampling with and without replacement on the regression Simple regression and Multivariable
majority class, and then it merges with the minority regression. By the term linear we understand that the
class. Tenfold cross-validation is used for testing two variables been on x and y axis are linearly
purposes. The proposed methodology was based on correlated. Linear Regression has been widely used in
the idea of choosing the best from data partitions Price prediction, Trends Prediction and Risk
under-sampling. The models used for insurance fraud Management etc.
detection were Support Vector Machines, Decision
3.3 Naïve Bayes (NB)
Trees, and artificial neural networks. Experiments
were carried on a publicly available dataset containing NB is a very popular classifier based on Bayes
the records of different automobile insurance claims. Theorem. It works on the probability of instances for
Results showed that the Decision Trees were the best each class. Its reasons of its popularity include its
among all the classifiers. It was demonstrated that the simplicity, correctness, and authenticity. Though it has
technique outperformed the previous work done and applications in many fields of life but NB has the most
its accuracy was the best. implemented work in the Natural Language
processing, Hybrid recommender system, text
In this paper, eight popular ML algorithms are used
classification, and spam filtering [12]. Its name Naïve
Mehran University Research Journal of Engineering and Technology, Vol. 41, No. 1, January 2022 [p-ISSN: 0254-7821, e-ISSN: 2413-7219]

35
Prediction of Insurance Fraud Detection using Machine Learning Algorithms

is just because of its simple assumption that each of its required [14]..
attributes has an independent identity and not depends
on any other feature. By using past information it 3.7 Random Forest (RF)
computes the probability for each attribute.
Leo Breiman and Adèle Cutler proposed the RF
3.4 Adaboost classifier in 2001. It works by utilizing the combined
effect of two concepts “bagging” and “subspaces”
Adaboost which is also called Adaptive Boosting. This [15]. From the training dataset, a set of decision trees
algorithm is famous for its quick boosting in machine are built and decision i.e. label is predicted based on
learning. Boosting algorithms are very suitable for votes collected from these decision tresses [16]. RF
transforming a lazy learner into an eager learner [12]. provides high accuracy and is mainly used for
The basic purpose of adaptive boosting is to enhance classifying large datasets due to its ability to handle
the predictive ability of lazy learners with the help of missing values. The application domains of RF
training. To have a strong learner Adaboost merges include remote sensing, e-commerce, stock market,
many weak and slow learners. At the start of the fraud detection, network intrusion detection, and so
algorithm weights of each attribute are identical and on.
by the further run of the algorithm, weights start to
update. 3.8 Multi-Layer Perceptron

3.5 KNN MLP belongs to the class of feed-forward artificial


neural networks (ANN). An ANN mimics the working
K Nearest neighbor algorithm is used to classify the behavior of the human brain. The main inspiration
instances to the neighbor with the majority vote. The behind the ANN is the way the brain receives input,
nearest neighbor i.e. the neighbor with the smallest processes it, and produces output. The basic unit of
distance is found by using some distance metric. The ANN is a Perceptron. Each perceptron has some
most common distance measure used is the Euclidean weight value associated with it and it generates output
distance [13]. Distance is determined between test and using the activation function. ANN works by learning
training instances. After determining the distance a representation from training data and further relating
feature value is calculated of all the nearest neighbor it with the desired output variable. ANN has many
training examples and the majority of this value is real-world applications such as Data Compression,
considered as prediction value based on which new Character Recognition, Computer Vision, Pattern
test dataset is categorized. KNN is highly Recognition, and Robotics.
recommended in scenarios where accurate prediction
is required due to its effectiveness and simplicity. 4. METHODOLOGY

3.6 Decision Trees The methodological approach can be evaluated in


three main steps:
Decision trees enable to present results in the form of
a tree. In the decision trees, inner nodes are used to 4.1 Data Extraction and Preparation
represent the attributes descriptively whereas leaves
are labeled with classes. Decision trees are made Before the various classification approaches to
upside-down top node is called the root. They are describe, it is important to introduce the data to be
widely used in data mining due to their simplicity and analyzed for predicting the fraud. This study is
robustness. Decision trees work by selecting the best analyzed with an auto insurance fraud dataset. The raw
feature that yields maximum information for the dataset contains more than a thousand customers with
classification. The classifier stops when all the leaf 36 attributes. Fig.1 shows various age groups of
nodes have become pure. A leaf node is said to be pure policyholders and Fig.2 presents the amount ranges of
when all instances belong to the same class or decision annual premium. The success of any classifier not only
tree is complete and no further classification is depend upon the type of model to be used. Quality of
the training data is also important for Satisfactory
Mehran University Research Journal of Engineering and Technology, Vol. 41, No. 1, January 2022 [p-ISSN: 0254-7821, e-ISSN: 2413-7219]

36
Prediction of Insurance Fraud Detection using Machine Learning Algorithms

results. To achieve better result, data pre-processing Categorical data is the factual information
strategy is employed. comprising of categorical variables or data. First, the
dataset is explored for categorical data. When we
consider the dataset, it may have non-categorical data
such as insured_sex, PoliceReportFiled,
WitnessPresent, insured_hobbies etc. which is
converted into categorical data by using one-Hot
encoder so integer encoded variable is removed and a
new binary variable is added for each unique integer
value. The model performs progressively when
features are on a relative scale close to normally
Fig. 1: Age Group of Policyholders distributed.

Suppose one of the features has an outlier, then the


distance will be governed by this feature. Secondly,
the gradient descent converges much faster with
feature scaling. Here the dataset is exposed to both
MinMaxScaler and StandardScaler to make the data
featured and close to normal distribution.

4.3 Proposed Work


Fig. 2: Range of Policy Annual Premium Amount
The fraud detection approach involves number of
4.2 Data Pre-Processing stages. Fig.3 shows the overview of the fraud detection
system.
Data Pre-processing considerably inhibits in Data
Mining. Clean data is usually not possible and it may 5. EXPERIMENTAL RESULTS
contain impossible combinations, missing values,
noise, inconsistencies, etc. [17]. The quality of the data For performance evaluation, we have computed five
is the first and foremost requirement before applying metrics: accuracy, Precision, Recall, F1-Score, and
the algorithm [18]. Data pre-processing may affect confusion matrix. Where Precision is the portion of
the way the outcomes of the ultimate is interpreted relative cases among the retrieved occasions, while
[19].
Pre Processing

Logistic Decision
Tree KNN SVM
Regression

Random Naive
Adaboost Forest Bays MLP

Prediction

Evaluation
Fig. 3: Fraud Detection System

Mehran University Research Journal of Engineering and Technology, Vol. 41, No. 1, January 2022 [p-ISSN: 0254-7821, e-ISSN: 2413-7219]

37
Prediction of Insurance Fraud Detection using Machine Learning Algorithms

Recall is the division of the aggregate sum of relative (3):


cases that are retrieved. F1-Score is the average of Precision = (1)
Precision and Recall, while the Confusion Matrix is
the measure of performance of an ML algorithm as Precision = (2)
explained in Table 1 and Table 2. × !×"
F1 Score = ! "
(3)

In this paper, we consider an auto insurance fraud


Fig. 4 and Fig. 5, demonstrate the Performance
detection dataset and execute a sample that contains
Metrics: Precision, Recall, and F1-Score that ranges
110 customers with corresponding attributes. Table I
shows that the eight classification models have been
validated using evaluation metrics such as precision, Prediction "Yes" class
recall, and F1-score with corresponding Macro and 0.7

Performance Metrics
0.6
weighted average as in Table 2. The results of the 0.5
experiment have shown that Decision-Tree 0.4
0.3
outperforms in all aspects such as execution time, non- 0.2
sensitive to outliers, and the reduction of noise. The 0.1
0
results obtained using the Classification algorithm
outshines using real sample obtained from the reliable
repository. For all the experiments in this section, the
performance shown is based on the test dataset. Also,
the Adaboost almost gave better classification Classification Algorithms
accuracy close to the Decision Tree. The classification Precision Recall F1 Score
accuracy of Adaboost is 78%. The Precision, Recall,
Fig. 4: Performance Metric for Yes class
F1-Score are computed using the equation. (1), (2) and

Table 1: Macro and weighted average of Precision, Recall and F1 Score


Classification Algorithms
Metrics Average Linear Decision Random Naïve
KNN SVM Adaboost MLP
Regression Tree Forest Bayes
Macro 0.50 0.73 0.50 0.50 0.66 0.70 0.50 0.50
Precision
Weighted 1.00 0.78 1.00 1.00 0.82 0.80 0.80 1.00
Macro 0.36 0.73 0.36 0.36 0.72 0.72 0.72 0.36
Recall
Weighted 0.72 0.79 0.72 0.72 0.78 0.78 0.77 0.72
Macro 0.42 0.73 0.42 0.42 0.42 0.68 0.71 0.42
F1 Score
Weighted 0.84 0.78 0.84 0.84 0.84 0.79 0.79 0.84

Table 2: Confusion matrix, Accuracy, Precision, Recall, and F1 score


Metrics Classification Algorithms
Linear Decision KNN SVM Adaboost Random Naïve MLP
Regression Tree Forest Bayes
Confusion [[145 55] [[123 24] [[145 55] [[145 55] [[132 32] [[126 29] [[135 51] [[145 55]
Matrix [0 0]] [22 31]] [0 0]] [0 0]] [13 23]] [19 26]] [0 0]] [0 0]]
Precision 0 1.00 0 0.85 0 1.00 0 1.00 0 0.91 0 0.88 0 0.87 0 1.00
1 0.54 1 0.62 1 0.51 1 0.64 1 0.42 1 0.51 1 0.41 1 0.40
Recall 0 0.72 0 0.85 0 0.72 0 0.72 0 0.80 0 0.83 0 0.81 0 0.72
1 0.53 1 0.61 1 0.34 1 0.36 1 0.64 1 0.62 1 0.42 1 0.44
F1 Score 0 0.84 0 0.85 0 0.84 0 0.84 0 0.85 0 0.85 0 0.82 0 0.84
1 0.54 1 0.61 1 0.34 1 0.32 1 0.51 1 0.56 1 0.41 1 0.34
Accuracy 73% 79% 77% 73% 78% 76% 73% 73%
Mehran University Research Journal of Engineering and Technology, Vol. 41, No. 1, January 2022 [p-ISSN: 0254-7821, e-ISSN: 2413-7219]

38
Prediction of Insurance Fraud Detection using Machine Learning Algorithms

Prediction "No" class No.1, pp. 58–75, 2016.


1.2 2. Kirlidog M., Asuk C., A Fraud Detection
Performance Metrics

1
0.8 Approach with Data Mining in Health Insurance.
0.6 Procedia - Social and Behavioral Sciences, Vol.
0.4 62, pp. 989–994, 2012.
0.2
3. Sathya R., Abraham A., “The Science and
0
Information Organization Editorial Preface”,
International Journal of Advanced Research in
Artificial Intelligence, Vol.2, No.2, pp. 34–38,
2013.
4. Rätsch G., “A brief introduction into machine
Classification Algorithms
learning”, Proceedings of the 21st Chaos
Precision Recall F1 Score
Communication Congress, 1–6, Berlin, Germany,
Fig. 5: Performance Matric for No Class 27-29 December 2004.
5. Wang H., Shi Y., Zhou X., Zhou Q., Shao S.,
from 0 to 1. The value will be 1 when the system Bouguettaya A., “Web service classification
performs well. using support vector machine”, Proceedings of
the International Conference on Tools with
6. CONCLUSIONS AND FUTURE Artificial Intelligence, Vol. 1, pp. 3–6, Arras,
WORK France, 27-29 October 2010.
6. Sun C., Li Q., Li H., Shi Y., Zhang S., Guo W.,
In our research, the classification algorithms namely “Patient Cluster Divergence Based Healthcare
Random-Forest, Decision –Tree, Support Vector Insurance Fraudster Detection”, IEEE Access,
Machine, K-Nearest Neighbor, Adaboost, Linear Vol. 7, pp. 14162–14170, 2019.
Regression, Naïve Bayes, and Multi-Linear 7. Dhieb N., Ghazzai H., Besbes H., Massoud Y., “A
Perceptron are employed to detect fraud. We audited Secure AI-Driven Architecture for Automated
various techniques and conducted experiments on the Insurance Systems: Fraud Detection and Risk
auto-insurance dataset from a reliable repository to Measurement”, IEEE Access, Vol. 8, pp. 58546–
find or adapt the best classifier for the fraud detection 58558, 2020.
system. Furthermore, the system has been analyzed in 8. Kirlidog M., Asuk C., “A Fraud Detection
the aspects of precision, recall, and F1-score for all the Approach with Data Mining in Health Insurance”,
algorithms. Procedia - Social and Behavioral Sciences, Vol.
62, pp. 989–994, 2012.
In the future, the fraud detection method can be 9. Bhowmik R., “Detecting Auto Insurance Fraud by
extended to the Adaptive Neuro-Fuzzy Inference Data Mining Techniques”, Journal of Emerging
System (ANFIS) which is the combination of both Trends in Computing and Information Sciences,
Neuro-Fuzzy and Neural Networks. Hence, the Vol. 2, No.4, pp. 156–162, 2011.
prediction can be made more accurate and Hidden 10. Liu S., Yang B., Wang L., Abraham A.,
Markov Model (HMM) to predict fraud using internal “Advances in Nature and Biologically Inspired
factors. Computing”, Advances in Intelligent Systems and
Computing, Vol. 419, 2016.
11. Bennett K. P., Campbell C., “Support vector
REFERENCES machines: hype or hallelujah?”, ACM SIGKDD
Explorations Newsletter, Vol. 2, No. 2, pp. 1–13,
1. Nian K., Zhang H., Tayal A., Coleman T., Li Y. 2000.
(2016). “Auto insurance fraud detection using 12. Noor A., Islam M., “Sentiment Analysis for
unsupervised spectral ranking for anomaly”, The Women’s E-commerce Reviews Using Machine
Journal of Finance and Data Science, Vol. 2, Learning Algorithms”, Proceedings of the 10th
International Conference on Computing,
Mehran University Research Journal of Engineering and Technology, Vol. 41, No. 1, January 2022 [p-ISSN: 0254-7821, e-ISSN: 2413-7219]

39
Prediction of Insurance Fraud Detection using Machine Learning Algorithms

Communication and Networking Technologies 16. Ankit, Saleena N., “An Ensemble Classification
(ICCCNT), pp. 1–6, Kanpur, India, 6-8 July 2019. System for Twitter Sentiment Analysis”,
13. Priya B. G., “Emoji Based Sentiment Analysis Procedia Computer Science, Vol. 132, pp. 937–
Using Knn”, International Journal of Scientific 946, 2018.
Research and Reviews, Vol. 7, No.4, pp. 859-865, 17. Han J., Kamber M., Pei J., Introduction. In Han J.,
Kamber M., Pei J. (Eds.): Data Mining (Third
2019.
Edition), pp. 1-38, Morgan Kaufmann, 2012.
14. Suresh A., Bharathi C. R., “Sentiment 18. Zhang S., Zhang C., Yang Q. (2003), “Data
Classification using Decision Tree Based Feature preparation for data mining”, Applied Artificial
Selection Sentiment Classification using Decision Intelligence, Vol. 17, No. 5–6, 2003.
Tree Based Feature Selection”, International 19. Oliveri P., Malegori C., Simonetti R., Casale M.,
Journal of Control Theory and Applications, Vol.
9, No. 36, pp. 419–425, 2016. “The impact of signal pre-processing on the final
15. Mohamed L., Kamal E. E. K., Yassine A. A., interpretation of analytical outcomes – A
“Random Forest and Support Vector Machine tutorial”, Analytica Chimica Acta, Vol. 1058, pp.
based Hybrid Approach to Sentiment Analysis”, 9–17, 2019.
Procedia Computer Science, Vol. 127, pp. 511–
520, 2018.

Mehran University Research Journal of Engineering and Technology, Vol. 41, No. 1, January 2022 [p-ISSN: 0254-7821, e-ISSN: 2413-7219]

40

You might also like