0% found this document useful (0 votes)
1 views

Sentiment_Analysis_on_E-commerce_Product_using_Mac

Uploaded by

Aastha Raythatha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Sentiment_Analysis_on_E-commerce_Product_using_Mac

Uploaded by

Aastha Raythatha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

International Journal of Recent Technology and Engineering (IJRTE)

ISSN: 2277-3878 (Online), Volume-8 Issue-6, March 2020

Sentiment Analysis on E-commerce Product


using Machine Learning and Combination of
TF-IDF and Backward Elimination
Tommy Willianto, Supryadi, Antoni Wibowo
 products become easier with online e-commerce, so people
Abstract: E-commerce is a website or mobile application don’t need to go outside to buy things. E-commerce
platform that help people to buy products. Before purchasing the (electronic commerce) is selling, distributing, and marketing
product, customer will decide to buy it or not by reading the review
from previous buyer. There is a problem that there are a lot of
products and services using electronic systems [1].
review so it will take a long time for customer to read it all. This Tokopedia, Shopee, and Bukalapak are the three best
research will be using sentiment analysis method to classify the e-commerce in Indonesia currently [2].
review data. Sentiment analysis or opinion mining is a machine In marketplace’s web application and mobile application,
learning approach to classify and analyse texts or documents there are features of reviews and ratings that can be given by
about human’s sentiments, emotions, and opinions. In this
customers after making a product purchase. Many
research, sentiment analysis was used to classify product reviews
from e-commerce websites into positive or negative classes. The e-commerce websites encouraging online users to post their
results could be processed further and be used to summarize evaluations on product or the service. Reviews and ratings can
customers' opinions about a certain product without reading every be information for sellers to find out feedback from
single review. The goal of this research is to optimize classification customers, also useful for other customers to find out whether
performance by using feature selection technique. Terms the store is reliable and good [3]. However, if there are many
Frequency-Inverse Document Frequency (TF-IDF) feature
extraction, Backward Elimination feature selection, and five
purchase transactions, there are also lots of reviews on the
different classifiers (Naïve Bayes, Support Vector Machine, product. Too many reviews make users lazy to read
K-Nearest Neighbour, Decision Tree, Random Forest) were used everything one by one.
in analysing the sentiment of the reviews. In this research, the Sentiment analysis or opinion mining is a machine learning
dataset used are Indonesian language and classified into two approach to classify and analyse texts or documents about
classes(positive and negative). The best accuracy is achieved by
human’s sentiments, emotions, opinions. It will classify texts
using TF-IDF, Backward Elimination and Support Vector
Machine (SVM) with a score of 85.97%, which increases by 7.91% into some classes according to the amount of label from data
if compared to the process without feature selection. Based on the training. With machine learning, reading all of the reviews is
results, Backward Elimination feature selection succeeded in rather time consuming where we can summarize the review on
improving all performance for all classifiers used in this research. particular category [4]. The importance of sentiment analysis
is increasing as the amount of opinion data increases. So the
Keywords : Backward elimination, e-commerce,
sentiment-analysis, TF-IDF (terms frequency-inverse document machine needs to be more reliable and efficient [5].
frequency) In this paper, the research focuses on combining TF-IDF
and feature selection with different classification algorithms.
I. INTRODUCTION The algorithms that are used in this research are Support
Vector Machine (SVM), Naive Bayes, Decision Tree,
In this era, information technology plays a big role to make
K-Nearest Neighbour (K-NN), and Random Forest. The
people’s life more comfortable. Even purchasing and selling
feature selection using forward selection algorithm.

II. RELATED WORKS


Manuscript received on February 10, 2020.
Revised Manuscript received on February 20, 2020.
Kamilah research about sentiment analysis for product
Manuscript published on March 30, 2020. review using Naive Bayes algorithm. The dataset retrieved
* Correspondence Author from the Tokopedia website and translated to English, with
Tommy Willianto, Computer Science Department, Binus Graduate 200 dataset training and 20 dataset testing. Data
Program – Master of Computer Science, Bina Nusantara University, Jakarta,
Indonesia, 11480. Email: [email protected] pre-processing with tokenization, filtering, stemming, and
Supryadi, Computer Science Department, Binus Graduate Program – transformation. Then the data classified to positive and
Master of Computer Science, Bina Nusantara University, Jakarta, Indonesia, negative classes using Naive Bayes algorithm. Validation
11480. Email: [email protected]
Antoni Wibowo, Computer Science Department, Binus Graduate with cross validation. Accuracy result 77% [6].
Program – Master of Computer Science, Bina Nusantara University, Jakarta, Different algorithms are used in Hario research. The
Indonesia, 11480. Email: [email protected] research also compares extraction methods between Terms
Frequency-Inverse Document Frequency (TF-IDF) and N-gram
© The Authors. Published by Blue Eyes Intelligence Engineering and
Sciences Publication (BEIESP). This is an open access article under the CC using SVM classifier. The
BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/) retrieved datasets are Bahasa
Indonesia.
Retrieval Number: F7889038620/2020©BEIESP
DOI:10.35940/ijrte.F7889.038620 Published By:
Journal Website: www.ijrte.org Blue Eyes Intelligence Engineering
2862 & Sciences Publication
Sentiment Analysis on E-commerce Product Review using Machine Learning and Combination of TF-IDF and
Backward Elimination

The best result obtained from using unigram and SVM After pre-processing, the data weighted using TF-IDF. The
method with 80,87% accuracy [7]. data classified using Naive Bayes algorithm. Accuracy
Billy works on sentiment analysis using Naive Bayes achieved was 93.3%.
algorithm with different amounts of classes and data training. Sudheer also works on real time sentiment analysis of
The datasets are Bahasa Indonesia sentiments classified into 3 tweets about e-commerce websites data. The collected tweets
classes (positif, netral, negatif) and 5 classes (sangat positif, about e-commerce are Amazon 50000 tweets, eBay 25000
positif, netral, negatif, sangat negatif). The dataset split into tweets, and Alibaba 25000 tweets. The work focuses on
different amounts of data training, 80% and 90%[6]. The comparing accuracy with different classifiers, feature
highest accuracy achieved was 77.78% that classified into 3 selection, and datasets. The algorithms used in this paper are
classes and using 90% data training [8]. Naive Bayes, Maximum Entropy, and Decision Tree. The
Twitter users' opinions about the service of the feature selection used are document frequency and part of
marketplaces are used in Muljono work. The dataset retrieved speech tag. The finest result is data set from amazon
with crawling opinions from twitter using twitter API. e-commerce, much of the time Naive Bayes classifier
Collected data was 1200 Bahasa Indonesia opinion data [9]. outperformed the other classifier [10].

Table-I: Related Works


No Title Author Problem Method Result

1 Analisa Sentimen Pelanggan Ai Nurhayatul Kamilah Product quality has not Using Naive Bayes Accuracy 77%
Tokopedia Menggunakan been conveyed Classification,
Algoritma Naive Bayes properly to customers Tokopedia product
Berdasarkan Review review dataset translated
Pelanggan to English, 200 data
training, 20 data testing

2 Support Vector Machine Hario Laskito Ardi, Eko Compare TF-IDF, n-gram using The unigram character
Classifier for Sentiment Sediyono, Retno characteristics Support Vector Machine analysis model and the
Analysis of Feedback Kusumaningrum Support Vector Machine
Marketplace with a analysis to get the best classification are the best
Comparison Features at classification results. models with an accuracy
Aspect Level value of 80.87%.

3 Sistem Analisis Sentimen Billy Gunawan, Helen Difficulty to read all of Classification with Naive Accuracy 77.78% (3 class and
pada Ulasan Produk Sasty Pratiwi, Enda the reviews and Bayes Algorithm. 4 90% data training), 73.89%(3
Menggunakan Metode Naive Esyudha Pratama opinions because the testing, classified into 3 class and 80% data training),
Bayes data too much classes and 5 classes with 59.33%(5 class and 90% data
2 different amount data training), 52.66%(5 class and
training 80% data training)

4 Analisa Sentimen Untuk Muljono, Dian Putri Consumer use social Using Naive Bayes Accuracy 93.3%
Penilaian Pelayanan Situs Artanti, Abdul Syukur, media to express their Classification Algorithm,
Belanja Online Menggunakan Adi Prihandono, De opinion about the 1200 dataset from twitter.
Algoritma Naive Bayes Rosal I. Moses Setiadi services of online
marketplace

5 Real Time Sentiment Analysis Prof. K. Sudheer, Dr. B. The e-commerce 50000 datasets from The best accuracy 92%
of E-Commerce Websites Valarmathi websites only maintain amazon tweets, 25000 obtained from amazon using
Using Machine Learning positive rating dataset from eBay tweets, document frequency feature
Algorithms 25000 dataset from selection
Alibaba tweets

III. METHODOLOGY
Sentiment analysis is a computational methodology to
identify and extract the sentiment contents in text, speech, or
database. Sentiment analysis also characterized emotions,
subjective impression, and opinions [11]. Classifying
sentiments on e-commerce product review with the best
performance is the main goal in this paper. Sentiments will be
Fig. 1. Research Methodology
classified into either positive or negative classes. Fig. 1.
shows the steps of the proposed work to accomplish the
expected results.

Retrieval Number: F7889038620/2020©BEIESP


DOI:10.35940/ijrte.F7889.038620 Published By:
Journal Website: www.ijrte.org Blue Eyes Intelligence Engineering
2863 & Sciences Publication
International Journal of Recent Technology and Engineering (IJRTE)
ISSN: 2277-3878 (Online), Volume-8 Issue-6, March 2020

A. Data Collection and Labelling  Stemming


The first step in this research methodology is data The sentiment analysis will work better with stem words,
collection from the marketplace’s website using web so the words need to be transformed to the original form.
scraping. Mitchell(2018) defined that web scraping is Stemming is transforming the word into its stem form by
gathering data sourced from the internet. It works with removing prefix, suffix, and infix. In this research, the
accessing the web page, selecting the data elements, extract stemming process using PHP library from
and store it into a structured dataset [12]. https://github.com/sastrawi/sastrawi [15].
Data is collected by scraping from web pages with the help  Transform Cases
of Data Miner Google Chrome Extension. A total of 1500 A word consists of letters with different cases such as
reviews is acquired equally from the official website of upper case and lower case. To standardize letter cases, all
Tokopedia, Shopee, and Bukalapak in order to obtain a letters are converted to its lower case. Transforming
variety of data. The collected data will then be exported to an cases also improves the consistency of the data. [16]
Excel spreadsheet. After data is successfully collected, the  Filter Stopwords
labelling process will take place. Labelling is a process of Stopword is a vocabulary that is not unique words from a
tagging wherein in this case is to set a positive or negative document. The examples are “di”, “oleh”, “pada”,
label to classify the sentiment of a review. Results of this “sebuah”, “karena”, etc. The stopwords will be removed
process is a dataset that contains two classes of sentiments and to improve the performance of sentiment analysis [17].
is ready to be processed. Below are some examples of dataset  Tokenize and Filter Tokens by Length
that have been labelled. Tokenize is a process for separating words and resulting
tokens. Then the generated token filtered according to the
length of the character [18].
C. Feature Extraction
Feature extraction is the process of dimensionality
reduction which transforms original data to a dataset with
Fig. 2. Labeled Dataset Examples reduced number of variables. These variables are also
B. Pre-processing known as features. Feature extraction is effective in
reducing the amount of data that needs to be processed
Data that is collected from the previous step must undergo
while still maintaining relevant information of the original
several cleansing operations in order for it to be processed
dataset. Feature extraction can also reduce redundant data in
and used in machine learning. This step is also known as
a dataset and speeds up machine learning process [19].
pre-processing. Pre-processing is one of the most important
 TF-IDF
steps in data mining. A good result definitely depends on how
Term frequency inverse document frequency
well the data is handled. There are a lot of pre-processing
(TF-IDF) is a method to calculate the weight value of
techniques. There are filtering, stemming, and tokenizing
word(term) contained in a document. Term-frequency
[13]. Pre-processing in this paper is done with the help of
measuring the frequency of a term appears in a
RapidMiner software. Below are the steps that will be done to
document(1). While the inverse document frequency
filter the data.
is logarithm of the ratio of the total number of
 Remove Duplicates and Missing Values
documents in the number of corpus by the number of
Often in large amounts of data, there will exist some
documents that have the term(2). The equation of
duplicate and missing values. These values are called
TF-IDF is represented by the following equation(3)
noise and could interfere with the performance of the
[20]:
model. Therefore it is important to remove duplicate
and missing values in the data.
 Replace Emoji and Emoticon
Emoji and Emoticon are useful in expressing feelings
in a sentiment. It is a very strong way to represent
human’s feeling. Emoji and Emoticon could be used D. Feature Selection
alone or even with words to clarify the meaning of a Feature selection is one of the main data mining tasks. It
sentence. Since sentiment analysis uses text mining, helps in selecting the most relevant features for classification.
emoji and emoticons should be converted into text in The irrelevant and redundant features may confusing the
order for the machine to understand [14]. Below are classifier and lead to incorrect results. The use of feature
some examples of emoji and emoticons that will be selection will reduce the dimensionality of dataset and
converted to text. increase the learning accuracy [21].

Fig. 3. Examples of Emoji and Emoticons that will


be Converted
Retrieval Number: F7889038620/2020©BEIESP
DOI:10.35940/ijrte.F7889.038620 Published By:
Journal Website: www.ijrte.org Blue Eyes Intelligence Engineering
2864 & Sciences Publication
Sentiment Analysis on E-commerce Product Review using Machine Learning and Combination of TF-IDF and
Backward Elimination

 Backward Elimination  K-Nearest Neighbour


There are a lot of feature selection methods. In this KNN is a vector space model method for classifying
research, backward elimination is used to improve the objects. KNN classifies the object with theory a test
process by selecting the most relevant attributes and document d will have the same category or label as the
eliminating rare and unused features. The Backward category of the training document positioned in the
Elimination feature selection starts with the full set of scope of k surrounding the document d. The
attributes and in each iteration, it removes each parameter k in KNN is often chosen based on
remaining attribute of the given dataset. For each experience or knowledge of the classification problem
removed attribute, the performance is estimated using at hand [27].
the inner operators. The attribute that gives the least  Random Forest
decrease of performance will be removed from the Random forest is a supervised machine learning
selection. Then a new iteration is started with the algorithm based on ensemble learning(forming a
modified selection. This implementation avoids any powerful prediction model with joining different or
additional memory consumption besides the memory the same algorithm multiple times). So this algorithm
used originally for storing the data and the memory combines multiple decision tree algorithms and
which might be needed for applying the inner resulting a forest of trees. Random forest splits each
operators [22]. node in tree using the best among subset of predictors
randomly chosen at the node [28].
E. Data Splitting
In order to continue the process, data needs to be split into G. Evaluation
two parts. The first part is for training and the other part is for The confusion matrix evaluation is used after classifying
testing purposes. The ratio of splitting this dataset is 80% for the data. The table of confusion matrix is shown in Fig. 4.
training and 20% for testing. Training data will be used to True Positive (TP) is the number of data that labelled as
train the machine in order to classify sentiments of the positive and classified as positive by classifier. False Positive
reviews. While testing data is used to test the performance of (FP) is the number of data that labelled as positive, but
the model that has been trained. classified as negative by classifier. True Negative (TN) is the
number of data that labelled as negative and classified as
F. Machine Learning Classification
negative by classifier. False Negative (FN) is the number of
Text classification has been studied in different data that labelled as negative, but classified as positive by
communities of information technology, such as data mining, classifier.
database, machine learning, and information retrieval. The
goal of text classification is to assign predefined classes to
text documents. There are many applications of text
classification, such as image processing, medical diagnosis,
document or organization, etc [23]. There are many
algorithms for classifying data where each of it produce
different performance results. In this study, 5 algorithms are
used in order to achieve the best results.
 Support Vector Machine (SVM) Fig. 4. Confusion Matrix Table [29]
SVM is a classifier that is defined by a separating The performance of the classifier is measured with accuracy,
hyperplane. The goal of SVM is to find the optimal precision, recall, and f-measure [30].
separating hyper-plane (OSH) that has the maximal
margin to both sides [24].
 Naive Bayes
Naive Bayes is an algorithm to find the highest
probability value to classify the data testing into the
most proper category. A very strong assumption of
independence from each condition or event is the
main characteristic of Naive Bayes. Each document is
represented with a pair of attributes ”x1, x2,
...xn” where x1 is the first word, x2 is the second
word, etc [25]. IV. RESULTS AND DISCUSSION
 Decision Tree The focus of this paper is to prove that feature selection can
Decision trees are considered as one of the most be an option to improve performance accuracy in sentiment
popular data-mining techniques. Decision tree splits analysis. The feature selection that is used in this research is
recursively a dataset of records using depth-first Backward Elimination. TF-IDF and Backward Elimination
approach or breadth-first approach. The process are combined and used in the following classification
works until all data items have been classified. This operators: SVM, Naive Bayes,
algorithm is desirable for small-medium data sets Decision Tree, K-NN, and
[26]. Random Forest.

Retrieval Number: F7889038620/2020©BEIESP


DOI:10.35940/ijrte.F7889.038620 Published By:
Journal Website: www.ijrte.org Blue Eyes Intelligence Engineering
2865 & Sciences Publication
International Journal of Recent Technology and Engineering (IJRTE)
ISSN: 2277-3878 (Online), Volume-8 Issue-6, March 2020

Table-II: Confusion Matrix classifiers used in this paper. Therefore Backward


Elimination feature selection succeeded in achieving the
expectation of this research. The results of this research shows
that feature selection method can be an option to improve the
performance in sentiment analysis.

V. CONCLUSION
Table-II shows the confusion matrix that will be used to The objective of this research is to optimize sentiment
calculate performance accuracy, precision, recall and f analysis performance by using feature selection strategy.
measure. It compares the confusion matrix of sentiment Product reviews from Tokopedia, Shopee, and Bukalapak
analysis using TF-IDF, and sentiment analysis using was used as the dataset, while TF-IDF feature extraction,
combination of TF-IDF and Backward Elimination in five Backward Elimination feature selection, and SVM, Naive
different machine learning methods. Bayes, Decision Tree, K-NN, Random Forest classifiers was
Table-III: Results Comparison Without Feature used in analysing the sentiment of the reviews. The best
Selection accuracy is achieved by using TF-IDF and Backward
Elimination in SVM with a score of 85.97%, which increases
by 7.91% after applying feature selection. From the results,
Backward Elimination succeeded in improving all
performance including accuracy, precision, recall, and f
measure for all classifiers used in this research if compared to
sentiment analysis that did not use any feature selection. The
concern in using Backward Elimination feature selection is
Table-IV: Results Comparison With Feature Selection longer runtime when dataset gets bigger. Overall, it can be
concluded that feature selection technique can be used to
optimize performance of 2 class classification in sentiment
analysis on e-commerce product reviews. For future works in
this research, it is highly recommendable to use larger
datasets and to do comparison with other feature selection
methods.

ACKNOWLEDGMENT
The authors would like take this opportunity to express
their deepest gratitude to all those who have helped in
completing this study, especially to Bina Nusantara
University for supporting this research project.

REFERENCES
1. N. Kristiadi, "E-Commerce, Manfaat, dan Keuntungannya," 15 August
2017. [Online]. Available:
https://www.kompasiana.com/novikristiadi/5992634e93be2508e06c5
Fig 5. Results Comparison 402/e-commerce-manfaat-dan-keuntungannya. [Accessed 13
Table-III and Table-IV shows the results comparison of November 2019].
performance accuracy, precision, recall, and f measure 2. Aseanup, "Top 10 e-commerce sites in Indonesia 2019," 6 November
2019. [Online]. Available:
between sentiment analysis using TF-IDF and sentiment
https://aseanup.com/top-e-commerce-sites-indonesia/. [Accessed 13
analysis using combination of TF-IDF and Backward November 2019].
Elimination in five different machine learning methods. 3. R. Liang and J.-q. Wang, "A Linguistic Intuitionistic Cloud Decision
The highest accuracy for classifying sentiments in this Support Model with Sentiment Analysis for Product Selection in
E-commerce," International Journal of Fuzzy System, 2019.
research is 85.97%. It is achieved by using SVM and 4. T. U. Haque, N. N. Saber and F. M. Shah, "Sentiment Analysis on
combination of TF-IDF and Backward Elimination. Although Large Scale Amazon Product Reviews," IEEE International
Backward Elimination feature selection increases the process Conference on Innovative Research and Development, 2018.
5. R. Safrin, K. Sharmila, T. Subangi and E. Vimal, "Sentiment Analysis
runtime, it has shown better results in performance accuracy, on Online Product Review," International Research Journal of
precision, recall, and f measure for all classifiers used in this Engineering and Technology(IRJET), vol. 4, no. 4, pp. 2381-2388,
paper. Therefore Backward Elimination feature selection 2017.
6. A. N. Kamilah, "Analisa Sentimen Pelanggan Tokopedia
succeeded in achieving the expectation of this research. Menggunakan Algoritma Naive Bayes Berdasarkan Review
The highest accuracy for classifying sentiments in this Pelanggan," Simki-Techsain, vol. 1, no. 6, pp. 1-13, 2017.
research is 85.97%. It is achieved by using SVM and 7. H. L. Adi, E. Sediyono and R. Kusumaningrum, "Support Vector
combination of TF-IDF and Backward Elimination. Machine Classifier for Sentiment Analysis of Feedback Marketplace
with a Comparison Features at
Although Backward Elimination feature selection increases Aspect Level".
the process runtime, it has shown better results in
performance accuracy, precision, recall, and f measure for all
Retrieval Number: F7889038620/2020©BEIESP
DOI:10.35940/ijrte.F7889.038620 Published By:
Journal Website: www.ijrte.org Blue Eyes Intelligence Engineering
2866 & Sciences Publication
Sentiment Analysis on E-commerce Product Review using Machine Learning and Combination of TF-IDF and
Backward Elimination

8. B. Gunawan, H. S. Pratiwi and E. E. Pratama, "Sistem Analisis https://towardsdatascience.com/understanding-confusion-matrix-a9ad


Sentimen pada Ulasan Produk Menggunakan Metode Naive Bayes," 42dcfd62. [Accessed 5 February 2020].
Jurnal Edukasi dan Penelitian Informatika(JEPIN), vol. 4, no. 2, pp. 30. P. Dellia and A. Tjahyanto, "Tax Complaints Classification on Twitter
113-118, 2018. Using Text Mining," Journal of Science, vol. 2, no. 1, pp. 11-15, 2017.
9. M. D. P. Artanti, A. Syukur, A. Prihandono and D. R. I. M. Setiadi,
"Analisa Sentimen untuk Penilaian Pelayanan Situs Belanja Online
AUTHORS PROFILE
Menggunakan Algoritma Naïve Bayes," in Konferensi Nasional
Sistem Informasi 2018, Pangkalpinang, 2018.
10. K. Sudheer and B. Valarmathi, "Real Time Sentiment Analysis of Tommy Willianto is a graduate student in Computer
E-Commerce Websites Using Machine Learning Algorithms," Science Department, Bina Nusantara University, BINUS
International Journal of Mechanical Engineering and Graduate Program-Master in Computer Science, Jakarta,
Technology(IJMET), vol. 9, no. 2, pp. 180-193, 2018. 11480, Indonesia. From September 2017 – December
11. Y. Hedge and S. Padma, "Sentiment Analysis Using Random Forest 2018, he worked in a software house as a software
Ensemble for Mobile Product Reviews in Kannada," IEEE 7th engineer. In March 2019 – August 2019, he had his internship in a
International Advance Computing Conference, pp. 777-782, 2017. technology based startup as a backend engineer. He starts having interest in
12. M. R. Herga, "Implementasi Text Mining Sistem Klasifikasi dan research since early 2019. Afterwards, he decided to change his career path
Pencarian Konten Buku Perpustakaan Menggunakan Algoritma Naïve from an engineer to a researcher. Currently he chooses to focus in the field of
Bayes Classifier". data mining on his research. Other research topics that he is interest in are
13. D. Virmani and S. Taneja, "A Text Preprocessing Approach for big data analytics, business intelligence, internet of things and optimization.
Efficacious Information Retrieval," Smart Innovations in
Communication and Computational Sciences, Advances in Intelligent Supryadi is a graduate student in Computer Science
Systems and Computing 669, pp. 13-22, 2019. Department, Bina Nusantara University, BINUS
14. M. A. Sghaier and M. Zrigui, "Sentiment Analysis for Arabic Graduate Program-Master in Computer Science, Jakarta,
E-commerce Websites," in 2016 International Conference on 11480, Indonesia. In 2019, he had his 6 month internship
Engineering & MIS (ICEMIS), Agadir, Morocco, 2016. in software and IT company with a position as a system
15. O. Somantri, "Text Mining untuk Klasifikasi Kateogori Cerita Pendek analyst. His research interest in text mining, data
Menggunakan Naïve Bayes (NB)," Jurnal Telematika, vol. 12, no. 1, warehouse and machine learning.
2017.
16. V. Kalra and R. Aggarwal, "Importance of Text Data Preprocessing & Antoni Wibowo has received his first degree of Applied
Implementation in RapidMiner," in First International Conference on Mathematics in 1995 and a master degree of Computer
Information Technology and Knowledge Management (ICTKM), New Science in 2000. In 2003, He was awarded a Japanese
Delhi, 2018. Government Scholarship (Monbukagakusho) to attend
17. Y. T. Pratama, F. A. Bachtiar and N. Y. Setiawan, "Analisis Sentimen Master and PhD programs at Systems and Information
Opini Pelanggan Terhadap Aspek Pariwisata Pantai Malang Selatan Engineering in University of Tsukuba-Japan. He
Menggunakan TF-IDF dan Support Vector Machine," Jurnal completed the second master degree in 2006 and PhD degree in 2009,
Pengembangan Teknologi Informasi dan Ilmu Komputer, vol. 2, no. respectively. His PhD research focused on machine learning, operations
12, pp. 6244-6252, 2018. research, multivariate statistical analysis and mathematical programming,
18. S. Fatima and B. Srinivasu, "Text Document Categorization Using especially in developing nonlinear robust regressions using statistical
Support Vector Machine," International Research Journal of learning theory. He has worked from 1997 to 2010 as a researcher in the
Engineering and Technology(IRJET), vol. 4, no. 2, pp. 141-147, 2017. Agency for the Assessment and Application of Technology – Indonesia.
19. "Feature Extraction," DeepAI, [Online]. Available: From April 2010 – September 2014, he worked as a senior lecturer in the
https://deepai.org/machine-learning-glossary-and-terms/feature-extrac Department of Computer Science - Faculty of Computing, and a researcher
tion. [Accessed 8 February 2020]. in the Operation Business Intelligence (OBI) Research Group, Universiti
20. B. Kurniawan, S. Effendi and O. S. Sitompul, "Klasifikasi Konten Teknologi Malaysia (UTM) – Malaysia. From October 2014 – October
Berita dengan Metode Text Mining," Jurnal Dunia Teknologi 2016, he was an Associate Professor at Department of Decision Sciences,
Informasi, vol. 1, no. 1, pp. 14-19, 2012 School of Quantitative Sciences in Universiti Utara Malaysia (UUM). Dr.
21. P. Kumbhar and M. Mali, "A Survey on Feature Selection Techniques Eng. Wibowo is currently working at Binus Graduate Program (Master in
and Classification Algorithms for Efficient Text Classification," Computer Science) in Bina Nusantara University-Indonesia as a Specialist
International Journal of Science and Research(IJSR), vol. 5, no. 5, pp. Lecturer and continues his research activities in machine learning,
1267-1275, 2016. optimization, operations research, multivariate data analysis, data mining,
22. "Backward Elimination(RapidMiner Studio Core)," [Online]. computational intelligence and artificial intelligence.
Available:
https://docs.rapidminer.com/latest/studio/operators/modeling/optimiz
ation/feature_selection/optimize_selection_backward.html. [Accessed
8 February 2020].
23. M. Allahyari, S. Pouriyeh, M. Assefi, S. Safaei, E. D. Trippe, J. B.
Gutierrez and K. Kochut, "A Brief Survey of Text Mining:
Classification, Clustering, and Extraction Techniques," arXiv, 2017.
24. A. A. Lutfi, A. E. Permanasari and S. Fauziati, "Sentiment Analysis in
the Sales Review of Indonesian Marketplace by Utilizing Support
Vector Machine," Journal of Information Systems Engineering and
Business Intelligence, vol. 4, no. 1, pp. 57-64, 2018.
25. M. N. Saadah, R. W. Atmagi, D. S. Rahayu and A. Z. Arifin,
"Information Retrieval of Text Document with Weighting TF-IDF and
LCS," Journal of Computer Science and Information, vol. 6, no. 1, p.
34, 2013.
26. K. M. Almunirawi and A. Y. Maghari, "A Comparative Study on
Serial Decision Tree Classification Algorithms in Text Mining,"
International Journal of Intelligent Computing Research(IJICR), vol. 7,
no. 4, pp. 754-760, 2016.
27. A. Sukma, B. Zaman and E. Purwanti, "Information Retrieval
Document Classified with K-Nearest Neighbor," Record and Library
Journal, vol. 1, no. 2, pp. 129-138, 2015.
28. Bahrawi, "Sentiment Analysis Using Random Forest Algorithm -
Online Social Media Based," Journal of Information Technology and
Its Utilization, vol. 2, no. 2, pp. 29-33, 2019.
29. S. Narkheda, "Understanding Confusion Matrix," 9 May 2018.
[Online]. Available:

Retrieval Number: F7889038620/2020©BEIESP


DOI:10.35940/ijrte.F7889.038620 Published By:
Journal Website: www.ijrte.org Blue Eyes Intelligence Engineering
2867 & Sciences Publication

You might also like