0% found this document useful (0 votes)
988 views

Proceedings Iccsai Final 2021 Compressed

Uploaded by

DIANA
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
988 views

Proceedings Iccsai Final 2021 Compressed

Uploaded by

DIANA
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 484

2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Proceedings of
2021 1st International Conference
on Computer Science and Artificial Intelligence
(ICCSAI)

Date and Venue:

28 October 2021

Bina Nusantara University, Jakarta - Indonesia

Organized by

Supported by

ISBN: 978-1-7281-3333-1 | IEEE Part Number: CFP19H83-ART

I
2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)
Copyright © 2021 by the Institute of Electrical and Electronics Engineers, Inc. All rights reserved.

Copyright and Reprint Permission: Abstracting is permitted with credit to the source. Libraries are permitted to
photocopy beyond the limit of U.S. copyright law for the private use of patrons those articles in this volume that
carry a code at the bottom of the first page, provided the per-copy fee indicated in the code is paid through
Copyright Clearance Center. For reprint or republication permission, email to IEEE Copyrights Manager at pubs -
[email protected]. All right reserved. Copyright ©2021 by IEEE.

IEEE Catalog Number


CFP19H83-ART

ISBN
978-1-7281-3333-1

Additional copies of this publication are available from


Curran Associates, Inc.
57 Morehouse Lane
Red Hook, NY 12571
USA
Phone: +1 (845) 758-0400
Fax: +1 (845) 758-2633
E-mail: [email protected]

II
2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Greetings!

Honorable Participants/Researchers/Delegates/Professors
Distinguish Guests
Ladies and gentlemen,

The research and development of computer science, artificial intelligence, and information systems grows rapidly
nowadays. Engineers, researchers, and scientists need a media to share their knowledge, idea, and research to
expand the collaboration and networking. 2021 1 st International Conference on Computer Science and Artificial
Intelligence (ICCSAI) is an international forum for engineer, researchers, and scientist to present their knowledge
of technological advance and research in the field of Computer Science, Artificial Intelligence, and Information
Systems.

We have great participants and achievement in our first event of 2021 1 st ICCSAI, with a total of about 200 papers
submitted, and the acceptance rate was 40%, i.e. there were a total of 81 papers accepted. We would like to thank
to all participants, keynote speakers, committees, and reviewers for contributing to the conference program and
proceeding. We would like to express our appreciation to the reviewers and suggestions. We also would like to
thank to IEEE, IEEE Indonesia Section and IEEE CIS Indonesia Chapter for supporting our conference.

Best regards,

Prof. Dr. Ir. Widodo Budiharto, S.Si., M.Kom., IPM., SMIEEE


General chair of 2021 1st ICCSAI

III
2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Rector Bina Nusantara University

WELCOMING REMARKS

Distinguished keynote speakers


Fellow professors and presenters,
Ladies and gentlemen

It is a great honor for me to welcome you to the 1st International Conference on Computer Science and Artificial
Intelligence (ICCSAI) .
This conference is part of continuing efforts in producing, deliberating, and disseminating knowledge as well as
creating research partnerships between faculty members, distinguished scholars, entrepreneurs, industry leaders
and experts from many universities, research think-tanks, and companies in the world. Therefore, an international
conference that focuses on creating the future by improvement and advancement of informa tion systems and
aims to encourage fostering digital transformation of society is essential to make university stay relevant to the
needs of the modern societies and betterment of living standard of the people. Ladies and gentlemen, I would
like to express my highest appreciation to Higher Education Service on Region III, along with several universities
as co-hosts, such as
o Universitas Tarumanagara
o Universitas Gunadarma
o Universitas Pancasila
o Universitas Mercubuana
o Universitas Esa Unggul
o STMIK Jakarta STI&K
o Universitas Budi Luhur
and all invited keynote speakers, invited plenary session speakers and all presenters and participants who will
make this conference meaningful. I strongly advice to make use of this conference wisely, not limited to
discussing about research but also actively trying to build connections for a new joint research, publication,
faculty exchanges and so on. Finally, I also thank all the chairpersons and committee members of the conference.
I wish all of you great conference and make new acquaintances during the conferences virtually.

Thank you very much.


Jakarta, 24 October 2021

Prof. Dr. Ir. Harjanto Prabowo, MM


Head of Computer Science Consortium

IV
2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Head Representative of LLDIKTI III Jakarta

WELCOMING REMARKS

Praise and gratitude to the Almighty God for the grace and guidance. Let me express, on behalf of the Agency
for Higher Education Service on Region III, my warm welcome. You have kindly come all the way to this great
event, “The 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI 2021)”
The Honorable:
Advisory Commitee
o Prof. Dr. Ir. Harjanto Prabowo, MM, Bina Nusantara University, Indonesia
o Prof. Tirta N. Mursitama, S.Sos., M.M., PhD, Bina Nusantara University, Indonesia
General Chair
o Prof. Dr. Ir. Widodo Budiharto, S.Si., M. Kom., IPM, Bina Nusantara University, Indonesia
Vice Chair
o Yaya Sudarya Triana, M.Kom., Ph.D (Mercu Buana University)
Herewith I express my appreciation. It is my honor to give an opening remark in order to scale up and advance
the publications in Computer Science, Information Systems, Computer Engineering and Information Technology.

Distinguished guests, colleagues, ladies and gentlemen,


A Technologist and Head of a leading artificial intelligence software company, Thomas M. Siebel stated that there
are 4 technological forces that could change our lifestyle, behaviour, and activities in 21st century such as; cloud
computing, big data, artificial intelligence, and the internet of things. All of them have changed the way of thinking,
activities, business and organization models, and government designs in the new form of digital landscape and
generate what was previously impossible becomes reality.
We have been experiencing the ‘tsunami’ of digital transformation from the advertising industry, media, and e -
commerce that boost the spectrum of investment and digital transformation globally.
Digital transformation is also taking part in various sectors, especially in facing the era of 5.0 society. There are
numerous changes and challenges that we have to deal with to prepare superior human resources. This era also
allows the people to anticipate an industrial revolution 4.0’s wave that full of disruption.
The rapid growing of technology and digital transformation are closely related to the fields of Computer Science,
Information Systems, Computer Engineering, and Information Technology. This progress has a tremendous
impact on the progress of human civilization. The jobs that were previously done by humans are presently
replaced by automatic machines. The discovery of various new formulations of computer capacity has also
provided substantial convenience and comfortness for human life.

V
2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Ladies and Gentlemen,


This event is a form of synergy of government, local and international Universities to improve the quality of
publications, both quantity and quality, as well as creating a culture of scientific paper writings which will ultimately
lead to the massive dissemination of knowledge, both through seminars and publications.
On behalf of Agency for Higher Education Service on Region III, I would like to thank to Bina Nusantara University
as a host, along with several universities as co-hosts, such as:
o Universitas Tarumanagara
o Universitas Gunadarma
o Universitas Pancasila
o Universitas Mercubuana
o Universitas Esa Unggul
o STMIK Jakarta STI&K
o Universitas Budi Luhur

In closing, this consortium provides a valuable opportunity for lecturers, research scientists, and industry
specialists to share experiences. I am so grateful to the many experts who attend to share their knowledge in this
consortium. I am sure you will have fruitful and rewarding exchanges ahead. I wish you every success with this
event.

Thank you.
Wassalamu ‘alaikum Wr. Wb.

Prof. Dr. Agus Setyo Budi, M.Sc


Head Representative of LLDIKTI III Jakarta

VI
2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

CONFERENCE COMMITTEE

Advisory Committee
o Prof. Dr. Ir. Harjanto Prabowo, MM, Bina Nusantara University, Indonesia
o Prof. Tirta N. Mursitama, S.Sos., M.M., PhD, Bina Nusantara University, Indonesia
o Dr. -Ing. Wahyudi Hasbi, S.Si, M.Kom, IEEE Indonesia Section
o Prof. Teddy Mantoro , Ph.D., SMEEE, Chairman IEEE CIS Indonesia Chapter
General Chair
o Prof. Dr. Ir. Widodo Budiharto, S.Si., M. Kom., IPM, Bina Nusantara University, Indonesia
Vice Chair
o Dr. Yaya Sudarya Triana, M.Kom., Universitas Mercubuana, Indonesia
Secretary
o Dr. Maria Susan Anggreainy, S.Kom., M.Kom, Bina Nusantara University, Indonesia
o Dr. Dina Fitria Murad, S.Kom., M.Kom, Bina Nusantara University, Indonesia
Treasurer
o Ajeng Wulandari, S.Kom, M.Kom., Bina Nusantara University, Indonesia
Publication Section
o Dr. Ir. Alexander Agung S. Gunawan, M.Sc., IPM, Bina Nusantara University, Indonesia
o Dr. Evaritus Didik Madyatmadja, ST., M.Kom, M.T, Bina Nusantara University, Indonesia
o Noerlina, S.Kom., M.MSI, Bina Nusantara University, Indonesia
Track Directors and teams
o Dr. Ir. Edy Irwansyah, IPM, Bina Nusantara University, Indonesia
o Lina, S.T., M. Kom., Ph.D, Universitas Tarumanagara, Indonesia
o Dr. Ionia Veritawati, Universitas Pancasila, Indonesia
o Prof. Dr. Eri Prasetyo Wibowo, Universitas Gunadarma, Indonesia
o Dr. Dewi A. R, S.Kom., M.Sc, Universitas Gunadarma, Indonesia
o Habibullah Akbar, S.Si, M.Sc, Ph.D, Universitas Esa Unggul, Indonesia
Event Section
o Dr. Puji Rahayu, Universitas Mercubuana, Jakarta
Registration Section
o Dr. Sunny Arief Sudiro, S.Kom., MM., STMIK Jakarta STI&K, Indonesia
o Desti Fitriati, S.Kom., M.Kom., Universitas Pancasila, Indonesia
Graphic Designer
o Dr. Bambang, SSi., M.Kom, Bina Nusantara University, Indonesia

VII
2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Keynote Speaker 1

Adaptive central pattern generators to control human/robot interactions

Patrick Hénaff, LORIA UMR 7503, Université de Lorraine – CNRS, Nancy, FRANCE
[email protected]

Abstract
The presentation will concern the use of bio-inspired robot controllers based on the functioning of specific
biological sensorimotor loops that control biological systems. These loops are based on specific neural network
structures, called central pattern generators (CPG) that are implied in the genesis and learning of adaptive
rhythmic movements. Therefore, it is interesting to better understanding and modeling these structures to have
humanoid robots able to learn rhythmic movements for locomotion or for interacting with humans. After a brief
introduction on biological central pattern generators and the rhythmic movements, we will introduce the concept
of synchronization a principle that underlies the rhythmic interaction between humans and the dynamic oscillators.
Different models of central pattern generators based on dynamic oscillators will be introduced. The second part
of the presentation will present several experiments of vision-based Human-Robot motor coordination using
adaptive central pattern generators. Other experiments of robot teleoperation for industrial rhythmic tasks will be
introduced. Several videos of simulations and experiments will illustrate the presentation. A conclusion and
perspectives will conclude the talk.
Keywords: Humanoid robotics, Neural control, Central Pattern Generator (CPG), sensorimotor coordination,
Human/robot interactions, locomotion

Dr. Patrick Henaff is a full-time professor within the School of Engineers "Mines Nancy" at the University of
Lorraine, in France. He is the head of the research department, Complex Systems, Artificial Intelligence and
Robotics, at LORIA, an applied Computer Science laboratory. His research interests lie in the bio -inspired control
of humanoids robots. Dr. Henaff earned his Master's in electronics at the University of Rennes, France, and
completed his PhD in Robotics at the University Paris VI. He joined "Mines Nancy" and University of Lorraine in
2013. His passion lies in studying artificial intelligence, interactive robotics and neural control. He participated to
several robotic projects especially for legged locomotion and control of rhythmic movements. He is a regular
reviewer for international journals (IEEE TRO, Frontiers in neuro-robotics, IJARS, JAR, neuro-computing and
conferences (ICRA, IROS, IJCNN, AIM).

VIII
2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Keynote Speaker 2

Modelling personality prediction from user’s posting on social media

Derwin Suhartono, Head of Computer Science Department Bina Nusantara University


[email protected]
Abstract
Huge amount of user’s postings from social media becomes promising data that can be converted into new
knowledge. One of which is to mining the information for predicting user’s personality. This task is able to get
the real basic characteristics of people which nowadays surfs a lot in social media. Text becomes appropriate
type of data to utilize as social media users tend to do texting for expressing their feelings, thoughts, as well as
their emotions. The Big Five Personality Traits, also known as OCEAN, is one concept in psychology that is popular
in the state-of-the-art research in personality prediction. Research in personality modelling using text involve
feature extraction methods as well as deep learning-related architecture are appealing to be much further
enhanced. Finally, promising research result is indicated to happen in the future such that actual personality of a
person is possible to observe.
Keywords: real basic characteristics, Big Five Personality Traits, personality modelling, fea ture extraction, deep
learning.

Derwin Suhartono is a faculty member of Bina Nusantara University, Jakarta, Indonesia. He currently also serves
as Head of Computer Science Department. He got his PhD in computer science from Universitas Indonesia in
2018. His research fields are natural language processing and machine learning. Recently, he is continually
investigating argumentation mining, personality recognition and hoax analysis. He actively involves in the
Indonesia Association of Computational Linguistics (INACL), and Indonesian Computer Electronics and
Instrumentation Support Society (IndoCEISS). His professional memberships are ACM, INSTICC, IACT, IEEE, and
many others. He also serves as reviewer in many international conferences and many reputable international
journals such as IEEE Access, IJCIS, MDPI journals, etc.

IX
2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

2021 1st International Conference on Computer Science and Artificial Intelligence


(ICCSAI)
ISBN: 978-1-7281-3333-1 | IEEE Part Number: CFP19H83-ART

Table of Contents
Title Page Range
Web Based Application for Ordering Food Raw Materials 1-4
Comparison of Gaussian Hidden Markov Model and Convolutional Neural Network in Sign 5-10
Language Recognition System

Intelligent Computational Model for Early Heart Disease Prediction using Logistic 11-16
Regression and Stochastic Gradient Descent (A Preliminary Study)

Line Follower Smart Trolley System V2 using RFID 17-21


An Efficient System to Collect Data for AI Training on Multi-Category Object Counting Task 22-26
The Influence of UI UX Design to Number of Users Between ‘Line’ and ‘Whatsapp’ 27-31
A Comparison of Artificial Intelligence-Based Methods in Traffic Prediction 32-36
Impact of Computer Vision With Deep Learning Approach in Medical Imaging Diagnosis 37-41
Finetunning IndoBERT to Understand Indonesian Stock Trader Slang Language 42-46
Development of Portable Temperature and Air Quality Detector for Preventing Covid -19 47-50
Development of Robot to Clean Garbage in River Streams with Deep Learning 51-55
Effectiveness of LMS in Online Learning by Analyzing Its Usability and Features 56-61
Health Chatbot Using Natural Language Processing for Disease Prediction and Treatment 62-67
Sentiment Analysis of E-commerce Review using Lexicon Sentiment Method 68-71
Coronary Artery Disease Prediction Model using CART and SVM: A Comparative Study 72-75
Identify High-Priority Barriers to Effective Digital Transformation in Higher Education: A 76-80
Case Study at Private University in Indonesia

A Comparison of Lexicon-based and Transformer-based Sentiment Analysis on Code-mixed 81-85


of Low-Resource Languages

Estimation of Technology Acceptance Model (TAM) on the Adoption of Technology in the 86-91
Learning Process Using Structural Equation Modeling (SEM) with Bayesian Approach

Predicting Stock Market Prices using Time Series SARIMA 92-99


Sentiment Analysis using SVM and Naïve Bayes Classifiers on Restaurant Review Dataset 100-108
Developing An Automated Face Mask Detection Using Computer Vision and Artificial 109-114
Intelligence

Blockchain Technology behind Cryptocurrency and Bitcoin for Commercial Transactions 115-119
The Effect of UI/UX Design on User Satisfaction in Online Art Gallery 120-125
Covid-19 Vaccine Tweets - Sentiment Analysis 126-129

X
2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

2021 1st International Conference on Computer Science and Artificial Intelligence


(ICCSAI)
ISBN: 978-1-7281-3333-1 | IEEE Part Number: CFP19H83-ART

Table of Contents
Title Page Range
Image Data Encryption Using DES Method 130-135
Systematic Literature Review: An Intelligent Pulmonary TB Detection from Chest X-Rays 136-141

Design of Cadets Administration System for Nusantara Cilacap Maritime Academy Based 142-147
On Website

Implementation of Face Recognition Method for Attendance in Class 148-153


Comparative of Advanced Sorting Algorithms (Quick Sort, Heap Sort, Merge Sort, Intro 154-160
Sort, Radix Sort) Based on Time and Memory Usage
Factors that Affect Data Gathered Using Interviews for Requirements Gathering 161-166
The Impact of E-Transport Platforms’ Gojek and Grab UI/UX Design to User Preference in 167-177
Indonesia
Compare the Path Finding Algorithms that are Applied for Route Searching in Maps 178-183
A Systematic Literature Review of Fintech Investment and Relationship with Bank in 184-189
Developed Countries.
Enhancement Design for Smart Parking System Using IoT and A-Star Algorithm 190-195
E-Learning Service Issues and Challenges: An Exploratory Study 196-201
Smart Electricity Meter as An Advisor for Office Power Consumption 202-206
Building Natural Language Understanding System from User Manual to Execute Office 207-212
Application Functions
Aspect Based Sentiment Analysis: Restaurant Online Review Platform in Indonesia with 213-218
Unsupervised Scraped Corpus in Indonesian Language
A Review of Signature Recognition Using Machine Learning 219-223
Student Performance Based on Student Final Exam Prediction 224-229

Development of Smart Restaurant Application for Dine-In 230-235

Utilization Big Data and GPS to Help E-TLE System in The Cities of Indonesia 236-242

Expert System to Predict Acute Inflammation of Urinary Bladder and Nephritis Using Naïve 243-248
Bayes Method
The Search for the Best Real-Time Face Recognition Method for Finding Potential COVID 249-252
Patients
Waste Classification Using EfficientNet-B0 253-257

XI
2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

2021 1st International Conference on Computer Science and Artificial Intelligence


(ICCSAI)
ISBN: 978-1-7281-3333-1 | IEEE Part Number: CFP19H83-ART

Table of Contents
Title Page Range
A Survey: Crowds Detection Method on Public Transportation 258-262
Performance Analysis Between Cloud Storage and NAS to Improve Company’s 263-268
Performance: A Literature Review

Usability Evaluation of Learning Management System 269-272

Self-Checkout System Using RFID (Radio Frequency Identification) Technology: A Survey 273-277
Effective Methods for Fake News Detection: A Systematic Literature Review 278-283
Determining the best Delivery Service in Jakarta using Tsukamoto Fuzzy Algorithm 284-288
RTR AR PHOTO BOOTH: THE REAL-TIME RENDERING AUGMENTED REALITY PHOTO BOOTH 289-294
A Systematic Literature Review: Database Optimization Techniques 295-300
Study on Face Recognition Techniques 301-306
Big Data For Smart City: An Advance Analytical Review 307-312
Analysis of Big Data in Healthcare Using Decision Tree Algorithm 313-317
Detrimental Factors of the Development of Smart City and Digital City 318-323
Application of Internet of Things in Smart City: A Systematic Literature Review 324-328
Smart Tourism Services: A Systematic Literature Review 329-333
Indonesia China Trade Relations, Social Media and Sentiment Analysis: Insight from Text 334-339
Mining Technique
Sinophobia in Indonesia and Its Impact on Indonesia-China Economic Cooperation with the 340-345
SVM (Support Vector Machine) Approach

Towards Classification of Personality Prediction Model: A Combination of BERT Word 346-350


Embedding and MLSMOTE

Level of Password Vulnerability 351-354

Cultural Tourism Technology Used and Themes: A Literature Review 355-360


IoT Sensors Integration for Water Quality Analysis 361-366
Street View Object Detection for Autonomous Car Steering Angle Prediction Using 367-372
Convolutional Neural Network
Extract Transform Loading (ETL) Based Data Quality for Data Warehouse Development 373-378
Spread of COVID-19 Deaths in Jakarta: Cluster and Regression Analysis 379-384
Indonesian Banking Stock Price Prediction with LSTM and Random Walk Method 385-390

XII
2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

2021 1st International Conference on Computer Science and Artificial Intelligence


(ICCSAI)
ISBN: 978-1-7281-3333-1 | IEEE Part Number: CFP19H83-ART

Table of Contents
Title Page Range
Exploration of React Native Framework in designing a Rule-Based Application for healthy 391-394
lifestyle education
Design of Water Information Management System in Palm Oil Plantation 395-399

A Hydrodynamic Analysis of Water System in Dadahup Swamp Irrigation Area 400-406

Spatiotemporal Features Learning from Song for Emotions Recognition with Time 407-412
Distributed CNN
AR-Mart: The Implementation of Augmented Reality as a Smart Self-Service Cashier in the 413-417
Pandemic Era
Immersive Experience with Non-Player Characters Dynamic Dialogue 418-421
Explainable Supervised Method for Genetics Ancestry Estimation 422-426
Memorize COVID-19 Advertisement: Customer Neuroscience Data Collection Techniques 427-430
by Using EEG and fMRI
Development of Stock Market Price Application to Predict Purchase and Sales Decisions 431-437
Using Proximal Policy Optimization Method
Exploiting Facial Action Unit in Video for Recognizing Depression using Metaheuristic and 438-443
Neural Networks
Review Literature Performance : Quality of Service from Internet of Things for 444-450
Transportation System
Auto-Tracking Camera System for Remote Learning Using Face Detection and Hand 451-457
Gesture Recognition Based on Convolutional Neural Network
Exploration of React Native Framework in designing a Rule-Based Application for healthy 391-394
lifestyle education
Design of Water Information Management System in Palm Oil Plantation 395-399
A Hydrodynamic Analysis of Water System in Dadahup Swamp Irrigation Area 400-406
Spatiotemporal Features Learning from Song for Emotions Recognition with Time 407-412
Distributed CNN

AR-Mart: The Implementation of Augmented Reality as a Smart Self-Service Cashier in the 413-417
Pandemic Era

Immersive Experience with Non-Player Characters Dynamic Dialogue 418-421

Explainable Supervised Method for Genetics Ancestry Estimation 422-426

XIII
2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

2021 1st International Conference on Computer Science and Artificial Intelligence


(ICCSAI)
ISBN: 978-1-7281-3333-1 | IEEE Part Number: CFP19H83-ART

Table of Contents
Title Page Range
Memorize COVID-19 Advertisement: Customer Neuroscience Data Collection Techniques 427-430
by Using EEG and fMRI
Development of Stock Market Price Application to Predict Purchase and Sales Decisions 431-437
Using Proximal Policy Optimization Method

Exploiting Facial Action Unit in Video for Recognizing Depression using Metaheuristic and 438-443
Neural Networks

Review Literature Performance : Quality of Service from Internet of Things for 444-450
Transportation System
Auto-Tracking Camera System for Remote Learning Using Face Detection and Hand 451-457
Gesture Recognition Based on Convolutional Neural Network

XIV
2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Web Based Application for Ordering Food Raw


Materials
Rita Layona Budi Yulianto Yovita Tunardi
Computer Science Department, School Computer Science Department, School Computer Science Department, School
of Computer Science of Computer Science of Computer Science
Bina Nusantara University Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia Jakarta, Indonesia Jakarta, Indonesia
[email protected] [email protected] [email protected]

Abstract— The current declining economic situation in


Indonesia is caused by the Covid-19 pandemic. Housewives must
continue to shop for food raw materials and at the same time
must protect themselves from the interaction of people around
to avoid transmission of Covid-19. This research is to provide a
web-based application for the community, especially housewives
and family members, to shop for food raw materials that are
sold from merchants in local location. The purchase and
delivery of goods will be carried out by a courier from the local
government. The research method used is mixed method
through quantitative, and development method used is
Waterfall. The results obtained from this study are that
housewives are helped to shop for food raw materials safely and
easily through this application.

Keywords— food, raw materials, food delivery, food shopping

I. INTRODUCTION
The current declining economic situation in Indonesia is
caused by the Covid-19 pandemic. Some modern shopping
places and companies have stopped their activities and some
of them have closed due to loss. Places to shop that are trying
Fig. 1. Large Restaurants Start Selling Their Products on the Roadside [1]
to keep operating are small roadside shops, unmodern
markets, or home stores. Therefore, large restaurants start
selling their products on the roadside (Fig. 1) in order to carry
out their business [1] [2]. The companies with economic II. PREVIOUS WORKS
difficulties began to lay off or terminate employment
contracts. This unemployment causes people to save their Research conducted by Hassan states that shopping
money or hold back their consumption. applications need to pay attention of funding factor and its
strategies [3]. The application needs to convince the seller
In addition, people take care of themselves by following
about the opportunities that exist and provide a sense of
health protocols by not visiting physical stores. People are
starting to use online courier services to help them shop. This security and convenience for buyers. Jielin added that for the
effect is also felt by housewives. They must continue to shop application to work properly, good funding from investors,
for food raw materials and at the same time must protect the government, or the transaction is needed [4]. Maturity in
themselves from the interaction of people around to avoid cash flow will guarantee the viability of the application. In
transmission of Covid-19. 2014, Rendra developed a shopping application for food raw
materials [5]. The application can bridge farmers and
Based on the obstacles that occur, this is to provide a
consumers directly but has not yet accommodated a
web-based application for the community, especially
centralized goods delivery system.
housewives and family members, to shop the food raw
Liemantara and Sofiani's conducted research to develop
materials from merchants located in their village or sub-
a shopping application for vegetables, where the merchants
district. In addition, this application also allows the
will deliver their vegetables to buyers [6] [7]. Soyusiawaty
surrounding community to sell food raw materials in their
also developed an application that can educate housewives to
villages or sub-districts. This research aims to help
sell online [8]. However, the applications developed by the
preventing Covid-19 transmission while maintaining
three researchers have not been implemented optimally
economic growth among the people during this pandemic or
during the Covid-19 pandemic because they did not include a
after it.
centralized delivery system to avoid the Covid-19
transmission.
In 2020, Maheswara developed a mobile-based
application that allows farmers to sell food raw materials

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

1 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

directly to consumers [9]. Savitri developed a similar b. Deployment


application but in a web platform [10]. The application that is Application is deployed (installed) into server and tried
developed by the two researchers uses a manual sales method by respondents to provide simulation.
that is done outside the application (the buyer directly contacts
the farmer) and does not cover the delivery system needed in
the current Covid-19 pandemic. V. PROPOSED SYSTEM
The text on the application uses Indonesian language
III. RESEARCH METHOD because it is focused for local residents with limited English
skill. Users can access this application through the Register
This study uses mixed method through a quantitative and Login page (Fig. 2).
research method and IS development method. In quantitative
method, data collection was carried out by distributing
questionnaires to 90 housewives in Cipondoh District, Banten
Province, Indonesia. This is to gather requirement data such
as how to shop easily and safely, how to pay, and how to ship
goods. The results obtained showed that 91.1% (82)
housewives prefer to shop online, 70% (63) want a very
simple and easy-to-use application, 78.8% (71) are willing to
order 1 day in advance, 83.3% (75) want a single courier who
delivers all the goods, 92.2% (83) want COD (cash on
delivery) payment method, 64.4% (58) want to use digital
money to pay to the courier to avoid physical contact, and
86.6% (78) are happy and willing to take part as a seller in the
system (if possible) to sell the food raw materials. Based on
the data analysis that has been done, this research will use the
Waterfall method (by Roger S. Pressman) in providing the
application and an evaluation will be carried out to check
whether it has met user expectations. The Waterfall stages are
presented in the Software Development Method chapter. At
the evaluation stage, the user (in this case the housewife) will
use this application and conduct an evaluation through a
questionnaire.

IV. SOFTWARE DEVELOPMENT METHOD


Waterfall is used as development method for this
application. Waterfall is a software development method that
consists of communication, planning, modelling,
construction, deployment.
a. Communication
In this phase, project is initiated, and questionnaire is
distributed to respondents (housewives in Cipondoh
District, Banten, Indonesia) to collect the information of
their needs.
b. Planning Fig. 2. Register and Login Page
In this phase, project schedule all documents for
preparation is made. Tracking mechanism has been setup
to control and monitor the project progress. On the Register page, the user enters an email address,
c. Modelling name, and password. After creating an account, the user can
1. After collecting all necessary data, data analysis is log in and complete the profile in the form of name, email,
made to summarize all the needs to build this telephone, home address, and home map. This profile data is
application. intended for delivery of goods by courier (Fig. 3).
2. Comparison with similar applications (Tokopedia,
Bukalapak, and Shopee), is performed.
3. Application is modelled by using UML (use case
diagram, and class diagram).
a. Construction
This phase is developing the application with features:
Register; displaying list of the food raw materials; price,
stock, and weight; seller information; order; and search.
Application is developed by using PHP and MySQL.

2 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

made at the same day by the courier after getting the payment
from the buyer.

Fig. 5. Detail Page

Fig. 3. Profile Page After the application was implemented and tested by 47
housewives (all are respondents who filled out the initial
questionnaire), an evaluation was carried out by giving them
User can access the search page for food raw materials. a questionnaire. The questionnaire results present that 95.7%
On this page, user can search for food raw materials by (45) respondents agree that the application is easy and
entering keywords or by selecting a category. This page also interesting to use, 91.5% (43) agree that all the components
displays the list of food raw materials, product price, shipping and features information had met the needs, 100% (47) agree
cost, and the seller's name (Fig. 4). Currently, user and seller that the application would assist in purchasing food raw
must reside in the local area. The data in the Figure 3 is still materials during the Covid-19 pandemic condition. In
in the form of sample and not the actual data. addition, 93.6% (44) expressed no objection to the manual
transfer payment method, and 95.7% (43) said they would
recommend this application to colleagues.

Fig. 4. Search Page

If a product is selected, the application will go to the


product detail page. On this page, there are the seller's name,
feature to send message, price of goods, tips for courier, item
weight, description, and button to buy (Fig. 5).
After shopping, the user will be redirected to the
payment information page (Fig. 6). On the page, there are the
buyer information, the total cost of goods, shipping cost, total Fig. 6. Detail Page
shopping cost, and order number. The purchase and delivery
of goods will be carried out by a courier from the local
government. Payment system and payment confirmation are VI. CONCLUSION AND FUTURE WORK
still done manually because of the limited ability of Based on the results of this study and its evaluation, it
technology adoption from local residents. The down payment can be concluded that this developed system can help
of 50% is determined by the local government to avoid fraud housewives in shopping for food raw materials safely and
from the buyer. The remaining payment to the seller will be easily in this Covid-19 pandemic situation. The system is

3 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

easy and interesting to be used by users; all information and [4] W. Jielin, "Advanced Logistics Performance Measures of
features already meet the needs; the system can assist in the Alibaba.com on E-commerce," 3rd International Conference on
Information, Business and Education Technology (ICIBET 2014), pp.
purchase of food raw materials during the Covid-19 123-126, 2014.
pandemic situation; and manual transfer payment method [5] R. Pranadipa, "Rancang Bangun Sistem Informasi Jual Beli Bahan
being the current payment solution. Some suggestions for Pangan Dan Pertanian Berbasis Web," 2014. [Online]. Available:
further development are: (1) the application is also built for http://repository.ub.ac.id/146019/.
mobile version, (2) addition of courier data that will assist the [6] S. Liemantara, "Pembuatan Aplikasi E-Sayur Sebagai Media Jual Beli
Pedagang Sayur Keliling Berbasis Mobile," 2019. [Online].
delivery; and (3) additional information on tracking the Available: http://repository.widyakartika.ac.id/874/.
position of the courier. [7] I. Sofiani, "Rancang Bangun Aplikasi E-Marketplace Hasil Pertanian
Berbasis Website Dengan Menggunakan Framework CodeIgniter,"
Jurnal Manajemen Informasi, Vol 10 (1), pp. 25-32, 2019.
REFERENCES [8] D. Soyusiawaty, "Pemberdayaan Kaum Ibu Desa Sinduadi Melalui
Pengelolaan Bahan Makanan dan Pelatihan Aplikasi Pemasaran
Berbasis Mobile," in Seminar Nasional Hasil Pengabdian Kepada
[1] L. Lee, "Bersaing Mencari Konsumen, KFC dan Pizza Hut Indonesia
Masyarakat, 2019.
Berjualan di Jalanan dengan Harga Promo," 2020. [Online].
Available: https://www.nihaoindo.com/bersaing-mencari-konsumen- [9] A. A. Maheswara, "E-Sayur: Platform Jual Beli Sayur," Automata, Vol
kfc-dan-pizza-hut-indonesia-berjualan-di-jalanan-dengan-harga- 1 (2), 2020.
promo/. [10] A. D. Savitri, "Pengembangan Aplikasi Jual Beli Bahan Pangan
[2] R. Saputra, "Ring Terbuka, Tanpa Proteksi UMKM Jangan Manja," Berbasis Website," Automata, Vol 1 (2), 2020.
2020. [Online]. Available: https://www.berdaulat.id/ring-terbuka-
jangan-manja/.
[3] S. M. Hassan, "Top Online Shopping E-companies and their Strength
and Weakness (SWOT)," Research Journal of Recent Sciences, Vol 3
(9), pp. 102-107, 2014.

4 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Comparison of Gaussian Hidden Markov Model and


Convolutional Neural Network in Sign Language
Recognition System
Herman Gunawan Suharjito Devriady Pratama
Computer Science Department, Computer Science Department, Computer Science Department,
Binus Graduate Program – Master of Binus Online Learning, Binus Online Learning,
Computer Science, Bina Nusantara University, Bina Nusantara University,
Bina Nusantara University Jakarta, Indonesia 11480 Jakarta, Indonesia 11480
Jakarta, Indonesia 11480, [email protected] devriady.pratama @binus.ac.id
herman.gunawan @binus.ac.id

Abstract— Sign language Recognition is the study to help devices as dataset [1]. Most researchers using sensor-based
bridging communication of deaf-mute people. Sign Language recognition to get high accuracy because of the accurate depth
Recognition uses techniques to convert gestures of sign language features which makes vision-based recognition not as
into words or alphabet. In Indonesia, there are two types of sign improved as sensors-based recognition.
languages which are used, Bahasa Isyarat Indonesia (BISINDO)
Sign language recognition acts as a translator to bridge the
and Sistem Isyarat Bahasa Indonesia (SIBI). The purpose of this
research is comparing sign language recognition methods between communication in a real time manner, so it needs to be prompt
Gaussian Hidden Markov Model and Convolutional Neural and accurate. For that purpose, there are many research that
Network using indonesian sign language SIBI as a dataset. The has contributed in this field of study, such as comparison of
dataset comes from 200 videos from 2 signers. Each signer number of hidden states in Sign Language Recognition using
performs 10 signs with 10 repetitions. To improve the recognition Hidden Markov Model, sensors-based recognition using deep
accuracy, modified histogram equalization is used as an image learning model by implementing Artificial Neural Network
enhancement. Skin detection was used to track the movement of the and Convolutional Neural Network [2], sensors-based
gesture as input features in the Gaussian Hidden Markov Model recognition using Modified Hidden Markov Model by
and fine tuning was used in Convolutional Neural Network using
implementing Gaussian Hidden Markov Model, low rank
transfer learning, freeze layer, and dropout. The results of the
research are the Gaussian Hidden Markov Model provides approximation, bootstrap resampling technique, K-Means, and
accuracy value of 84.6% and Convolutional Neural Network Coupled Hidden Markov Model [4] [5] [6] [7] [8], vision-
provides accuracy value of 82%. based recognition using modified Hidden Markov Model [9]
Keywords: sign language; hidden Markov model; convolutional [11], vision-based recognition using deep learning models
neural network; sign language recognition; machine learning. such as multi-layered neural network using back propagation
and Convolutional Neural Network [12] [13].
I. INTRODUCTION Several methods have been proposed by previous research
and most of them use Hidden Markov Model (HMM) and
Sign language is the language that is used in the
deep learning models [15]. With the purpose of implementing
communication between deaf-mute people. While it is very
the result in daily life, this research must focus on vision-
common for them to use sign language to communicate, it is
based recognition. The aim of this research is to compare
very rare to see communication between deaf-mute and
Gaussian Hidden Markov Model and Convolutional Network
normal people. The problem lies in the difficulties to
model or I3D Inception using Indonesia Sign Language
understand sign language for normal people. While sign
known as SIBI.
language is commonly used, the combination of sign language
is very wide which will take a very long time to learn. II. LITERATURE REVIEW
To address this problem, sign language recognition with
technology is introduced. Sign language recognition is used as Many research have been done in the field of sign language
a bridge in the communication between deaf-mute people and recognition in recent years. While it is popular in recent years,
normal people. With the appearance of sign language there are still many areas of improvement that could be done
recognition, normal people that don’t understand sign in the future. The methodology used in the research is also
language could communicate with deaf-mute people easily. varied which gives researchers a hard time to decide what
There are 2 categories in sign language recognition, vision- method that will be used. In sign language recognition, the
based recognition and sensor-based recognition. Vision-based first thing that is needed is a dataset. The dataset must be
recognition is using video files as dataset, and sensor-based decided whether it is vision-based or sensors-based. For
recognition is using features that’s collected using additional vision-based recognition, the common characteristics are
gloves color and skin color which will be used as features [9]

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

5 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

[10] [11]. While sensor-based, there are many devices that convolutional neural network consist of five convolutional and
could be used such as Kinect, Myo_Armband, Leap Motion, three fully-connected layer [13], using a modified Inception
and Flex Sensor [3] [4] [6] [7] [8]. The most important thing V1 Model into a 3D Convolutional Neural Network Model on
for dataset collection is the consistency of the dataset. While Action Recognition [18].
sign language is used world-wide, Sign language don’t have In this research, we compare methods from previous
an international language that could be used as universal research by using Gaussian Hidden Markov Model, which is
language, which means that every country has their own sign used on sensor-based recognition to get the best accuracy on
language . With the consistency and uniqueness as its focus, American Sign Language, and deep learning using
every researcher only used dataset with a single country sign Convolutional Neural Network with i3d inception, which is
language, such as American Sign Language [ [4] [6] [10] [15], one of the best Action Recognition Model, to recognize
Indian Sign Language [7] [8] [16], Chinese Sign Language Indonesian Sign Language known as SIBI with vision-based
[5], Arabian Sign Language [17], Italian Sign Language [2], recognition.
Japanese Sign Language [9], and Indonesian Sign Language
[20] [21]. III. METHODOLOGY
After the dataset collection, the data must be preprocessed In this section, the details of this research methodology are
and its features extracted. The extracted features will be the presented. The stages of the research can be explained using
characteristic of the sign. In sign language recognition, the Figure 1.
features that needs to be acquired is the movement of the
hand. By using sensor-based recognition, the feature that need
to be processed is already taken so it’s accuracy will be very
high even if there’s still area of improvement. There are many
ways that has been experimented by previous researcher to get
the features such as by combining the result of using Kinect
and Leap Motion to get hand positions and fingertip direction
[7], using Leap Motion to detect states of hands from different
point of views [15], using Myo Armband to get
electromyography data and inertial measurement unit data [4],
using skin segmentation, skin color regions, and blobs to
detect face and hand [10], using Flex Sensor and
accelerometer to measure movement in the x,y,z coordinates
[1], resizing to a fixed input dimension [13], applying pupils
detection, hand detection, background extraction, and
Principal Component Analysis [11], using 3 dimension look
up table (3-D LUT) to extract the fact and hand region [9], and
skeletal data extraction using Kinect [6] [8] [14].
After the features extracted, the system will train the
extracted feature into a selected model and recognize another
sign using the trained model. The models that have been used
and popular in the recent years are machine learning using
Modified Hidden Markov Model, such as combining Hidden
Markov Model from two different input type which is called
Coupled Hidden Markov Model and implement Bootstrap Fig. 1. Research Methodology
resampling technique to avoid loss of information [7], using
multiple sensors data fusion (MSDF) with Hidden Markov The dataset that is used in this research is based on official
Model [15], combining K-Means and Hidden Markov Model Indonesia Sign Languages called SIBI. The dataset was
to make all the data in the dataset in the set of operable collected by Bina Nusantara University by recording two
number [8], combining Hidden Markov Model and people demonstrating 10 words with 10 iterations to get a total
determining the number of hidden states with low rank of 200 videos. The dataset is collected using Samsung Galaxy
approximation which is called Light Hidden Markov Model S6 Camera, 16 megapixels Sony Exmore RS IMX240 sensor.
[6], combining Gaussian in Hidden Markov Model which is To implement the learning model from the video, there
called Gaussian Hidden Markov Model [4], and Tied-Mixture are many steps that need to be done. For the Hidden Markov
Density Hidden Markov Model [11], and deep learning Model, the model could take raw video as an input. The
models, such as using Artificial Neural Network called Hidden Markov Model only takes sequences of data which is
Adaptive Neighborhood based Modified Backpropagation known as features as an input. To change the inputted data
[15], using 3D-ConvNet which is one of the Convolutional which is video into numerical value that represent the data,
Network Model [3], using two 2D Convolutional Neural there are many steps that need to be done. The first one is to
Network for hand features and body features [2], using deep turn the video as sequence of image, then do image
enhancement which is used to reduce noise and increase image

6 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

quality before the feature extraction. Image enhancement Markov model topology which is used is left to right topology
technique that will be used is Contrast Limited Adaptive because on sign language, we expect that the state on the
Histogram Equalization (CLAHE) which is a modification of frame is only affected by its previous frame.
histogram equalization by equalizing pixels value based on the The other method is Convolutional Neural Network
neighborhood pixels to make sure the image isn’t destroyed (CNN), one of the most popular deep learning models in
[19]. CLAHE is one of the most popular technique that is used image recognition study. The Convolutional Neural Network
on image that has many noises such as fog and it’s also (CNN) model that is used in this research is i3d inception or
popular in medical research that need very good image inflated inception, which is one of the best action recognition
enhancement like tumor detection from image. After neural network models. I3d Inception consists of 67
enhancing the image quality, we apply skin detection which convolutional layers with 9 inception modules. The concept of
will be compared later. Both methods are methods to track the inception is to find out local sparse structure optimal in
movement of the hand which will be the features. the skin convolutional neural network. The problem lies where there
detection that is used in this research is skin detection using are smaller number spatially spread-out clusters that will be
YCbCr color space which is one of the most accurate skin covered by convolutions with larger kernel, also there will be
detection technique [20]. After the detection, there’s a need to a decreasing number of kernels over larger area. To avoid
resize each image. The purpose is that even if the video got kernel alignment issues, the architecture’s kernels are
different size earlier, after the resize the feature taken will be restricted with size 1x1, 3x3, 5x5 as shown in Figure 3 [21].
normalized. In this research, the image is resized into 640x480 But there is another problem. The problem is about the
to maintain the 4:3 ratio on computer, which treat all pixels as computation cost that is too expensive for native inception
square. Then we cut each frame into 2 sides in the middle of module, especially the 5x5 convolutional layer. So there is an
the frame and calculate the center point of the detection result. idea to reduce the computation cost which is to reduce the
Each sides will got a (x,y) coordinates of center point which dimension. For example, let’s focus on the 5x5 convolutional
will be the feature that’s used in Hidden Markov Model layer computation. If we have an input shape 28x28x192 and
technique. For Convolutional Neural Network model, the want to pass 5x5@32 filter convolutional layer. To do the
model is applicable to use raw image as an input. The model computation, we need (5x5x192) x (28x28x32) which is
will recognize the feature automatically from the raw image approximately 120 million computations. To solve the
which is one of the benefits to use Convolutional Neural problem, we add 1x1 convolutional layer to reduce the
Network rather than Hidden Markov Model. Even though it is dimension of the shape which is 28x28x192. With the same
powerful enough, pre-processing is still needed to adjust the input shape, if we pass it to the 1x1@16 filter convolutional
inputted data to the model and enhance the inputted data, so layer, the computation will be (1x1x192) x (28x28x16) which
the features that’s taken are valid. The pre-processing is 2.4 million computations. So, the computation cost will be
technique that needs to be done first is turn the video into a (5x5x16) x (28x28x32) which is about 10 million. Then, we
sequence of images then padded the number of images into the add both step so it will be approximately 12.4 million
average number of frames of all the videos. In this dataset, the computation cost. We did freeze on some layers. Freeze is
average number of frames is 81, then add black image for making a layer to be not trainable. We did freeze some layers
every video that has less than 81 frames and cut every video because we don’t want the weight from transfer learning to be
that has more than 81 frames. After that, every image is changed, especially on the early layers. Early layers were used
resized into sized of 224x224 with bilinear interpolation and for edge detection, so the edge detection from the transfer
normalize the pixels values into a scale of -1 to 1. Then store learning won’t fade out. We also add 0.5 drop-out rate to the
the data into an HDF5 file using h5py library to be used on model. Because this model learned too many features. There is
Convolutional Neural Network model. a probability this model would learn too many things so it
Gaussian Hidden Markov Model and Convolutional Neural would be over fit. Because of the over fit, the model not only
Network using I3D Inception were used as a model to process can classify the sign, but the model also classifies the different
the training and classification data. Hidden Markov Model is signer with the same sign as a different class.
chosen as one of the models because it is a state of the art for The evaluation and comparison are focusing on the result
speech recognition which is a form of sequence recognition. of each process. For the evaluation, we will use confusion
With the same concept, sign language recognition is also a matrix and accuracy rate as a main focus. The evaluation of
form of sequence recognition. Hidden Markov Model itself is this study is to compare the result of sign language recognition
a stochastic finite state automata (FSA) consist of finite set of using Gaussian Hidden Markov Model and Convolutional
state and transition while some state is hidden which fall in the Neural Network for Indonesian Sign Language.
class of Dynamic Bayesian Network. Gaussian Hidden
Markov Model is already proved as the one of the best Hidden IV. RESULTS AND DISCUSSION
Markov Model to be used in sensors-based Sign Language
Recognition. In Sign Language Recognition, the Hidden A. Implementation Detail
Markov Model is going to recognize the possibility on every For Gaussian Hidden Markov Model, we implemented our
frame, which means the gaussian is the solution because the methodology on Intel(R) Core (TM) i3-4030U CPU @
value will be variative depending on the frame. The Hidden 1.90GHz (4 CPUs), ~1.9GHz with 4GB RAM using Python
3.6 version while for Convolutional Neural Network Model,

7 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

we implemented our methodology on 2x Intel(R) Xeon(R) Convolutional Neural Network could be seen on Fig 6 and Fig
CPU E5-2630 v4 (10 core, 2.20 GHz, 25MB, 85W) with 7. For Gaussian Hidden Markov Model, the accuracy on
80GB RAM using Python on NVIDIA Tesla P100 3584 testing data that we got on 10 signs is 84.6%.
CUDA cores GPU. The dataset that’s used for Gaussian
Hidden Markov Model is split from SIBI dataset with a ratio On Figure 2, Convolutional Neural Network model is run on
of 7:3 for training and testing while for Convolutional Neural 200 iteration and the accuracy of the training data is
Network, the dataset is split from SIBI dataset with a ratio of increasing, but on epoch 45s the accuracy of the training set
6:2:2 for training, validation, and testing. Both models are has reach 100%. After that, the accuracy is constant 100%
trained with a total of 200 iterations with random which means that there were no more features that need to be
initialization. learned in order to classify the training data. On Figure 3, the
B. Dataset accuracy of the validation also reached the maximum value of
The dataset that’s used is SIBI dataset which is taken in Bina 100% on epoch 45s which is the same as training data which
Nusantara University on 2018/2019. The dataset used means that when the training has reached its peak, the
Indonesian language with a total video of 200 with 10 classes validation also reached its peak in the dataset. But on
and 2 signers. The dataset contains RGB Video without Depth validation data, the accuracy was going down on epoch 55s
feature for vision-based recognition purpose. In this research, and keep on going down later on. This show that after the
there are some requirements for dataset collection. The first training on epoch 55, the training data isn’t affected by the
one is that the signer must stand in the middle of the video feature, but the validation data is affected which means the
because the feature that’s used in this research is the trained feature is too complicated and is not a useful feature
movement on the left and right side of the frames which anymore.
means both areas must have a movement to get better result.
The second requirement is that the color of the background
and clothes must be different from skin color to make the skin
detection only detect the hand and face of the signer.
C. Preprocessing Results
For the preprocessing method, the method that we used that
other sign language recognition haven’t used is CLAHE. The
concept of CLAHE is similar to Histogram Equalization,
which is adjusting the contrast of each pixel to get better detail Fig 2. Accuracy rate of training on Convolutional Neural Network
in photographs what has over- exposed or under-exposed.
With that said, Histogram Equalization is far below Contrast
Limited Histogram Equalization in performance, because the
chance of Histogram Equalization increasing the contrast of
background noise is very high which will destroy the feature
of the images. The main objectives of Contrast Limited
Histogram Equalization which is an upgrade version of
Histogram Equalization is to improve the contrast without
producing brightness mean-shift and decreasing the loss detail
by taking the size of the neighborhood region as one of the
requirements. Then to get the movement of the hands, we used Fig 3. Accuracy Rate of Validation on Convolutional Neural Network
skin detection using YCbCr Color Space. We also tried HSV
Color Space, but the result of YCbCr Color Space is much Table 1: Recognition Result on Gaussian Hidden Markov Model.
better then HSV Color Space. After the experiment, we see - 1 2 3 4 5 6 7 8 9 10
that the differences between using CLAHE and not using 1 6 - - - 2 - - - - -
CLAHE isn’t very different in our dataset. The reason is 2 - 4 - - - - - - - -
because the dataset that was collected is already prepared 3 - - 6 - - - - 1 - -
using good light so that even without CLAHE, the skin is 4 - - - 6 - - - 3 - -
already detected.
5 - - - - 3 - - - - -
D. Evaluation Results 6 - 1 - - - 6 - - - -
In this research, the dataset of 200 videos with 10 signs is 7 - - - - - - 6 - - -
used. On Hidden Markov Model technique, the data is split 8 - - - - - - - 2 - -
with a ratio of 7:3 as training data and testing data accordingly
9 - 1 - - - - - - 6 -
while a ratio of 6:2:2 is used for convolutional neural network
10 - - - - 1 - - - - 6
for training data, validation data, and testing data accordingly.
For Convolutional Neural Network, the accuracy of 100% is Recognition result on Gaussian Hidden Markov Model could
gotten on training data and the accuracy of 82% is gotten on be seen on Table 1 and recognition result on Convolutional
validation data. The diagram of the accuracy rate on Neural Network could be seen on Table 2. The number on the

8 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

top of the figure and left of the figure represent the number With the result of the confusion matrix of each sign using
used on the sign. Top of the figure represent the sign that is Gaussian Hidden Markov Model and Convolutional Neural
needed to be recognized while left of the figure represent the Network could be created, which in shown on Table 3 and
recognition result. The number that’s shown diagonally is the Table 4. On Table 3, the confusion matrix of each sign on
number of current prediction while the other are the number of Gaussian Hidden Markov Model is shown. With formula (2),
false predictions. (3), (4), the precision, recall, and F1 Score could be calculated
Table 2: Recognition Result on Convolutional Neural Network. for each sign, Table 3(a) represent laborer sign and got 75%
precision, 100% recall, 85% F1 Score, 3(b) represent pilot
- 1 2 3 4 5 6 7 8 9 10 sign and got 100% precision, 66% recall, 79% F1 Score, 3(c)
1 4 - - - - - - - - - represent garden sign and got 85% precision, 100%% recall,
2 - 4 - - - - - - 1 - 91% F1 Score, 3(d) represent class sign and got 66%
3 - - 2 - - - - - - - precision, 100% recall, 79% F1 Score, 3(e) represent house
sign and got 100% precision, 50% recall, 66% F1 Score, 3(f)
4 - - 2 4 - - - - - -
represent place sign and got 85% precision, 100% recall, 91%
5 - - - - 2 - - - - - F1 Score, 3(g) represent confused sign and got 100%
6 - - - - - 4 - - - - precision, 100% recall, 100% F1 Score, 3(h) represent excited
7 - - - - - - 3 - 1 - sign and got 100% precision, 33% recall, 49% F1 Score, 3(i)
represent disappointed sign and got 85% precision, 100%
8 - - - - - - - 4 - -
recall, 91% F1 Score, 3(j) represent angry sign and got 85%
9 - - - - 2 - - - 2 - precision, 100% recall, 91% F1 Score which could be seen of
10 - - - - - - 1 - - 4 Table 5, while for Table 4 which is Convolutional Neural
Network, Table 4(a) represent laborer sign and got 100%
precision, 100% recall, 100% F1 Score, 4(b) represent pilot
Table 3: Confusion Matrix of each Gaussian Hidden Markov Model. sign and got 80% precision, 100% recall, 88% F1 Score, 4(c)
represent garden sign and got 100% precision, 50%% recall,
(a) 1 T F (b) 2 T F (c) 3 T F
66% F1 Score, 4(d) represent class sign and got 66%
T 6 2 T 4 0 T 6 1
precision, 100% recall, 79% F1 Score, 4(e) represent house
F 0 52 F 2 54 F 0 53
sign and got 100% precision, 50% recall, 66% F1 Score, 4(f)
represent place sign and got 100% precision, 100% recall,
(d) 4 T F (e) 5 T F (f) 6 T F
100% F1 Score, 4(g) represent confused sign and got 75%
T 6 3 T 3 0 T 6 1
precision, 75% recall, 75% F1 Score, 4(h) represent excited
F 0 51 F 3 54 F 0 53
sign and got 100% precision, 100% recall, 100% F1 Score,
(g) 7 T F (h) 8 T F (i) 9 T F
4(i) represent disappointed sign and got 50% precision, 50%
T 6 0 T 2 0 T 6 1
recall, 50% F1 Score, 4(j) represent angry sign and got 80%
F 0 54 F 4 54 F 0 53
precision, 100% recall, 88% F1 Score which could be seen of
Table 6.
(j) 10 T F
Table 5. Precision, recall, and F1 score for each sign on Gaussian
T 6 1
Hidden Markov Model.
F 0 53 No. Sign Precision Recall F1 Score
(%) (%) (%)
Table 4. Confusion Matrix of each Convolutional Neural Network 1 Laborer 75 100 85
(a) 1 T F (b) 2 T F (c) 3 T F 2 Pilot 100 66 79
T 4 0 T 4 1 T 2 0
3 Garden 85 100 91
4 Class 66 100 79
F 0 36 F 0 35 F 2 36
5 House 100 50 66
6 Place 85 100 91
(d) 4 T F (e) 5 T F (f) 6 T F
7 Confused 100 100 100
T 4 2 T 2 0 T 4 0
8 Excited 100 33 49
F 0 34 F 2 36 F 0 36 9 Disappointed 85 100 91
10 Angry 85 100 91
(g) 7 T F (h) 8 T F (i) 9 T F
T 3 1 T 4 0 T 2 2 Table 6. Precision, recall, and F1 score for each sign on
F 1 35 F 0 36 F 2 34 Convolutional Neural Network.
No. Sign Precision Recall F1 Score
(j) 10 T F (%) (%) (%)
T 4 1 1 Laborer 100 100 100
F 0 35 2 Pilot 80 100 88
3 Garden 100 50 66

9 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

4 Class 66 100 79 [2] Suharjito, H. Gunawan, N. Thiracitta and A. Nugroho, "Sign language
5 House 100 50 66 recognition using modified convolutional neural network model.," in
2018 Indonesian Association for Pattern Recognition International
6 Place 100 100 100 Conference (INAPR), 2018, September.
7 Confused 75 75 75 [3] L. Pigou, S. Dieleman, P.-J. Kindermas and B. Schrauwen, "Sign
8 Excited 100 100 100 Language Recognition using Convolutional Neural Network".
9 Disappointed 50 50 50 [4] J. Huang, W. Zhou, H. Li and W. Li, "Sign Language Recognition using
10 Angry 80 100 88 3D Convolutional Neural Networks," 2015 IEEE International
Conference, pp. 1-6, 2015.
[5] R. Fatmi, S. Rashad, R. Integlia and G. Hutchison, "American Sign
The calculation of precision, recall, and F1 score are done for Language Recognition using Hidden Markov Models and Wearable
each model and the means of every Gaussian Hidden Markov Motion Sensors," 2017.
Model is the representation of its result. In this experiment, for [6] Y. Wenwen, Tao, Jinxu, Ye and Zhongfu, "Continuous sign language
Gaussian Hidden Markov Model, we got the means result of recognition using level building based on fast hidden Markov model,"
Pattern Recognition Letters, pp. 28-35, 2016.
88.1% precision, 84.9% recall, and 82.2% F1 score while for [7] H. Wang, X. Chai, Y. Zhou and X. Chen, "Fast Sign Language
Convolutional Neural Network, we got the result of 85.1% precision, Recognition Benefited From Low Rank Approximation," 2015 11th
82.5% recall, and 81.2% F1 score that could be seen on Table 7. IEEE International Conference and Workshops on Automatic Face and
Gesture Recognition (FG), pp. 1-6, 2015.
[8] P. Kumar, H. Gauba, P. P. Roy and D. P. Dogra, "Coupled HMM-based
Table 7: Comparison on Prediction between Gaussian Hidden multi-sensor data fusion for sign language recognition," Pattern
Markov Model and Convolutional Neural Network. Recognition Letters, pp. 1-8, 2017.
Model Mean of Mean of Mean of Accuracy [9] J. L. Raheja, M. Minhas, D. Prashanth, T. Shah and A. Chaudhary,
Precision Recall F1 Score "Robust gesture recognition using Kinect: A comparison between DTW
Gaussian 88.1% 84.9% 82.2% 84.6% and HMM," Optik-International Journal for Light and Electron Optics,
Hidden pp. 1098-1104, 2015.
[10] K. Imagawa, S. Lu and S. Igi, "Color-Based Hands Tracking System for
Markov Model Sign Language Recognition," Automatic Face and Gesture Recognition,
Convolutional 85.1% 82.5% 81.2% 82% pp. 462-467, 1998.
Neural [11] M. M. Zaki and S. I. Shaheen, "Sign language recognition using a
Network combination of new vision based features," Pattern Recognition Letters,
pp. 572-577, 2011.
[12] L.-G. Zhang, Y. Chen, G. Fang and X. Chen, "A Vision-Based Sign
By comparing the result of Gaussian Hidden Markov Language Recognition System Using Tied-Mixture Density HMM,"
Model and the result on Convolutional Neural Network, we Proceedings of the 6th international conference on Multimodal
could see that the result on Gaussian Hidden Markov Model interfaces, pp. 198-204, 2004.
got more precision, recall, F1 score, and accuracy for 10 [13] S. Dogic and G. Karli, "Sign Language Recognition using Neural
Networks," TEM Journal, pp. 296-301, 2014.
different SIBI Sign. Even though it is better in values, the [14] A. Krizhevsky, I. Sutskever and G. E. Hinton, "ImageNet Classification
range of precision, recall, and F1 Score that we got on Hidden with Deep Convolutional," Advances in Neural Information Processing
Markov model is wider than the result that we got on Systems, pp. 1097-1105, 2012.
Convolutional Neural Network. [15] Suharjito, R. Anderson, F. Wiryana, M. C. Ariesta and G. P. Kusuma,
"Sign language recognition application systems for deaf-mute people: a
V. CONCLUSION review based on input-process-output.," Procedia computer science, vol.
116, pp. 441-448., 2017.
In this paper, Gaussian Hidden Markov Model and [16] T. Handhika, R. I. M. Zen, D. P. Lestari and I. Sari, "Gesture
Convolutional Neural Network has been used as a model on Recognition for Indonesian Sign Language (BISINDO)," Journal of
Physics: Conference Series, 2018.
10 SIBI with vision-based recognition approach. For video [17] K.-Y. Fok, N. Ganganath, C.-T. Cheng and C. K. Tse, "A Real-Time
enhancement, Contrast Limited Adaptive Histogram ASL Recognition System Using Leap Motion Sensors," 2015
Equalization was used to make sure the skin is detected. Skin International Conference on Cyber-Enabled Distributed Computing and
detection was used in Gaussian Hidden Markov Model and Knowledge Discovery, pp. 411-414, 2015.
[18] D. Tewari and S. K. Srivastava, "A Visual Recognition of Static Hand
Fine Tuning was used in Convolutional Neural Network. In Gestures in Indian Sign Language based on Kohonen Self-Organizing
this experiment, Gaussian Hidden Markov Model result was Map Algorithm," International Journal of Engineering and Advanced
better than Convolutional Neural Network but the range of Technology (IJEAT), pp. 165-170, 2012.
result in Gaussian Hidden Markov Model was wider which [19] N. A. Sarhan, Y. El-Sonbaty and S. M. Youssef, "HMM-Based Arabic
Sign Language Recognition Using Kinect," The Tenth International
means that Convolutional Neural Network is more consistent Conference on Digital Information Management (ICDIM 2015), pp.
than Gaussian Hidden Markov Model. For future works, there 169-174, 2015.
are many are of improvement that could be done in this [20] Suharjito, N. Thiracitta and H. Gunawan, "SIBI Sign Language
research, just like using another feature extraction, using more Recognition Using Convolutional Neural Network Combined with
Transfer Learning and non-trainable Parameters.," Procedia Computer
dataset, using another modified Hidden Markov Model or Science, vol. 179, pp. 72-80, 2021.
Convolutional Neural Network Model, and many other else. [21] A. Aljabar and Suharjito, "BISINDO (Bahasa Isyarat Indonesia) Sign
Language Recognition Using CNN and LSTM," Advances in Science,
REFERENCES Technology and Engineering Systems Journal, vol. 5, no. 5, pp. 282-287,
[1] E. Supriyati and I. Mohammad, "Recognition System of Indonesia Sign 2020.
Language based on Sensor and Artificial Neural Network," Makara
Journal of Technology, pp. 25-31, 2013.

10 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Intelligent Computational
Model for Early Heart Disease Prediction using
Logistic Regression and Stochastic Gradient
Descent
(A Preliminary Study)
Eka Miranda Faqir M Bhatti Mediana Aryuni
Information Systems Department, Riphah Institute of Computing and Information Systems Department,
School of Information Systems Applied Sciences School of Information Systems
Bina Nusantara University Riphah International University Bina Nusantara University
Jakarta, Indonesia 11480 Raiwind, Lahore. Pakistan Jakarta, Indonesia 11480
[email protected] [email protected] [email protected]

Charles Bernando
Information Systems Department,
School of Information Systems
Bina Nusantara University
Jakarta, Indonesia 11480
[email protected]

Abstract— Heart disease, also known as cardiovascular every year. At least 2,784,064 people in Indonesia had heart
disease (CVDs) caused major death worldwide. Heart disease disease [3]. Center for Data and Information the Ministry of
couldcan be diagnosed using non-invasive and invasive Health of the Republic of Indonesia reported the occurrence
methods. The main distinctions for invasive and non-invasive of coronary heart disease in Indonesia in 2013 was 0.5% or
tests were invasive test use medical equipment entering the approximately 883,447 people [2],[4]. Individual habits and
human body while non-invasive tests did not. This study was genetic tendency are several factors that cause heart disease.
designed a model for non-invasive prediction with an Physical inactivity, smoking, harmful use of alcohol, obesity,
intelligent computational and machine learning approach for hypertension, and cholesterol are several factors for heart
predicting early heart disease. Logistic regression and
disease. Early detection of heart disease plays an important
stochastic gradient descent applied for this model. A clinical
dataset of 303 patients was gathered from the UCI repository
task as a pre-emptive action to prevent death [2], [5].
that was available at Heart disease can be diagnosed using non-invasive and
http://archive.ics.uci.edu/ml/datasets/Heart+Disease. Age, Sex, invasive methods. The invasive intervention method uses
Cp, Trestbps, Chol, Fbs, Exang Continuous Maximum heart minimally invasive surgery to identify abnormalities of the
rate achieved, Thalach, Old peak ST, Slope, Ca and Thal heart structure whereas the non-invasive method identifies
variables were used to classify the patient into two class heart problems without invasive surgery or other instruments
prediction namely No presence or Have heart disease.
which are entered into the body [6]. The conventional
Classifier performance for logistic regression namely accuracy
invasive-based method is used based on physical test results,
91.67%, precision 93.93%, F Measure 92.53%, recall 91.18%
and for gradient descent namely accuracy 80.00%,
medical record examination and investigation of related
precision 76.47%, F Measure 81.25%, recall, 86.67%. symptoms for heart disease diagnosis. Also, an image-based
The experiment result revealed logistic regression gained test using X-rays to see blood vessels (Angiography) was a
higher accuracy, precision, F-measure and recall value than precise technique to identify heart problems coupling with
stochastic gradient descent. the conventional invasive-based method. Angiography has
some drawbacks like highly cost. These two methods
Keywords—heart disease, logistic regression, stochastic physical test result and angiography requires expensive
gradient descent, machine learning laboratory equipment, specialized tools and techniques. On
the other hand, the appropriate early identification of heart
problem was urgent to avoid more heart issue [7]. To answer
I. INTRODUCTION (HEADING 1) these issues this study tried to develop non-invasive based on
Heart disease, or known as cardiovascular disease (CVD) the intelligent computational model and machine learning for
caused major death wide-reaching nearly 17.9 million people early heart disease prediction. For several previous years
died from CVDs in 2016. This number was 31% of all statistical and machine learning approaches progressively
worldwide deaths. CVDs are the heart and blood vessels implemented to support medical analysis. This approach
illness include coronary heart disease and other conditions comprises a predictive and descriptive data mining approach.
[1]. WHO reported 17 million deaths under 70 years old Predictive data mining commonly implemented to generate a
caused by non-communicable diseases in 2015 and 37% model for the prediction and classification [8]. Several
were caused by CVDs [2]. previous kinds of research have already got great
performance on data mining and machine learning used to
Indonesian Basic Health Research (RISKESDAS) in predict heart disease. The common use is logistic regression.
2018 revealed that heart and blood vessel disease cases grow This model was designed for the risk assessment of complex

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

11 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

diseases. Logistic regression was part of a supervised 2. Multinomial logistics regression.


learning task for predicting an object. A study reported by [9] 3. Ordinal logistics regression.
revealed the use of a logistic tree algorithm (coupling logistic The logistic function namely: the sigmoid function. Logistic
regression and decision tree) for heart disease prediction and regression functions are shown in equation 1 and 2 [13].
get an accuracy result of 55.77%. Logistic regression is easy
to implement, interpret, and very time-saving to train.
Therefore, this study tried to develop an intelligent
computational model using logistic regression and also
gradient descent algorithm for early heart disease prediction. σ (z) = output or y (0 to 1), z = input to the function and e =
The contribution of this study was to develop an intelligent base for log. Independent variables: x1, x2, x3……., xn.
computational model using logistic regression and gradient
descent algorithm for early heart disease prediction.

II. LITERATURE STUDY


P: the probability for heart disease prediction.
A. Logistic Regression
The logistic regression process begin by giving data
Logistic regression operates on machine learning in the variables x and y as a classification feature. Subsequently,
statistics domain and worked to solve binary classification feature weight for computing in logistic regression formula
issues. Additionally, this technique applies for observing a given while updating the weight value. Finally, evaluate the
set of discrete classes. Logistic Regression is a simple classification result by creating a confusion matrix for
machine learning technique, easy to employ also produce accuracy measurement calculation. Figure 1 display the
great training efficiency [10]. Plentiful studies have revealed logistic regression process.
the fruitful performance of logistic regression to address
several problems. The study by [9] reported the use of a
logistic model tree algorithm to predict heart disease and
gained an accuracy of 55.77%. Research by [11] conveyed
the result for diabetes type 1 and 2 predictions with logistic
regression and shown the great ROC AUC 0.95. Another
research by [12] presented the use of logistic regression and
machine learning for predicting patient mortality and got a
95% confidence interval. The study by [13] reported logistic Fig. 1. Logistic Regression Process
regression in machine learning to predict heart disease and
produced an accuracy value of 87%.
C. Gradient Descent
B. Gradient Descent Gradient descent performs for minimizing an objective
Gradient descent performs for optimizing neural function J(θ) and using parameters θ ∈ Rd also updating
networks. In machine learning, gradient descent applies for parameters on the objective function ∇θJ(θ) w.r.t. The
reviewing parameters [14]. Numerous studies have revealed learning rate η defines a number of the steps to get the
the positive result of gradient descent to answer several smallest function. Gradient descent frequently used in
issues. The studies by [15] reported the use of Ridge-Adaline supervised learning that uses the training set to learn the
Stochastic Gradient Descent Classifier (RASGD) for relationship between input and output [14]. The gradient
diabetes mellitus prediction and get an accuracy of 92%. descent has three types namely [14]:
Research by [16] explained their model for cardiovascular 1. Batch Gradient Descent
disease prediction using gradient descent optimization and This type assuring global lowest scope size for
performed an accuracy value of 98.54%. A study by [17] convex error area and local lowest scope size for the
reported the use of gradient descent to predict heart disease non-convex area
and gain accuracy of 98.35%. 2. Stochastic Gradient Descent (SGD)
This type presenting duplicates computation on
huge data and calculates the gradient. SGD
III. DATA AND METHODS
performs redundancy by running a revise at a time.
A. Dataset SGD one of the optimization algorithms frequently
used in machine learning applications to discover
A clinical dataset of 303 patients was gathered from the the model parameters that correspond to the best fit
UCI repository between predicted and actual.
http://archive.ics.uci.edu/ml/datasets/Heart+Disease. 3. Mini-batch gradient descent
Training a neural network
B. Logistic Regression
Logistic regression task is supervised learning for The gradient descent begins by initializing the parameters.
prediction purposes and for analyzing data. This algorithm Subsequently, using the data points to get optimize
consists of independent variables (IVs) to define a result coefficient value by updating the learning rate and gradient
and a categorical dependent variable (DV) [13]. The coefficient on every iteration for getting an optimized
logistics regression types namely [13]. coefficient. Figure 2 display the logistic regression process.
1. Binary logistics regression.

12 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

E. Research Stages

Fig. 2. Gradient Descent Process

D. Evaluation of Classification Model


The evaluation of classification models performs to
measure the correct task the classification predicted using the
model corresponding to the actual classification of the case.
Evaluation based on confusion matrix most frequently used
to measure classification performance. The confusion matrix
work by comparing the actual values with the predicted
value performed by machine learning. Measurements are
calculated based on a confusion matrix, namely Accuracy,
Precision, F Measure, Recall. The formula for Accuracy,
Precision, F Measure, Recall is shown in equation 4 – 7 [18]
[19]. The confusion matrix could be seen in Figure 3.

Fig. 4. Research Stages

There are three steps for model construction. The steps are
shown in Figure 4.
Step 1: Capturing data sources required by the models
Step 2: Based on the data source select heart disease
attributes and construct data for the training dataset and
testing dataset afterwards take correlated algorithm to
classify. Hold out method dividing the training data (80%)
Fig. 3. Confusion Matrix and testing data (20%).
Step 3: Evaluate the classier result using confusion matrix
analysis.

IV. RESULT AND DISCUSSION

A. Dataset Description
The clinical dataset of 303 patients gathered from the
UCI repository
http://archive.ics.uci.edu/ml/datasets/Heart+Disease. The
dataset consists of 76 attributes and 303 records.
Nevertheless, merely 13 attributes were being employed by
reason of most published experiments using the 13 attributes.
All records had been class labelled for heart disease
diagnosis classes namely: No Presence and Have heart
disease. The attributes chosen can be seen in Table 1.

13 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

TABLE I. SELECTED HEART DISEASE ATTRIBUTES Figure 5 demonstrates the prediction outcomes in a
confusion matrix for the testing dataset. A confusion matrix
is commonly applied to show a classification result from
testing data that positive value identified. The matrix
evaluates the actual target values against predicted values.
[19]. Figure 3 shows 24 records that have match the value of
heart disease prediction between predicted and the actual
value (true positive) and 31 records that have match the
value of no presence heart disease between predicted and the
actual value (true negative). Measurement values were
calculated based on confusion matrix namely: accuracy,
precision, F-measure and recall (seen on Table II).

TABLE II. ACCURACY, PRECISION, F-MEASURE, RECALL: LOGISTIC


REGRESSION CLASSIFIER

Score
Accuracy 91.67%

Precision 93.93%

F Measure 92.53%

Recall 91.18%

C. Stochastic Gradient Descent


Parameter tuning for optimizing classifier performance
was set for several parameter values namely: Learning rate
that was derived from alpha and epsilon value. The learning
rate managed the number of modification for estimating
error. The alpha parameter is a constant value that multiplies
the regularization term. The higher the value, the stronger the
regularization. The default value is 0.0001. Epsilon
parameter controlled the threshold value for getting precise
prediction.
Any distinction between the current prediction and the
correct value was unobserved when this value less than the
threshold value. The default value is 0.1. Experiment on this
study set alpha value of 0.0001 and epsilon value of 0.1. The
B. Logistic Regression
result of gradient descent for heart disease prediction is
Logistic regression was conducted through the phyton shown in Figure 6 on the testing dataset.
library. Parameter tuning for optimizing classifier
performance was set for several parameter values namely:
C =1.0, λ = 1.2
C parameter showed inverse regularization parameter. A
control variable keept vigour change of regularization and it
was put in the inverse position to the Lambda regulator.The
default value is 1.0. The smaller C value specifies the
stronger regularization value. The Lambda parameter was
used to control the regularization strength. The increase in
the lambda value the strength of the regularization effect.
The result of logistic regression for heart disease prediction Fig. 6. Logistic Regresion with Stochastic Gradient Descent classifier
result
is demonstrated in Figure 5.
Figure 6 shows 22 records that have match the value of heart
disease prediction between predicted and the actual value
(true positive) and 26 records that have match the value of no
presence heart disease between predicted and the actual
value (true negative). Table III show accuracy, precision, F-
measure and recall

Fig. 5. Logistic Regression classifier result

14 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

TABLE III. ACCURACY, PRECISION, F-MEASURE, RECALL: implication of this study for the medical field could be
STOCHASTIC GRADIENT DESCENT
employed as an early prediction tool for heart disease
Score prediction. The forthcoming task from this study could be
Accuracy 80.00%
performed by employ more datasets and more precise
techniques, explore feature extraction, and classification
Precision 76.47% techniques to increase accuracy results.
F Measure 81.25%

Recall 86.67% ACKNOWLEDGMENT


This work is supported by Research and Technology
The experiment result of this study revealed logistic Transfer Office, Bina Nusantara University as a part of Bina
regression gained higher accuracy, precision, F-measure and Nusantara University’s International Research Grant entitled
recall value rather than stochastic gradient descent (see "Aplikasi Prediksi Diagnosa Awal Penyakit Jantung Koroner
Figure 7). Berbasis Web Dengan Teknik Regresi Logistik" with
contract number: No: 017/VR.RTT/III/2021 and contract
date: 22 March 2021. We also appreciate the support from
Prof. Dr. Faqir M Bhatti, Riphah Institute of Computing and
Applied Sciences, Riphah International University, Lahore.
Pakistan, who help writing process.

REFERENCES

[1] World Health Organization, “Cardiovascular Disease”, World Health


Organization, May 2021 [Online] Available:
https://www.who.int/health-topics/cardiovascular-
diseases/#tab=tab_1 [Accessed 8 May 2021].
[2] World Health Organization, “Cardiovascular Disease (CVDs)”, World
Health Organization, May 2017 [Online] Available:
Fig. 7. Classifier result: Logistic regression and Logistic Regresion with https://www.who.int/news-room/fact-sheets/detail/cardiovascular-
Stochastic Gradient Descent diseases-(cvds) [Accessed 8 May 2021].
[3] Ministry of Health of the Republic of Indonesia, “Indonesian Basic
Health Research Report”, Ministry of Health of the Republic of
This study produced accuracy improvement when using a Indonesia, 2018 [Online] Available:
similar dataset to the previous study by [9] for logistic https://kesmas.kemkes.go.id/assets/upload/dir_519d41d8cd98f00/files
/Hasil-riskesdas-2018_1274.pdf [Accessed 8 May 2021].
regression. Research by [9] performed a logistic model tree
[4] Center for Data And Information The Ministry of Health of the
algorithm (coupling logistic regression and decision tree) Republic of Indonesia , “Heart Health Situation”. Center for Data
and produced the result 55.77% accuracy, while our study And Information The Ministry of Health of the Republic of Indonesia,
performed logistic regression technique and produced the September 2014, [Online] Available:
result 91.67% accuracy. The beneficial impact of this study https://pusdatin.kemkes.go.id/resources/download/pusdatin/infodatin/i
revealed logistic regression and stochastic gradient descent nfodatin-jantung.pdf [Accessed 8 May 2021].
reached high accuracy to predict heart disease presence. The [5] D. Shah, S. Patel, S. K. Bahrti, “Heart Disease Prediction using
Machine Learning Techniques”, SN Computer Science, Volume 1,
beneficial impact of this study revealed logistic regression No. 345, pp. 1-6, 2020.
and stochastic gradient descent reached high accuracy to
[6] R. Alizadehsania, M. J. Hosseinib, A. Khosravia, F. Khozeimehc, M.
predict heart disease presence, 91,67% and 80.00% Roshanzamird, N. Sarrafzadegane, S. Nahavandia, “Non-invasive
respectively. The limitation of our study was the small Detection of Coronary Artery Disease in High-Risk Patients Based
number of the dataset. Furthermore, the evaluation based on on the Stenosis Prediction of Separate Coronary Arteries”, Computer
the medical judgement was needed to complete the accurate Methods and Programs in Biomedicine, Volume 162, pp. 119-127,
2018.
result.
[7] Y. Muhammad, M. Tahir, M. Hayat, K. To Chong, “Early and
Accurate Detection and Diagnosis of Heart Disease using Intelligent
V. CONCLUSION AND FUTURE WORK Computational Model”, Scientific Reports, Volume 10, pp. 1-17,
2020.
[8] L. Verma, S. Srivastava, P. C. Negi, “An Intelligent Noninvasive
The authors used logistic regression and stochastic Model for Coronary Artery Disease Detection”, Complex &
Intelligent Systems, Volume 4, pp. 11-18, 2018.
gradient descent for early heart disease prediction. The
[9] J. Patel, T. Upadhyay, S. Patel, “ Heart Disease Prediction Using
holdout method was applied for dividing the data set into a Machine learning and Data Mining Technique”, International Journal
training dataset and testing dataset (80%:20% for training of Computer Science & Communication, Volume 7, No 1, pp. 129-
data and testing data respectively). The result showed both 137, 2015.
algorithms gained high accuracy, 91,67% and 80.00% [10] E. Christodoulou, J. Ma, G. S. Collins, E. W. Steyerberg, J. Y.
respectively. Furthermore, the experiment result shown Verbakel, B. Van Calstera, “A Systematic Review Shows No
logistic regression gained higher accuracy, precision, F- Performance Benefit of Machine Learning Over Logistic Regression
for Clinical Prediction Models”, Journal of Clinical Epidemiology,
measure and recall value than stochastic gradient descent. Volume 110, pp. 12-22, 2019
This finding was potentially used for the medical field as a
[11] A. L. Lynam, J. M. Dennis, K. R. Owen, R. A. Oram, A. G. Jones, B.
non-invasive diagnosis to detect heart disease presence. The M. Shields, L, A. Ferrat, “Logistic Regression Has Similar

15 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Performance to Optimised Machine Learning Algorithms In A [16] M. S Nawaz, B. Shoaib, M. A. Ashraf, “Intelligent Cardiovascular
Clinical Setting: Application To The Discrimination Between Type 1 Disease Prediction Empowered with Gradient Descent Optimization”,
And Type 2 Diabetes In Young Adults“, Diagnostic and Prognostic Heliyon, Volume 7, Issue 5, May, pp 1-7, 2021
Research, Volume 4, No 6, pp 2-10, 2020. [17] V. S. Sakila, A. Dhiman, K. Mohapatra, P. R. Jagdishkumar, “An
[12] T. E. Cowling, D. A. Cromwell, A. Bellot, L. D. Sharples, J. van der Automatic System for Heart Disease Prediction using Perceptron
Meulen, “Logistic Regression And Machine Learning Predicted Model and Gradient Descent Algorithm”, International Journal of
Patient Mortality from Large Sets of Diagnosis Codes Comparably”, Engineering and Advanced Technology, Volume 9, No 1, pp 1-4,
Journal of Clinical Epidemiology, VolumeE 133, pp 43-52, 2021 2019
[13] A.S. Thanuja Nishadi, “Predicting Heart Diseases in Logistic [18] J. D. Novakovic, A. Veljovic, S. S. Ilic, Z. Papic, M. Tomovic,
Regression of Machine Learning Algorithms By Python Jupyterlab“, “Evaluation of Classification Models in Machine Learning“, Theory
International Journal of Advanced Research and Publications, and Applications of Mathematics & Computer Science, Volume 7, No
Volume 3, Issue 8, pp 1-6, 2019 1, pp 39 – 46, 2017
[14] S. Ruder, “An Overview of Gradient Descent Optimization [19] E. D. Madyatmadja, M. Aryuni, “Comparative Study of Data Mining
Algorithms”,arXiv Cornell University Machine Learning, Volume Model for Credit Card Application Scoring in Bank”, Journal of
June 15, pp 1-14, 2014 Theoretical and Applied Information Technology, Volume 59, No 2,
[15] N. Deepa, B. Prabadevi, P. K. Maddikunta, T. R. Gadekallu, T. pp. 269–274, 2014
Baker, M. A. Khan, U.Tariq, “An AI‑based intelligent system for
healthcare analysis using Ridge‑Adaline Stochastic Gradient Descent
Classifer“, The Journal of Supercomputing, Volume May, pp 1-20,
2020

16 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Line Follower Smart Trolley System V2 using RFID


Alexander Agung Santoso Alicia Junaedi Dede Muhamad Daffa Mennawi
Gunawan Computer Science Department Computer Science Department Computer Science Department
Computer Science Department School of Computer Science School of Computer Science School of Computer Science
School of Computer Science Bina Nusantara University Bina Nusantara University Bina Nusantara University
Bina Nusantara University Jakarta, Indonesia 11480 Jakarta, Indonesia 11480 Jakarta, Indonesia 11480
Jakarta, Indonesia 11480 [email protected] [email protected] [email protected]
[email protected]
Heri Ngarianto Widodo Budiharto Herman Tolle Muhammad Attamimi
Computer Science Department Computer Science Department Media, Game, and Mobile Department of Electrical
School of Computer Science School of Computer Science Technologies Research Group, Engineering, Institut Teknologi
Bina Nusantara University Bina Nusantara University Master of Computer Science Sepuluh Nopember, Surabaya,
Jakarta, Indonesia 11480 Jakarta, Indonesia 11480 Brawijaya University, Malang, Indonesia
[email protected] [email protected] Indonesia [email protected]
emang@ub. ac .id

Abstract— Shopping in supermarket have become a variety of to make the decision in choosing the products for their
experience for everyone as stores innovate new technologies to groceries like fruits and vegetables [1].
support their buyers or customers. One of important innovations
was a shopping trolley, inspired by wire shopping baskets held Supermarket shopping becomes a competitive
with a folding chair, doubling the quantity of items shoppers can environment for each companies in how they attract their
carry as what they want to buy. The design for shopping trolley customers to eventually come and buy the products or to fulfill
continuously changes and evolves to satisfy customers’ their needs in the store. Supermarket business acquires wide
experience when shopping. In this paper, we focus to develop our assortment of goods from suppliers, then they organize and
previous research on shopping trolley, which will be called as distribute it to retail stores for selling it to local customers [2].
smart trolley V2. The smart trolley V2 will adopt line follower To provide a better in-store shopping experience for their
model and using RFID (Radio-Frequency Identification) for customers, many supermarkets initially focused on offering
localization. The smart trolley V2 is designed to have four more services to customers. Different supermarket focuses on
Mecanum wheels which can go omni-directional moves without different service, such as merchandise value, internal shop
the need to rotate. This design is enabling the trolley to move environment, interaction with customer service staff,
easily from one to another position as desired. The development merchandise variety, presence of and interaction with other
uses Arduino board as microcontroller to process input from an customers, customer in-shop emotions, and the outcomes [3].
infrared sensor and RFID reader to support the robot in taking Ideas and innovations have been found to fulfill the
ways to move to a selected location. Finally, the results of the
satisfaction and to support customers when they are shopping
experiment on smart trolley navigation are presented. The result
is our smart trolley V2 based on Mecanum wheels and RFID can
in supermarkets. The regular idea to support customers to
follow the line and navigate to its destination easily. select and store their chosen products is by giving their
customer a shopping basket or shopping trolley. Many designs
Keywords—smart trolley V2, line follower robot, RFID, have been created to give different features and advantages of
Mecanum wheels the invention for shopping trolleys. In modern days like now,
each supermarket has their own shopping trolley with different
I. INTRODUCTION design from many types of research and inventions to improve
the features and advantages from many different shopping
There are many ways people can do to fulfill their needs
trolley.
for living, one of it is shopping. Innovation and technology
advancement make life easier for people to fulfill their needs The main idea of our research is to see the shopping trolley
and desire with shopping. In the modern days, shopping can be as a mobile robot. Our previous research [4] used common
done everywhere and whenever we want to use online shopping trolley in Indonesia as a model. It has two-motorized
commercial shopping applications. For the simplicity, it wheels and two castor wheels as locomotion. Our smart trolley
mainly become reasons for people to do online shopping for V1 had the main weakness in its navigation because the robot
their groceries nowadays. Besides that, many individuals use movement can not be controlled precisely. Furthermore, we
social commerce for grocery shopping to fulfill their needs in found in our literature study [5] that the pick and plug model,
groceries as they will get in-store shopping experience. Not a which the shopping trolley can be switched between manual
few individuals do their shopping in physical supermarket for mode and automatic mode is the preferred model for the smart
many different reasons. Many people are generally interested trolley in future. Based on the previous finding, we would like
in cooking and shopping as finding new ways to cook by to develop the smart trolley V2, which can solve the navigation
selecting the best ingredients by the way they see and feel the problem and uses the pick and plug model. Our smart trolley
products, as a routine activity or even as a way to spend their V2 can move following the line automatically after we have
time with. Some of them explicitly did not trust someone else designed its movement track. Robot localization have already
been researched for example a multitasking shopping trolley

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

17 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

based on RFID tag and reader [6]. Another researcher [7] used architecture.
six ultrasonic sensors for developing an automated human
guided shopping trolley. Those inventions have improved the
enjoyment and easiness for shopping since the shopping
trolley is the most regular thing that can be found in
supermarkets. Those improvements motivate us to create a
prototype to improve a shopping trolley that can move
following the lines [8] which lead to other ways and directions.
The main purpose of our prototype is to implement more
automated shopping for customers to feel more convenient and
efficient with the smart trolley that can move from one location
to the selected product's location so the customers do not need
to find and navigate the location of the products they are Fig 1. Sketch architecture for smart trolley system V2
looking for by themselves since indoor maps and navigation Our smart trolley must be able to move following the line
are also the problems for customers when they are shopping. which is created with black tape with some RFID tags
The first step of our research is to build a robot with underneath the black tape. Our system will be using infrared
mecanum wheels [9] which can carry a basket on top of the sensor to be able to detect line created with black tape on any
robot. The robot needs to process the location to move to, colored surfaces except black. The analog signals received
received from the Android application and produce a direction from the infrared sensors will be processed so that the robot
to move for the mecanum wheels to reach the desired will not get off the track or the black tape. RFID tag is used
destination. Our main concern is about how the robot can move for robot localization to define the current position of the robot.
following the line smoothly to reach the destination [10]. The The RFID tag reader is attached to the robot and it will read
prototype uses infrared sensor to move following the line. every RFID tag underneath the black tape. The Android
Besides that, we also attached RFID tags below the line to application is developed to send a new desired location for the
indicate location for the robot to read using RFID tag reader. robot to go to. The calculation and the path will be processed
Both infrared sensor for the line tracker and RFID tag reader in the Android application and the pre-determined path will be
will help the robot to move to desired destination. The last sent through the Bluetooth module of the robots, then the
location of the robot will be displayed in the Android robots will move accordingly to the pre-determined path it
application [11]. Arduino Mega 2560 is used as our main received.
microcontroller platform to process input it received. Arduino The path that the robot received will consists of directions
board will be able to read and process inputs from sensors and the robot needs to move along. The robot then will move
modules. In receiving the input from the Android application, following the line until the next RFID tag is read. The robot
we also attached a Bluetooth module to the robot so that we will move until there is no pre-determined path left. Then it
can send and receive data between application and robot. The send the current location through the Bluetooth module to the
program uploaded to the Arduino board is written using C Android and update the current location on the Android
programming language which is processed and compiled by application.
Arduino IDE from an Arduino code file. In next section, we
discuss smart trolley system V2, and is followed by hardware III. HARDWARE DESIGN
design, in section 3. We discuss software design in section 4 Since the new concept of the smart trolley is more
and the experiment results in section 5. Finally in section 6, we advanced compared to the last prototype, a mecanum wheel
concluded our work based on our experiment. robot car is used as the base component to support other
II. SMART TROLLEY SYSTEM V2 components which is used to develop the smart trolley. For the
research purposes, the hardware currently used is smaller
The general concept of our smart trolley system V2 (see version for prototyping and testing purpose only. It can be seen
Fig. 1) is a robot underneath a shopping basket which is in Fig. 2.
designed to read RFID tags. Each RFID tag describes a unique
position to help the robot know its location. Other than that, a
line is also given so that the robot will always stay in the track.
The design for our smart trolley system has to be flexible to
attract the customers need and it can be installed in many
different baskets or trolleys.
The components for the smart trolley robot consists of
batteries, infrared sensor, RFID tag reader, a motor driver
controller, 4 motor dc 12v, and Arduino as the main board
platform. The robot will be attached with 4 mecanum wheels
on each motor dc so that the robot will have high mobility to
move omni-directionally from one to other position without
the needs of rotating. Below is the smart trolley’s sketch Fig 2. Hardware design for smart trolley V2

18 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

(d)

(e)
(a)

(b) (f)
Fig. 3. (a) Mecanum wheel robot components, (b) right rear view of the Fig. 4. (a) Arduino Mega2560, (b) 4Motor9Servo, (c) MFRC522, (d), BFD-
mecanum wheel robot 1000, (e) Limskey 4200mah battery, (f) Bluetooth Receiver HC-05
In Fig. 3. (a), there are four main components indicated The smart trolley V2 uses a mecanum wheel platform as
with numbers on the picture. Those main components of the its base and it needs microcontroller to connect all
mecanum wheel robot car are as follow. components such as sensors and servos. As microcontroller,
1. The robot platform size: 10.24*9.06*2.56inch, which it is used Arduino Mega2560 (Fig. 4 (a)) with the help of
can load weight 10kg 4Motor9 Servo (Fig. 4. (b)) since there are two TB6612 motor
2. 4pcs DC 12V 330rpm motor driver chips are included on it. TB6612 is a driver to help
3. 4pcs mecanum wheel controlling the motors on the mecanum wheel robot car. The
4. 4pcs speed feedback circuit and hall encoder TB6612 driver can control two motors up to 1.2A of constant
current. Meanwhile, the smart trolley will be using MFRC522
(Fig. 4. (c)) to scan the RFID tag for location-based
information and BFD-1000 (Fig. 4. (d)) to figure out which
path the smart trolley need to follow. There are 2 pieces of
BFD-1000 which is used and each of it is for the vertical and
horizontal axis. To give power supply for the smart trolley,
(a) the Limskey 4200mah battery (Fig. 4. (e)) is used. Finally,
Bluetooth module HC-05 (Fig.4. (f)) is used to receive
movement direction from the android application.
IV. SOFTWARE DESIGN
The smart trolley application is divided into two separate
applications, one is for the user to navigate the smart trolley
while the other one is for the smart trolley itself. These are the
(b) main algorithms on the smart trolley applications which will
also be explained below.
A. Controlling Mecanum Wheels
The first step to create the smart trolley is to have the ability
to control the mecanum wheels. Each wheel rotation will affect
the direction and rotation. Fig. 6 is described as reference to
control the mecanum wheels and how they will affect the smart
trolley movement. The smart trolley then could go omni-
(c)

19 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

directional moves without the need of having to rotate the


robot.
B. Line Follower
The smart trolley needs the ability to follow a path to reach
their destination. Using BFD-1000 sensor, the sensor will
detect the black and white line. With that in mind, by creating
a path using the black line tape, the smart trolley then will use
the difference from the black and white line to only follow the
black line. This can also be used to adjust the position of the
smart trolley so that it will always be in the center of the line.
C. RFID Scanner
A location positioning is needed for the smart trolley to Fig. 6. Mecanum wheel movement
know where it currently is. By using a RFID scanner
(MFRC522), the robot localization can be achieved without The experiment used the track (see Fig. 8) which is
difficulty. RFID can be placed on certain the robot line, so that attached with RFID tags (see Fig. 7) on each side of the track
the smart trolley V2 can know its own position. Therefore, the and the center of junctions for testing the robot movement.
NFC tag is put on every junction as the unique id to determine Below are pictures of the track and RFID tag used.
where it currently is and choose the next path that should be
taken when it meets the junction. This can also be used to make
the smart trolley to stop moving when it has reached the
destination. Finally, the path planning of the smart trolley can
be perceived as an extension of robot localization.
D. Android Application
Android application (see Fig.5) will be used to give
destinations to the smart trolley. To lighten the burden on
Arduino, the android application will also be used to
determine and calculate the shortest path to its destination
Fig. 7. RFID tag
using A* shortest path algorithm. And so, the smart trolley
will be given a pre-determined path to reach their destinations.

Fig. 8. experiment track

The speed configuration of the robot is set as follow.


TABLE I. SPEED CONFIGURATION OF MECANUM WHELLS

Direction Wheel
Left Right Left Right
Front Front Rear Rear
Fig. 5 Software design of smart trolley V2 Forward 66 63 66 63
Backward 71 64.5 71 64.5
V. EXPERIMENT RESULT Left 67 67 70 68
The movement of the mecanum wheel robot [12] is
Right 63.5 63.5 62 70
affected by each individual wheel direction and velocity. The
combination of all these forces produces the desired direction.
The robot movement is evaluated by firstly giving a
The robot will be able to move forward, backward, rotate on
destination from the Android application. It will then move
the spot, etc. as shown in Fig. 6 below.
according to the pre-determined path received from the A*
shortest path algorithm which is sent by the Android
application. The robot will move following the line and when

20 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

it scan a tag, it will send its new location to the Android ACKNOWLEDGMENTS
application and continue to move to the next position if it still
This work is supported by Directorate General of Research
has not arrived at the destination.
and Development Strengthening, Indonesian Ministry of
Research, Technology, and Higher Education, as a part of
Konsorsium Riset Unggulan Perguruan Tinggi (KRUPT)
Research Grant to Bina Nusantara University titled “Smart
Trolley in Smart Mart” with contract number:
234/E4.1/AK.04.PT/2021, 3582/LL3/KR/2021 with contract
date: 12 July 2021.
VII. REFERENCES

[1] P. Tukkinen and J. Lindqvist, "Understanding Motivations for


Using Grocery Shopping Applications," IEEE Pervasive
Computing, vol. 14, no. 4, pp. 38-44, 2015.
[2] F. Steeneken and D. Ackley, "A Complete Model of the
Supermarket Business," BPTrends, 2012.
[3] N. S. Terblanche, "Revisiting the supermarket in-store
customer shopping experience," Journal of Retailing and
Consumer Services, vol. 40, pp. 48-59, 2018.
[4] A. Gunawan, V. Stevanus, A. Farley, H. Ngarianto, W
Budiharto and H. Tolle, "Development of smart trolley system
based on android smartphone sensors," Procedia Computer
Science , vol. 157, pp. 629-637, 2019.
[5] H. Purwantono, A. Gunawan, H. Tolle, M. Attamimi and W.
Fig. 9. robot testing
Budiharto, "A literature review: Feasibility Study of
technology to improve shopping experience," Procedia
Besides testing to move the robot to a specific location, the Computer Science, vol. 179, pp. 468-479, 2020.
robot is tested to move back to its initial position after [6] S. Kamble, S. Meshram, R. Thokal and R. Gakre, "Developing
reaching the destination. In the experiment, all designed tasks a Multitasking Shopping Trolley," International Journal of
can be executed well by our smart trolley V2. Soft Computing and Engineering, vol. 3, no. 6, pp. 179-183,
2014.
VI. CONCLUSION [7] Y. L. Ng, C. S. Lim, K. A. Danapalasingam, M. L. P. Tan and
The research focuses on developing robot system that C. W. Tan, "Automatic Human Guided Shopping Trolley with
follows line which is implemented on the smart trolley. With Smart Shopping System," Jurnal Teknologi, vol. 73, no. 3, pp.
the help of this line following robot, the supermarket owners 49-56, 2015.
only need to purchase the robot and they can easily put the [8] K. M. Hasan, Abdullah-Al-Nahid and A. A. Mamun,
shopping trolley on top of the robot. On the customer side, "Implementation of autonomous line follower robot," in
International Conference on Informatics, Electronics &
they can enjoy shopping without the need of pushing the
Vision, 2012.
shopping trolley by themselves. The customers can also
access the location of items that they would like to buy in the [9] Y. Jia, X. Song and S. S. Xu, "Modeling and motion analysis
of four-Mecanum wheel omni-directional mobile platform," in
supermarket by using the Android application which is
International Automatic Control Conference (CACS), 2013.
connected via Bluetooth connection to the smart trolley and
then the trolley will move automatically to the designated [10] J. Jose, R. R. Illiyas and . R. S. John, "Line Follower Smart
Cart using RFID Technology," International Research
destination. This will also help to guide the customers so that Journal of Engineering and Technology, vol. 7, no. 7, pp.
they will not need to search manually for the location of the 1906-1910, 2020.
item they would like to buy. The RFID technology is also used
[11] H. S. Bedi, N. Goyal, S. Kumar and A. Gupta, "Smart Trolley
to track the location of the smart trolley. using Smart Phone and Arduino," Journal of Electrical &
There are still improvements that can be made for this Electronic Systems, vol. 6, no. 2, 2017.
smart trolley project. First, a more advanced algorithm so that [12] F. Adascalitei and I. Doroftei, "Practical Applications for
the smart trolley will be able to move in a crowded situation. Mobile Robots Based on Mecanum Wheels - A Systematic
Second, an item scanner so that the customers can know the Survey," The Romanian Review Precision Mechanics, Optics
total of the current price. & Mechatronics, vol. 40, pp. 21-29, 2011.

21 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

An Efficient System to Collect Data for AI Training


on Multi-Category Object Counting Task
Brian Haessel* Munif Faisol Abdul Rahman* Steven Andry*
Computer Science Department, School of Computer Science Department, School of Computer Science Department, School of
Computer Science Computer Science Computer Science
Bina Nusantara University Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia, 11480 Jakarta, Indonesia, 11480 Jakarta, Indonesia, 11480
[email protected] [email protected] [email protected]

Tjeng Wawan Cenggoro


Computer Science Department, School of
Computer Science
Bioinformatics and Data Science Research
Center
Bina Nusantara University
Jakarta, Indonesia, 11480
[email protected]

Abstract— This study focused on the problem of collecting In this study, we are interested in developing a data
data to train AI for a multi-category object counting task, which collection system to obtain training data for a task of multi-
is like a standard object counting task, where we need to count category object counting. This particular task is interested to
the number of a particular type of object, for instance, counting be studied because the currently available dataset for a
how many people in an image. In multi-category object counting task provides only a single category, for instance,
counting, we need to count more than one type of object by people [12], [13] and vehicles [14]. In practice, however, a
keeping track of the number of each type of object. Although in computer vision system needs to be able to count the multi-
the real object counting case, we often find that AI needs to category objects in an image. An example real case in
count objects with multiple categories, the dataset for that
agriculture is to count crops grouped by their age. This is
particular task is not publicly available. Meanwhile, to have a
robust AI for this task, it needs to be trained with a massive
necessary because the farmers need to estimate the future yield
amount of data. Therefore, in this study, we developed a system in a certain time range. In summary, the contribution of this
to efficiently facilitate massive data collection for multi-category study is to develop a data collection system with a suitable
object counting. This aim was achieved by a careful design of user experience for collecting multi-category object counting
the user experience in the system. The system has been proved data. The system is called Innotate for the rest of this paper.
to be useful via Technology Acceptance Model (TAM).
II. LITERATURE REVIEW
Keywords— data collection, object counting, multi-category,
annotation system Crowdsourcing for massive data collection is typically
implemented with Amazon Mechanical Turk (MTurk), which
allows researchers or developers to develop a data collection
I. INTRODUCTION system with minimum coding. Many renowned computer
In this modern society, Artificial Intelligence (AI) has vision datasets were collected in this fashion, such as
been beneficial for humans, whether in helping our daily life ImageNet [15], [16] and Microsoft COCO [17]. However, for
or work. The current AI can be trained to achieve human-level the reason of confidentiality, it is more practical to develop the
performance on specific tasks [1]–[6], which tend to be system from scratch. An example of a from-scratch-
laborious tasks for humans. This trait is helpful to let humans development data collection system for a unique task is
complete more creative tasks by letting AI perform laborious Chimera [18]. It was developed for a product categorization
tasks. For example, in health care, AI can effectively assist task in Walmart.
physicians for a faster diagnosis, thus provides them more
time to check any unreported health issue the patients might Another reason for developing a data collection system
have [7]. Not only in health care, but AI is also beneficial in from scratch is the fact that MTurk cannot be implemented in
other sectors such as agriculture [8]. a certain country, for instance, Indonesia. In many studies that
tried to collect massive data in Indonesia, they were initiated
The only caveat in regards to the benefits of AI is the with the development of a data collection system for the task
requirement of massive data to achieve an applicable level of in interest. Example cases are the data collection for object
performance [9]. The required data is at the scale of hundreds counting [9], people counting [12], [19], and question-
of millions of data points [10], which hinders the possibility answering in the context of hospitality [11].
of collecting the data manually. To collect data at this massive
scale, the common practice is to use a strategy based on III. SYSTEM DEVELOPMENT
crowdsourcing [9]. This strategy enables massive data
collection to be completed in a reasonable timeframe [11], The development of the system was performed in four
given that it is facilitated with a data collection system with a steps: (1) requirement gathering, (2) creating the system
well-designed user experience. design, (3) developing the system, and (4) evaluating the

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

22 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

system. These steps are illustrated in Figure 1. As the first step,


we gathered requirements from an industry that intends to
develop an application with AI capability for multi-category
object counting.

Requirement gathering

Creating the system design

Developing the system

Evaluating the system

Fig. 1. System development steps

To design the system in the second step, we used use case,


activity, and class diagram from Unified Modeling Language Fig. 3. Activity diagram
(UML). In addition, we describe the database with Entity
Relationship Diagram (ERD). The use case diagram of the The class diagram in Figure 3 further detailed the Innotate
complete Innotate system is presented in Figure 1. It includes system, specifically the structure of its classes and the
the basic Information System (IS) use cases for account relationship between them. Innotate has five classes that
management (i.e. Sign Up, Login, and Change Password), use support the operation of the Annotate class, in which the core
cases for annotation management (i.e. Delete Annotation, function of the system lies. The Home, User, and Profile
Load Image, and Add Label), and the Annotation use case that classes support the interface functions while the Settings and
is the core of the Innotation system. Result classes support the annotation functions. The Settings
class administers the configuration of the annotation process
For the brevity of this paper, only the Annotate use case is interface and the Result class provides the interface for the
detailed with an activity diagram in this paper, which is shown users to evaluate their annotation quality.
in Figure 2. The novelty of this system lies in the Choose
Label and Choose Annotation Tool activity. The main interest
of this study to have a data collection system for the multi-
object categories counting task is addressed by the Choose
Label activity. It allows the users to select a suitable
category/label when annotating the object.

Fig. 4. Class diagram


Fig. 2. Use case diagram

23 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

To store the annotation data, we utilize a relational whether the system is useful for the potential users. The
database with the structures of the tables and their relationship evaluation was done with Technology Acceptance Model
as depicted in Figure 4. The main table that stores the (TAM).
annotation is the coordinates table. The detail of the
information in this table is delegated to the other three tables: IV. RESULTS AND DISCUSSION
canvas for the size of the annotated images, label for the
description of the category, and current cursor for the details In this section, the result of the implemented annotation
of the annotator/user. interface is delineated in subsection IV.A. The quantitative
assessment of the functionality of the Innotate system is given
in subsection IV.B.

A. User Experience Qualitative Evaluation


We present the annotation interface for the rectangle and
dot annotation respectively in Figures 5 and 6. The interface
of the Choose Label activity is put at the top left box below
the menu bar. With this interface, the users can quickly shift
between categories, which results in a faster annotation
process. Each category is also displayed in a different color,
thus ease the users to monitor their annotation in the current
image. If the users’ mouse pointer hovers a rectangle or dot
Fig. 5. Entity relationship diagram annotation, a thrash bin icon is displayed, which can be
clicked to remove the hovered annotation. This provides an
In the third step, we developed the system based on the intuitive interface if the users want to remove a certain
provided design. Lastly, in the fourth step, we evaluated annotation.

Fig. 6. Annotation page with rectangle tool

24 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Fig. 7. Annotation page with dot tool

B. User Experience Quantitative Evaluation with V. CONCLUSION


Technology Acceptance Model
We have presented a data collection system called
To quantitatively assess the functionality of Innotate, we Innotate, which enables an efficient collection of training data
surveyed 30 potential users with a questionnaire that captures for a multi-category object counting task. The system allows
the factors considered in TAM, which are Perceived Ease of the users to efficiently annotate images for multi-category
Use (PEU), Perceived Usefulness (PU), Attitude toward object counting with the switchable annotation tool and
Using (AU), and Behavioral Intention to Use (BIU). interactive layout to select different categories. Ultimately,
Afterward, we run the TAM hypothesis test, which is depicted Innotate has been proven to be useful via the TAM hypothesis
in Figure 7, to prove whether the users’ intention to use test. The result of this study will accelerate the adoption of AI
Innotate in the future. The beginning and the end of the arrow for multi-category object detection by removing a bottleneck
in Figure 7 denotes the independent and dependent variable, in data collection. It should be noted that this study only dealt
respectively. with data collection, which is only a step in the whole AI
From the survey, we found that the users generally have a development pipeline. Thus, future works should handle the
high intention to use Innotate in the future, based on the other steps in a typical AI development pipeline such as model
measured BIU with a 1 to 5 scale. The average BIU rating of training and evaluation.
the users is 4, with the detail as presented in Table 1.
REFERENCES

[1] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for
Image Recognition,” Proc. IEEE Conf. Comput. Vis. pattern Recognit.,
pp. 770–778, 2016.
[2] A. Esteva et al., “Dermatologist-level classification of skin cancer with
deep neural networks,” Nature, vol. 542, no. 7639, pp. 115–118, 2017.
[3] P. Rajpurkar, A. Y. Hannun, M. Haghpanahi, C. Bourn, and A. Y. Ng,
“Cardiologist-Level Arrhythmia Detection with Convolutional Neural
Networks,” Jul. 2017.
[4] P. Rajpurkar et al., “Deep learning for chest radiograph diagnosis: A
Fig. 8. TAM hypothesis test retrospective comparison of the CheXNeXt algorithm to practicing
radiologists,” PLoS Med., vol. 15, no. 11, p. e1002686, 2018.
[5] D. Silver et al., “A general reinforcement learning algorithm that
TABLE I. THE DETAIL OF THE USERS’ BIU masters chess, shogi, and Go through self-play,” Science (80-. )., vol.
362, no. 6419, pp. 1140–1144, 2018.
Rating % of Respondents
[6] A. P. Badia et al., “Agent57: Outperforming the atari human
1 0.0% benchmark,” in International Conference on Machine Learning, 2020,
pp. 507–517.
2 0.0% [7] M. Nadimpalli, “Artificial intelligence risks and benefits,” Int. J. Innov.
3 27.5% Res. Sci. Eng. Technol., vol. 6, no. 6, 2017.
[8] V. Dharmaraj and C. Vijayanand, “Artificial Intelligence (AI) in
4 43.3% Agriculture,” Int. J. Curr. Microbiol. Appl. Sci., vol. 7, no. 12, pp.
2122–2128, 2018.
5 29.2%
[9] T. W. Cenggoro, F. Tanzil, A. H. Aslamiah, E. K. Karuppiah, and B.
Pardamean, “Crowdsourcing annotation system of object counting
dataset for deep learning algorithm,” IOP Conf. Ser. Earth Environ.
Sci., vol. 195, no. 1, 2018.
The result of the hypothesis test is presented in Table II.
[10] OECD, Artificial Intelligence in Society. Paris: OECD, 2019.
We found that all hypotheses between TAM factors were
[11] R. Levannoza et al., “Enabling a massive data collection for hotel
accepted, thus proves the high intention to use Innotate in the receptionist chatbot using a crowdsourcing information system,” in
future. Proceedings of 2020 International Conference on Information
Management and Technology, ICIMTech 2020, 2020.
TABLE II. HYPOTHESIS TEST RESULT [12] B. Pardamean, H. H. Muljo, F. Abid, Herman, A. Susanto, and T. W.
Cenggoro, “RHC: A Dataset for In-Room and Out-Room Human
Independent Dependent Odds Counting,” Procedia Comput. Sci., vol. 179, pp. 33–39, 2021.
R2 p-value
Variable Variable Ratio [13] B. Pardamean, H. H. Muljo, T. W. Cenggoro, B. J. Chandra, and R.
PU AU 0.5714 0.2869 < 0.001* Rahutomo, “Using transfer learning for smart building management
system,” J. Big Data, vol. 6, no. 1, 2019.
PEU AU 0.3191 0.0983 < 0.001* [14] R. Guerrero-Gómez-Olmedo, B. Torre-Jiménez, R. López-Sastre, S.
Maldonado-Bascón, and D. Oñoro-Rubio, “Extremely Overlapping
PU BIU 0.4375 0.0929 < 0.001* Vehicle Counting,” in Iberian Conference on Pattern Recognition and
PEU BIU 0.6240 0.2075 < 0.001* Image Analysis, 2015, vol. 2, pp. 423–431.
[15] J. Deng, W. Dong, R. Socher, L. Li, Kai Li, and Li Fei-Fei, “ImageNet:
AU BIU 0.6535 0.2359 < 0.001* A large-scale hierarchical image database,” in 2009 IEEE Conference
a.
on Computer Vision and Pattern Recognition, 2009, pp. 248–255.
Significant p-value at α=0.05 (hypothesis accepted)
[16] O. Russakovsky et al., “ImageNet Large Scale Visual Recognition
Challenge,” Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, 2015.
[17] T. Lin et al., “Microsoft COCO: Common Objects in Context,” in
European Conference on Computer Vision 2014, 2014, pp. 740–755.

25 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

[18] C. Sun, N. Rampalli, F. Yang, and A. Doan, “Chimera: Large-Scale


Classification using Machine Learning, Rules, and Crowdsourcing,”
Proc. VLDB Endow., vol. 7, no. 13, pp. 1529–1540, Aug. 2014.
[19] B. Pardamean, T. W. Cenggoro, B. J. Chandra, and Rahutomo, “Data
Annotation System for Intelligent Energy Conservator in Smart
Building,” IOP Conf. Ser. Earth Environ. Sci., 2020.

26 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

The Influence of UI UX Design to Number of Users


Between ‘Line’ and ‘Whatsapp’
Sartika Devina Angelica Nadia Vicky Chen Alexander A S Gunawan
Computer Science Department Computer Science Department Computer Science Department Computer Science Department
School of Computer Science School of Computer Science School of Computer Science School of Computer Science
Bina Nusantara University Bina Nusantara University Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia 11480 Jakarta, Indonesia 11480 Jakarta, Indonesia 11480 Jakarta, Indonesia 11480
[email protected] [email protected] [email protected] [email protected]

Abstract — In this modern era, many applications Especially in modern times, almost everyone uses
have been used to communicate long distances, both for smartphones, especially for communication. Therefore,
telephone and chat. UI / UX design in applications instant messaging apps on mobile also continue to grow, it
affects users, as most of the younger generation prefers can be seen from the increasing number of instant
an attractive, innovative, and many-choice design, while messaging apps circulating in Indonesia, for example, such
most of the older generation prefers a simple design as Whatsapp and LINE application[3]. These two apps
appearance. In this research, we discuss the effect of UI / compete to exert strategy, strength, and ingenuity to provide
UX design on the number of uses between the LINE and the best service to users. Whatsapp and Line certainly not
Whatsapp applications. Then we conducted a survey only compete to make designs that are attractive to users but
with short questions using Google Forms, where 44 also compete in making features and user comfort when
respondents participated in our research, 34 (77.3%) using an app, which is a consideration for users in choosing
respondents out of 44 respondents used Whatsapp more to use an app[4].
often. Of 34 respondents, 23 (67.6%) of respondents
chose a score of 5 which stated that the WhatsApp In this study we use quantitative methods, to analyze the
application was easy to learn and easy to use, 20 (58.8%) effect of UI / UX design on the number of users between the
respondents were comfortable using the WhatsApp line app and Whatsapp. We conduct analysis by collecting
application by choosing a score of 5, 21 (61.8%) data from surveys or distributing questionnaires online.
respondents did not experience difficulties in using the
features on the Whatsapp application by choosing a II. LITERATURE REVIEW
score of 1 (not confusing), 14 (41.2%) respondents like
the appearance and color selection on the Whatsapp In this modern era, almost everyone has used a mobile
application with a score range of 4-5. The results of the device in their daily life, this is also indicated by the
survey show that a good UI / UX design has an effect growing number of mobile apps, for example instant
that can increase the number of users of the application. messaging apps which are increasingly on the market [4].
Keywords—User interface, User experience, Design, User, Text Instant messaging is a type of communication medium
Whatsapp, Line (key words) that provides direct communication even in different
places[10]. Instant text messages can contain a set of
I. INTRODUCTION important information such as a person's tasks and schedules
[11]. Instant messaging apps will help users to communicate
User Experience (UX) is a process carried out by the UX more easily and also build relationships with other people
Designer by approaching the user, thereby creating a [4]. Mobile device users also have several considerations
product design that suits the needs and desires of the user . when choosing to use an app, the UI / UX design of an app
User Interface (UI) is a process that aims to create a visual will later determine whether the user will be loyal to the app
appearance of an app or website. uI Designer will focus on created or even choose to use another app [5].
the beauty of a website or app, such as the use of the right
text, consistent color selection, well-placed icons, UI / UX design has a meaning as a visual display that
animations, buttons, and others [1]. includes physical and communicative aspects [7]. The ISO
9241-210 standard (ISO 9241-210, 2010) defines UX as 'a
According to a McKinsey report entitled "The Business person's perception and response resulting from the
Value of Design" at the end of 2018, it is known that anticipated use and/or use of a product, system or service'. It
companies with good design quality will have profits up to considers that UX is 'includes all users’ emotions, beliefs,
200% higher than industry benchmarks [2]. Based on this preferences, perceptions, physical and psychological
report, it can be said that the UI / UX design of the app responses, behaviours and achievements that occur before,
affects the number of users of the app. Therefore, this makes during, and after use” [12]. In addition it shows that
app companies or startups competing to improve the quality aesthetics can play an important role in product and system
of the products they produce, both in terms of appearance design[13]. User Interface(UI), \'physical representations
and in terms of functionality. and procedures provided for viewing and interacting with
system functionality\' [6, p. 80], are important components

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

27 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

of any app, as the UI directly affects user-app interaction, From the results of the survey, we got Indonesian
which then determines a concrete user experience with the respondents from various backgrounds such as students,
app[14]. Having a bad interface is frustrating for users and office workers, and others and there are 44 respondents :
will affect productivity so that competitors may have better • 40 respondents have an age range of 18 - 30 years
systems [15]. Evaluating is one way to detect usability • 1 respondent is less than 18 years old
issues of an app, so that it can produce the user experience • 3 respondents are over 30 years old.
expected by the user [1]. In addition to evaluating, designers
must also keep up to date with the latest trends, have an
emotional connection, where a UX designer must think
about what the user needs and wants to improve the user
experience [8] [9]. Because there are many things that must
be considered in making a user interface design and user
experience, it is important to conduct an analysis first so that
it can adjust to user needs. Feedback from users will also be
very useful because it can be a guideline for developers and
designers of an app in order to make the app even better [2].
However, mobile app developers also feel that it is difficult
to determine what the user needs and it is also difficult to Figure 1 Answer for Question 2.
get feedback that can help developers make an app even
better. There are 3 key elements that can make the app LINE and Whatsapp apps, and no respondents only have
develop, namely consistent design, user friendly or easy to Line app, as shown below :
use, and the latter can make the user comfortable when
using the app [4].

Another thing that UI / UX designers need to remember


is that apps are not only used by young people, but are also
used by all ages. Based on 2010, Vincent and Velkoff
predict that by 2050, the older population in the United
States will be more than 88 million. currently, the elderly
constitute a large proportion of the world's population, and
there is no denying that the elderly population is likely to
increase over the next three decades [6] . However, what
developers like to forget is that sometimes they only focus
on making apps that follow current trends, but forget to Figure 2 Answer for Question 3.
make an app design that adapts to parents, so they often find For frequently used apps, 10 respondents often use Line,
it difficult when you want to use an existing app, which and 34 respondents often use Whatsapp, as shown below :
eventually creates a gap between old and young users. There
are still very few instant messaging apps that are made
suitable for parents, so they are not sufficient to eliminate
the gap that has been formed [3] .

III. METHOD AND SURVEY


To find out the influence of UI/UX design on the
number of uses of the Line and Whatsapp apps, a survey
was conducted containing a short question, and for the
collection of data using Quantitative method through
Google Form.
The survey was then shared through a study group chat, and Figure 3 Answer for Question 4.
shared personally with other friends, for 5 days, from May After answering the last question on the first page, the
21 to 25, 2021. respondent will be taken to the next page according to the
respondent's answer to the 4th question "Which chat app is
Furthermore, on the first page there are 4 questions about often used?". Respondents who frequently use the Whatsapp
personal info, before going to the next page, as follows: app instead of Line will be taken to the Whatsapp page, and
1. Full name? vice versa.
2. Age?
3. What chat apps do you have on your Handphone? Here is the question on Whatsapp page :
4. What chat app do you use often? 1. Do you think Whatsapp app is easy to learn and easy to
use?
2. Have you ever felt confused when using certain features
in Whatsapp app?
3. .Do you feel helped by the features available?

28 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

4. How does the design look on Whatsapp app? B. Question 2 – Have you ever felt confused when using
5. What do you think about the color selection on certain features in line or Whatsapp apps?
Whatsapp app? From respondents who use the Whatsapp app more often,
the following results are obtained :
The following are the questions on the LINE page :
1. Do you think the LINE app is easy to learn and easy to
use?
2. Have you ever felt confused when using certain features
in the LINE app?
3. Do you feel helped by the features available?
4. How does the design look on the LINE app?
5. What do you think about the color selection on the LINE
app?
Figure 5.1 Answer for Question 2
A. Question 1 – Do you think the LINE or Whatsapp app is
From respondents who use the LINE app more often, the
easy to learn and easy to use?? following results are obtained :
From respondents who use Whatsapp more often, the results
can be as follows :

Figure 5.2 Answer for Question 2


Figure 4.1 Answer for Question 1
For the above question there is a score of 1 - 5, score 1
means that respondents never feel confused when using
From respondents who use LINE more often, the following certain features in the LINE app or whatsapp, while score 5
results can be obtained : means that respondents have felt confused when using
certain features in the LINE or Whatsapp apps.

In the data shown in the picture above, it can be seen that 21


(61.8%) respondents (choosing score 1), 10 (29.4%)
respondents (choosing score 2) feel that the Whatsapp
application, its features are easier to understand. But there
are 3 (8.8%) respondents who choose score 3 from the
Whatsapp application, which means that respondents
experience confusion when using certain features.
Figure 4.2 Answer for Question 1 Furthermore, on the LINE application, there were 5 (50%)
respondents (choosing score 1), 3 (30%) respondents
For the above question, there is a score of 1 - 5 , score 1 (choosing score 2) felt that the LINE application, its features
means that LINE or Whatsapp app is difficult to learn and were easier to understand. But there is 1 (10%) respondent
difficult to use, and score 5 means LINE or Whatsapp app, who chose score 3 and 1 (10%) respondent who chose score
very easy to understand and learn. 2, this means that the respondent has difficulty with a
feature found in the LINE application.
From the results obtained, 23 (67.6%) respondents answered
5 that the WhatsApp application is very easy to learn and C. Question 3 – Do you feel helped by the features
use. Then as many as 11 (32.4%) respondents answered that available?
the WhatsApp application was quite easy to learn and use. From respondents who use Whatsapp app more often, the
It is different with the LINE application results, which is following results are obtained :
that 9 (90%) respondents feel that the applications they use
more often are easier to learn and easier to understand.
However, in the results of LINE user data, there are 1 (10%)
respondents who have difficulty using the LINE application,
because of the many buttons in the application, but
respondents are forced to use LINE, because most of the
college groups use the LINE application.

Figure 6.1 Answer for Question 3

29 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

From respondents who use the LINE app more often, the From the data above, respondents' answers are very diverse
following results are obtained: on the Whatsapp application, most respondents choose a
score of 4 which is as much as 14 (41.2%) While 3
respondents gave a score of 2 (8.8%) for the look of
Whatsapp design, they think the look of Whatsapp design is
less diverse, and does not have many themes.
Meanwhile, from the results of the data above, 7 (70%)
respondents gave a score of 5, and 3 (30%) respondents
gave a score of 4, this shows that LINE has a more attractive
design appearance.

Figure 6.2 Answer for Question 3 E. Question 5 – What do you think about the color selection
For the above question, there is a score of 1 – 5, score 1 on the LINE or Whatsapp app?
means that respondents are not helped by the features From respondents who use the Whatsapp app more often
contained in the LINE or Whatsapp app, while score 5 about Whatsapp design, the following results are obtained:
means that the respondent feels helped in the LINE or
Whatsapp app.
From the data above, there are 17 (50%) respondents
(choosing score 5), 13 (38.2%) respondents (choosing score
4), meaning respondents feel the features that exist in
whatsapp applications can help their daily life. While there
are 3 (8.8%) respondents (have a score of 3), and 1 (2.9%)
respondents (choose score 1), this means that respondents
do not feel helped by the features in the Whatsapp
application..
Furthermore, there are 8 (80%) respondents who choose Figure 8.1 Answer for Question 5
score 5, and 2 (20%) respondents who choose score 4, this
means that respondents feel helped by the features provided From respondents who use the LINE app more often about
by LINE. LINE design, the following results are obtained :

D. Question 4 – How does the design look on LINE or


Whatsapp app?
From respondents who use the Whatsapp app more often,
the following results are obtained :

Figure 8.2 Answer for Question 5

For the question above, there is a score of 1 - 5, score 1


means the color selection on the LINE or Whatsapp
application is not suitable, while score 5 means that the
color selection on the LINE or Whatsapp application is very
Figure 7.1 Answer for Question 4 suitable .
From the data above, 14 (41.2%) respondents chose a score
From respondents who use the LINE app more often, the of 5, and 14 (41.2%) respondents chose a score of 4, this
following results are obtained : means that respondents felt the color selection on the
Whatsapp application was in place, and 6 (17.6%)
respondents chose a score of 3, which means that
respondents felt the color selection in the Whatsapp
application, between appropriate and unsuitable. While from
the data above, 7 (70%) respondents gave a score of 5, and 3
(30%) respondents gave a score of 4, this indicates that
LINE has the appropriate color selection.

Figure 7.2 Answer for Question 4


Based on the data that has been generated from the
questionnaire, we get an assessment from the user on the
For the above questions, there is a score of 1 - 5, score 1 UI/UX display of LINE and Whatsapp apps, where
means the design on the LINE or Whatsapp app is not attractive design will be preferred by young people, while
interesting, while score 5 means the design on the LINE or simple design is preferred by the older generation. In
Whatsapp app is very interesting. addition, both the younger and older generations prefer
complete features to support their daily activities and easy-

30 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

to-learn features. The use of consistent colors also certainly interface, usability, and app design. From the results of this
affects the user in using an app, this is because color as one study, we can conclude that a good UI / UX design has an
of the most important components in the creation of a effect that can increase the number of users of the app.
UI/UX design, with the right color selection will make an
app even more interesting. REFERENCES
[1] Caro-Alvaro,Sergio, Garcia-Lopez,Eva, Garcia-Cabot,Antonio, de-
IV. ANALYSIS AND DISCUSSION Marcos ,Luis, Martinez-Herraiz, Jose- Javier.(2018). Identifying
Based on the results of the survey conducted, there were Usability Issues in Instant Messaging Apps on iOS and Android
Platforms. Mobile Information Systems,2018,1-
44 respondents consisting of various age ranges. The results 19.doi:https://doi.org/10.1155/2018/2056290.
obtained were mostly answered by respondents aged 18-30 [2] Mohit Jain, Pratyush Kumar, Ramachandra Kota, and Shwetak N.
years, namely as much as 90.9%. From the survey we Patel. 2018. Evaluating and Informing the Design of Chatbots. In
conducted, (77.3%) respondents had Whatsapp and Line Proceedings of the 2018 Designing Interactive Systems Conference
(DIS '18). Association for Computing Machinery, New York, NY,
apps on their cellphones and 22.7% only had the Whatsapp USA, 895–906. doi:https://doi.org/10.1145/3196709.3196735
app. After conducting a further survey, we obtained data [3] Anam,Rajibul & Abid,Abdelouahab, Usability Study of Smart Phone
that 34 out of 44 respondents used Whatsapp more often. Messaging for Elderly and Low-literate Users. International Journal of
Then, we asked respondents about the ease of using the Advanced Computer Science and Apps(IJACSA).11(3),108-115.
doi:http://dx.doi.org/10.14569/IJACSA.2020.0110313.
Whatsapp app, 23 (67.6%) of 34 respondents stated that [4] Caro-Alvaro,Sergio, Garcia-Lopez,Eva, Garcia-Cabot,Antonio, de-
Whatsapp is easier to use and has features that are easy to Marcos ,Luis, Martinez-Herraiz, Jose-Javier.(2017). A Systematic
understand. Besides that, the respondents did not feel Evaluation of Mobile Apps for Instant Messaging on iOS Devices.
confused when using the features of the Whatsapp app and Mobile Information Systems,2017,1-17.
doi:https://doi.org/10.1155/2017/1294193.
felt helped by the features provided. In terms of appearance, [5] Naeem, Muhammad & Rafiq, Abid. (2019). Usability Analysis of
most of the respondents as many as 14 (41.2%) out of 34 Instant Messaging Platforms in a Mobile Phone Environment using
people answered that the Whatsapp app was attractive and Heuristics Evaluation. International Journal of Scientific &
used the right color selection. We accept all answers from Engineering Research,7(10),135-138.
[6] Bong, Way Kiat & Chen, Weiqin. (2015).Mobile Instant Messaging for
respondents about usability, design, and features on the
the Elderly. Procedia Computer Science,67,28-
Whatsapp app, where on average the dominant respondent 37.doi:10.1016/j.procs.2015.09.246.
answers in the range of 4-5 points. From these results, it [7] Su, D.K., & Yee, V.S.(2008). A Comparative Evaluation of User
shows that respondents use and like Whatsapp more often Preferences for Mobile Chat Usable Interface.Asia-Pasific Conference
because the UI / UX display is simple. on Computer Human Interaction (pp. 258-259). Seoul: Springer
Science & Business Media. https://doi.org/10.1007/978-3-540-70585-
7_29
V. CONCLUSION
[8] Hu, Xuanyi.(2018). Designing Emotional Connections In Instant
Messaging Tools[Master thesis, University of Cincinnati].
Chat apps make digital communication needs easier, https://etd.ohiolink.edu/apexprod/rws_etd/send_file/send?accession=uc
thus making chat apps more numerous and varied. With so in1522342539008247&disposition=inline
[9] Holzer, A., Vozniuk, A., Govaerts, S., Bloch, H., Benedetto, A., &
many factors that can also influence the user in choosing the Gillet, D.(2015). Uncomfortable yet Fun Messaging with Chachachat.
app to use, one of which is the UI / UX design of the app. CHI PLAY '15: The annual symposium on Computer-Human
For that, we wanted to know the effect of UI / UX design on Interaction in Play (pp. 547-552 ). London:Association for Computing
the number of Whatsapp and LINE users by conducting Machinery. https://dl.acm.org/doi/10.1145/2793107.2810296
[10] Youn, J.J., Seo, Y.H., & Oh, M.S.(2017). A Study on UI Design of
online surveys using Google Form and getting 44 Social Networking Service. Journal of information and communication
respondents. Of 44 respondents divided into 3 age ranges, as convergence engineering, 104-111.
many as 1 respondent aged <18 years, 40 respondents aged https://doi.org/10.6109/jicce.2017.15.2.104
18-30 years, and 3 respondents aged> 30 years. Among [11]Chen, F., Xia, K., Dhabalia, K., Hong, J.I.A.(2019). MessageOnTap: A
Suggestive Interface to Facilitate Messaging-related Tasks. CHI ’19:
these 44 respondents, 34 out of 44 respondents have both CHI Conference on Human Factors in Computing Systems (pp. 1-14).
the Whatsapp and LINE apps and as many as 22.7% only Glasgow Scotland:Association for Computing Machinery.
have the Whatsapp app on their cellphones. In addition, we https://dl.acm.org/doi/10.1145/3290605.3300805
obtained data as much as 77.3% that respondents use the [12]Pratama, M. A., & Cahyadi, A. T. (2020). Effect of User Interface and
User Experience on App Sales. IOP Conference Series: Materials
Whatsapp app more often than LINE (22.7%). Science and Engineering, 879, 012133. doi:10.1088/1757-
899x/879/1/012133
Of the 34 respondents who used the Whatsapp app, 23 [13]Sonderegger, A., &amp; Sauer, J. (2010). The influence of design
(67.6%) of respondents chose a score of 5 which stated that aesthetics in usability testing: Effects on user performance and
perceived usability. Applied Ergonomics, 41(3), 403-410.
the Whatsapp app was easy to learn and easy to use, 20 doi:10.1016/j.apergo.2009.09.002
(58.8%) respondents were comfortable using the Whatsapp [14]Choi, W., & Tulu, B. (2017). Effective Use of User Interface and User
app by choosing a score of 5, 21 (61.8%) respondents did Experience in an mHealth App. Proceedings of the 50th Hawaii
not experience difficulties inusing the features on the International Conference on System Sciences (2017).
doi:10.24251/hicss.2017.460
Whatsapp app by choosing a score of 1 (not confusing), 14 [15] Heny, D. N. (2016). User Interface and User Experience Analysis on
(41.2%) respondents like the appearance and color selection the Website of Adisutjipto High School of Technology Yogyakarta.
on the Whatsapp app with a score range of 4-5. This shows Conference senatik STT Adisutjipto Yogyakarta, 2, 183. doi
that the respondent chooses and uses the app in terms of 10.28989/senatik.v2i0.77

31 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

A Comparison of Artificial Intelligence-Based


Methods in Traffic Prediction

Priscilla Diamanta Gian Avila M. Ilham Hudaya


School of Computer Science School of Computer Science School of Computer Science
Bina Nusantara University Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia Jakarta, Indonesia Jakarta, Indonesia

Edy Irwansyah
School of Computer Science
Bina Nusantara University
Jakarta, Indonesia
[email protected]

Abstract— Traffic plays an important role in our society as those predictions can be received by individuals and
its state can affect individuals and industries in various ways. organizations alike—in order to mitigate traffic congestion.
Traffic congestion can bring negative impacts to the society and The subject of traffic prediction itself has been an ever-active
can lead to bigger problems if let be without a solution to research topic since the late 1970s [19]. However, traffic
mitigate it. Thus, traffic prediction serves as a solution to said prediction can be difficult to handle as traffic data are
problem. In this systematic literature review, AI-based traffic constantly changing and can be affected by other factors
prediction methods are compared in order to find which ones around the area. This is where Artificial Intelligence comes
serve as the better solutions for predicting traffic. Using the into the frame. The implementation of AI in traffic prediction
PRISMA Flowchart methodology, which helps authors
allows us to get accurate prediction of real-time traffic data
systematically analyze relevant publications and improve the
quality of reports and meta-analyses. By conducting further
to mitigate traffic congestion. Many researchers have been
analysis on the screened references, it is found that the methods conducting experiences on traffic prediction using Artificial
that integrates Convolutional Neural Network or Recurrent Intelligence based techniques; proposing various methods
Neural Network with Long Short-Term Memory along with and comparing state-of-the-art methods in order to find the
error-recurrent Neural Network proved to be good candidates optimal solution for traffic prediction, which can be seen as
for an optimal traffic prediction. an example in [6] and [19]. Most of the research in this topic
revolve around the implementation of machine learning and
Keywords— Traffic Prediction, Traffic Forecasting, Machine deep learning-based method, although deep-learning based
Learning, Deep Learning methods are preferred by many researchers since it has a
better capability for handling complex data.
I. INTRODUCTION
The aim of this paper is to review and compare state of
Traffic prediction in this context refers to the ability to the art Artificial Intelligence-based methods proposed by
forecast a traffic condition in specific areas based on real time many researchers in regards of traffic condition prediction
data. It provides us with information that allows us to and to see which method(s) performs the best. The paper
monitor, plan, and come up with a decision with which we starts with an introduction to traffic prediction and Artificial
can try to mitigate traffic congestion. It plays an important Intelligence (AI) along with its implementation in traffic
role in our society, as transportation itself is an important part prediction, which are further explained in the theoretical
in the development of a country [3]. Fortunately, as background section. The methodology that is used to
technology grows to be more and more advanced, it provides construct this paper is then discussed next. The paper will
us with more advanced solutions regarding traffic prediction then discuss of the results obtained from reviewing the
to date. Artificial Intelligence, or what people would simply screened, relevant references. Finally, the last part will
address as AI, is a branch of computer science which has provide the final conclusions of this paper.
found its way into many fields in real life. The transportation
field is no exception to this; in fact, it provides more solution II. THEORETICAL BACKGROUND
to it, including for traffic-related problems.
A. Artificial Intelligence
Traffic congestion has been an ever-ongoing problems in
many cities, especially in metropolitan areas. With the In this modern era, the application of Artificial
increase of vehicle possession, traffic condition certainly Intelligence (AI) can be found in various industries and
does not seem to be getting any better, if not worse. Traffic businesses. AI has been incorporated in many fields due to its
congestion has been a major problem that affects many ability to help in solving critical issues found in modern day
industries and individuals, having put them under many problems. The implementation can be seen, for example, in
disadvantages such as time loss, mental stress, and pollution the field of education, economics, business, science, and
[3]. These can lead to bigger problems regarding the flow of transportations.
economics and global warming. AI itself is a term that refers to a technology that focuses
Ultimately, the issue regarding traffic congestion is not on performing various tasks as a human would. AI makes
one to be left ignored. There needs to be a solution in which machines simulate the capability of human intelligence,
traffic condition can be predicted accurately—and in which including learning and problem-solving. AI falls under the

978-1-6654-4002-8/21/$31.00 ©20 IEEE

32 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

category of data science and is an umbrella term that includes 1. Identification, which involves the searching of
both machine learning and deep learning in it. AI was first potentially usable papers from online databases such
discovered back in 1956 by John McCarthy, who defined its as Science Direct, Semantics Scholar, ResearchGate,
goal as to create intelligent machines or computer programs. MDPI, and online university libraries. Keywords
Since its discovery, many researchers have conducted various that are used in searching relevant papers are
research to explore AI further and further, establishing many “Traffic Prediction”, “Traffic Prediction AI”,
AI-based techniques that we know today along the way. As “Artificial Intelligence” and “Traffic and Route
of now, AI consists of many different subfields, which can be Prediction”.
used accordingly based on what type of task is to be done. 2. Screening, in which the author sorted out duplicates
For example, symbolic AI can solve logical problems that are and removed papers that we found irrelevant from
clearly defined but unsuitable for certain higher-level tasks, the topic after.
machine learning is good at handling dynamic environment 3. Eligibility, where after further reading the
but requires a lot of datasets, and so on. introduction and abstract part, irrelevant papers were
sorted out.
As the main discussion of this review would involve the 4. Included, in which a full-paper review is done on the
implementation of AI-based techniques in transportation—
eligible papers to determine which are the essentials.
specifically traffic prediction— the authors have found that
various techniques from the machine learning and deep The inclusion criteria of the papers used in this review are
learning subfields are commonly involved for this task. papers which topics are related to the use of artificial
intelligence methods in traffic prediction, written in the
B. Machine Learning and Deep Learning English language, and published within the last 5 years (with
Both Machine Learning and Deep Learning is a subfield some exceptions). The exclusion criteria are papers that are
of AI. Machine Learning focuses on learning from data and non-full article, written in non-English language, duplicates
developing algorithms based on what available data are and those which do not quite meet the topic of our review.
provided. The algorithm produced can then be used on future
datasets. The aim is to enable computers to learn and take
actions by themselves without having to teach them
everything. Four commonly used learning methods in
Machine Learning are supervised, unsupervised, semi
supervised, and reinforcement learning. Supervised learning
uses known and labeled dataset, and its tasks include
classification and regression algorithms (e.g. Random
Forests, Support Vector Machine, etc.) while unsupervised
learning uses unknown and unlabeled dataset, and its tasks
include clustering algorithms (e.g. K-means) and such. As the
name suggests, semi supervised learning is a medium
between supervised and unsupervised learning and is used
when the dataset contains labeled and unlabeled data. Finally,
in reinforcement learning, the algorithm learns from trials and
errors and is trained to produce an overall outcome.
Deep Learning itself falls under the scope of Machine
Learning, but Deep Learning consists of many layers and has
a more complex architecture compared to Machine Learning,
making it suitable for handling more complex problems. This
made them ideal in handling spatio-temporal data for
forecasting tasks, including traffic forecasting. Deep
Learning models are also faster in producing results
compared to Machine Learning approaches, as Deep
Learning models are able to learn important features by itself
rather than having a data scientist select them manually.
There’s a challenge, though— to create an accurate model,
Deep Learning depends on the quality of the training data dan Fig. 1. Prisma Flow Diagram
the complexity of the task. Even then, Deep Learning
methods can achieve better performance compared to Data extraction is performed after reading the full-text
Machine Learning. papers which have undergone the selection process as shown
above. The following Table.1 shows the extracted data along
III. METHODOLOGY with their brief description.
This paper is a systematic literature review which
IV. RESULT
discusses the use of different artificial intelligence methods
in traffic prediction. As the methodology and report of a From the 37 publications screened, the authors have
systematic review must be of a good quality, this systematic gained a total of 30 relevant papers (without duplicates) that
literature review is constructed using the PRISMA Flow have met the inclusion criteria for this paper. Therefore, those
Diagram, which helps the authors improve the quality of eligible papers have undergone a data extraction conducted
reports and meta-analyses. The selection process is as by the authors to gain the information needed in order to
follows:

33 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

answer some research questions that will be discussed in this forementioned methods in order to optimize the performance
part. in predicting traffic condition.

TABLE I. DATA EXTRACTION TABLE III. PUBLICATIONS ON THE MOST COMMONLY USED DEEP
LEARNING BASED TECHNIQUES IN TRAFFIC PREDICTION

Extracted Data Description Methods Number of Reference


Papers
ID Unique number assigned to each referred LSTM 6 [10], [13], [15], [19], [25],
paper. [28]

Reference Includes the papers’ title, authors, and year Convolutional 5 [9], [13], [21], [22], [28]
of publication. Neural Network
(CNN)
Methodology The methods used by the referred papers, Recurrent Neural 3 [9], [10], [22]
specifically artificial intelligence methods/ Network (RNN)
algorithms.

Context An evaluation of the artificial intelligence From the various techniques mentioned, one that was
methods that are used and the factors mentioned most is the long short-term memory (LSTM)
considered in choosing the best technique. Although mentioned the most, LSTM by itself is
performing algorithm. not necessarily the best technique that can be used for traffic
prediction, hence why it is usually integrated with other
techniques. He et al [10] and Yao et al [28] did a research in
A. What is the most common AI-based approach used for which they proposed methods which included the integration
of LSTM with other algorithm in order to improve the
traffic prediction?
performance. He et al [10] proposed a method called
Based on the information acquired from the reviewed STANN, a Spatio-Temporal Attentive Neural Network for
papers, it is known that there are various methods used for traffic prediction, which combined Recurrent Neural
traffic prediction. The commonly used ones found are Deep Network (RNN) with LSTM. While RNN is suitable for time
Learning, Machine Learning, Markov Model, and Fuzzy series data, it is still limited from learning long-term
Logic. The following is a table listing all the forementioned dependency, but this problem is alleviated by a gated
methods and the number of papers they were discussed in. mechanism provided by using RNN with LSTM units. This
enabled them to capture spatio-temporal dependencies from
TABLE II. PUBLICATION ON THE COMMONLY USED AI-BASED historical traffic time series. This method was proposed to
METHODS IN TRAFFIC PREDICTION address the problems found in the state-of-the-art traffic
Approach Number of Papers Reference
prediction models: Dynamic spatio-temporal dependencies
among network-wide links and long-term traffic prediction.
Deep Learning 16 [2], [8], [10], [12], [13], From the experimental results, it has been proven that
[15], [17], [19], [20], STANN outperforms other methods especially for long-term
[21], [22], [25], [26], traffic predictions.
[29], [31], [32]
On the other hand, Yao et al [28] proposed a method
Machine Learning 8 [3], [4], [5], [6], [8], called Spatial-Temporal Dynamic Network (STDN) which
[11], [18], [29] used Convolutional Neural Network (CNN) and LSTM to
handle spatial and temporal information. They used flow-
Markov Model 5 [8], [17], [23], [30]
gated local CNN to handle spatial dependency based on
Fuzzy Logic 4 [1], [3], [12], [18] traffic flow information, and LSTM to handle sequential
dependency. This method captured long-term periodic
information as well as temporal shifting in traffic sequence.
Based on the extracted information shown above, Deep This method was also proven to outperform state-of-the-art
Learning is the most common method used for predicting methods.
traffic condition, with a total of 16 papers using said method. Another candidate for the best deep learning technique
Deep Learning performs better than traditional machine that can be used for traffic condition prediction would be the
learning models due to having a more complex architecture, error recurrent model CNN (eRCNN), which research were
thus making it the preferred approach in traffic prediction done by Mena-Oreja et al [13] and Wang et al [20]. In order
which requires working with a lot of complex spatio-temporal to handle the rapidly changing traffic speed— which makes
data. Some research proposed an integration of deep learning it hard to predict traffic condition—Wang et al [20]
techniques in order to optimize it even further, which can be introduced feedback neurons to recurrent layer so that
seen, as an example, in a research conducted by He et al [10]. eRCNN can learn from prediction errors from abrupt traffic
event changes. Through their experiment they conducted
B. What are good Deep Learning techniques that can be over two roads in the Beijing city, it is seen that while the
used for traffic prediction? normal CNN is comparable with eRCNN, it still has a greater
A number of techniques are mentioned in the papers, the prediction error than eRCNN. It also captured abrupt traffic
most commonly mentioned ones being long short-term speed changes successfully, proving that eRCNN is able to
memory (LSTM), convolutional neural network (CNN), give solution the challenge.
recurrent neural network (RNN) or a combination of the

34 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Mena-Oreja et al [13] also did a comparison between [4] Amirian, P., Basiri, A., and Morley, J. Predictive analytics for
eRCNN with CNN+LSTM, a technique used in STDN. In enhancing travel time estimation in navigation apps of Apple, Google,
and Microsoft. In Proceedings of the 9th ACM SIGSPATIAL
their experiment on training neural networks to predict International Workshop on Computational Transportation Science. pp.
traffic, it is shown that error recurrent models—including 31-36, October 2016.
eRCNN—achieved the best prediction accuracy compared to [5] Bhavsar, P., Safro, I., Bouaynaya, N., Polikar, R., and Dera, D.
other methods, including CNN+LSTM. In predicting traffic Machine learning in transportation data analytics. In Data analytics
speed and flow, eRCNN has been proven to outperform for intelligent transportation systems. Elsevier. pp. 283-307, 2017.
CNN+LSTM, having a considerably lower error rates that [6] Bratsas, C., Koupidis, K., Salanova, J. M., Giannakopoulos, K.,
CNN+LSTM. This proved the same in predicting traffic Kaloudis, A., and Aifadopoulou, G. A comparison of machine learning
methods for the prediction of traffic speed in urban places.
congestion; while CNN managed to outperform LSTM in Sustainability, vol.12, no.1, pp.142, 2020.
traffic congestion prediction, it is still outperformed by the [7] Choi, R. Y., Coyner, A. S., Kalpathy-Cramer, J., Chiang, M. F., and
error recurrent models, specifically eRCNN and Campbell, J. P. Introduction to machine learning, neural networks, and
eRCNN+LSTM. The overall results reported that eRCNN, deep learning. Translational Vision Science & Technology, vol. 9,
while having fewer convolutional layers thus providing a no.2, pp. 14, Jan 2020. doi:https://doi.org/10.1167/tvst.9.2.14
smaller computational cost, outperformed CNN models in [8] Dai, Y., Ma, Y., Wang, Q., Murphey, Y. L., Qiu, S., Kristinsson, J., ...
predicting traffic even under congestion conditions. and Feldkamp, T. "Dynamic prediction of drivers' personal routes
through machine learning," 2016 IEEE Symposium Series on
And such, the forementioned methods proved to be good Computational Intelligence (SSCI), 2016, pp. 1-8, doi:
candidates for Deep-Learning based techniques that can be 10.1109/SSCI.2016.7850094.
used to predict traffic condition. eRCNN, in particular, might [9] Haghighat, A. K., Ravichandra-Mouli, V., Chakraborty, P., Esfandiari,
Y., Arabi, S., and Sharma, A. Applications of deep learning in
be a better solution than methods that integrated CNN and intelligent transportation systems. Journal of Big Data Analytics in
LSTM in predicting traffic. However, it is not possible to Transportation, vol.2, no.2, pp.115–145, 2020. doi:
compare the forementioned methods and come into a https://doi.org/10.1007/s42421-020-00020-1
conclusion regarding which one proves to be the best as each [10] Z. He, C. Chow and J. Zhang, "STANN: A Spatio–Temporal Attentive
experiment used different datasets and were done under Neural Network for Traffic Prediction," in IEEE Access, vol. 7, pp.
different conditions. Further research involving the 4795-4806, 2019, doi: 10.1109/ACCESS.2018.2888561.
methods— using common datasets under common [11] Liebig, T., Piatkowski, N., Bockermann, C., and Morik, K. Dynamic
conditions— must needs be conducted to see which one can route planning with real-time traffic predictions. Information Systems,
vol.64, pp. 258-265, March 2017. doi:
outperform all others. https://doi.org/10.1016/j.is.2016.01.007
[12] Y. Lv, Y. Duan, W. Kang, Z. Li and F. Wang, "Traffic Flow Prediction
V. CONCLUSION With Big Data: A Deep Learning Approach," in IEEE Transactions on
In this paper, the authors did a systematic literature review Intelligent Transportation Systems, vol. 16, no. 2, pp. 865-873, April
on the topic of the use of different artificial intelligence 2015, doi: 10.1109/TITS.2014.2345663.
methods in predicting traffic condition. A total of 30 relevant [13] J. Mena-Oreja and J. Gozalvez, "A Comprehensive Evaluation of Deep
Learning-Based Techniques for Traffic Prediction," in IEEE Access,
papers that had undergone the screening process were used as vol. 8, pp. 91188-91212, 2020, doi: 10.1109/ACCESS.2020.2994415.
references to construct this paper. The authors compared the [14] E. Osaba, P. Lopez-Garcia, E. Onieva, A. Masegosa, L. Serrano and H.
various methods of traffic prediction using AI that were Landaluce, Application of Artificial Intelligence Techniques to Traffic
mentioned or proposed in those papers to find which method Prediction and Route Planning, the vision of TIMON project,
is the best one. Collection of open conferences in research transport. 2017. Available:
https://www.scipedia.com/public/Osaba_et_al_2017a
From those papers, the authors discovered that the [15] Parsa, A. B., Chauhan, R. S., Taghipour, H., Derrible, S., and
commonly used methods in predicting traffic condition are Mohammadian, A. Applying deep learning to detect traffic accidents
Deep Learning based methods, due to their ability to work in real time using spatiotemporal sequential data. arXiv preprint
with a lot of complex spatio-temporal data. Among the many arXiv:1912.06991. 2019.
methods of Deep Learning, we found three techniques that [16] Petrillo, A., Travaglioni, M., De Felice, F., Cioffi, R., and Piscitelli, G.
Artificial Intelligence and Machine Learning Applications in Smart
are used the most for traffic prediction, namely long short- Production: Progress, Trends, and Directions. Sustainability, vol. 12,
term memory (LSTM), convolutional neural network (CNN), pp. 492, 2019. doi: 10.3390/su12020492.
recurrent neural network (RNN) or a combination of those [17] Sun, S., Chen, J., and Sun, J. Traffic congestion prediction based on
methods. The authors then went to further analyze each GPS trajectory data. International Journal of Distributed Sensor
technique and found that CNN or RNN combined with LSTM Networks, vol. 15, no. 5, 1550147719847440. 2019. doi:
and eRCNN proved to be good methods candidates for traffic https://doi.org/10.1177/1550147719847440.
prediction, and that eRCNN in particular might be a better [18] F. Tseng, J. Hsueh, C. Tseng, Y. Yang, H. Chao and L. Chou,
"Congestion Prediction With Big Data for Real-Time Highway
solution than CNN combined with LSTM. However, further Traffic," in IEEE Access, vol. 6, pp. 57311-57323, 2018, doi:
experiments on those methods must be conducted under 10.1109/ACCESS.2018.2873569.
common datasets and conditions in order to come to a clear [19] Vázquez, J. J., Arjona, J., Linares, M., and Casanovas-Garcia, J. A
conclusion as to which method outperforms all others. comparison of deep learning methods for urban traffic forecasting
using floating car data. Transportation Research Procedia, 2020, vol.
REFERENCES 47, pp. 195-202. doi: https://doi.org/10.1016/j.trpro.2020.03.079.
[1] Abduljabbar, R., Dia, H., Liyanage, S., and Bagloee. Applications of [20] J. Wang, Q. Gu, J. Wu, G. Liu and Z. Xiong, "Traffic Speed Prediction
Artificial Intelligence in Transport: An Overview. Sustainability, and Congestion Source Exploration: A Deep Learning Method," 2016
vol.11, no.1. pp.189, 2019. doi: https://doi.org/10.3390/su11010189 IEEE 16th International Conference on Data Mining (ICDM), 2016,
pp. 499-508, doi: 10.1109/ICDM.2016.0061.
[2] Aggarwal, A. K. Enhancement of gps position accuracy using machine
vision and deep learning techniques. Journal of Computer Science, [21] P. Wang, C. Luo, F. Pan and Y. Zhu, "Analysis and Research of
vol.16, no.5. 2020. doi:10.3844/jcssp.2020.651.659. Artificial Intelligence Algorithms in GPS Data," in IEEE Access, 2020.
doi: 10.1109/ACCESS.2020.3021426.
[3] Akhtar, M., and Moridpour, S. A Review of Traffic Congestion
Prediction Using Artificial Intelligence. Journal of Advanced
Transportation, 2021. doi: https://doi.org/10.1155/2021/8878011.

35 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

[22] S. Wang, J. Cao and P. Yu, "Deep Learning for Spatio-Temporal Data [28] H. Yao, X. Tang, H. Wei, G. Zheng, and Z. Li, “Revisiting Spatial-
Mining: A Survey," in IEEE Transactions on Knowledge and Data Temporal Similarity: A Deep Learning Framework for Traffic
Engineering, 2020. doi: 10.1109/TKDE.2020.3025580. Prediction”, AAAI, vol. 33, no. 01, pp. 5668-5675, Jul. 2019.
[23] Wang, X., Ma, Y., Di, J., Murphey, Y. L., Qiu, S., Kristinsson, J., ... [29] X. Yin, G. Wu, J. Wei, Y. Shen, H. Qi and B. Yin, "Deep Learning on
and Feldkamp, T. Building efficient probability transition matrix using Traffic Prediction: Methods, Analysis and Future Directions," in IEEE
machine learning from big data for personalized route prediction. Transactions on Intelligent Transportation Systems, 2021. doi:
Procedia Computer Science, 2015, vol. 53, pp. 284-291. doi: 10.1109/TITS.2021.3054840.
https://doi.org/10.1016/j.procs.2015.07.305. [30] Zaki, J. F., Ali-Eldin, A., Hussein, S. E., Saraya, S. F., and Areed, F. F.
[24] Wen, W., and Hsu, S. W. A route navigation system with a new revised Traffic congestion prediction based on Hidden Markov Models and
shortest path routing algorithm and its performance evaluation. WIT contrast measure. Ain Shams Engineering Journal, vol. 11, no. 3, pp.
Transactions on The Built Environment, vol. 77, 2005. 535-551, 2020. doi: https://doi.org/10.1016/j.asej.2019.10.006
[25] Wu, F., Fu, K., Wang, Y., Xiao, Z., and Fu, X. A Spatial-Temporal- [31] Zhang, S., Yao, Y., Hu, J., Zhao, Y., Li, S., and Hu, J. Deep
Semantic Neural Network Algorithm for Location Prediction on Autoencoder Neural Networks for Short-Term Traffic Congestion
Moving Objects. Algorithms. vol. 10, no.2. pp. 37, 2017. doi: Prediction of Transportation Networks. Sensors, 19, 2229, 2019. doi:
https://doi.org/10.3390/a10020037 https://doi.org/10.3390/s19102229
[26] Xing, Y., Ban, X., Liu, X., and Shen, Q. Large-Scale Traffic [32] T. Zhou, ‘Deep Learning Models for Route Planning in Road
Congestion Prediction Based on the Symmetric Extreme Learning Networks’, Dissertation, 2018.
Machine Cluster Fast Learning Method. Symmetry, vol. 11, no. 6, pp.
730, 2019. https://doi.org/10.3390/sym11060730
[27] Yang, W., Ai, T., and Lu, W. A Method for Extracting Road Boundary
Information from Crowdsourcing Vehicle GPS Trajectories. Sensors,
vol. 18, no. 4, pp. 1261, 2018. https://doi.org/10.3390/s18041261.

36 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Impact of Computer Vision With Deep Learning


Approach in Medical Imaging Diagnosis
Charleen Cheryl Angelica Hendrik Purnama
Computer Science Computer Science Computer Science
Binus University Binus University Binus University
Jakarta, Indonesia Jakarta, Indonesia Jakarta, Indonesia
[email protected] [email protected] [email protected]

Fredy Purnomo
Computer Science
Binus University
Jakarta, Indonesia
[email protected]

Abstract— Medical experts are usually the ones who analyze interactive processes, on the other hand, might easily exceed
the interpretations of medical data. A medical expert's ability to practical time restrictions in complicated circumstances
interpret images is limited due to subjectivity and the where both quantitative and qualitative outputs are required,
complexity of the images. This research purpose is to find out if necessitating automated techniques to support them. This
the uses of computer vision in medical imaging will harm the
problem looks to be well suited to computer vision, which
patient with the impact and challenges, we will face while
implementing computer vision in healthcare, especially medical provides ways for evaluating complicated 3D data and
imaging. This research will uncover how well deep learning turning it into representations that are appropriate for our
algorithms compared to health-care professionals at classifying visual perception and cognitive processes [1].
diseases based on medical imaging. In this research, the methods
that we use is systematic literature review about Computer
The use of computer vision techniques in surgery and
Vision. Deep Learning approach in Computer Vision for the treatment of various diseases has shown to be extremely
Medical Imaging is the secret to aiding physicians in maximizing beneficial. Although basic image processing techniques can
the accuracy of diagnoses, it is harmless and are safe to use for easily process digital images, effective computer vision can
assisting doctors in medical imaging diagnosis. provide a plethora of data for diagnosis and treatment. It has
been a challenge to use computer vision to process medical
Keywords— Medical Imaging, Computer Vision, Deep images because of the complexity in dealing with the images
Learning, Healthcare [2].
I. INTRODUCTION Deep learning technology, which is a new trend in data
The creation of strong algorithms for automated analysis in general and has been dubbed one of the year's top
analysis of digital medical pictures is gaining popularity. ten breakthrough technologies, is used in computer vision.
Medical imaging refers to the methods and techniques used Deep learning is a type of artificial neural network that has
to create pictures of the human body for therapeutic reasons, more layers and allows for higher degrees of abstraction and
as well as operations aimed at discovering, diagnosing, or better data predictions. It is now the dominant machine-
analysing illness, as well as study into normal anatomy and learning technology in the computer vision arena for assisting
physiology. Fast imaging techniques like computer with medical imaging interpretation [3].
tomography (CT) and magnetic resonance imaging (MRI) Following that, a Computer-Aided Diagnosis (CAD)
allow for the capture of picture series, slices, or actual volume system is used to improve the accuracy of the doctor's
data in clinical practice. From the first x-rays to the most diagnosis. CAD uses a deep learning algorithm in computer
recent MRI, medical imaging has progressed. With the vision to analyse imaging data and allows a diagnosis of the
progress comes a significant increase in the use of computer patient's disease, which can then be used to help physicians
vision methods in medical imaging [1]. make decisions. However, is it really necessary? This
Since MRI may offer structural and functional research will further discuss the necessity of computer vision
information about biological tissue without disturbing the in medical imaging and its impact of improving healthcare
organ system's physiology, it becomes a powerful and vital [4].
tool for non-invasive examinations. Multiple spectral II. METHODOLOGY
channels increase specific properties, and multi echo
collection allows for better differentiation of diverse tissue To make systematic literature review, the authors use item
types and anatomically functioning units. screening and inclusion criteria with PRISMA flow diagram
which illustrated in Fig. 1. The authors separately reviewed
Although the human visual system is highly efficient at the records that were accessible. At the end of each round,
interpreting our natural environment, it is a very sophisticated they gathered to share and present the findings of their
human vision activity that requires a lot of practice. By personal investigations. The authors came to an agreement on
analysing sets of slices, qualified physicists can visually a list of tracks that would be transferred to the next level after
inspect 2D and basic 3D phenomena. Visual analysis or pure solving conflicts and discrepancies.

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

37 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Fig. 1. PRISMA flow diagram on literature review

The authors conducted a systematic search for relevant sectors affected by the changing digital age, such as
literature, reviewing the databases of IEEE Xplore and other healthcare. The governance approach still must be designed
resources such as Google Scholar from January 2016 to May and applied to fully realize Industry 4.0 [5].
2021. The keywords that had been used to searched are
A. Medical Imaging
computer vision, deep learning, medical imaging, and
healthcare. As we screened the records, we focused on our Medical imaging is a series of procedures used to view a
studies aim which is “Is the used of computer vision in human's body for diagnosing purposes. Examples of medical
medical imaging harm people? What are the impact and imaging are CT-Scan, MRI (Magnetic Resonance Imaging),
challenges of computer vision in healthcare especially x-rays, ultrasounds, and endoscopy.
medical imaging?”. The authors scoured the titles, keywords, Medical image processing is a method for extracting
and abstracts of the 168 papers obtained which related to our information from pictures that would otherwise be
research topics. inaccessible without the use of computers. Image processing
The study of these records was guided by 4 exclusion techniques that are used focus on two specific tasks that need
criteria: (1) Items with titles and phrases that did not relate to normal processing workflow: (1) Segmentation, (2) Feature
deep learning and computer vision were deleted as “off topic” extraction [6].
in health care, (2) Papers without an abstract were removed Segmentation is a required step in medical imaging that
from further screening because they were not accessible, (3) can be done manually by experts with high accuracy, but it is
Papers without medical imaging technique were removed, (4) time consuming. Manual data interpretation and analysis has
The records that did not discussed about rate of success or become a challenging task. Radiologists can misdiagnose due
challenges in using deep learning to detects the diseases were to inexperience or fatigue, resulting in faulty diagnoses, such
also eliminated. There are 74 papers that become candidates as false negative results, benign cancer which can be
after reading the abstracts, and 38 papers were judged to be mistaken for malignant cancer, and many more. According to
incompatible with the study's objectives and were deleted data, the percentage of human-caused misinterpretation in
from our database following extensive readings and based on medical picture processing might reach 10-30% [7]. On the
exclusion criteria, as shown in Table 1. As a result, we have other hand, highly accurate and automatic techniques have
36 records in our literature review. yet to be invented [8].
III. RESULTS
TABLE I. SELECTED PAPER DETAILS
In recent years we have experienced a rapid transition,
from the analog age to the digital and post-digital world. Studies
However, little is understood about the implications of Source Studies Found (no
Candidate Studies Selected Studies
industry 4.0 for an organization's ability to raise quality duplication)
IEEE
standards and meet customer’s expectations, particularly for Xplore
114 34 9
Google
54 40 27
Scholar
Total 168 74 36

38 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

From a conceptual and empirical point of view, the use of Typically, a multi-level network design is utilized to process
imaging should enhance an organization's ability to provide picture characteristics.
high-quality and efficient care. [9] More advanced imaging,
One of the most representative deep learning techniques
particularly newer technologies such as CT scanning, gives
is the convolutional neural network (CNN), which is used in
higher image quality and detail in a shorter amount of time.
image classification. Deep learning is also used by Region
The outcomes will be more precise and timelier, which will
CNN (RCNN) to perform object classification and
cause treatment to be started sooner and less unnecessary
localisation in images.
treatment. Medical imaging has changed the way doctors
assess, diagnose, and monitor disease [10]. The CNN and RCNN algorithms are used for unique
picture categorization and object detection systems. Image
B. Computer Vision
classification is a technique for separating images based on
Computer Vision (CV) is one of the most rapidly their semantics that use CNN. Object detection is a method
increasing fields of AI research (AI), with the main objective that combines image classification with object localisation to
of using computers to mimic human learning and vision, as determine the type of specific object in an image [8].
well as having capacity to draw inferences and execute
actions based on visual information [11]. This technology is RCNN algorithm for detecting objects is made more
based on identification with computer senses using concrete and complex than CNN. Both algorithms comply
algorithms in a way that computers can identify what they see with particular application-level criteria in the realm of
[12]. To be able to understand or analyse images, the key lies medical technology. [11].
between CV and image processing [8]. Diagnosing with a Computer-Aided Diagnosis (CAD) has
The application of computer vision in the health sector the potential to lower typical reading time by more than half
has been widely used. This technology can identify minor [4]. The architecture defined in the healthcare domain is
ailments or even identify our bodies every day [13,14,15]. based largely on convolutional neural networks (CNN). The
Examples include detecting the mobilization of adult patients study states that CNN is faster at processing the diagnosis of
in the ICU [16], MRI [17], mammography & digital breast lung cancer than DNN (Deep Neural Network) and SAE
tomosynthesis [4], detecting the presence or absence of (Stacked Auto Encoder). In this case, 3D images will help
cancer in the lungs and colon tissue [18], and can even CNN to process the diagnosis more quickly [23]. The list of
diagnose disease only with facial image [19]. Computer methods used is increasing day by day to increase the
vision techniques are used to analyse blood parameters by precision with which diseases are diagnosed. The CAD
capturing and analysing images of the patient's conjunctiva. system in medical analysis can be divided into 4 stages: (1)
In the CIELAB colour space, the methodology employed is image pre-processing, (2) image segmentation, (3) feature
the canny edge detection approach with morphological extraction and selection, (4) classification [7]. Image
procedures [20]. Another computer vision technique segmentation is one of the most important things in CAD
approach is applied in an attempt to find vaginal epithelial (computer-aided diagnosis) and even one of the most
secretions. To aid in the diagnosis, the paper employs dual significant and fundamental algorithms in image processing
process and nucleus-cytoplasm cross-verification procedures. is the algorithm. In addition, of course, 2D and 3D images
[21] On the other hand, image processing techniques with will affect the speed of analysing [3].
computer vision that are used to analyse oesophageal image There are many tested examples of using deep learning
data include Computed Tomography (CT), endoscopy, and algorithms in identifying diseases (Table II). Accuracy in
Positron Emission Tomography (PET) [22]. diagnosing diseases that have been tested has exceeded 80%.
C. Deep Learning However, the results still depend on how we choose the deep
learning model. The deep learning algorithm is still evolving
One of the most reliant approaches on CV is deep
since many new diseases come with time.
learning. It is also a technique based on neural networks.
TABLE II. ACCURACY DEEP LEARNING ALGORITHM IN DIAGNOSIS

Deep Learning Accuracy (in


Authors Year Imaging Modality Diagnosis
Model percent)
Akiyama, Y., Mikami,
2020 MRI Moyamoya VGG16 92.8%
T., & Mikuni, N. [24]
Al-Bander, et al. [25] 2017 Fundus Imaging Glaucoma CNN 88.2%
Siamese Neural
C.-F. Liu, et al [26] 2019 MRI Alzheimer 92%
Network
Mesrabadi, H. A., &
2018 MRI Prostate Cancer AlexNet 86.3%
Faez, K. [27]
Nobrega, et al [28] 2018 CT Lung Cancer CNN-ResNet50 88.41%
Racic, et al [29] 2021 X-ray Pneumonia CNN 90%
Environmental
Srivastava, et al. [30] 2019 Endoscopy CNN 96.7%
Enteropathy
Siamese Neural
Tummala, S. [31] 2021 MRI Autism 99%
Network
Wang, et al. [32] 2021 CT-Scan COVID-19 Modified Inception 85.2%
Wang, J. et al. [33] 2017 MMG Coronary Artery Diseases CNN 96.24%

39 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

D. Challenge for Computer Vision in Medical Image with the impact and challenges, we will face while
Analysis implementing computer vision in healthcare, especially
medical imaging. We hypothesized at the outset of this study
Even though the algorithms of Computer Vision that have
that computer vision could be used in healthcare and would
been implemented have a good efficiency, they still have
be beneficial, but that it would also pose some risks.
limitations and challenges for processing medical images
According to our findings, the use of these algorithms
[23]. This is a list of challenges that we got from our
produces very accurate and quick results for medical
references.
imaging, and it is extremely useful for assisting medical
1) The Instability of Deep Learning in Image personnel.
Reconstruction: Image Reconstruction is an operation in the
In order to adapt to the current progress in data
image pre-processing stage for recreating an object from their
processing, it is essential to supply accurate research tools
projection, the example of this is CT scan. Deep learning that
that are suitable for the complex multiple information
is used in image reconstruction can result in unstable
contained in the data. Although our visual system is very
methods. This instability usually happens in a bunch of
good at interpreting our environment, it is a specialized
forms. First, they appear as a little interference in image or
medical human vision duty that necessitates a lot of practice.
domain sampling that almost goes undetected and will make
the image alter that far from what the images should be. From the results we found out that the accuracy of the
Second, there are many variations of failure in recovering medical imaging is higher than usual and so it makes medical
structural changes. For instance, the complete deletion detail imaging better to use for identifying the cause of the disease
to a softer distortion and obscure the image feature. The last that patient have. This also shown by other studies that have
is that there are too many samples that might make the quality brought this topic so in this experiment we can already say
of the image drop. This leads to the fact that we need that medical imaging is effective for identifying causes of a
hundreds of networks that need to be trained to recognize disease. The only limitation on this medical imaging is that
every pattern of specific subsampling, ratio subsampling, and we do not know the exact side effect since it might happen if
what dimension will be used [34]. we do it too much and the fact that the machine needs to learn
2) Shortage of Data Volume: Computer vision, especially a lot of data. The data will become much bigger because there
deep learning, needs millions of data to be trained so that they is a new disease that pop ups from time to time.
can have a high accuracy. Algorithms that have been used in
the last 30 years are Bayesian inference, fuzzy systems, V. CONCLUSION
Monte Carlo simulation, rough classification, and Dempster– Deep learning algorithms in computer vision are very
Shafer theory [34]. But in the medical field, not every human useful in the health sector especially for medical imaging
has a health problem and they do not always need the medical diagnosis. Based on our research, the use of these algorithms
imaging check like a CT scan or MRI. Because of that there has very accurate and fast results for medical imaging, in fact,
are not many patients that can be taken as data and trained by it is very helpful for medical personnel in diagnosing
deep learning. Other than that, variability of the disease diseases. There are no signs of danger in using deep learning
makes it hard to understand it [35]. to assist doctors. The only thing that can cause harm is found
3) Temporality: Along with time, disease will always in medical imaging devices that emit radiation such as CT-
develop with it. There have been a lot of new diseases in Scan. However, even if the developed deep learning
uncertain times. Deep learning cannot handle this factor of algorithms are efficient, they still have limitations and
uncertainty, so what that means is that if we find a new obstacles when it comes to analysing medical images. For
disease the machine needs to be trained from the scratch again instance, the instability in image reconstruction of deep
to understand that disease. This problem still needs solving learning, lack of data volume, and the presence of new
so that we do not need to wait for such a long time because diseases from time to time. So, deep learning algorithms must
of the algorithm that computer vision uses in detecting the be upgraded every time.
disease [35].
4) Radiation in Medical Imaging: If we want to attempt Hopefully, this study will serve as a good reference for
future medical imaging computer vision, deep learning, and
to use this technology every day, we need to think about how
machine learning research.
to do it and if it is safe if we use the radiation from MRI and
X-Ray [36]. Because from the study there is a small chance VI. REFERENCES
to accelerate disease like cancer in our body from radiation
[37]. This is already known by many people, but we still use
[1] Gerig, G., Kuoni, W., Kikinis, R., & Kübler, O. (1989). Medical
radiation to check disease or damage in our body, we still use Imaging and Computer Vision: An integrated approach for diagnosis
this because normally the effect of the radiation is so little and planning. Mustererkennung 1989, 425–432.
that it might be impossible to happen to our body. However, https://doi.org/10.1007/978-3-642-75102-8_64
using it on an everyday basis is a different problem and might [2] Gao, J., Yang, Y., Lin, P., & Park, D. S. (2018). Computer Vision in
cause us harm. Healthcare Applications. Journal of Healthcare Engineering, 2018, 1–
4. https://doi.org/10.1155/2018/5157020
IV. DISCUSSION [3] Greenspan, H., Van Ginneken, B., & Summers, R. M. (2016). Guest
editorial deep learning in medical imaging: Overview and future
The purpose of this research is to find out if the uses of promise of an exciting new technique. IEEE Transactions on Medical
computer vision in medical imaging will harm the patient Imaging, 35(5), 1153-1159.

40 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

[4] Chan, H., Hadjiiski, L. M., & Samala, R. K. (2020). Computer‐Aided [22] Domingues, I., Sampaio, I. L., Duarte, H., Santos, J. A., & Abreu, P.
Diagnosis in The Era of Deep Learning. Medical Physics, 47(5). H. (2019). Computer vision in esophageal cancer: a literature review.
doi:10.1002/mp.13764 IEEE Access, 7, 103080-103094.
[5] Cavallone, M., & Palumbo, R. (2020). Debunking the myth of industry [23] Lin, E. C. (2010, December). Radiation risk from medical imaging. In
4.0 in health care: Insights from a systematic literature review. The Mayo Clinic Proceedings (Vol. 85, No. 12, pp. 1142-1146). Elsevier.
TQM Journal. [24] Akiyama, Y., Mikami, T., & Mikuni, N. (2020). Deep Learning-Based
[6] Moreno, S., Bonfante, M., Zurek, E., & San Juan, H. (2019, June). Approach for the Diagnosis of Moyamoya Disease. Journal of Stroke
Study of medical image processing techniques applied to lung cancer. and Cerebrovascular Diseases, 29(12), 105322.
In 2019 14th Iberian Conference on Information Systems and [25] Al-Bander, B., Al-Nuaimy, W., Al-Taee, M. A., & Zheng, Y. (2017).
Technologies (CISTI) (pp. 1-6). IEEE. Automated glaucoma diagnosis using a deep learning approach. 2017
[7] Gao, J., Jiang, Q., Zhou, B., & Chen, D. (2019). Convolutional Neural 14th International Multi-Conference on Systems, Signals & Devices
Networks for Computer-Aided Detection or Diagnosis in Medical (SSD). doi:10.1109/ssd.2017.8166974
Image Analysis: An Overview. Mathematical Biosciences and [26] Liu, C. F., Padhy, S., Ramachandran, S., Wang, V. X., Efimov, A.,
Engineering, 16(6), 6536-6561. doi:10.3934/mbe.2019326 Bernal, A., Shi, L., Vaillant, M., Ratnanather, J. T., Faria, A. V., Caffo,
[8] Mohan, G., & Subashini, M. M. (2018). MRI based medical image B., Albert, M., & Miller, M. I. (2019). Using deep Siamese neural
analysis: Survey on brain tumor grade classification. Biomedical networks for detection of brain asymmetries associated with
Signal Processing and Control, 39, 139-161. Alzheimer’s Disease and Mild Cognitive Impairment. Magnetic
[9] Sandoval, G. A., Brown, A. D., Wodchis, W. P., & Anderson, G. M. Resonance Imaging, 64, 190–199.
(2019). The relationship between hospital adoption and use of high [27] Mesrabadi, H. A., & Faez, K. (2018). Improving early prostate cancer
technology medical imaging and in-patient mortality and length of stay. diagnosis by using Artificial Neural Networks and Deep Learning.
Journal of health organization and management. 2018 4th Iranian Conference on Signal Processing and Intelligent
[10] Vocaturo, E., Zumpano, E., & Veltri, P. (2018, December). Image pre- Systems (ICSPIS). doi:10.1109/icspis.2018.8700542
processing in computer vision systems for melanoma detection. In [28] Nobrega, R. V. M. D., Peixoto, S. A., da Silva, S. P. P., & Filho, P. P.
2018 IEEE International Conference on Bioinformatics and R. (2018). Lung Nodule Classification via Deep Transfer Learning in
Biomedicine (BIBM) (pp. 2117-2124). IEEE. CT Lung Images. 2018 IEEE 31st International Symposium on
[11] Yan, H. (2018, July). Computer Vision Applied in Medical Computer-Based Medical Systems (CBMS).
Technology: The Comparison of Image Classification and Object [29] Racic, L., Popovic, T., cakic, S., & Sandi, S. (2021). Pneumonia
Detection on Medical Images. In 2018 International Symposium on Detection Using Deep Learning Based on Convolutional Neural
Communication Engineering & Computer Science (CECS 2018). Network. 2021 25th International Conference on Information
Atlantis Press. Technology (IT).
[12] Seo, J., Han, S., Lee, S., & Kim, H. (2015). Computer vision techniques [30] Srivastava, A., Sengupta, S., Kang, S.-J., Kant, K., Khan, M., Ali, S.
for construction safety and health monitoring. Advanced Engineering A., … Brown, D. E. (2019). Deep Learning for Detecting Diseases in
Informatics, 29(2), 239-251. Gastrointestinal Biopsy Images. 2019 Systems and Information
[13] Dong, C. Z., & Catbas, F. N. (2020). A review of computer vision– Engineering Design Symposium (SIEDS).
based structural health monitoring at local and global levels. Structural doi:10.1109/sieds.2019.8735619
Health Monitoring, 1475921720935585. [31] Tummala, S. (2021). Deep Learning Framework using Siamese Neural
[14] Bao, Y., Tang, Z., Li, H., & Zhang, Y. (2019). Computer vision and Network for Diagnosis of Autism from Brain Magnetic Resonance
deep learning–based data anomaly detection method for structural Imaging. 2021 6th International Conference for Convergence in
health monitoring. Structural Health Monitoring, 18(2), 401-421. Technology (I2CT).
[15] Khuc, T., & Catbas, F. N. (2018). Structural identification using [32] Wang, S., Kang, B., Ma, J., Zeng, X., Xiao, M., Guo, J., Cai, M., Yang,
computer vision–based bridge health monitoring. Journal of Structural J., Li, Y., Meng, X., & Xu, B. (2021). A deep learning algorithm using
Engineering, 144(2), 04017202. CT images to screen for Coronavirus disease (COVID-19). European
Radiology.
[16] Yeung, S., Rinaldo, F., Jopling, J., Liu, B., Mehra, R., Downing, N. L.,
… Milstein, A. (2019). A Computer Vision System for Deep Learning- [33] Wang, J., Ding, H., Bidgoli, F. A., Zhou, B., Iribarren, C., Molloi, S.,
Based Detection of Patient Mobilization Activities in The ICU. Npj & Baldi, P. (2017). Detecting Cardiovascular Disease from
Digital Medicine, 2(1). doi:10.1038/s41746-019-0087-z Mammograms With Deep Learning. IEEE Transactions on Medical
Imaging, 36(5), 1172–1181.
[17] Selvikvåg Lundervold, A., & Lundervold, A. (2018). An Overview of
Deep Learning in Medical Imaging Focusing on MRI. Zeitschrift Für [34] Antun, V., Renna, F., Poon, C., Adcock, B., & Hansen, A. C. (2020).
Medizinische Physik. doi:10.1016/j.zemedi.2018.11.002 On Instabilities of Deep Learning in Image Reconstruction and The
Potential Costs of AI. Proceedings of the National Academy of
[18] Masud, M., Sikder, N., Nahid, A., Bairagi, A. K., & Alzain, M. A. Sciences, 201907377. doi:10.1073/pnas.1907377117
(2021). A Machine Learning Approach to Diagnosing Lung and Colon
Cancer Using a Deep Learning-Based Classification Framework. [35] Alizadehsani, R., Roshanzamir, M., Hussain, S., Khosravi, A.,
Sensors, 21(3), 748. doi:10.3390/s21030748 Koohestani, A., Zangooei, M. H., . . . Acharya, U. R. (2021). Handling
of uncertainty in medical data using machine learning and probability
[19] Thevenot, J., Lopez, M. B., & Hadid, A. (2018). A Survey on Computer theory techniques: A review of 30 years (1991–2020). Annals of
Vision for Assistive Medical Diagnosis from Faces. IEEE Journal of Operations Research. doi:10.1007/s10479-021-04006-2
Biomedical and Health Informatics, 22(5), 1497–1511.
doi:10.1109/jbhi.2017.2754861 [36] Miotto, R., Wang, F., Wang, S., Jiang, X., & Dudley, J. T. (2017). Deep
Learning for Healthcare: Review, Opportunities and Challenges.
[20] Bevilacqua, V., Dimauro, G., Marino, F., Brunetti, A., Cassano, F., Di Briefings in Bioinformatics. doi:10.1093/bib/bbx044
Maio, A., ... & Guarini, A. (2016, May). A novel approach to evaluate
blood parameters using computer vision techniques. In 2016 IEEE [37] Luo, Z., Hsieh, J. T., Balachandar, N., Yeung, S., Pusiol, G., Luxenberg,
International Symposium on Medical Measurements and Applications J., ... & Fei-Fei, L. (2018). Computer vision-based descriptive analytics
(MeMeA) (pp. 1-6). IEEE. of seniors’ daily activities for long-term health monitoring. Machine
Learning for Healthcare (MLHC), 2.
[21] Guo, S., Guan, H., Li, J., Liao, Y., Zhang, W., & Chen, S. (2020).
Vaginal Secretions Epithelial Cells and Bacteria Recognition Based on
Computer Vision. Mathematical Problems in Engineering, 2020.

41 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Finetunning IndoBERT to Understand Indonesian


Stock Trader Slang Language
Anderies Reza Rahutomo Bens Pardamean
Computer Science Departement, Information System Department, Computer Science Department
School of Computer Science School of Information Systems BINUS Graduate Program -
Bina Nusantara University Bina Nusantara University Master of Computer Science Program,
Jakarta, Indonesia 11480 Jakarta, Indonesia 11480 Bina Nusantara University,
[email protected] [email protected] Jakarta, Indonesia 11480
[email protected]

Abstract — News and social media sentiment is one of the The model gained its popularity because it has been trained
variables to formulate decisions for stock trading with around 4-Billion-word corpus (Indo4B) [6].
activities, although in previous research Twitter was The contribution of this research is delivering a fine-tuned
commonly used as the main data source to train models IndoBERT [6] to identify sentences that contain Indonesian
and identify stock market sentiment. In order to tackle trader slang words such as ‘ARA’, ‘HAKA’, ‘ARB’, ‘Merah’,
bias and noise that highly produced by variety of Twitter ‘Hijau’ and ‘to the moon’. The result of sentiment analysis by
audience background, this research utilized data of third- using the fine-tuned model is not limited to negative and
party trading application comments to train and perform positive word basis but also based on stock price movements
an experiment of sentiment analysis approach to predict as well.
stock movement price. the model used a fine-tuned
IndoBERT model to perform sentiment analysis on stock II. LITERATURE REVIEW
movement price that achieved 68% accuracy of 1101
records stock comment and posts, furthermore the model News sentiment analysis with unsupervised learning
also able to identify a number of Indonesian trader slang approach are gained more popular in academia text mining
words on the comments. research field [7], [8]. and in cases of stock market, news is
essential and foundation for traders in decision-making for
Keywords—Natural Language Processing, Sentiment trading activities. Velay and Daniel studied the basic
Analysis, Stock Market, IndoBERT, Indonesian Stock Trader sentiment analysis on trending news using common machine
Slang learning and deep learning models [9]. The results are
produced by comparing human knowledge in labeling and
I. INTRODUCTION machine labeling without looking at stock movement price
occurs. Despite all of the results indicating low performance
As technical analysis and examining, news trends are as shown in Table 1, this study opened an opportunity to
common strategies to formulate decisions in stock trading,
enhance the performance of basic sentiment analysis.
posts and comments from stock traders in both social media
and trading mobile applications are playing a major role [1]. Table 1 shows a comparison between algorithms that
Sentiment analysis in stock trading is transforming due to the studied by Velay and Daniel. The highest accuracy is
growth of technology and linguistic. More traders and achieved by Logistic Regression while the lowest accuracy
investors are expressing their opinions on social media and achieved by K-Nearest Neighbors in case of sentiment
trading platforms. analysis problems. it shows supervised learning is higher than
Nowadays, comments and posts from investors in stock unsupervised learning.
trading mobile applications are more influential to capital gain
and loss than general social media such as comments on TABLE 1. RESULTS FROM VELAY AND DANIEL'S STUDY [9]
Twitter. On Twitter, noise and biases are contained in posts
due to the variety of audience backgrounds [2]–[4]. This could Algorithm Accuracy
be a problem that is affecting sentiment analysis accuracy, in Logistic Regression 57%
case noise and biases are remaining uncontrollable. Linear Discriminant Analyst 51%
Despite most researches are using Twitter as their main K-Nearest Neighbors 46%
data source [2] – [4], this research conducted manual Decision Tree Classifier 49%
scrapping of Smartfren Telecom (Persero) Tbk. and Bank
Syariah Indonesia (Persero) Tbk from third party trading Support Vector Machine 53%
application. Random Forest 50%

In this study, sentiment analysis in the scope of Indonesian Extreme Gradient Boosting 52%
stock exchange occurred with utilizing a robust model from Naïve Bayes 53%
IndoNLU team, namely IndoBERT Model. As a collaboration LSTM 55%
from academia and practitioners on industries, the IndoBERT
model is a popular and state-of-the-art model for sentiment MLP 53%
analysis in Bahasa Indonesia inspired by BERT model [5].

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

42 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Since nowadays, social media played a major role in


affecting the stock market, social media analysis draws much
attention. From general sentiment analysis trend, constantly Data Acquisition
changing to sentiment analysis for spesific case. This topic • Data Labelling
becomes a popular research topic. The Predictability of
financial movements using online information from Twitter
is surveyed based on burgeoning literature [10]. This turning
point facilitates investors to express their opinion and
directing decision-making in trading activities. Pre-processing
On the other hand, it is common to use Twitter as the data • Punctuation removal
• Stopword
source in sentiment analysis and other activities, namely data
• Tokenization
visualization, data analysis, and machine learning model • Long Tensor
training. Azar and Lo stated that the capability of Twitter as • Sensor squeezing to 512
a social media is sufficient to be a data source for sentiment
analysis in the scope of stock market [11]. Although twitter
data source has advantages such as 1) large repository, 2)
various audiences in any topics, and 3) hashtag filtering, it
has a weakness that lies within data noise and irrelevant Model Training
audience correlated to stocks market. So, somehow if data • Compiling datasets
• Train-test split (80/20)
preprocessing did not perform, running a machine learning
model on Twitter data source is unable to fully represent the
stock market.
Derakhsan and Beigy proposed the uses of the LDA-POS
[12] method. The total accuracy is 56.24% in the English
language using 15 stock code datasets and gained 55.33% Model Validation
accuracy in Persian language five stock code. The study • Parameter tuning
mentioned that the result is superior to previous works that • Model accuracy
only implemented in one stock code and more differential at
twenty stock code [13], [14] and outperforms result of price ‹‰ͳǤ Research Methodology
only method both in the same dataset [15].
division of training, validation and testing dataset is shown in
Tanulia and Girsang proposed two methods: NMF (Non- Figure 2 and 3. Python library pandas utilized in several tasks,
Negative Matrix Factorization) and SVM (Support Vector namely creating database, manipulating data, visualization,
Machine) in the process of sentiment analysis and stock and exporting data file. On the other side, the testing part was
movement of LQ45 prediction in IDX with Twitter and gathered by manual scrapping in a trusted third-party trading
historical data. The study stated that the consideration of application, testing set content sample shown in Table 2. The
research methodology is based on literature review that testing dataset contained records of stock trader comments
covered the combination of SVM classifier and C3E-SL [16], and labeled by stock expertise into three categories: negative,
NMF [17] by Zhu, Jing, & Yu in their study stated that NMF positive, and neutral based on capital gains or capital losses.
has a good performance in speed and easier to interpret. The testing dataset is contained comments from two selected
The study claimed the accuracy value of stock movement sentimental companies at Indonesia Stock Exchange (IDX),
predictions is 60.15% by applying two methods, NMF-SVM. namely Smartfren Telecom (Persero) Tbk. And Bank Syariah
NMF is used to get the topic percentage. feature in the tweet Indonesia (Persero) Tbk. This data acquisition step produced
data, while the analysis sentiment value obtained using SVM a text dataset that ready to be view, analyze and preprocess.
[18].
III. RESEARCH METHODOLOGY
Figure 1 shows four major steps in this research. to achieve
reliable results on each step, technical methods are required.
As the purpose of this research is to understand Indonesian
stock trader slang language, a series of technical activities
were focused on pre-processing and preparing model for the
training. The explanation about each step is delivered as
follows:
A. Data Acquisition
The utilized dataset is divided into three parts: training,
validation, and testing. Training and validation are created by
combining six datasets from Instagram Comments, Twitter
Comment [19], and smsa_doc-sentiment-prosa [20]. These ‹‰ʹǤ Training and validation set class division.
datasets contained 13.200 records and each data class

43 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Punctuation Removal

Stopwords

Long Tensor

Squeeze 512 Tensor

Fig 4. Pre-Processing Steps.

C. Model Training
The model used and trained in this research to perform
sentiment analysis to predict stock movement price is
IndoBERT. This model is a state-of-the-art language model
for Indonesian based on the BERT model [6]. The pre-trained
‹‰͵Ǥ Test set class division.
model is trained using a masked language modeling objective
and next sentence prediction objective. Indobert-base-p1 has
TABLE 2. CONTENT LABELED BY MODEL
been pre-trained using Indo4B dataset, which contains four
billion words of Indonesia and 250 million sentences with
Content Class
parameters of the model is 124.5 million. IndoBERT is state-
yuk CL saham Negative of-the-art comparable to other models with a solid
Kasihan tlkm bakalan dipanggil kejagung Negative understanding of pre-trained Indonesia sentences and words.
kerugian negara is real coy wkwkwkkw These resources of the pre-trained model are easy to access
Adanya insentif pajak dividen semoga bisa Positive and reproduce[23].
membantu menahan penurunan bursa
Diharapkan saham dengan yield dividen tinggi The model trained using six combinations dataset with
seperti TLKM UNVR HMSP INDF dkk IDX total 13,200 records and tested on 1101 comments record
Div 20 bisa bertahan bahkan hijau
Gak ada kata Sangkut di Saham2 ini yg ada turun Positive from third-party trading application and will be evaluated in
beli turun beli hold keras TLKM bbri bbca bmri the final step. The training and validation dataset combination
ihsg is used and chosen based on two reasons, first is due to
Jakarta CNBC Indonesia Indeks Harga Saham Neutral absence of public labeled dataset for investor comments case,
Gabungan IHSG berhasil melanjutkan reli
penguatan selama 3 hari beruntun setelah ditutup second is related to theme of the research is slang language
menguat pada perdagangan Rabu 332021 and we estimate similarity of investor and user of social
Sahamsaham emiten bank big cap menjadi buruan media, even though it’s different corpus the authors provide
investor asingData BEI mencatat IHSG yang explanation of previous research statement in discussions of
menjadi indeks acuan di Burs
Jika Daya Mitra Telekomunikasi alias Mitratel Neutral
different corpus problem.
IPO apakah berpengaruh pada saham TLKM D. Accuracy for Model Evaluation
IHSG LQ45
In the final step, the proposed fine-tuned model is
B. Data Preprocessing evaluated with accuracy calculation as shown in Equation 1.
Before the data feed into IndoBERT model [21], it must The formula is containing the calculation of the whole correct
be pre-processed to be compatible with the intended standard- prediction of data classes and divided with correct plus
of the model. In this research, there are four steps of data pre- incorrect prediction, and model are able to predict three
processing implemented, as seen in Figure 4. categories of sentiment analysis-based impact in stock price
movements of capital gain or capital losses. The accuracy of
The first step is punctuations removal. In this function, the
correct prediction is analyzed and evaluate by stock
special characters of the sentence will be removed in order to
expertise-based stock market data.
remove unnecessary characters and bias. The second step is
stopwords. In this function, sentences will be mapped by each The purpose of this research is to achieve the highest
word will be converted to array of words. For example: testing accuracy as possible in addition we complement our
"ujung2nya paling harga 82 doji hijau asegg" into [‘ujung’, evaluation by provide the result of classification report
’nya’, ’paling’, ’harga’, ’82’, ‘sayang’, ‘hijau’, ’asik’]. The generated by sklearn.metrics library shown in Table 3.
third step is tokenizer and long tensor. In this function, each
word will be converted into a tensor. The final step of pre-
processing will squeeze into 512 tensors to follows ‫ ݕܿܽݎݑܿܿܣ‬ൌ
‫݊݋݅ݐܿ݅݀݁ݎܲݐܿ݁ݎݎ݋ܥ‬ (1)
IndoBERT model restrictions [22]. The output of these steps ‫ ݊݋݅ݐܿ݅݀݁ݎܲݐܿ݁ݎݎ݋ܥ‬൅ ‫݊݋݅ݐܿ݅݀݁ݎܲݐܿ݁ݎݎ݋ܿ݊ܫ‬
is a dataset that ready to be feed into the model.

44 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

IV. RESULTS AND DISCUSSION B. Identified Dataset Problem


From the acquired dataset and evaluation result, the problems
A. Model Experiment and Result are defined as follows:
In Figure 5 shown the overall result of experiment
conducted by this research as explained in detail in research 1. Tagging Problem
methodology section, the dataset holds 13,260 records and
divided as training set with 10,560 record and validation set Tagging several popular stock codes in comments or posts,
with 2,640 records.The pre-processed training and validation but the user does not mention and discuss the stock code
set is used to train the pre-trained indobert-base-p1 model. tagged. This is happening so that user comments or posts get
The-pretrained model as the main backbone for determining more attention and drag more users to another stock code.
sentiment analysis prediction of the dataset has previously 2. Price Sarcasm
been trained with Indo4B dataset, which contains 4 billion
words with modBel parameters of 124.5 million. The user used a collection of positive words or positive
Indonesian slang words such as ‘ARA ARA’ and said
After the IndoBERT model is fine-tuned and trained with unreasonably high prices in comments or posts, but it means
the dataset to perform sentiment analysis, then the model will negative sentiment if it is understood deeply.
perform labeling work based on stock price movements in a
testing set containing 1k records from third-party trading 3. Different Corpus
application, then the label will be evaluated and analyzed by The fine-tuned model is trained with a different category of
stock experts based on stock market data, the results corpus. The training dataset contained Instagram, Twitter,
evaluation will be used as a measure of model performance. and smsa-indonesia corpus. This composition is not
In the training set, our model performed with accuracy: equivalent with the testing dataset that contains the
99% at 10,560 records. In the validation set, the proposed Indonesian trader stock comments corpus. There is 31%
model performed with accuracy: 93% at 2,640 records. In the accuracy score difference when training accuracy compared
testing set, the fine-tuned model performed with 68% to testing accuracy, and 25% difference when validation
accuracy at 1.101 records, the labelled test set record by accuracy to testing accuracy.
model can be seen in Table 2 where model able to understand C. Discussion
half of trader slang language. The notebook is available in Compared to LDA, LDA-POS [12], SVM – NMF [18],
this following URL: https://github.com/Anderies/stock_ and the study conducted by Velay and Marc [9], the fine-
market_sentiment/blob/master/smsa_github.ipynb. tunned IndoBERT model outperforms similar studies in
literature review for sentiment analysis stock price movement
TABLE 3. TESTING CLASSIFICATION REPORT
of capital gain and capital losses. Our model also able to
Precision Recall F1-score Support identify trader slang language in Bahasa and able to
Negative 0.88 0.45 0.59 383
categorize the data automatically into negative, positive, and
neutral with 68% accuracy of one thousand one hundred one
Neutral 0.65 0.78 0.71 409 records.
Positive 62 0.84 0.72 309 In Table 3 shown that our model resulting in overfit
compared to training and validation result, on the other hand
Accuracy 0.68 1101
we achieved competitive accuracy compared previous stock
Macro Avg 0.72 0.69 0.67 1101 movement price researches [9], [12], [18],. Some of popular
Weighted Avg 0.72 0.68 0.67 1101 research done by Jacob et al [5]. And Simon Kornblith et al
[24]. conclude that model trained with different corpus could
lead to good result and even outperform model hand-
engineered parameters and features.
V. CONCLUSION

This research is successfully fine-tuning IndoBERT


model to perform sentiment analysis in case of Indonesia
stock trader comments. In the scope of automatic sentiment
analysis, the proposed model successfully recognized 1101
records from stock trader comments with 68% accuracy from
three different categories: positive, negative, and neutral.
Positive category covered confident comments on capital
gains, while negative category covered confident comments
on capital loss. On the other hand, neutral category covered
comments that have no impact on capital gains and capital
losses.

Fig 5. Experiment Overview.

45 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

REFERENCES stock social media for stock price movement


prediction,” Eng. Appl. Artif. Intell., vol. 85, pp. 569–
[1] E. Bartov, L. Faurel, and P. S. Mohanram, “Can 578, Oct. 2019, doi: 10.1016/ j.engappai.2019.07
Twitter help predict firm-level earnings and stock .002.
returns?,” Account. Rev., vol. 93, no. 3, pp. 25–57, [13] J. Si, A. Mukherjee, B. Liu, Q. Li, H. Li, and X.
2018. Deng, “Exploiting topic based twitter sentiment for
[2] I. Nurlaila, R. Rahutomo, K. Purwandari, and B. stock prediction,” in Proceedings of the 51st Annual
Pardamean, “Provoking Tweets by Indonesia Media Meeting of the Association for Computational
Twitter in the Initial Month of Coronavirus Disease Linguistics (Volume 2: Short Papers), 2013, pp. 24–
Hit,” in 2020 International Conference on 29.
Information Management and Technology [14] R. P. Schumaker and H. Chen, “A quantitative stock
(ICIMTech), 2020, pp. 409–414. prediction system based on financial news,” Inf.
[3] J. An, M. Cha, K. Gummadi, J. Crowcroft, and D. Process. Manag., vol. 45, no. 5, pp. 571–583, 2009.
Quercia, “Visualizing media bias through Twitter,” [15] T. H. Nguyen, K. Shirai, and J. Velcin, “Sentiment
in Proceedings of the International AAAI Conference analysis on social media for stock movement
on Web and Social Media, 2012, vol. 6, no. 1. prediction,” Expert Syst. Appl., vol. 42, no. 24, pp.
[4] K. Purwandari, J. W. C. Sigalingging, T. W. 9603–9611, 2015.
Cenggoro, and B. Pardamean, “Multi-class Weather [16] L. F. S. Coletta, N. F. F. da Silva, E. R. Hruschka,
Forecasting from Twitter Using Machine Learning and E. R. Hruschka, “Combining Classification and
Aprroaches,” Procedia Comput. Sci., vol. 179, pp. Clustering for Tweet Sentiment Analysis,” in 2014
47–54, 2021. Brazilian Conference on Intelligent Systems, Oct.
[5] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, 2014, pp. 210–215, doi: 10.1109/BRACIS.2014.46.
“Bert: Pre-training of deep bidirectional transformers [17] Y. Zhu, L. Jing, and J. Yu, “Text clustering via
for language understanding,” arXiv Prepr. constrained nonnegative matrix factorization,” in
arXiv1810.04805, 2018. 2011 IEEE 11th International Conference on Data
[6] B. Wilie et al., “IndoNLU: Benchmark and resources Mining, 2011, pp. 1278–1283.
for evaluating Indonesian natural language [18] Y. Tanulia and A. S. Girsang, “Sentiment Analysis
understanding,” arXiv Prepr. arXiv2009.05387, on Twitter for Predicting Stock Exchange
2020. Movement,” Adv. Sci. Technol. Eng. Syst. J., vol. 4,
[7] A. Budiarto, R. Rahutomo, H. N. Putra, T. W. no. 2, pp. 244–250, 2019.
Cenggoro, M. F. Kacamarga, and B. Pardamean, [19] R. S. Perdana, “Dataset Sentimen Analisis Bahasa
“Unsupervised News Topic Modelling with Indonesia,” github.com, 2020. https://github.com
Doc2Vec and Spherical Clustering,” Procedia /rizalespe/Dataset-Sentimen-Analisis-Bahasa-
Comput. Sci., vol. 179, pp. 40–46, 2021. Indonesia (accessed Mar. 17, 2021).
[8] R. Rahutomo, F. Lubis, H. H. Muljo, and B. [20] A. Purwarianti and I. A. P. A. Crisdayanti,
Pardamean, “Preprocessing Methods and Tools in “Improving Bi-LSTM Performance for Indonesian
Modelling Japanese for Text Classification,” in 2019 Sentiment Analysis Using Paragraph Vector,” in
International Conference on Information 2019 International Conference of Advanced
Management and Technology (ICIMTech), 2019, vol. Informatics: Concepts, Theory and Applications
1, pp. 472–476. (ICAICTA), Sep. 2019, pp. 1–5, doi: 10.1109
[9] M. Velay and F. Daniel, “Using NLP on news /ICAICTA.2019.8904199.
headlines to predict index trends,” arXiv Prepr. [21] F. Koto, A. Rahimi, J. H. Lau, and T. Baldwin,
arXiv1806.09533, Jun. 2018, [Online]. Available: “IndoLEM and IndoBERT: A Benchmark Dataset
http://arxiv.org/abs/1806.09533. and Pre-trained Language Model for Indonesian
[10] M. Nardo, M. Petracco-Giudici, and M. Naltsidis, NLP,” arXiv Prepr. arXiv2011.00677, 2020.
“WALKING DOWN WALL STREET WITH A [22] T. Kudo and J. Richardson, “Sentencepiece: A simple
TABLET: A SURVEY OF STOCK MARKET and language independent subword tokenizer and
PREDICTIONS USING THE WEB,” J. Econ. Surv., detokenizer for neural text processing,” arXiv Prepr.
vol. 30, no. 2, pp. 356–369, Apr. 2016, doi: arXiv1808.06226, 2018.
10.1111/joes.12102. [23] IndoBenchMark, “IndoNLU,” github.com, 2020.
[11] P. D. Azar and A. W. Lo, “The Wisdom of Twitter https://github.com/indobenchmark/indonlu
Crowds: Predicting Stock Market Reactions to (accessed Mar. 17, 2021).
FOMC Meetings via Twitter Feeds,” J. Portf. [24] S. Kornblith, J. Shlens, and Q. V Le, “Do better
Manag., vol. 42, no. 5, pp. 123–134, Jul. 2016, doi: imagenet models transfer better?,” in Proceedings of
10.3905/jpm.2016.42.5.123. the IEEE/CVF Conference on Computer Vision and
[12] A. Derakhshan and H. Beigy, “Sentiment analysis on Pattern Recognition, 2019, pp. 2661–2671.

46 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Development of Portable Temperature and


Air Quality Detector for Preventing Covid-19
Widodo Budiharto Edy Irwansyah Retno Dewanti
Computer Science Department, School Computer Science Department, School Management Department, Binus
of Computer Science, of Computer Science, Business School
Bina Nusantara University, Bina Nusantara University, Bina Nusantara University,
Jakarta, Indonesia Jakarta, Indonesia Jakarta, Indonesia
[email protected] [email protected] [email protected]

Alexander Agung Santoso Gunawan Danu Widhyatmoko Jarot Soeroso Sembodo


Computer Science Department, School Dept. of Visual Communication Master of Information System Program,
of Computer Science, Design, School of Design BINUS Graduate Program
Bina Nusantara University, Bina Nusantara University, Bina Nusantara University,
Jakarta, Indonesia Jakarta, Indonesia) Jakarta, Indonesia
[email protected] [email protected] [email protected]

Abstract—Due to Covid-19, body temperature measurement implements the RetinaNet architecture has also been carried
is mandatory and become an important consideration in out by previous research. This study uses facial images from
determining whether an individual is healthy or not. This paper dual cameras, both multi-spectral RGB and thermal cameras
presents the development of portable temperature and air to generate input images and capture a person's temperature
quality detector to prevent suspect of Covid-19 with the main respectively [4]
symptom is the body of temperature above 38° Celsius. We
propose an algorithm and architecture used for temperature
detector with maximum distance 80cm and CO2 and Volatile
In addition to monitoring body temperature, air quality,
Organic Compounds (VOC) measurement are as indicator of
good air quality. Based on experiment, we can detect especially in air condition closed rooms, needs attention in
temperature accurately until 0.3° Celsius using digital controlling the spread of the infection. Many studies regrading
temperature sensor MLX90614 comparing with commercial to the air quality to preventing the covid have been conducted.
one, furthermore the system able to give information about the Ventilation and air filtration play a key role in preventing the
quality of air, allowing/not allowing someone to enter a room spread of Covid-19 indoors [5]. This study aims to develop a
with accuracy 92.5%. low cost body temperature detector combined with an air
quality detector. Researchers believe that the health
Keywords—infrared temperature sensor; air quality sensor; environment will increase the immune of the people. Figure
Covid-19 1 shows our prototype of temperature and air quality detector
named as BacaSuhu V4. Part I of our paper is an introduction,
I. INTRODUCTION part II is concept of temperature and air quality sensor and we
The novel respiratory illness of Coronavirus disease 2019 proposed a method in part III and experimental result in part
(Covid-19) is still exist until 2021. Total confirmed cases of IV and Conclusion in part V.
Covid-19 until 7 June 2021 in Indonesia is 1,863,031 people
with 51,803 death [1]. In Indonesia, a person can be called a
suspected COVID-19 if they have one or more of criteria such
as experiencing symptoms of a respiratory tract infection
(such as fever or a history of fever with a temperature above
38° Celsius) and any of the symptoms of a respiratory disease
(such as cough, shortness of breath, sore throat, and runny
nose). Monitoring body temperature as a way of screening
people with COVID-19 is the most massive way to be done in
many places. According with [2] The application of body
temperature measurement with an infrared thermometer is one
of the existing covid 19 detection methods currently available.
This screening method is a fairly effective method, although
it cannot catch asymptomatic patients. The trend that still
increase and case of new variants, it seems that we must
struggle with Covid-19 using additional preventive tools such
as temperature detector. Fig. 1. Our prototype of temperature and air quality detector named as
BacaSuhu V4 standing using tripod. It shows the temperature, Co2 and VOC
The state of the art for this research has many approaches and said that the air is not health [5].
such as [3] proposed the correct detection for mask using
computer vision. The research has tried experimental and II. LITERATURE REVIEW
comprehensive methods to detect faces of people who use
masks, unfortunately that it requires a more complex system, A. Temperature Sensor
and the implementation costs are quite high. The use of a The MLX90614 is a Contactless Infrared (IR) Digital
single-stage object detection method with deep learning that Temperature Sensor that can be used to measure the

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

47 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

temperature of a particular object ranging from -70° C to measurement commands and waits for the measurement
38.2°C. The sensor uses IR rays to measure the temperature of process to complete. The wait process is set either by waiting
the object without any physical contact and communicates to for the maximum execution time or by waiting for the
the microcontroller using the I2C protocol. Figure 2 shows the expected duration and then polling the data until the read
temperature sensor used in the research: header is recognized by the sensor. Reading out the
measurement results is the final process carried out by the I2C
master. The air quality sensor module which is also capable
of measuring ethanol and H2 levels is as can be seen in Figure
4

(a) (b)

Fig. 2. MLX90614 Contactless Infrared (IR) Digital Temperature Sensor Fig. 3. Functional block diagram of the sensor SGP30 [8]
circuit (a) and the sensor module (b)[7]

The specification of the sensor:


• Working voltage: 3 ~ 5v (internal low voltage
regulator)
• Communication: standard IIC communication
protocol
• Measuring distance max 80cm
• Product Size: 11.5 * 16.5mm

B. Air Quality Sensor


Information about the air quality in a room or in a house Fig. 4. Module of Air quality sensor SGP30 Qwiic [9]
can be obtained using the SGP30 air quality sensor. The
information is obtained by monitoring the organic III. PROPOSED METHOD
compounds that evaporate in the air around the sensor. Using
the SGP30 sensor, indoor air quality (IAQ) readings can be A. System Architecture
taken in just 15 seconds with valid results. Typical air quality The author has conducted prior research and experiment
sensors (IAQ) are very good and fast at measuring levels of for face mask detection and controlling IR temperature sensor
CO2 and other volatile compounds dispersed in the air, [9]. We use Arduino and LCD 20x4 for display and I2C
although some other types of sensors take 20 minutes before (Inter-Integrated Circuit) line both for IR temperature and air
being able to inform air content levels with a burning time of quality sensor. I2C is a serial communication protocol, so
48 hours. data is transferred bit by bit along a single wire (the SDA
line). SCL is used to synchronize the communication between
The SGP30 sensor is designed to be highly resistant to the the microcontroller and the sensor. The SDA pin is used to
influence or contamination of other gases in the room and at transfer data to and from the sensor. Figure 5 shows the
the same time. Such a design is essential to ensure low drift architecture of our system [10].
as well as long term stability factors with highly reliable
results. The SGP30 Gas Sensor reads the results of the total
VOC in parts per billion (ppb) as well as the carbon dioxide IR Temperature Sensor
(CO2) content in parts per million (ppm). In principle, the GY-906 DCI
sensor will read two air quality signals in the form of a total
VOC signal (TVOC) and a CO2 equivalent signal (CO2eq)
[8] which are free in the air using a dynamic basic I2C Arduino controller
compensation algorithm and on-chip calibration parameters. Communication
Figure 3 shows a functional block diagram of a sensor
consisting of a Signal Processing and Baseline Compensation
circuit.
Air Quality Sensor 20x4 LCD Display
The communication process with the microcontroller in SGP30
principle takes place as follows: The I2C Master periodically
requests measurements and reads data, then sends Fig. 5. The architecture of temperature and air quality detector

48 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

First, the program will make a connection between the end.


controller of Arduino and sensor using I2C line. Ini start
condition, the display will inform about the ambient IV. EXPERIMENTAL RESULT
temperature, CO2 and VOC measurement and the condition We developed a program for controlling the sensor using
of the air quality. If person approach the device less than Ardunio UNO [10]. We also tuning the device with a switch
80cm, then it will display the body temperature and allowing for indoor/outdoor usage. Based on the experiment, the ability
access/not. There is a buzzer to indicate the status of of program for detecting temperature and air quality are very
measurement. The algorithm of temperature and air quality good. Based on table, the problem is for detecting not wearing
detector is shown in algorithm 1: mask properly because may be the training data for that is not
enough.
Algorithm 1. Realtime temperature and air quality detector
Table 1. Experimental results for detecting mask and
begin
temperature detector.
declare variables
import library
Results from 10 times simulation
setting LCD 20x4
No Not
setup() Action Success
success
begin Temperature <38°
Serial.begin(9600)//serial communication 1 Celsius and air quality 10 0
is good
Set I2C Temperature >38°
display welcome message Celsius and air quality
2 9 1
is good.
end
loop() Temperature <38°
3 Celsius and air quality 9 1
begin is good
reading temperature Temperature <38°
reading distance of person 4 Celsius and air quality 9 1
is not good
if (temperature<43 degree and distance>=80cm ) accuracy 92.5%
begin
display temperature, CO2 and VOC For improving the accuracy, for the next project may be
end range compensation should be implemented [11].
//measure air quality
if (CO2 >550 and TVOC>50) V. CONCLUSION
begin In this paper, we propose a model of temperature and air
quality detector for preventing Covid-19. We propose an
display "Air quality is not Good"
algorithm and low cost device that can be used in
end office/campus. With the trend of increasing positive cases in
if (CO2 <=550 and TVOC<=50) Indonesia as a second wave, the intelligent system used for
begin handling Covid-19 is very important. Based on experiment,
we can detect mask accurately and using digital temperature
display "Air Quality is Good"); sensor MLX90614, the system able to give accurate
end measurement with accuracy 95%.
end
if (temperature)>=28 and (temperature<38.3 and ACKNOWLEDGMENT
distance<80) Thank you for Bina Nusantara University for supporting
begin this research.
display temperature REFERENCES
display “Please enter"
end [1] Indonesian Covid-19 statistics,
https://www.worldometers.info/coronavirus/country/indonesia/
if (temperature>=38.3 and distance<80 ) accesed at 19 April 2021
begin [2] Kumar, K., Kumar, N.,and Shah, R. Role of IoT to avoid spreading of
COVID-19. International Journal of Intelligent Networks, 1, 32-35.
display temperature 2020
display “Enter not allowed" [3] B. Batagelj, P. Peer, V. Štruc, and S. Dobrišek, “How to Correctly
Detect Face-Masks for COVID-19 from Visual
end Information?,” Applied Sciences, vol. 11, no. 5, p. 2070, Feb. 2021.

49 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

[4] Isack Farady; Chih-Yang Lin; Amornthep Rojanasarit; Kanatip [8] Datasheet of SGP30 Air Quality Sensor, accessed on 10 Mey 2021 at
Prompol; Fityanul Akhyar, “Mask Classification and Head https://cdn.sparkfun.com/assets/c/0/a/2/e/Sensirion_Gas_Sensors_SG
Temperature Detection Combined with Deep Learning Networks”, P30_Datasheet.pdf
2020 2nd International Conference on Broadband Communications, [9] Face Mask and temperature detector, accessed at 20 March 2021 at
Wireless Sensors and Powering (BCWSP), Yogyakarta, 2020 https://www.youtube.com/watch?v=Aj-spofftpw
[5] Air quality for preventing Covid-19, accessed at 12 May 2021 at [10] Introduction to Arduino, https://www.arduino.cc/ accessed at 12
https://www.usatoday.com/in-depth/graphics/2020/10/18/improving- November 2020.
indoor-air-quality-prevent-covid-19/3566978001/
[11] Goh, Nicholas Wei-Jie et al. “Design and Development of a Low Cost,
[6] Demo of BacaSuhu V4: ttemperature and air quality detector, accessed Non-Contact Infrared Thermometer with Range
at 3 June 2021 at https://www.youtube.com/watch?v=S7TqSW7hx_w. Compensation.” Sensors (Basel, Switzerland) vol. 21,11 3817. 31
[7] MLX90614 Contactless Infrared (IR) Digital Temperature Sensor, May. 2021, doi:10.3390/s21113817
https://www.melexis.com/en/product/MLX90614/Digital-Plug-Play-
Infrared-Thermometer-TO-Can, accessed at 10 January 2021.

50 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Development of Robot to Clean Garbage in River


Streams with Deep Learning
Brilyan Nathanael Rumahorbo Antonio Josef Muhammad Hafizh Ramadhansyah
Computer Science Department, School of Computer Science Department, School of Computer Science Department, School of
Computer Science, Computer Science, Computer Science,
Bina Nusantara University Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia 11480 Jakarta, Indonesia 11480 Jakarta, Indonesia 11480
[email protected] [email protected] [email protected]

Handy Pratama Widodo Budiharto


Computer Science Department, School of Computer Science Department, School of
Computer Science, Computer Science,
Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia 11480 Jakarta, Indonesia 11480
[email protected] [email protected]

ABSTRACT— The problem of garbage that continues to fill rivers to the sea comes from countries on the Asian continent
the oceans has become a serious concern in recent years. [4]. China is one of the Asian countries with the largest
Garbage that continues to accumulate in the oceans is very plastic waste contributor, which is around 8.82 million
dangerous to the survival of marine life. Several efforts have tons/year and around 1.32-3.53 million tons for pollution in
been made in addressing this problem, ranging from reducing
the sea, followed by Indonesia in second place with 3.22
the use of single-use plastics to carrying out garbage
transportation from rivers as one of the biggest contributors to million tons/year and around 0.48-1.29 million tons of
garbage that accumulates in the oceans. This paper will present pollution in the sea in 2010 [5]. This rubbish has threatened
the creation of a robot that can help transport waste on the the survival of marine life [6]. Rosek stated that in several
surface of a river. By building a river cleaning robot, it is hoped regions in Indonesia, dead marine animals were found with
that it can control the growth of waste on earth. This robot uses plastic waste in their bodies [6].
Robot Vision technology to detect the presence of trash around
the robot so that it can be transported into a storage tank, and To handle the plastic pollution problems several projects
the language that will be used in this robot is python language. and designs have been researched [7]. Some of them are
In this paper, pictures of the robot components used, and the autonomous, mechanical, and human-based computation
work system of the robot will be attached. designs [7]. Several people have tried to develop a prototype
plastic waste cleaning robot. Andy Febry Anto and Totok
Keywords— Waste, River, Robot vision, Artificial intelligence, Sukardiyono have developed an Autonomous Rover capable
Deep learning, Object recognition of cleaning trash on the beach equipped with an ArduPilot as
I. INTRODUCTION navigation capable of mapping and monitoring the state of
the shallow marine environment [8]. Dutch company, The
Our earth has suffered a lot of damage due to human Ocean Cleanup has developed The Interceptor, a robot
activity. One of them is the volume of waste that has swelled capable of transporting up to 50,000 kg plastic/day [9]. The
due to neglect of waste management [1]. Poor waste Interceptor is equipped with a conveyor belt to move waste
management coupled with single-use plastic production from the surface of the water into one of the collection basins,
which is one of the most pressing environmental problems is and a sensor that can inform the operator when the trough is
rapidly increasing beyond the world's ability to tackle it almost full [9].
making waste a serious global problem [1]. Single-use plastic
is a serious environmental problem because it is a type of
inorganic waste that is difficult to decompose. A lot of plastic
waste is carried by rivers and ends in the sea, carried by II. LITERATURE REVIEW
waves across islands, countries and even continents [1]. A. Artificial Intelligence
It is estimated 1.3 billion tons of plastic waste (not Artificial intelligence (AI) is a branch of computer
including other types of waste) will fill the earth (land and science that studies how to make computers better do things
oceans) by 2040 if prevention is not taken as soon as possible that humans are currently doing [10]. AI becomes very
[2]. The Science study states that there are 24-34 million tons familiar because its applicability on computers is getting
of plastic waste every year polluting the world's oceans and stronger, there are special problems that can’t be resolved by
it is estimated that the amount of waste will increase by 53- humans and traditional computing properly [10]. According
90 million tons by 2030 [3]. According to data from Our to [11], there are 4 AI approaches that have been followed,
World in Data, at least 86% of the input of plastic waste from namely the touring test approach, the cognitive modeling

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

51 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

approach, the "laws of thought" approach, and the rational computer vision, the robot can recognize objects / trash on
agent approach made by different methods by different the surface of the water which will command the motor that
people where humans are the center of knowledge empirical, moves the pedal wheel to direct the robot to the object / trash.
which involves observations and hypotheses about human The camera has also been equipped with a program that is
behavior. AI also assists the scientific goal to construct an able to measure the distance from the robot to the closest
information-processing theory of intelligence [12]. AI's object / trash and prevents collisions with walls or other
methods and techniques have been applied to solve several obstacles so as not to interfere with the robot's work process.
different problems as we mentioned before [12]. When the camera detects an obstacle, the motor that moves
the pedal wheel will be ordered to move the robot against the
obstacle.
B. Robot Vision
According to Oxford English Dictionary a robot is
defined as "a machine capable of carrying out a complex
series of actions automatically" [13]. In order for the robot to
do these things, the robot needs to understand what are
around and what it can do anywhere [13]. The things that are
needed for the robot vision manufacturing process to run well
are low cost, reliable operation, fundamental simplicity, fast
image processing and ease of scene illumination [14].

Fig. 1. The system architecture of waste recognition


C. Deep Learning
To move the robot, the robot is directed by using a motor
Deep learning is a part of machine learning that makes
that rotates the paddle wheel. The machine gets its power
computers able to learn from their experience and able to
supply from the battery. The battery gets its power from
understand the world with concepts of hierarchy [15]. Deep
charging from a direct electricity source and from solar
learning will be very useful in the process of learning models panels which capture energy from sunlight. The rotating
on complex datasets and in large numbers [16]. Deep learning paddle wheel directs the robot to where the trash is located.
works by creating a learning model that continues to grow
When the sensor detects an object / trash in front of the
over time with the support of computer hardware and
engine, it will rotate the paddle wheel forward, as well as at
software infrastructure, deep learning can develop so that
the back, the engine will turn the paddle wheel backwards.
complex problems can be solved with increasing accuracy
When the sensor detects an object / trash on the left, the
over time [15]. Increased accuracy is also supported by the engine will rotate the right paddle wheel forward and the left
amount of data being studied to train the model continues to paddle wheel backward as well as when the camera detects
increase so that over time the accuracy also increases [15].
an object / trash on the right, the machine will turn the right
rowing wheel backwards and the other rowing wheel left to
the front.
III. PROPOSE METHOD
A. Architecture.
The robot can learn and remember important locations
and forms of water that the robot has traversed in the river
where the robot is operated. The robot is also equipped with
garbage data generated from the river where the robot is
operated and will continue to collect garbage data from the
river where the robot is operated. The data collected will be
studied and analyzed by robots with deep learning
capabilities as a measure of river conditions and the robot's
working effectiveness over time and these data can also be
used to assist the research process for those in need.
The robot is equipped with artificial intelligence that will Fig. 2. The design of autonomous river cleaning robot
carry out deep learning of the data / images of the given trash
object. From the results of this deep learning and assisted by The robot is equipped with an arm that will direct the
a camera installed on the robot from the application of trash in front of the robot to the tub door which will open
when it detects an object in front of the tub door and puts it

52 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

in the container. The reservoir is equipped with sensors that end if


can instruct the robot to return to the place where the garbage if battery low
will be collected when the container is fully filled with charge it
garbage, then the robot will return to work when the container else
has been emptied. The collected waste will later be start working again
reprocessed so that it does not pollute the river. end if
end

C. Flow Chart

Fig. 3. The Illustration of Container

B. Algorithm
The program begins with a camera that begins to identify
the surrounding environment, from objects on the surface, to
obstacles to be faced such as river walls. The results of the
identification of the surroundings will be used to determine
the direction of the robot's movement where when it finds an
object on the surface of the water, the robot will approach the
object, and when facing an obstacle, the robot will be directed
to avoid the obstacle.
Algorithm 1. Algorithm of Robot Mechanism
engine start
camera on
begin
if camera detect an object on the left then
turn left
if camera detect an object on the right then
turn right
if camera detect an object in front then
move forward
if camera detect an object behind then Fig. 4. Flow of robot system
turn around
end if IV. EXPERIMENTAL RESULTS
if camera detect an obstacle on the left then The experiment was carried out using image data of
don’t turn left objects on the surface of the water which consisted of 73% of
if camera detect an obstacle on the right then image data of garbage objects in cloudy and clear water and
don’t turn right 27% images of other objects. The training data is given to the
if camera detect an obstacle in front then VGG16 network model for training the neural network
move backward model, the test data is used to assess how well the model
end if identifies garbage objects on the water surface. Experiments
if an object near robot and the object are trash then using camera 360 on garbage on the surface of river water
take the object to the tank were also carried out after the model training process using
end if image data of garbage objects. The camera is directed at the
if tank is full then garbage object on the surface of the river water, then the
inform the operator model will begin to identify the object by processing the
empty the tank

53 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

garbage object captured by the camera to a size compatible V. CONCLUSIONS


with VGG16, and the model then determines whether the Water surface object recognition using deep learning
object is garbage or not. From the results of experiments on implanted in the river waste cleaning robot with a camera as
several garbage objects on the surface of the river water, the an image of objects around the robot can be used to maximize
success of the model in identifying garbage objects on the the river cleaning process from garbage. The algorithm
surface of the river is 72%. programmed on the robot will command the paddle wheel
motor to direct the robot to the waste objects that have been
identified by the VGG16 deep learning model and avoid the
robot from obstacles in front of the robot. All the power
needed by the robot is obtained from batteries that are
charged using manual charging and solar panels as an
alternative charging method when the robot is operating.
From the experimental results of the VGG16 model in
identifying garbage objects on the water surface with an
accuracy rate of 72%, it is hoped that in the future the level
of accuracy and the speed and identification of objects on the
water surface can be developed.
REFERENCES

[1] Laura Parker, “Plastic pollution facts and information,” National


Geographic, 2019.
https://www.nationalgeographic.com/environment/article/plastic-
pollution (accessed Mar. 26, 2021).
[2] G. L. Widyaningrum, “Studi : Jumlah Sampah Di Bumi Akan
Mencapai 1,3 Miliar Ton Pada 2040,” National Geographic
Fig. 5. Detected garbage results Indonesia, 2020.
https://nationalgeographic.grid.id/read/132263813/studi-jumlah-
sampah-di-bumi-akan-mencapai-13-miliar-ton-pada-
2040?page=all (accessed Mar. 26, 2021).
[3] G. L. Widyaningrum, “Studi Terbaru: Masalah Sampah Plastik di
Bumi Sudah di Luar Kendali,” www.nationalgeographic.grid.id,
2020. https://nationalgeographic.grid.id/read/132346281/studi-
terbaru-masalah-sampah-plastik-di-bumi-sudah-di-luar-
kendali?page=all (accessed Mar. 26, 2021).
[4] H. Ritchie and M. Roser, “Plastic Pollution - Our World in
Data,” OurWorldInData.org., 2018.
https://ourworldindata.org/plastic-pollution (accessed Mar. 26,
2021).
[5] J. R. Jambeck et al., “Plastic waste inputs from land into the
ocean,” Science (80-. )., vol. 347, no. 6223, pp. 768 LP – 771,
Feb. 2015, doi: 10.1126/science.1260352.
[6] A. Andriansyah, “Sampah Masih Jadi ‘Predator’ Biota Laut,”
Voice of America Indonesia, 2020.
https://www.voaindonesia.com/a/sampah-masih-jadi-predator-
biota-laut/5454013.html (accessed Mar. 26, 2021).
[7] H. Othman, M. Iskandar Petra, L. Chandratilak De Silva, and W.
Caesarendra, “Automated trash collector design,” in Journal of
Physics: Conference Series, 2020, vol. 1444, no. 1, p. 12040, doi:
10.1088/1742-6596/1444/1/012040.
[8] Pembersih Sampah Pantai menggunakan ArduPilot,” Elinvo
(Electronics, Informatics, Vocat. Educ., vol. 4, no. 2, pp. 202–
209, Dec. 2019, doi: 10.21831/elinvo.v4i2.28793.
[9] The Ocean Cleanup, “Rivers | The Ocean Cleanup,” The Ocean
Cleanup, 2019. https://theoceancleanup.com/rivers/ (accessed
Mar. 26, 2021).
[10] S. B. Rich, Elaine; Knight, Kevin; Nair, Artificial Intelligence.
Tata McGraw-Hill, 2010.
Fig. 6. Garbage not detected results [11] P. Russell, Stuart J; Norvig, Artificial Intelligence: A Modern
Approach. Pearson, 2010.

54 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

[12] N. J. Nilsson, Principles of Artificial Intelligence. Elsevier, 1980. Heidelberg, 1983.


[13] U. Frese and H. Hirschmüller, “Special issue on robot vision: [15] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. The
what is robot vision?,” J. Real-Time Image Process., vol. 10, no. MIT Press, 2016.
4, pp. 597–598, 2015, doi: 10.1007/s11554-015-0541-3. [16] J. D. Kelleher, Deep Learning. The MIT Press, 2019.
[14] A. Pugh, Ed., Robot Vision. Berlin, Heidelberg: Springer Berlin

55 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Effectiveness of LMS in Online Learning by


Analyzing Its Usability and Features
I Putu Gede Prama Duta Rio
Computer Science Department, Computer Science Department,
School of Computer Science School of Computer Science
Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia 11530 Jakarta, Indonesia 11530
[email protected] [email protected]

Mochamad Rizky Febriansyah Maria Susan Anggreainy


Computer Science Department, Computer Science Department,
School of Computer Science School of Computer Science
Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia 11530 Jakarta, Indonesia 11530
[email protected] [email protected]

Abstract—With the advancement of technology, education As of conducting this research, there is a COVID-19
and learning processes have evolved and accommodated by pandemic still happening worldwide where the majority of
many forms of digital applications and services, one of which students ranging from primary school to university students
that is prevalent in most educational institutions is none other and other office workers are forced to do their job from home
than LMS (Learning Management System). In this current using online facilities. Due to this very precarious condition,
digital era, the world is transitioning from offline to online students learning process shifted to using online meeting apps
activities including schools and universities due to certain such as Zoom [5] or online communication platforms such as
circumstances. This transition affects LMS to fully support Microsoft Teams [6]. As a result, students and lecturers need
online learning as opposed to supplementary support for offline
to adapt to a new online learning environment using a
learning process. Therefore, we conduct this research to find out
proprietary or third-party licensed LMS.
how effective are currently available LMS to support online
learning. Our approach is by analyzing LMS features by During this pandemic, online learning has undergone
conducting workflow testing to test the effectiveness of the LMS many changes in terms of learning and assessment methods
for online learning. We align our findings in comparison to user [7]. This control is also felt by teachers and students or
satisfaction survey to reach the conclusion. We will see how workers in companies whose internet access is inadequate,
effective is our current technology to support education and it is not easy for them to fully understand the learning and
especially online learning. We will know what modules are done
work process. Studying and working online adds to laziness
right, what needs to be improved, which one is important and
and difficulty concentrating on learning comprehension, and
not as important, and what addition can be made as a reference
to enhance future LMS development. (Abstract) difficulty working [8].
The main question that we have in mind is how effective
Keywords—Online Learning, Learning Management Systems, is current LMS technology adapted by institutions to support
Features, Students, User Satisfaction. online learning especially during emergent situation like this?
I. INTRODUCTION We also question do the current LMS features standard suffice
enough to support this effectively? Or maybe are there any
With the advancement of learning technology, a learning improvements needed from features and usability standpoint?
management system or LMS has become essential in
education [1]. Currently, the world is undergoing a transition There are several previous research studies published that
to a digital era where there has been a shift in offline to online tackles similar problem and questions regarding the
activities, which encourages universities and companies to effectiveness of an LMS. For example, in 2008 Kakasevski,
change their educational programs and jobs. Currently, many Mihajlov, Arsenovski, and Chungurski [9] conducted
learning institutions carry out online learning with learning usability evaluation towards the LMS Moodle in which the
technology [2]. From this technology, they can improve conclusion is generally positive with slight lacking problem in
communication with students, staff, teachers. The main communication module. Another example in 2010 a case
technology to support online learning is the LMS or Learning study was conducted at OUM [10] to figure out the
Management System. effectiveness of LMS used by the institution. The study results
in a positive outcome for LMS effectiveness using several
LMS itself is software specifically designed to create, factors. The factors measured were easy to learn, error
distribute, and manage educational content delivery that we tolerance, LMS speed, and quality in which it got 3.8193,
can use through web-based or mobile-based applications [3]. 3.7383, 3.8925, and 4.11 mean score respectively.
LMS supports its courses with various features, namely
discussion boards, forums, online exams, sharing schedules, A more recent study in 2018 [11] focused on online
and file sharing. This gives all students and teachers easy learning environment using Moodle also has similar
access to courses on the go. In contrast, administrators and conclusion in which it stated that the LMS positively affects
teachers can monitor student progress and make academic achievement of the student and have overall positive
improvements. Thus, this makes the LMS able to support impact on educational outcomes. However, these studies may
online learning [4]. feel outdated as the current era and condition that we are

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

56 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

facing is different than it was a few years ago. With (as this is an optional survey). The qualitative research that we
technology growing exponentially fast, conducting evaluation will do includes reviewing modules from several selected
periodically can be helpful to be used as a new reference point LMS for technical surveys, namely Binus Maya, where Binus
in the future. Maya is the LMS used by BINUS University to support
learning. Then from a user satisfaction survey and testing the
Therefore, to answer these questions this research was LMS workflow while performing a specific task. If we are
conducted to determine the appropriate LMS models and lucky enough, we will gather additional information by
features for distance learning by observing what features the interviewing the people responsible for developing the LMS
LMS has. The methodology used is by analyzing its features and gathering data about their approach to building the LMS
and conducting surveys to determine user satisfaction. The and why.
user satisfaction, namely students and teachers, using the
current LMS. Are they delighted with the LMS used by their B. LMS Module Review and Classification
institution? From this level of satisfaction, we can determine Every app has a different approach to how it is built, and
whether user satisfaction means that the LMS used by their different modules are considered and implemented to satisfy
institution is effective, then what is the level of student and the app's requirement. LMS itself is also identified as
teacher satisfaction? If they are not delighted with the LMS application software that facilitates learning, whether as a
used by their institution, the LMS is less effective [12]. We companion to face-to-face learning or to support online
can find out what affects user satisfaction and then determine learning. Therefore, an LMS has a different form of modules
what makes LMS effective. In this study, we also tested the that are being put together to make a functional and practical
LMS modules and features and analyzed their effectiveness system that its users can use.
through user input.
In this part of the research, we will conduct a technical
II. RESEARCH METHODOLOGY survey to identify and classify the different modules that are
implemented to selected LMS by trying all the features that
the LMS has to offer and testing the workflow when doing
several tasks such as (1) accessing learning materials, (2) ease
of communications, (3) assignment submission process, (4)
schedule checking sequence, and more. This test is for
understanding the architecture complexity of the LMS and
check the intuitiveness nature of the LMS. After experiencing
all the features, we will gather all the information and form
our subjective opinion towards it both the positives and the
negatives and create a user survey based on the module review
result.
C. User Survey and Analysis
This user survey consists of users' opinions about
their experience while using the LMS, and the first is how
often they use the LMS. Second, what elements of the LMS
are important or frequently used and not so important, or are
they rarely used. Third, their impression of how it was
implemented, whether the feature has been implemented
properly. Fourth, respondents' ratings of features with their
Figure 1- The analysis stages impact on their learning. How that feature can be improved.
A. Collecting Data Sources Fifth, what features make LMS effective. Sixth, respondents'
opinions regarding the user experience, user interface,
The LMS analysis stages that we use begin from the LMS loading speed, and accessibility of LMS. Then seventh,
Observation after the Topic Discussion (Figure 1). We finally, we ask what input from respondents needs to be
identify the source of our data used in this study for improved from LMS.
measurement, analysis, and evaluation. Our main All these data that we gather will be used for our
experimental method to collect and gather data for this
analysis. For example, knowing the importance of certain
research is divided into two, the first one being through
workflow testing (observation) for LMS analysis purpose and modules compared to their actual implementations feedback
through questionnaire survey to gather data from LMS users. may reflect the quality of main features that may or may not
There are several quantitative and qualitative data sources that be focused on during development. This result will also help
we will collect through surveys, observation, and examination maintain the development flow for those who build the LMS
of documentation. to set priorities on what things to fix, what things to
implement next, etc. Similarly, the result from modules usage
The quantitative data includes the ranking of important frequency helps the developer to understand which features
and effective LMS modules based on a survey of average user are favored by their users and evaluate whether they think it
scores and ratings for modules that users use the least when is good enough or they may strive even more to improve it
interacting with the LMS. Apart from that, it also includes the
and may create an opportunity out of it where they make it as
optional features you would most like to incorporate into your
their unique spin exclusively for their LMS (Example being
well-known or familiar LMS learning management system
and preferences. This additional data will be analyzed if there Steam embracing user-generated content by enhancing Steam
are sufficient samples after the collection survey period ends Workshop functionality, Steam Dev Days 2014) [13].

57 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

D. LMS Design Analysis (Main task to help user • Mail System


communicate with each other) (redirect
As we compared the user results, we also add our toOutlook)
own analysis to the actual LMS itself. In this analysis, we • Message
compared the LMS workflow and emulating the process of • Forum system
doing certain tasks integral to the learning process. As (Post and Reply)
mentioned above in point B regarding the workflow test, we • Notification
will analyze the navigation process needed to achieve our • Academic
tasks. One example is that we will count how many clicks are Advisory
Content • Dashboard.
needed to reach a specific window and how many different
Contains:
ways the LMS offers to access that said window. (Features that provide information
 Current Courses
or give convenience that are
We will also analyze how intuitive the usability and related to educational matters)  News Stream
accessibility of the features in said LMS. Accessibility  Agenda
correlates to how many different ways users can receive (Upcoming
information and inclusivity for all kinds of users with Schedules only)
different backgrounds. Usability correlates to effectivity.  Student Activity
Points (SAT
Whether or not said users could easily understand and
module/feature helps with the problem it faces and gives the Community
desired result (Usability definition based on ISO, 1998) [14]. Service)
This section will observe counterproductivity, contradicting  GPA Score
methods, user interface problems, inconsistencies, and many  To do List
more within these modules/features.  Course credit
info
After our observation, we will compare it to the
• Profile System
user's answers, whether our result corresponds to what the (Change Photo,
majority said or not. In conclusion, we will summarize what Personal
modules are important and frequently used in LMS, how it Information, etc.)
currently stands, and the desired result to accommodate • Cloud storage
online learning or learning process in general. (OneDrive
account)
III. RESULT AND DISCUSSION • Shared Material
• People
1. LMS Observation • Academic
The first part of our research consists of observing the Calendar
LMS, using the LMS for identifying features and classify • Curriculum /
Course
them into modules that supports the learning experience, and
Distribution
do some workflow testing when doing certain tasks that are • Class Schedule
related to learning processes (see Figure 1). The LMS that we • KRS Manager
are observing is Binus Maya LMS that is used by BINUS
University students (in which we already got the permission • Protest Score
to examine). The following are our results with Binus Maya: Learning • Courses
1.1. Binus Maya LMS (Helps with main learning • Resources (filled with
process) main material, links,
We began examining the LMS by listing all of its visible and references related
and functional features and after we’re done, we classify those to material topic)
features into their respective modules that we consider it • Video Learning
belongs to. After thorough examination, we gathered • Video Conference
information about the modules and managed to summarize • Assessment Rubric
them into 8 big modules: Communication, Content, Learning, • Assignment
(Individual / Team)
Exam, Enrichment, Evaluation, Utilities, and Extra. From
• Entrepreneurial &
those 8 modules, 6 of them we categorized as main modules, Employability Skills
while 2 of them (Exam and Enrichment) we consider as a • Global Learning
submodule or extension to Learning module. The reason being Systems
that those 2 modules are related to Learning but also
Exam • Practicum
independent (All the features may not be relevant to all
• Exam Schedule
applicable users) on its own. (Learning module extension
that’s focused on exams) • Thesis
1.2. Binus Maya Modules and Features Breakdown
Enrichment • Enrichment Programs
Below is a table that helps to visualize the modules and (Learning module extension (Unauthorized access)
their features from Binus Maya LMS (Table 1): that’s focused on enrichment • Internship Schedules
programs)
Table 1- Modules and features of Binus Maya.
Evaluation • Attendance
Modules Features (Features that evaluates Information
certain performances) • View Score
Communication • News Stream • View Grade

58 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

• Student Activity
Transcript /
Development
• Community Service
• Course Credits
• Organization
Experience
• Working Experience
• Achievement
• BUEPT Result
Utilities • Feedback Center
(Additional features that • Letter Request Figure 3-Most used features.
accommodate non-educational • Financial Summary & Next, we want to know what respondents think about the
matters) Receipt features available at Binus Maya, whether they have been
• Graduation purposes implemented properly. The largest number of responses
• Search function
(Material and answered yes (86,4%), and in contrast, only 13,6% of total
Keyword) respondents answered that Binus Maya’s features had not
been properly implemented (Table 2). We can conclude that
Extra • BINUS TV (Inhouse in the user's opinion, the features available in Binus Maya
application for videos
(Lead to other website/
related to university)
have been implemented properly.
service not related LMS purpose)
• BINUS Square Table 2- Respondents' opinions regarding feature implementation
(Inhouse application
Have the features in Binusmaya been implemented
for boarder’s related
properly?
service)
Percent Frequency
Doesn’t do anything • Student Do Quiz
• Calendar Yes 86,4% 19

No 13,6% 3
2. Survey Result and Analysis
.
After confirming our observations, we will report the The next question relates to the assessment of Binus Maya
results of the data's descriptive analysis to determine the user features about their impact on learning. This question is to find
satisfaction result. The survey process was carried out with the out whether the features of Binus Maya have a positive impact
criteria of respondents being BINUS students who knew about on student learning. The majority 47,8% answered good and
the Binus Maya LMS. From the survey process, we collected 34, 8% answered fair (Table 3) also 13% answered excellent.
a total of 23 complete and valid answers from our respondents. Only 4,3 answered very poor. So it can be said that the impact
Process data show that 100% of respondents surveyed are of the Binus Maya feature is quite positive on student learning.
between 18 and 25 years old from a demographic perspective.
Table 3-Binus Maya features impact on learning.
First, we asked students how often they used Binus Maya. Binusmaya features in relation to their impact on
It seems that the majority (56,52%) of them use Binus Maya learning
very often (34,78%) of them use Binus Maya often. (Figure 2) Very Poor Poor Fair Good Excellent
4,3% 0% 34,8% 47,8% 13%
From previous research questions, we next find out what
features, according to respondents, make Binus Maya
effective. The majority 56,5% of respondents answered
forums and 52,2% answered resources, assignments, and
the Learning tab. So it can be said that these features are
influential (Figure 4). Then when we compare with the
survey results for the most used features (Figure 3), we can
see which features are most often used and the most
important according to respondents that make Binus Maya
the most effective.

Figure 2- Time to use LMS.


Besides, the next research question is to find out
which features are most often used by respondents. The
majority 91,3% of respondents that forums are the most used
features. Followed by 82,5% of respondents think resource is
the most used features (Figure 3). From here, we can see that
these features are important features for respondents.

Figure 4-Binus Maya features impact on learning.

59 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

The next thing is the respondent's opinion regarding the design (less click but not only as a
not full function). reader, but also
value of user experience, user interface, loading speed on to reply.
Binus Maya, and accessibility. The user experience Communicate 2 3/2 Both methods are Message feature
assessment has a positive value as well as the accessibility with lecturer easy to use, self - seems obsolete
explanatory, and for primary
value, the user interface value has sufficient value, but at intuitive as they’re users (students
loading speed, there are a drawback where 30% of total both placed on main and lecturers) as
respondent answers poor, 52,5% answered fair and 13% of dropdown menu. the OutLook
service is used
answers very poor (Table 4). From there, we can see that the more often.
weakness of the Binus Maya is the loading speed. Check score 4 (3 6/1 Multiple version of Show English
relevant) showing scores, but score should be
Table 4-Binus Maya score value. ultimately only 1 merge into
Score way shows detailed another tab
information. 2 ways (such as First
Very are summary view, Year Program
Poor Fair Good Excellent
Poor and 1 other way tab) instead of
User shows score relevant its own separate
4,3% 0% 34,8% 52,2% 8,7%
experience only to freshmen tab as it is
(English score). All irrelevant to
User interface 4,3% 8,7% 39,1% 34,8% 13% of them are intuitive, later semester
Binusmaya easy to access, and students.
loading speed 13% 30% 52,5% 0% 4,3% placed within the
same directory.
value
Check 4 6/1 Intuitive and easy to -
Binusmaya schedule use, ranges from 3-6
4,3% 4,3% 30,4% 43,5% 17,4%
accessibility clicks and contains
important
information (1 click
3. Observation and Workflow Testing Analysis for today’s agenda).
Serve different
After identification, we tested the workflow of the LMS purposes, but
by using what it has to offer to complete certain tasks that are separated and
related to learning process. The measurement that we based it organized nicely
depending on desired
on are the Usability Inspection Methods (Nielsen, 1994) [15] information.
and relevancy to Three-Click rule [16][17] (Grundy, 2009). Check and 3 7/3 Pretty much similar Additional
download to submitting convenience to
Below is a table that summarizes our analysis and how the assignment assignment process. be able to check
Consistency for assignment
LMS perform when doing certain tasks (Table 5): downloading and directly from
submitting answer in notification.
the same place feels Helps that it
Table 5-Workflow Testing and Analysis. natural as they are directly put user
Task Unique Most Analysis Extra notes / our both relevant. to related course.
way(s) /least input
click
Search 2 8/4 Second half of Class selection
learning navigation same for course
IV. CONCLUSION
material sequence, but as a material should After we compare our assessment and survey result, the
whole do different appear more
purpose (previous clickable as it conclusion that we got is that the features and modules that the
and current courses). currently blends majority of users use in an LMS (Binus Maya) are related to
Busy server time in with
may cause 1 method background
fundamentals of learning processes such as forum discussion,
inaccessible. Material element. resources, and assignment (with evaluation module as close
tab immediately second). The result shows that users are mostly satisfied with
shows up upon
clicking. Extra step is
the LMS when these important features are implemented well,
not a problem as it is but also dissatisfied with slow loading times and inconsistent
intuitive. stability. This is verified by our assessment in Table 5 that
Submit 3 8/4 Good amount of Similar note
assignment choices to do the task with accessing
shows our agreement to user opinions. Which means that
with intuitive material, with currently Binus Maya features and usability suffice enough to
navigation. Busy addition of the be effective for the users but there are still few nitpicks that
server usually causes Assignment tab
students unable to should be put can or need to be improved.
upload their more front as
assignment due to opposed to do This results also shows that users use LMS most of the
slow loading time extra 4 clicks time as an alternate way to do learning process (learn,
(correlates to Binus through course
Maya architecture navigation bar
evaluation, communication) digitally, no more and no less.
using AJAX single Which means that developers who wants to develop or
page application) currently maintaining an LMS should build theirs with the
Access 4 6/2 Variety of doing this Accessing forum
main focus on these aspects (refer to first paragraph) as they
discussion task is balanced from courses
forum enough. Most ways should query the are what users expect when they interact with an LMS. Instead
are intuitive, correct course of using resources equally even on features that may not be
convenient, and and class instead
clearly explained by of default forum
utilized much by users (which may leave those main features
the UI with exception selection page. with “decent” result), developers should prioritize them first
of 1 questionable Clicking it from and make sure it’s implemented well and then they can
design choice forum
(unnecessary and notification continue with other features while still maintaining server
counterintuitive way) (without view stability.
and 1 all notification)
counterproductive should set user

60 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Unfortunately, from our result it seems that the unique during the Coronavirus Pandemic: Students’ Perspective.
features available in Binus Maya don’t seem important nor Sustainability, 12(24), 11–12. https://doi.org/10.3390/su122410367
frequently used by users which makes it not stand out from [9] Kakasevski, G., Mihajlov, M., Arsenovski, S., & Chungurski, S. (2008,
June). Evaluating usability in learning management system Moodle. In
other LMS. But this means that if developers were to develop Iti 2008-30th international conference on information technology
unique features and improvements exclusive to their LMS, interfaces (pp. 613-618). IEEE.
they should make it revolving the features that users think [10] Rahman, K. A., Ghazali, S. A. M., & Ismail, M. N. (2010). The
important (that can be seen from result above). effectiveness of learning management system (LMS) case study at
Open University Malaysia (OUM), Kota Bharu Campus. Journal of
REFERENCES Emerging Trends in Computing and Information Sciences, 2(2), 73-79.
[1] Kim, Donghyun. The Impact of Learning Management Systems on [11] Yildiz, E. P., Tezer, M., & Uzunboylu, H. (2018). Student opinion scale
Academic Performance: Virtual Competency and Student related to Moodle LMS in an online learning environment: Validity and
Involvement, Journal of Higher Education Theory and Practice, Vol. reliability study. International Journal of Interactive Mobile
17, No. 2, (2017), 23-35. Technologies, 12(4).
[2] Escueta, M., Quan, V., Nickow, A. J., & Oreopoulos, P. (2017). [12] Amy W. Gatian, Is user satisfaction a valid measure of system
Education technology: An evidence-based review. 78-86. effectiveness?,Information & Management,Volume 26, Issue
doi:10.3386/w23744. 3,1994,Pages 119-131.
[3] Valamis. (2019, April 15). What is a Learning Management System | [13] Embracing User Generated Content (Steam Dev Days 2014). (2014,
LMS Overview. Retrieved Mar 23, 2021, from February11).[Video].YouTube. Retrieved Apr 5, 2021, from
https://www.valamis.com/hub/what-is-an-lms. https://www.youtube.com/watch?v=SRyUpR4qOxU&feature=youtu.
be .
[4] Adzharuddin, N. (2013). Learning Management System (LMS) among
University Students: Does It Work? International Journal of E- [14] Phongphaew, Nattaporn & Jiamsanguanwong, Arisara. (2018).
Education, e-Business, e-Management and e-Learning, 4–5. Usability Evaluation on Learning Management System. 39-48.
https://doi.org/10.7763/ijeeee.2013.v3.233. 10.1007/978-3-319-60492-3_4.
[5] Wargadinata, W., Maimunah, I., Eva, D., & Rofiq, Z. (2020). Student’s [15] Nielsen, J. (1994, April). Usability inspection methods. In Conference
responses on learning in the early COVID-19 pandemic. Tadris: companion on Human factors in computing systems (pp. 413-414).
Journal of Education and Teacher Training, 5(1), 141-153. [16] Porter, T., & Miller, R. (2016). Investigating the three-click rule: a pilot
[6] Favale, T., Soro, F., Trevisan, M., Drago, I., & Mellia, M. (2020). study. MWAIS Proceedings, 2.
Campus traffic and e-Learning during COVID-19 pandemic. Computer [17] Grundy, C. (2009, January 31). Breaking the Law: The 3 Click Rule.
Networks, 176, 107290. Grundyhome. Retrieved Apr 19, 2021, from
[7] Dhawan, Online Learning: A Panacea in the Time of COVID-19 Crisis, http://grundyhome.com/blog/archives/2009/01/31/breaking-the-law-
Journal of Educatiotional Thenology System, Vol. 49, No. 1, (2020), the-3-click-rule/ .
5-7.
[8] Coman, C., Țîru, L. G., Meseșan-Schmitz, L., Stanciu, C., & Bularca,
M. C. (2020). Online Teaching and Learning in Higher Education

61 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Health Chatbot Using Natural Language Processing


for Disease Prediction and Treatment
Philip Indra Prayitno Reinhart Perbowo Pujo Leksono Fernando Chai
Computer Science Department, School Computer Science Department, School Computer Science Department, School
of Computer Science of Computer Science of Computer Science
Bina Nusantara University Bina Nusantara University Bina Nusantara University
North Jakarta, Indonesia Bogor, Indonesia Pontianak, Indonesia
[email protected] [email protected] [email protected]

Richard Aldy Widodo Budiharto


Computer Science Department, School Computer Science Department, School
of Computer Science of Computer Science
Bina Nusantara University Bina Nusantara University
West Jakarta, Indonesia West Jakarta, Indonesia
[email protected] [email protected]

Abstract—People who don’t know about products or services In these modern days, there are things called online
provided by a company need a system that can provide answers consultation or online health checkups with a doctor which is
to the questions that are usually asked. This system is called considered as a solution to all of those problems. But once
Frequently Asked Questions. This system is not effective and again there's still a problem with that solution, we have to
efficient in providing the information. In this paper, we propose make a reservation to make sure that the doctor is available,
a chatbot using Natural Language Processing to provide some we absolutely couldn't wait that long.
information about health.
Artificial intelligence provides the greatest ability to
Here, we use NLP to make a chatbot system that can imitate human thought and behaviour in front of a machine
understand and answer the questions that are asked by the user. [2]. If we use all of the knowledge from doctors and convert it
The cosine similarity is used to find the similarities between the to a database. We can take that advantage to create a chatbot
query words (the questions that are asked by the user) and the that is working just like a doctor in general and that will also
documents, then return the answers of the document with the solve all the existing problems.
highest similarity. Based on our development, our medical
chatbot has successfully diagnose the user illness with II. LITERATURE REVIEW
approximately 87 percent accuracy.
A. Chatbot
Keywords—Artificial Intelligence, Chatbot, Cosine Darius and Sophie [3] have noted that the word "Chatbot"
Similarity, ID3 Decision Tree, Natural Language Processing. is formed from the terms "chat" and "robot". At first, the term
chatbot is used for a computer program that simulates human
I. INTRODUCTION language with the help of text-based dialog systems. Chatbot
A chatbot is a computer program that conducts a contains input and output texts that allow users to
conversation via auditory or textual methods. These programs communicate with the chatbot software, such that it gives a
are designed to provide a clone of how a human will chat and feeling of talking to a real person.
thereby it acts as a conversational partner rather than humans
As Gajendra and company have said that [4] a chatbot has
[1]. They provide a simulating platform for effective and
to be trained first so that it can remember different kinds of
smart communications with the user on the other end.
data structure like an array, queue, stack, tree, etc. for it to be
The chatbot was developed in the 1960s. Initially, chatbots able to respond to the user according to its knowledge.
were an experimental computer program to trick people into
The statement above indirectly supports Divya and the
chatting with humans when in fact they were talking to a
company's claim that [5] at the basic level there are 2 types of
machine. As time goes by, the chatbot continues to experience
a chatbot, that is unintelligent (using predefined conversation)
progress. Nowadays, the chatbot has begun to spread into
and intelligent (using machine learning). A chatbot can be
many different fields of study, including health.
made using different algorithms. However, the frequently
Natural Language Processing (NLP) is a widely discussed used algorithm in general is Support Vector Machine (SVM)
and researched topic nowadays. As it is one of the oldest areas Classifier, Naïve Bayes, and K-Nearest Neighbor.
of research in machine learning it is used in major fields such
B. Artificial Intelligence
as machine translation, speech recognition and text
processing. In our chatbot, we will implement the concept of Haristiani stated that [6] Artificial Intelligence (AI) is the
cosine similarity from natural language processing so that it foundation of every kind of chatbot that is combined with NLP
can return the best solution for the patient. so that a chatbot responds according to data obtained from the
user. Nadarzynski and colleagues also stated [7] that AI is a
Naturally, when someone is in an unhealthy condition they software that is comprised of complex mathematical
go to the hospital or health clinics to see the doctors. The algorithms that process information to generate an output that
common problem we usually face is "what if we didn't have has been predetermined, which leads to a relevant result.
time to go to the hospital?" followed by other problems. There
must be a solution to those problems.

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

62 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Martin et al. [8] have stated that chat service agents have simple technique to make choices. They are used in many
been mostly replaced by conversational software agents or things, from animation to strategic AI.
chatbot and they are designed to communicate with users
The ID3 algorithm was proposed by Quinlan Ross in
using natural language that is based on AI; even though it
saves budget and time, implementation of chatbot AI often 1986. The tree is made from top-down while selecting
fails to meet the expectation of customers, potentially attributes in each node. In the algorithm, an Information Gain
resulting in users to not follow the requests from the chatbot. value is used to determine which attribute to use. According
to Shamrat et al, [14] The algorithm will stop when
C. Natural Language Processing Information Gain reaches 0 or every case has been used. Based
from the Shamrat et al experiment [14], ID3 Decision tree has
As we know, computer systems cannot directly
an accuracy around 52% for prediction result.
understand human language. That's why Mathew and
colleagues stated that [9] Natural Language Processing (NLP) ID3 Decision tree is used in a chatbot to enhance their
can help computer systems to understand the language that performance. Mujeeb and colleagues also stated that [15] from
humans use and classify them, as well as analyze them if the a certain information, a decision tree takes action that it thinks
dialog needs a response. is the most appropriate from a collection of actions. The
problem in making a decision tree is finding what action to
One of the important functions of NLP is Tokenization,
choose. The handling of this problem is called attribute
removing stopwords, and lemmatization. Lekha and
selection. There are a lot of ways to do attribute selection.
colleagues describe tokenization as [10] the process of
According to Panhalkar and Doye, Some popular methods are
breaking down a sentence into words and removing
[16] ID3, CART, and C4.5
punctuations that in the end removing stop words will be done,
that is removing unimportant words so we can gather III. PROPOSED METHOD
keywords for the next processes. Chaitrali et al. have
mentionted that [11] removing stopwords is used to remove A. Workflow
words that are often used like 'is', 'like' etc. Lemmatization is The bot will inherit common doctors' consultation
then used to obtain the base word of each token. behavior. First of all, the chatbot will ask whether the user
already knows their disease. If they do, the user could simply
D. Cosine Similarity input the disease they suffer from. Then, it will look for the
As stated before, keywords that are obtained from solution for said disease in the database.
removing stop words will be measured their similarity with
pre-existing documents using the Cosine Similarity formula as If the user doesn't know about what kind of disease that
shown below. they have, the user could describe all of his symptoms one by
one, then the bot will perform stop word removal and
compare the user’s input with the available symptoms in the
database using cosine similarity. The symptom with the
highest value is then used for the ID3 decision tree algorithm,
to find the disease as quickly as possible. If the bot manages
Where: to find the disease, it will give the disease treatment to the
|Di| = Vector Length of Document ith user. And of course, it will output a disclaimer.
Wi, j = Weight of Word jth in Document ith
|Q| = Vector Length of Query
B. Algorithm
The vector length of each document will then be used to We all know that every chatbot has at least 3 main steps
get the dot product value. Then by using the dot product inside its algorithm. First, user will input any kind of text to
value, vector length of document ith and vector length of the chatbot. Second, the chatbot will then starts to analyze
query, we could get the best result from the query by looking and process the user input. And lastly, it will produce an
at the highest value of the calculation. output that contains the response according to the user input.
In our chatbot, we’ve done some modification to the core
Bo-Hao and colleagues have combined the use of Cosine
algorithm as shown in the algorithm below.
Similarity with Recurrent Neural Network (RNN) algorithm
and TF-IDF [12] to predict the user's illness and what medical
product will be perfect for them. From their experiment, they
managed to acquire 88% accuracy.
Aside from that, Cosine Similarity for chatbot
development is also used by Mayuresh and colleagues [13] in
an attempt to develop a more natural usage of sentences by a
chatbot using word embedding technique. In short, Cosine
Similarity is an algorithm to measure the likeness of 2 objects.
E. ID3 Decision Tree
Iterative Dichotomiser 3 (ID3) Decision tree is an
algorithm to generate a decision tree. ID3 Decision tree is a

63 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

As shown in Figure 1. We can see the visual representation


of the algorithm in point B where it’s constructed using
flowchart.

D. Application Prototype

From the pseudocode above, we can see that the program


will keep looping until the user close it. First the program will
ask whether the user already know their disease or not.

If they do, then user can simply input their disease and the
program will do stop word removal and try to find the query
on the database. Then it will simply show their disease with
the treatment including a disclaimer message.

If they don’t, then user can input their symptoms one by


one, and the program will look towards the matched disease
based from the user symptoms. And finally, it will show their
disease, with the treatment and also the disclaimer message.

C. Flowchart

Fig. 2. Application Prototype

User can interact with the chatbot by inputting their


disease or symptoms inside the textbox. As shown in Figure
2. The chatbot has successfully analyze the user input and
provide the treatment referring to the user disease.

IV. EXPERIMENTAL RESULTS


This experiment is using data from the user inputs. The
experiment is going to be divided into the following steps:
first, the bot asks the user whether they already know the
disease they suffer from. Second, if the user does not know the
disease, the bot will ask the user about symptoms they have,
but if the user does already know the disease, they can simply
type it. Last one, the bot will show what user suffer from and
get the treatment for user disease.

A. Results
First, we collect a training data set that contains all of the
disease information including the symptoms and also its
treatment from the alodokter official website (a medical
website that has been assured and approved by the Indonesian
Health Minister). For the testing purposes we managed to
collect 113 disease data with its treatment and 640 symptoms
data according to the disease.

After the training data is collected, we proceed to begin in


developing a chatbot based on the iOS mobile application.
After the application development process is finished, we
Fig. 1. Application Flowchart

64 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

conduct a several test with these following user input and we


8 my father The suggested treatment for coughing
managed to receive the following output as shown in Table
keep is You should rest and.
1. and Table 2.
coughing all - Try to take a sip of honey
TABLE 1. THE DATA RESULT (USER ALREADY KNOW THEIR the time - Consume common cough meds at
DISEASE) your local store (make sure to read the
dosage)
- If the cough still exists, it may be
No User Input Output because of bacteria (you should visit a
clinic or hospital) to get antibiotics.
1 i think i The suggested treatment for cold is
suffer from You should consume supplements that
cold contain zinc and vitamin C. Consume
TABLE 2. THE DATA RESULT (USER DIDN’T KNOW THEIR
common cough meds at your local DISEASE)
store (make sure to read the dosage).
You can also consume menthol candy
to relieve the cold. No User Input Predicted User Actual
Disease (Output) Disease
2 what should The suggested treatment for covid is
i do if i have Isolate yourself in about 2 weeks, 1 i have a red spot According to my Allergy
covid drink water, get enough rest, and all over my analysis, you have
consume pain reliever to relieve the skin, i keep been potentially
cough, fever and pain. If you didn't get sneezing all the suffered from
any better make sure to do further time Allergy.
check up at the hospital.
2 sometimes i get According to my Phobia
3 i think i'm Sorry, I have tried my best and didn't very scared over analysis, you have
crazy come up with a solution to that something been potentially
disease(crazy). suffered from
Phobia.
4 help, my The suggested treatment for epilepsy
uncle suffer is Consume anti epilepsy medicine 3 so these couple According to my Rabies
from such as carbamazepine, clobazam, days, i have a analysis, you have
epilepsy levetiracetam, etc., to stabilize your very bad been potentially
electrical neuron activity in your brain. headache, i had suffered from
it since i was Rabies.
5 what should The suggested treatment for asthma is bitten by my
i do if my that you should always bring a relief friend dog
friend get inhaler. If the asthma is too chronic,
asthma get medical treatment at the hospital as 4 im not sick but i Sorry, I have tried Crazy
soon as possible. see someone my best and didn't
acting know the disease
6 i got a very The suggested treatment for bigboil is irrationally, i you have.
big boil Compress the boil with warm water 3 see that he is I recommend you to
times a day. And when the boil is out of control, go to the hospital.
broken, make sure to clean them with maybe he is
sterile gauze and antibacterial soap, crazy
then cover the boil with sterile gauze.
5 my nose hurt, i According to my Flu
7 my hands The suggested treatment for burned feel cold for no analysis, you have
got burned hands is To treat burns you have to reason been potentially
extinguish the fire or remove the suffered from Flu.
burning object from your body,
followed by removing the clothes or 6 my throat hurts, The suggested Sore Throat
cloth that is attached to your burned im pretty sure i treatment for
skin. Next, you can cool the burn with have a sore sorethroat is You
running water and take paracetamol to throat, sore should drink lots of
relieve the pain. If the burn is severe, throat water, consume soft
you are advised to receive further food, don't talk too
treatment at the hospital. much, gargle with
salt water, and if the
sore throat is caused
by bacteria, you

65 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)
Type your text

ID3 decision tree to provide the diagnosis result of the disease


should go to the
suffered by the user.
doctor to get
antibiotic medical
Program accuracy is very important, but it is near
treatment.
impossible to reach 100% accuracy. Based on the results
using the test data sets, the accuracy of our program if the
7 help, my friend The suggested Epilepsy
user already knows their disease is 87.5%, and for the case
is dropped treatment for
where the user didn’t know about their disease, the accuracy
down to the epilepsy is Consume
is 87.5%. We find that the accuracy of the program highly
floor and he antiepilepsi
depends on 2 variables, which are the result of the stopwords
seizure, medicine such as
removal process and the amount of trained data in the
epilepsy carbamazepine,
database.
clobazam,
levetiracetam, etc.,
For future work, we target to improve the chatbot
to stabilize your
accuracy and later on more features such as buying meds with
electrical neuron
the application, recommended clinic or hospital near the user,
activity in your
and to cooperate with doctors among the world so that when
brain.
the chatbot couldn’t find a proper treatment, it could
automatically suggest a recommended doctor to make the
8 my mouth is According to my Dehydration
dry, and also i analysis, you have user able to do further consultation.
always feel been potentially
thirsty suffered from REFERENCES
Dehydration.
[1] D. Shnavi, et al., "A Self-Diagnosis Medical Chatbot Using Artificial
Intelligence", Journal of Web Development and Web Designing, vol.3,
pp. 1, MAT Journals, 2018.
[2] S. Ghare, et al., "Self-Diagnosis Medical Chat-Bot Using Artificial
Intelligence", pp. 1, February 2020.
[3] D. Zumstein and S. Hundertmark, “Chatbots – An Interactive
Technology for Personalized Communication, Transactions and
Services,” IADIS International Journal on WWW/Internet, vol. 15, pp.
98, November 2017.
[4] G. Prasad K. C. et al., “A Personalized Medical Assistant Chatbot:
MediBot,” International Journal of Science Technology &
Engineering, vol. 5, pp. 43, January 2019.
[5] D. Madhu, et al., “A novel approach for medical assistance using
trained chatbot,” in International Conference on Inventive
Communication and Computational Technologies, pp. 1, March 2017.
[6] N. Haristiani, “Artificial Intelligence (AI) Chatbot as Language
Learning Medium: An inquiry,” in International Conference on
Education, Science and Technology, pp. 1-5, March 2019.
[7] T. Nadarzynski, et al., “Acceptability of artificial intelligence (AI)-led
chatbot services in healthcare: A mixed-methods study,” Digital
Health, vol. 5, pp. 1, January 2019.
[8] M. Adam, M. Wessel, and A. Benlian, “AI-based chatbots in customer
service and their effects on user compliance,” in The International
Journal on Networked Business, pp. 1, February 2020.
Fig. 3. Visual Representation Of Results [9] R. B. Mathew, et al., “Chatbot for Disease Prediction and Treatment
Recommendation using Machine Learning,” in International
As shown in Figure 3. We can see that our chatbot have a Conference on Trends in Electronics and Informatics, pp. 853, October
high success rate for analysing the diseases from both case 2019.
where the user already knows their disease and the user that [10] L. Athota, et al., “Chatbot for Healthcare System Using Artificial
Intelligence,” in International Conference on Reliability, Infocom
didn't know what their disease is. Technologies and Optimization, pp. 620, June 2020.
[11] C. S. Kulkrani, et al., “BANK CHAT BOT – An Intelligent Assistant
The success percentage from the case where user already System Using NLP and Machine Learning,” in International Research
know their disease is 87.5%, and from the case where user Journal of Engineering and Technology, vol. 4, pp. 2375, May 2017.
didn’t know their disease is also 87.5%. Which made the total [12] B. Su, et al., “Health Care Spoken Dialogue System for Diagnostic
success rate is 87.5%. Reasoning and Medical Product Recommendation,” in International
Conference on Orange Technologies, pp. 4, October 2018.
[13] M. Virkar, V. Honmane, and S. U. Rao, “Humanizing the Chatbot with
Semantics based Natural Language Generation,” in International
V. CONCLUSIONS Conference on Intelligent Computing and Control Systems, pp. 893,
Medical chatbots can be very useful and important May 2019.
for the users if they can’t afford nor have the time to go to the [14] F. M. J. M. Shamrat, et al., “Performance evaluation among ID3, C4.5,
and CART Decision Tree Algorithms,” in International Conference on
hospital to do some consultation with the doctors about their Pervasive Computing and Social Networking, pp. 4, March 2021.
illness. Our chatbot uses the concept of cosine similarity and

66 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

[15] S. Mujeeb, M. H. Javed, and T. Arshad, “Aquabot: A Diagnostic [16] A. R. Panhalkar and D. D. Doye, “Optimization of decision trees using
Chatbot for Achluophobia and Autism,” in International Journal of modified African buffalo algorithm,” in Journal of King Saud
Advanced Computer Science and Applications, pp. 209-216, January University –Computer and Information Sciences, pp. 1, February 2021.
2017.

67 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Sentiment Analysis of E-commerce Review using


Lexicon Sentiment Method

Michael Hakkinen Ferry Agustius Wong


Computer Science Department Computer Science Department
School of Computer Science School of Computer Science
Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia 11480 Jakarta, Indonesia 11480
[email protected] [email protected]

Maria Susan Anggreainy Wahyu Raihan Hidayat


Computer Science Department Computer Science Department
School of Computer Science School of Computer Science
Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia 11480 Jakarta, Indonesia 11480
[email protected] [email protected]

Abstract— Customer satisfaction is a top priority for Online shopping is a part of E-Commerce which is
any company engaged in the e-commerce sector. selling or buying products via the internet. Nowadays, in
Therefore, it is very important for any e-commerce, Indonesia there is more than one E-commerce [3].
especially those that have served transactions between Tokopedia is a website application and mobile application
countries such as Amazon, eBay, and Rakuten to see for online shopping. Tokopedia used to understand their
how the impressions or sentiments of their customers target market by analyzing the products and the comments
regarding the quality of products and services provided from customer . The problem here is on the internet the text
in order to improve or improve their quality. Through or description isn’t structured well. So, it’s necessary to
develop a system to automatically classify aspects and
the rapid development of technology, this sentiment has
sentiments of the online text or description data.
become easier to detect. One of them is by utilizing
Sentiment analysis can be the problem solver by
comments on social media such as Twitter. By analyzing analyzing the emotions and contexts of the online feedback
Twitter user comments related to the determinants of provided [4]. Sentiment analysis is most commonly used to
customer satisfaction with e-commerce using the approach analyzing text data and identifying sentiment
Lexicon classification method, it is found that the most content from text(description) and another term for
dominant factor in determining customer satisfaction is sentiment analysis is a natural language processing
the quality of information. E-commerce that wants to technique used to determine whether data is positive,
increase customer satisfaction, refers to these three negative, and neutral. Sentiment analysis is identifying
factors, because these factors are the main focus of positive and negative opinions, as well as evaluations from
customers when entering an e-commerce. customers. Analyzing E-Commerce data will help online
Keywords—sentiment analysis, e-commerce, lexicon, store to understand what customers expect, providing a
social media, review comments better shopping experience. There is an advantage from
sentiment analysis is that it can find out the needs to be
I. INTRODUCTION reviewed and also can shorten time for a lot of data at once.
Sentiment analysis can find out what emotion behind every
E-Commerce or we usually call it Online Shopping, word from customer on review.
it means that e-commerce is a place where someone is The aim from this research to study is to analyze and
selling or buying things even services [1]. It run by the apply what techniques for Tokopedia online retail sentiment
internet so it's easy for someone to use. Now were gonna analysis system reviews [5]. Here we present a technique for
talk about when someone wants to buy a thing or buy some classifying online reviews based upon the most important
services from the internet they need for reviews from aspects of Tokopedia. There are many classification
another user who has bought the things before but reading methods used to compare the best classification prediction
the reviews is not suitable for busy people or people who results. We conclude that Naïve Bayes has stable accuracy
lazy to read the reviews [2]. and vice versa for sellers and after being tested with several frequent itemset values and
stores they need suggestions and criticism from buyers in machine learning techniques use naïve bayes to get the
order to develop their services and improve the reputation of highest accuracy results. The research uses Naïve Bayes to
the store and of course this will deal with overwhelming get sentiment and classification aspects on online retail
number of customer review.comments . But there is a thing business. Multinomial text from Naïve Bayes can reduce
called the internet technologies to easily produce content errors in document classification with an average value of
about things and spread some information about the things 27% and up to 50% using multivariate Bernoulli. So, for
or products, so it helps the customer to easily know what this research paper we analyzed Tokopedia sentiment using
they need and not do tedious tasks.

978-I-6654-4002-8/21/$31.00 ©2021 IEEE

68 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

the Naïve Bayer Classifier to get a deeper overview of


useful information for various parties in need.
Table I. Expected Lexicon Table Example
II. METHODOLOGY
A. Raw Dataset Collection Evaluation Statement Effectiveness
range
This specific study, we are taking the dataset from
the top 10 online E-Commerce of Indonesia statistically Positive the process is quick, +70 - +98
according to the website traffic including Shopee and evaluation delivery instantly, neat
Tokopedia. The data was taken with all the unnecessary and safe packing. The
punctuation marks and the details. pin is cute thank you!
B. Data processing Neutral -1.5 - +1.5
Price affordable, liquid
1. The dataset will be collected and be prepared to be evaluation
smells good, safe to
processed.
e.g., the process is quick, delivery instantly, neat and handle but mice are still
safe packing. The pin is cute thank you! coming even though
2. Each of the data will be processed into three different increased dose
steps:
• Filtering: To remove stop words and other none
Negative Bad, Quality is not as -95 - -60
required text from the data such as the, is, at, which,
evaluation expected.
and on. Misspelled words will not be included in the
final output
e.g., process quick, delivery instantly, neat safe E. Negation
packing. Pin thank you User reviews might not always be straightforward,
• Tokenization: To separate the sentence into smaller we can just assume that a whole sentence is positive based
units and each unit are called tokens. upon the other 99 words and not take that last character into
e.g., process, quick, delivery, instantly, packing, neat, consideration. The sentence “Thank God nobody is using
safe, pin this product”, the review is to be considered as a positive
• Stemming: The process to reduce a word into their comment as determined by the keywords such as “Thank”
most basic form known as lemma and “using” but the word “nobody” changes the whole
e.g., process, quick, deliver, instant, pack, neat, safe, meaning of the sentence [6][8]. Sentiment analysis using
pin negation can be seen in Table II.
C. Lexicon based approach sentiment
Table II. Sentiment Analysis Using negation
Lexicon based approach is an approach which
measures the opinion and the main point of a sentence in a Sentiment Comments
given text statement. It usually consists of two parts which Positive Nobody heard about this brand
are a negative or positive evaluation and along with it is also Negative Everybody heard about this brand
calculates the strength or the effectiveness of the data
(Ranging from -100 to +100) [8]. These sentiments are calculated by using the formula:
D. Requirement Analysis
We can assume whether a certain sentence is in a
positive or negative light sentence such as “Oh my God you if S < 0 and S > 0 respectively (3)
are so funny I’m going to die”, one part of the sentence can
be taken positively while the other one can be taken in a where is the final negation and S is the sentiment value
negative light. It might be difficult to decide and to avoid from the lexicon.
these issues we can use conditional probability and assume
the value of each word denoted by P [9]. F. Intensifier
The word that could multiply the meaning of a single
word such as “very” and “slightly” could intensify and
P (positive | w) for positive w= (1)
multiply the amount of negation that is given in a sentence
depending on the situation and whether it is placed beside a
P (positive | w) for positive w= (2) word that is positive or negative it could decrease or
increase the strength of that word[8] (can be seen in Table
III)
where #w P and #w N stands for the number of messages Table III. Intensifier validity example
from the sample that contains word w and are positive and Validity Data
negative. Expected Lexicon Table Example (can be seen in Valid Very Satisfied
Table I) Not Valid Satisfied and very going to…
. Valid Very Unsatisfied

69 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

G. Result Combination
After running the lexicon on the data, we classify the
When two or more positive and complex sentences sentiment category into three category which are positive,
are combined, the final sentiment value might deviate neutral and negative, as shown in Table VI below. [10]
from the true result as it takes the average of the two
positive sentences white simpler and easier sentences are Table VI. Sentiment analysis
given the full mark.[8]
Sentiment Number of data (Percentage %)
e.g., “The product is perfect (100)” Positive 230 57,5
Neutral 80 20
“The product is perfect (100) and easy to use (95)” Negative 90 22,5
Total 400 100
We can see that the first sentence is given the best
score possible while the second one is lacking, as we can By using the size of lexicon of 1500 words from different
understand that the second one is a better and satisfied dictionaries. The results we can get for each review will
customer than the first. We can assume that the second have a range of 0-100, where the best review someone can
one is a better one, but the final sentiment will take it as give is 100 and the worst is -100. Since we can assume
otherwise.[8] that around the range of 25% and 35% is around the same
category, we will divide the result into -100% to -80%, -
To prevent this, we can use a coefficient that 79% to -60% and so on. Since we also take the importance
counts the number of positive words and negative words of the amount of positive and negative words, we can
in a given sentence by using the following formula: assume that by taking a new variable X, which can be
calculated by using simple calculation, for each positive
sentiment we add 1 to X when there is a negative
(4)
sentiment, we -1 from the X. By using the graph below we
can see the result of the sentiment and the reviews of the
rival compared to the reviews of Tokopedia since our
(5) example database are Tokopedia based. The Graph I
below show the relationship between the number of X
which is compared to the dictionary and the customer
III. RESULT AND ANALYSIS
reviews. Simply say that according to the number of X the
Using 400 review data, we know that 230 are graph shows the probability that the review is positive,
positive, 80 are neutral and 90 are negative using the neutral or negative. [6]
ratings given with 1 & 2 negative, 3 negative, 4 & 5
negative. The Table IV shows the different type of data. 100
Percent of type responses

90
Table IV. Multi-Class Classification Result 80
Tokenize Sentiment 70
Process Neutral 60
Quick Positive 50
Deliver Neutral 40
Instant Positive 30
Neat Positive 20
Safe Positive 10
0
Pack Neutral
-5 -4 -3 -2 -1 0 1 2 3 4 5
We prepare the data first by processing the data. By using
the example of Table V. given below, as we consider the Number of X
sentence “the process is quick, delivery instantly, neat and
safe packing!” [10] Positive Negative Neutral
Table V. Preprocessed Data

Graph I. Correlation between type of responses and X


Tokenize
Process The dataset taken was taken from Tokopedia and was
“the process is quick, compared to rival company. We can compare the graph of
delivery instantly, neat Quick
Deliver two different companies and see which graph has the
and safe packing!”
better set of graphs since we can determine which e-
Instant
commerce has better costumer experiences with the sellers
Neat
Safe
Shown below are Graph II, the result of the graph of the
Pack dataset Tokopedia

70 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

References
100 [1] Li Yang, Li Jin, Jin Wang and R. Simon Sherratt,
Percent of type

80 “Sentiment Analysis for E-Commerce Product Reviews in


Chinese Based on Sentiment Lexicon and Deep
responses

60 Learning”, 2020
40 [2] Digvijay Mali, M Abhyankar, Paras Bhavarthi, K
Gaidhar and Manoj Bangare, “Sentiment Analysis of
20 Product Reviews For E-Commerce Recommendation”, in
0 International Journal of Management and Applied
-5 -4 -3 -2 -1 0 1 2 3 4 5 Science, ISSN: 2394-7926, Volume-2, Issue-1, Jan.-2016
Number of X [3] Kevin Stewart. Data Analysis in Tokopedia –
Positive Negative Neutral Digital.Nov 2018
[4] Shashank Gupta. Sentiment Analysis : Concept,
Graph II. Tokopedia result
Analysis and Application. Towards Data Science. Jan 7
While Graph III shows the data from Shopee dataset 2018.
[5] Gurpreet Kaur, Malik Kamal. A Sentiment Analysis of
100 Airline System using Machine Learning Algorithms.
Percent of type

80 International Journal of Advanced Research in


responses

60 Engineering 12(1):731-742.
40
20 DOI: 10.34218/IJARET.12.1.2021.066
0 [6] Maite Taboada, Julian Brooke, Milan Tofiloski,
- - - - - 0 1 2 3 4 5 Kimberly Voll and Manfred Stede, “Lexicon-Based
5 4 3 2 1 Methods for Sentiment Analysis”, September 2010
[7] Mohammad Darwich, Sharul Azman Mohd Noah,
Number of X
Nazlia Omar and Nurul aida Osman, “Corpus-Based
Techniques for Sentiment Lexicon Generation: A
Positive Negative Neutral Review”, 2019, Journal of Digital Information
Management
Graph III. Shopee result [8] Anna Jurek, Maruice D. Mulvenna and Yaxin Bi,
“Improved lexicon-based sentiment analysis for social
We can conclude that Tokopedia have a consistent rate media analytics”, 2015
according to the number of X while Shopee are shown to [9] Muhammad Marong, Nowshath K Batcha and Mafas
have a steep slope which means that when people hate the Raheem, “Sentiment Analysis in E-Commerce: A Review
product, they really hate the product but when they like it on The Techniques and Algorithms”, 2020, Journal of
they will have a consistent feeling according to the Applied Technology and Innovation (e -ISSN: 2600-7304)
number of X. [6] Vol. 4, No. 1
[10] Boldenthusiast. Sentiment Analysis – The Lexicon
Based Approach. Februari 2019.
IV. CONCLUSION
[11] R. Kumaran, L. Monisha, T. Yamuna and P.
Our goals are to show the data processing to determine Maheswari, “Sentiment analysis in E- Commerce using
effectiveness and efficiency, our paper shown the existing Recommendation System”, 2020, IJERT (e-ISSN : 2278-
method in order to be able to analyze the sentiment of 0181).
suggestions and criticisms is not only that because the [12] Shahriar Akter and Samuel Fosso Wamba, “Big data
method appointed is also the most efficient method. analytics in E-Commerce: a systematic review and agenda
for future research”, 2016.
But instead of using a pure learning algorithm that has [13] S. K. Lakshmanaprabu, K. Shankar , Deepak Gupta,
been supervised and unsupervised which has been Ashish Khanna, Joel J. P. C. Rodrigues, Plácido R.
developed by a heuristic approach in order to improve Pinheiro, and Victor Hugo C. de Albuquerque, ” Ranking
accuracy, it would be better to use a semi-supervised Analysis for Online Customer Reviews of Products Using
approach because as an example, meaningful words or Opinion Mining with Clustering”, 2018.
opinions will be identified by wordnet. Therefore, [14] Sapna Negi and Paul Buitelaar, “Towards the
sentiment analysis will be carried out using NLTK whose Extraction of Customer-to-Customer Suggestions from
probabilistic model is naive Bayes. results will be Reviews”, 2015.
displayed with graphics and statistics [15] Shanshan Yi and Xiaofang Liu, “Machine learning
Based Customer Sentiment Analysis for Recommending
Shoppers, Shops Based on Customer’s Reviews”, 2020.

71 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Coronary Artery Disease Prediction Model using


CART and SVM: A Comparative Study
Mediana Aryuni Eka Miranda Charles Bernando
School of Information Systems School of Information Systems School of Information Systems
Bina Nusantara University Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia 11480 Jakarta, Indonesia 11480 Jakarta, Indonesia 11480
[email protected] [email protected] [email protected]

Andrian Hartanto
School of Information Systems
Bina Nusantara University
Jakarta, Indonesia 11480
[email protected]

Abstract— Heart disease is the major cause of mortality The people with heart disease may not be detected until
worldwide. Clinical Decision Support System is developed to he or she encounters indications like heart attack, heart
measure risk level of heart disease and detect heart disease failure, or an arrhythmia [4]. Hence, the research goal is to
using machine learning methods. Many cases showed that construct and compare coronary artery disease prediction
heart disease may not be detected until the person encounters model using CART and SVM and compare the performance
indications of a heart disease. Hence, the research goal is to of the models. The model identifies whether the patient has
construct and compare coronary artery disease prediction coronary artery disease (CAD) or not.
model using CART and SVM. The model identifies whether
the patient has coronary artery disease or not. The result There are two Research Questions in this research:
shows that CART and SVM has the same performance of
accuracy of 88,33%. For sensitivity, CART has slightly better (RQ-1) How to develop prediction models for CAD using
performance than SVM. While for specificity, SVM has better CART and SVM?
performance than CART. (RQ-2) How is the performance comparation of CART and
SVM prediction models?
Keywords— heart disease, CART, SVM, machine learning,
prediction
II. LITERATURE STUDY

I. INTRODUCTION A. CART
The major cause of mortality worldwide is heart disease, A Decision Tree is utilized to acquire a classification
which contributes 16% of total mortalities [1]. Started from function [17]. CART was formed by Leo Breiman in the
2000, the greatest growth in mortalities comes from heart beginning of 1980s. A regression tree for classification
disease, growing up to 3.1 million [1]. Around 25% of model is utilized to build decision trees using historical data
mortalities in consequence of heart disease in United States with pre-defined labels. Entropy is employed to select the
[2]. Heart disease is major cause of mortality for all ages attribute that has largest purity and become the root or split
about 12,9% in Indonesia [3]. [12].
Heart disease relates to some forms of heart Figure 1 shows example of Simple CART Decision Tree.
circumstances. The general form of heart disease is called
coronary artery disease (CAD). CAD gives impact to the
blood movement to the heart. Heart attack is triggered by
very low blood movement [4].
Based on age group in Indonesia, CAD is most happened
in 65-74 years old, followed by 75 years old above, 55-64
years old, and 35-44 years old. [3].
Decision Support System (DSS) has been used widely to
assists decision maker [5, 6, 7]. Clinical DSS developed to
measure risk level of heart disease [6] and detect heart
disease [7] using machine learning methods.
One of machine learning methods that can be utilized for
heart disease prediction is classification [6, 7, 8, 9].
Classification is a supervised learning which utilized
Fig. 1. Example of Simple CART Decision Tree [12]
historical data with predefined label (training set) to generate
model that able to infer new record into which class belongs
to [10]. B. SVM
Some research has developed classifier model in medical The SVM goal is to search the finest classifier model to
field using Classification and Regression Tree (CART) [11, separate difference classes. SVM find out a linear separating
12, 13] and Support Vector Machines (SVM) [14, 15, 16]. hyperplane with maximal margin in higher dimensional
space [16].

978-1-6654-4002-8/21/$31.00 ©2021 IEEE


72 28 October 2021, Jakarta - Indonesia
2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

SVM algorithm able to be geometrically characterized


and easy to comprehend [16].
Example of SVM Separating Hyperplane is shown in
figure 2.

Fig. 2. Example of SVM Separating Hyperplane [16]

III. DATA AND METHODS

A. Dataset
This research utilized dataset from [18], which comprises Fig. 3. Research Outline
303 records. There are 13 attributes, that are used to predict
the presence (values 1) or absence (value 0) of Coronary
Artery Disease (CAD).
IV. DISCUSSION
The target class is the presence (CAD +) or absence
(CAD -) of CAD in the patient. The value of 0 indicating A. RQ-1: How to develop prediction models for CAD using
absence of CAD and values of 1-4 indicating presence of
CART and SVM?
CAD [18]. This research utilized [18] dataset to differentiate
presence (values 1,2,3,4 replaced with value 1) from absence CART result for CAD prediction model is shown in
(value 0). figure 4.

B. Methods
Figure 3 shows the research outline which includes:
(1) feature extraction and feature selection: to select
attributes for CAD prediction model
(2) Model Training: to build classifier model for CAD
prediction
(3) Model Test: to measure classifier model performance for
CAD prediction
Data Splitting is utilized to split dataset into 80% for
training set and 20% for test set (60 records).
CART and SVM of CAD Prediction Models was built
and measured using Python code.

Fig. 4. CART Result for CAD Prediction Model

The attributes in CART are:


(1) chest pain type
(2) number of major vessels

73 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

(3) nuclear stress test The performance comparation of CART and SVM is
shown in table 3. For sensitivity, CART has slightly better
(4) maximum heart rate performance than SVM. While for specificity, SVM has
(5) exercise induced angina better performance than CART. For accuracy, CART and
SVM has common value of performance.
(6) age
(7) ST depression induced by exercise V. CONCLUSION AND FUTURE WORK
(8) resting electrocardiographic results
This research built a classifier model using machine
(9) cholesterol
learning methods for coronary artery disease prediction using
(10) resting blood pressure CART and SVM. Hopefully, this study will support in CAD
detection.
(11) sex
The result shows that CART and SVM has the same
(12) fasting blood sugar performance of accuracy of 88,33%. For sensitivity, CART
has slightly better performance than SVM. While for
specificity, SVM has better performance than CART.
The slope of the peak exercise ST segment does not
CART and SVM in this research have slightly better
occur in the CART tree.
performance than [19] (83,49%) and [15] (85%).
SVM classifier was developed using the kernel function For future research, this paper is suggested some
associated with Support Vectors [16]. scenarios: (1) use real dataset from Indonesian hospital, (2)
apply feature selection to know the important attributes in
building the prediction model, and (3) ensemble method for
B. RQ-2: How is the performance comparation of CART enhancing the performance.
and SVM prediction models?
The performances of the CAD prediction models were
ACKNOWLEDGMENT
measured using sensitivity, specificity, and accuracy. Table
1 and 2 shows the confusion matrix table respectively for Binus International Research Grant entitled "Aplikasi
CART and SVM. Prediksi Diagnosa Awal Penyakit Jantung Koroner Berbasis
Web Dengan Teknik Regresi Logistik" Contract Number:
TABLE I. CONFUSION MATRIX FOR CART 017/VR.RTT/III/2021 and date: 22 March 2021.
Actual
REFERENCES
CAD + CAD -
CAD + 22 4 [1] WHO, “The top 10 causes of death”, World Health Organization, 9
December 2020 [Online] Available at: https://www.who.int/news-
Predicted room/fact-sheets/detail/the-top-10-causes-of-death [Accessed 8 May
CAD - 3 31 2021].
[2] CDC, “Heart Disease”, Centers for Disease Control and Prevention,
25 35 19 January 2021 [Online] Available at:
https://www.cdc.gov/heartdisease/index.htm [Accessed 8 May 2021].
[3] Kemkes, “PENYAKIT JANTUNG PENYEBAB KEMATIAN
TABLE II. CONFUSION MATRIX FOR SVM
TERTINGGI, KEMENKES INGATKAN CERDIK”, Kementerian
Actual Kesehatan Republik Indonesia, 29 July 2017 [Online] Available at:
https://www.kemkes.go.id/article/print/17073100005/penyakit-
CAD + CAD - jantung-penyebab-kematian-tertinggi-kemenkes-ingatkan-cerdik-
CAD + 21 3 .html [Accessed 8 May 2021].
Predicted [4] CDC, “About Heart Disease” Centers for Disease Control and
Prevention, 13 January 2021 [Online] Available at:
CAD - 4 32 https://www.cdc.gov/heartdisease/about.htm [Accessed 8 May 2021].
[5] E.D. Madyatmadja, “Decision support system model to assist
25 35 management consultant in determining the physical infrastructure
fund”, Journal of Theoretical and Applied Information Technology,
62(1), pp. 269–274, 2014.
[6] P.K. Anooj, “Clinical decision support system: Risk level prediction
of heart disease using weighted fuzzy rules”, Journal of King Saud
TABLE III. THE PERFORMANCE COMPARATION OF CART AND SVM University - Computer and Information Sciences, Volume 24, Issue 1,
pp. 27-40, 2012.
CART SVM [7] P. Rani, R. Kumar, N.M.O.S. Ahmed, et al., “A decision support
system for heart disease prediction based upon machine learning”, J
Sensitivity 88% 84% Reliable Intell Environ, 2021.
[8] E. Miranda, E. Irwansyah, A. Y. Amelga, M. M. Maribondang, M.
Specificity 88,57% 91,43% Salim, “Detection of cardiovascular disease risk's level for adults
Accuracy 88,33% 88,33% using naive Bayes classifier”, Healthcare Informatics Research
Journal, Vol. 22, Issue 3, pp. 196-205, 2016.
[9] Y. Heryadi, E. Miranda, H. L. H. S. Warnars, “Learning decision
rules from incomplete biochemical risk factor indicators to predict
cardiovascular risk level for adult patients”, In Proceedings of 2017

74 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

IEEE International Conference on Cybernetics and Computational Journal of Machine Learning and Computing, Vol. 5, No. 5, pp. 414-
Intelligence (CyberneticsCom), IEEE, pp. 185-190, 2017. 419, 2015.
[10] M. Aryuni and E.D. Madyatmadja, “Feature selection in credit [15] R. Perumal, Kaladevi AC, “Early Prediction of Coronary Heart
scoring model for credit card applicants in XYZ bank: A comparative Disease from Cleveland Dataset using Machine Learning
study”, International Journal of Multimedia and Ubiquitous Techniques”, International Journal of Advanced Science and
Engineering, 10(5), pp. 17–24, 2015. Technology, 29(06), pp. 4225 – 4234, 2020.
[11] S. Bayat, M. Cuggia, D. Rossille, M. Kessler, “Comparison of [16] R.R. Ade, D. S. Medhekar, M. P. Bote, “Heart Disease Prediction
Bayesian Network and Decision Tree Methods for Predicting Access System Using SVM and Naive Bayes”, International Journal of
to the Renal Transplant Waiting List”, n Studies in Health Engineering Sciences & Research Technology, 2(5), pp. 1343-1348,
Technology and Informatics · February 2009, 2009. 2013.
[12] S. Harale, A.S. Dhillon, J. Nirmal, N. Kunte, “Detection of Heart [17] J. Patel, T. Upadhyay, S. Patel, “ Heart Disease Prediction Using
Disease using Classification Algorithm”, International Journal of Machine learning and Data Mining Technique”, International Journal
Engineering Research & Technology (IJERT), Special Issue – 2017, of Computer Science & Communication, Volume 7, No 1, pp. 129-
2017. 137, 2015.
[13] C. Schilling, D. Mortimer, K. Dalziel, E. Heeley, J. Chalmers, P. [18] Kaggle, “Cleveland Clinic Heart Disease Dataset” [Online] Available
Clarke, “Using Classification and Regression Tree (CART) to at: https://www.kaggle.com/aavigan/cleveland-clinic-heart-disease-
Identify Prescribing Thresholds for Cardiovascular Disease”, dataset [Accessed 8 May 2021].
Pharmacoeconomics, 34 (2), pp. 195-205, 2016. [19] V. Chaurasia and S. Pal, “Early Prediction of Heart Diseases Using
[14] D. Khanna, R. Sahu, V. Baths, B. Deshpande, “Comparative Study of Data Mining Techniques”, Carib.j.SciTech, Vol.1, pp. 208-217, 2013.
Classification Techniques (SVM, Logistic Regression and Neural
Networks) to Predict the Prevalence of Heart Disease”, International

75 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Identify High-Priority Barriers to Effective Digital


Transformation in Higher Education: A Case Study
at Private University in Indonesia
1th Bayu Rima Aditya1,2 2nd Dina Fitria Murad 3rd Ridi Ferdiana
1Department of Electrical and Information Systems Department Department of Electrical and
Information Engineering BINUS Online Learning Information Engineering
Universitas Gadjah Mada Bina Nusantara University Universitas Gadjah Mada
Indonesia Indonesia Indonesia
2School of Applied Science [email protected] [email protected]
Telkom University
Indonesia
[email protected]

4th Sri Suning Kusumawardani 5th Bambang Dwi Wijanarko


Department of Electrical and Information Engineering Computer Science Department
Universitas Gadjah Mada BINUS Online Learning
Indonesia Bina Nusantara University
[email protected] Indonesia
[email protected]

Abstract—Some barriers negatively affect the implementation learning program solution for employee class which is intended
of digital transformation in higher education institutions. This for high school/vocational equivalent & diploma (D3)
research aims to investigate these barriers in a particular context: graduates so that they can achieve their dreams of getting a
a private university in Indonesia. The barriers diagnostic bachelor's degree (S1), with very flexible lecture times carried
framework (BDF) has been applied to identify and prioritize out with the online class method so that it allows students to do
barriers. It is determined that ‘Actionable plans based on other work activities without having to be afraid of disturbing
strategy translation’, ‘The ability to embed ICT into the lecture time. The implementation of DEP provides awful
education system’, and ‘Limitations of institutional policies’ have benefits such as flexible lecture time and financing, multi-
high priority barriers and therefore meet critical concerns in the
channel learning with the Learning Management System
implementation of digital transformation at the case study. The
main contribution of this study is providing empirical evidence
(LMS) as an online learning medium for individuals who want
on barriers to digital transformation in the higher education to learn and develop themselves without being tied to
sector. More understanding of the high-priority barriers will help schedules, and places, and recognize past learning. In addition,
the management of higher education to find effective and efficient programs such as modular systems, minor programs, and all
strategies to manage the resources. processes are carried out fully online. However, in actual
practice, DEP development faces many barriers. To drive the
Keywords— digital transformation, distance education, online institution for the success of the DEP implementation, critical
learning, higher education. barriers need to be identified. Therefore, the understanding of
the barriers would be beneficial for the management
I. INTRODUCTION implementing the DEP.
Digital Transformation (DT) has gained important attention This study aims to determine the critical barriers to the
in the higher education sector in recent years [1]. However, not implementation of the Distance Education Program (DEP) and
all DT implementations in universities provide the expected high priority barriers. This study uses a mixed methods
results [2-4] because there are several barriers that affect the approach to the barrier diagnostic framework [6]. Based on the
success of DT implementation in universities. Therefore, it is determination of the priority of high barriers, this study
important to know and understand, which barriers are critical discusses the implications for the implementation of DT at
barriers and which barriers are the most prioritized. After that, XYZ University. This study can offer insight to design
higher education management can give appropriate attention to appropriate measures for successful digital transformation at
these barriers to the successful implementation of DT in higher XYZ University.
education [5].
XYZ University is one of the best private universities in II. LITERATURE REVIEW
Indonesia that have been engaging in digital transformation The digital transformation (DT) implementation has
(DT). The DT in XYZ University starts with the program of provided tremendous benefits in the higher education sector
Distance Education Program (DEP). The DEP offers a distance [7][8], however, several major problems have emerged. In

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

76 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

recent years, previous studies have discussed barriers to DT • Limitations of weak IT Infrastructure
implementation in the context of higher education, and several
have observed some barriers that affect when implementing DT • Limitations of Services on IT support
based on the empirical study. • Identification of any IT Risk
A study conducted by [9] discussed on barriers of digital • Lack of commitment
transformation in higher education institutions based on a brief
discussion. From an organizational point of view, there are 6 • Difficulty getting out of comfort zone
main barriers, including a strategic vision for digital
• Difficulties and drawbacks to keep up with
transformation, the digital literacy of all stakeholders, financial,
technological changes
IT infrastructure, cybersecurity risks, and digital strategies.
Likewise, a study by [10] reported that technological • Attitudes and beliefs about digital technology
infrastructure, financial, institutional leadership, institutional
policies, and online security and data privacy are barriers to • Lack of interest in technology and innovation
implementing digital transformation in higher education.
III. METHODS
Findings by [11], by understanding a sample of teachers
and students in Russia, identified 6 barriers, including attitudes To achieve the objectives of this research, we used a
towards the use of digital teaching technology, limited time framework developed by [6]. Fig. 1 shows the steps of the
(workload), difficulty adapting, and IT infrastructure digital methodology adopted in this study.
competence, and technical support. Next is a study by [12] in a
UAE higher education institution. This study identified several
barriers, such as a holistic view, competence of personnel and
their IT skills, data used, overlapping systems, third party
reporting systems, manual entry processes, customer adoption,
applicable regulations and business environment, social and
economic impacts. to business processes, security and privacy
issues, and budget constraints.
In addition, a study conducted by [13] provided Fig. 1. The three steps of research methodology
comprehensive barriers to digital transformation in higher
education based on a literature review. The study identified 22 A. Barriers Selection
barriers for DT implementation in higher education as follows:
For the identification of barriers, we used 22 lists of
• Actionable plans based on strategy translation potential barriers based on findings by [13]. Afterward, we
selected the critical barriers that could influence DT
• Lack of expertise in digital transformation or human
implementation at XYZ University by survey using a
resources
questionnaire with Distance Education Program (DEP)
• Limitations in shared vision practitioners. The details of DEP practitioners are shown in
Table 1.
• Limitations of institutional policies
• Limitations on strategic planning in implementing TABLE I. INFORMATION OF DEP PRACTITIONERS
digital transformation
Field Engineering Non-
• Limitations on government plans, visions and policies Engineering

• Insufficient Fund availability Number 9 8


Job Title Head of Secretary Course Lecture
• Limitations of time in incorporating digital technology Program of Coordinator Coordinator
Academic
• Limitations on clarity of vision regarding digital
transformation Number 5 3 7 2
Year of <1 year 2-3 years 4-6 years 7-9 >9
• Uncertainty in the economic environment in promoting
experience years year
the integration of ICT in core business processes in DEP
• Insufficient Digital Technology Skills Number 1 3 8 4 1
• Limitations of skills in organizational leadership in
generating ideas, planning ideas, and leading the As present in Table 1, a total of 17 key informants were
implementation of ideas obtained. The participants come from different field programs
at XYZ University. Also, all most participants have over four
• Limitations of leadership behavior years of experience in conducting DEP.
• The ability to embed ICT into the education system

77 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

B. Rating of Barriers transformation


After the critical barriers were identified, we identified the 3 Limitations of institutional policies
contextual relationship among the critical barriers and the 4 Lack of expertise in digital transformation or human resources
degree of difficulty to fix the barriers based on DEP executive
5 Limitations of time in incorporating digital technology
management’s input. A semi-structured interview with
executive managers at XYZ University was conducted to 6 Insufficient Fund availability
analyze the high-priority barriers (see Table 2). In this step, we 7 Identification of any IT Risk
focused on the determination of the level of importance of the
8 The ability to embed ICT into the education system
barriers, and the degree of difficulty to fix them based [6].
B. Determination of the High Priority Barriers
TABLE II. INFORMATION OF INTERVIEWED DEP MANAGERS The importance level was categorized into four categories
No Level Years of Years of [6] named Low (weak driving power and dependence power),
Work Experience in Moderate (weak driving power and strong dependence power),
Experience DEP High (strong driving power and weak dependence power), and
1 Dean > 12 years > 9 years Very High (strong driving power and dependence power). The
degree of difficulty to fix was categorized into four categories
2 Senior Manager 4 - 6 years 2 - 3 years [6] named No Action (No effort needed), Easy (Few efforts
3 Senior Manager > 12 years < 1 years needed), Moderate (Effort needed), and Difficult (Many efforts
4 Manager > 12 years 2 - 3 years needed). A summary result of the importance level and the
degree of difficulty to fix barriers to implement DEP in XYZ
5 Manager > 12 years < 1 years University is given in Table 4.
6 Manager > 12 years 4 – 6 years
7 Assistant Manager > 12 years 2 - 3 years TABLE IV. A SUMMARY RESULT OF THE BARRIERS OBSERVED IN THE
CASE STUDY
8 Assistant Manager 4 - 6 years 4 - 6 years
No Barriers to Driving Dependence The Difficulty
Implement Power Power Importance to Fix
C. Barriers Priority DEP Level
From the contextual relationships identified in the previous 1 Limitations in High Difficult
Strong Weak
step, we examine the driving forces and the dependency forces. shared vision
Based on driving forces and dependencies, the importance of 2 Limitations on Very High Difficult
critical barriers has been determined. Finally, a priority matrix strategic
of barriers was developed from mapping the importance and planning in
Strong Strong
difficulty of correcting each barriers [6]. implementing
digital
transformation
IV. RESULT
3 Limitations of High Moderate
A. Determination of Critical Barriers institutional Strong Weak
policies
Of the various existing barriers, 22 barriers were selected to
determine critical barriers related to the implementation of DT 4 Lack of Moderate Moderate
at XYZ University [6]. We used a questionnaire survey to expertise in
digital
capture the beliefs of DEP practitioners at XYZ University Weak Strong
transformation
about the seriousness of each barriers on a scale of 1-5 (very or human
mild, mild, moderate, serious, and very serious) regarding resources
critical barriers. Based on valid responses from the survey, the
level of seriousness of each barriers is identified based on the 5 Limitations of Low Moderate
time in
average score. Eight barriers with a mean score of more than
incorporating Weak Weak
3.41 were designated as critical barriers [14][15]. The critical digital
barriers in the implementation of DT at XYZ University are technology
shown in Table 3.
6 Insufficient Low Moderate
Fund Weak Weak
TABLE III. THE CRITICAL BARRIERS TO DT IMPLEMENTATION IN THE availability
CASE STUDY
7 Identification Low Difficult
Weak Weak
Barrier Barrier to Implement DEP of any IT Risk
Number 8 The ability to Very High Difficult
1 Limitations in shared vision embed ICT Strong Strong
into the
2 Limitations on strategic planning in implementing digital
education

78 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

system V. DISCUSSION

A. Study Findings
Based on Table 6, this study confirmed that there are three Higher education institutions all around the world are
critical barriers lies in the low-importance category facing digital technology pressure in the industrial revolution
(Limitations of time in incorporating digital technology), one 4.0 era [4]. Higher education institutions are digital
critical barrier lies in the moderate-importance category (Lack transformation to stay competitive [16]. The successful
of expertise in digital transformation or human resources), two implementation of digital transformation in higher education
critical barriers lies in high-importance category (Limitations in will warrant the sustainability of a higher education institution
shared vision, and Limitations of institutional policies), and in today’s digital era [17]. The digital transformation has many
two critical barriers lien in very high-importance category (The benefits in higher education, however, also faces some barriers
ability to embed ICT into the education system). related to organizational, social, technology, and culture that
Also, Table 4 shows the degree of difficulty to fix the emerge during the implementation of DT in higher education
barriers. In our findings, four critical barriers named institutions. Barriers diagnostic framework approach can be
‘Limitations of institutional policies’, ‘Lack of expertise in used to prioritize the barriers in the implementation of DT at
digital transformation or human resources’, ‘Limitations of XYZ University.
time in incorporating digital technology’, and ‘Insufficient Eight barriers namely ‘Limitations in shared vision’,
Fund availability’ are lying at a moderate level to fix the ‘Actionable plans based on strategy translation’, ‘Limitations
barriers, and four critical barriers named ‘Limitations in shared of institutional policies’, ‘Lack of expertise in digital
vision’, ‘Limitations on strategic planning in implementing transformation or human resources’, ‘Limitations of time in
digital transformation’, ‘Identification of any IT Risk’, and incorporating digital technology’, ‘Insufficient Fund
‘The ability to embed ICT into the education system’ are lying availability’, ‘Limitations of weak IT Infrastructure’, and ‘The
in difficulty level to fix the barriers. ability to embed ICT into the education system’ are the major
Once the two important aspects are determined (the barriers in the implementation of DEP at XYZ University. It
importance level, and the degree of difficulty to fix), the represents that the contextual constraint and technical
barriers priorities matrix has been developed as shown in Fig. constraint are the most important barriers in the implementation
2. The matrix is classified into four priority groups named of DEP at XYZ University. Moreover, the matrix indicates that
Priority 1, Priority 2, Priority 3, and Priority 4. ‘Actionable plans based on strategy translation ', The ability to
embed ICT into the education system', Limitations in shared
vision ', and 'Limitations of institutional policies' are at the
most priority of barriers. They lie at the first cluster. The study
findings of the matrix prioritized barriers to implementing DEP
put some information on the characteristics of the barriers. The
matrix of barriers offers a model to analyze the high priority
barriers to overcome them in order of significance.
The main finding that could be pointed is that of the overall
critical barriers to DEP implementation at XYZ University, it
seems that issues significantly only for two-dimensional aspect.
The first is the contextual barriers (organization aspect). In this
case, 1) Limitations in shared vision, 2) Actionable plans based
Fig. 2. Barriers priorities matrix to implement DT in the case study on strategy translation, 3) Limitations on government plans,
visions and policies, 4) Lack of expertise in digital
As presented in Fig. 2, we have identified the top four transformation or human resources, 5) Limitations of time in
priorities of barriers to implement DEP at XYZ University. incorporating digital technology, and 6) Insufficient Fund
‘Actionable plans based on strategy translation’, ‘Embedding availability. The second is the technical barriers including 1)
ICT into educational systems’, ‘Limitations in shared vision’, Limitations of weak IT Infrastructure, and 2) The ability to
and ‘Limitations of institutional policies’ have been identified embed ICT into the education system. Therefore, the findings
as the first highest priority cluster. While ‘Limitations on of this study clarify that there are still gaps between the belief
strategic planning in implementing digital transformation’, of the ability and the actual performance of DEP
‘Identification of any IT Risk’, ‘Limitations of time in implementation at XYZ University. This state will certainly
incorporating digital technology’, and ‘Insufficient Fund affect the effectiveness of the implementation [5][18]. Leaders
availability’ have been identified as the third-highest priority are responsible for designing the strategies to more successful
cluster. the implementation of DEP [19].

B. Implications for practice


The contribution of these findings is to give empirical
evidence on barriers to digital transformation in the higher
education sector. It also suggests a clear understanding to the
executive management of XYZ University on the barriers that

79 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

hamper the success of implementing DEP. More understanding Proceedings of 23rd Americas Conference on Information Systems,
of the high-priority barriers will help to find an effective and 2017.
efficient strategy. The following strategies are proposed to [4] NavitasVentures, “Digital transformation in Higher Education”, 2017.
[Online]. Available: https://www.navitasventures.com/wp-
overcome the critical barriers to implement DEP at XYZ content/uploads/2017/08/HE-Digital-Transformation-
University. _Navitas_Ventures_-EN.pdf (accessed January 2021).
[5] P. Reid, “Categories for barriers to adoption of instructional
1) Strategy 1: The management of XYZ University must technologies”, Education and Information Technology, vol. 19, pp. 383-
define clear strategic objectives and operational plans to the 407, 2014
organization’s vision statement. [6] B. R. Aditya, R. Ferdiana, and S. S. Kusumawardani, "Identifying and
2) Strategy 2: The management of XYZ University must prioritizing barriers to digital transformation in higher education: a case
study in Indonesia", International Journal of Innovation Science, 2021.
communicate clear strategic objectives and operational plans.
[7] K. Sandkuhl, and H. Lehmann, “Digital transformation in Higher
3) Strategy 3: The management of XYZ University must Education – the role of enterprise architectures and portals”, Digital
hire professionals or digital transformation specialists to align Enterprise Computing, Gesellschaft für Informatik, Bonn, pp. 49-60,
business strategy with digital transformation. 2017.
4) Strategy 4: The management of XYZ must provide [8] A. Thoring, D. Rudolph, and R. Vogl, “Digitalization of Higher
Education from a student’s point of view”, European Journal of Higher
more training and awareness about ICT adoption and Education IT, 2017.
integration into teaching and learning systems. [9] L. S. Rodrigues, “Challenges of Digital Transformation in Higher
Education Institutions: A brief discussion”, Proceedings of 30th IBIMA
C. Limitations of the study Conference, 2017.
The main limitation of this study was limited to specific [10] V. J. García-Morales, A. Garrido-Moreno, and R. Martín-Rojas, “The
contexts (Distance Education Program at XYZ University). Transformation of Higher Education After the COVID Disruption:
Thus, this finding is not for generalization. Emerging Challenges in an Online Learning Scenario”, Front. Psychol,
vol. 12, 2021.
[11] O. V. Yureva, L. A. Burganova, O. Y. Kukushkina, G. P. Myagkov, and
VI. CONCLUSION D. V. Syradoev, “Digital Transformation and Its Risks in Higher
Education: Students' and Teachers' Attitude”, Universal Journal of
Digital transformation was the best approach for improving Educational Research, vol. 8, no. 11B, pp. 5965-5971, 2020.
the quality of higher education institutions according to [12] A. Marks, M. AL-Ali, R. Attasi, A. A. Elkishk, and Y. Rezgui, “Digital
sustainability in the Industrial Revolution 4.0 era. Eight critical Transformation in Higher Education: Maturity and Challenges Post
barriers to implement the Distance Education Program (DEP) COVID-19”, Advances in Intelligent Systems and Computing, vol. 1330,
at XYZ University have been determined (6 barriers related to pp. 53-70, 2021.
the contextual domain and 2 barriers related to the technical [13] B. R. Aditya, R. Ferdiana, and S. S. Kusumawardani, “Digital
domain). In addition, our study also developed a barrier priority Transformation in Higher Education: A Barrier Framework”,
Proceedings of 3rd International Conference on Modern Educational
matrix to show the prioritize of the barriers. ‘Actionable plans Technology, 2021.
based on strategy translation’, ‘The ability to embed ICT into
[14] T. Mohammad, and C. P. Shafeeq, “Barriers to call practices in an EFL
the education system’, ‘Limitations in shared vision’, and context: a case study of preparatory year English courses”, International
‘Limitations of institutional policies’ have been examined as Journal of Distributed and Parallel Systems, vol. 7, pp. 1-11, 2016.
the top four priorities of barriers to implementing DEP at XYZ [15] J. R. Warmbrod, “Reporting and Interpreting Scores Derived from
University. An understanding of these prioritize barriers can be Likert-type Scales”, Journal of Agricultural Education vol, 55, pp. 0–47,
helpful to executive management at XYZ University to 2014.
successfully implement DEP and to overcome the barriers [16] C. Carolan, C. L. Davies, P. Crookes, S. McGhee, and M. Roxburgh,
effectively. ”COVID 19: disruptive impacts and transformative opportunities in
undergraduate nurse education”, Nurse Educ. Pract vol. 46, 2020.
[17] Z. G. Shatri, “Advantages and Disadvantages of Using Information
REFERENCES Technology in Learning Process of Students”, Journal of Turkish
[1] K. Matthews, C. Garratt, and D. Macdonald, “The Higher Education Science Education, vol. 17, no. 3, pp. 420-428, 2020
landscape: trends and implications”, Discussion Paper, Brisbane: The [18] F. Brunetti, D. T. Matt, A. Bonfanti, A. D. Longhi, G. Pedrini, and G.
University of Queensland, 2018. Orzes, “Digital transformation challenges: strategies emerging from a
[2] Y. Limani, E. Hajrizi, L. Stapleton, and M. Retkoceri, "Digital multi-stakeholder approach”, International Journal of Innovation
transformation readiness in Higher Education Institutions (HEI): the case Science, vol. 32, no. 04, 2020.
of Kosovo", IFAC Papers Online, vol. 52, no. 25, pp. 52-57, 2019. [19] V. Maltese, “Digital transformation challenges for universities: ensuring
[3] K. L. Wilms, C. Meske, S. Stieglitz, H. Decker, L. Froehlich, N. information consistency across digital services”, J. Cataloging
Jendrosch, S. Schaulies, R. Vogl, and D. Rudolph, “Digital Classification Quart, vol. 56, no. 7, 2018.
transformation in Higher Education– new cohorts, new requirements?”,

80 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

A Comparison of Lexicon-based and Transformer-


based Sentiment Analysis on Code-mixed of
Low-Resource Languages
Cuk Tho Yaya Heryadi Iman Herwidiana Kartowisastro
Computer Science Department, BINUS Computer Science Department, BINUS Computer Science Department, BINUS
Graduates Program – Doctor of Graduates Program – Doctor of Graduates Program – Doctor of
Computer Science Computer Science Computer Science
Bina Nusantara University Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia, 11480 Jakarta, Indonesia, 11480 Jakarta, Indonesia, 11480
[email protected] [email protected] [email protected]

Widodo Budiharto
Computer Science Department, BINUS
Graduates Program – Doctor of
Computer Science
Bina Nusantara University
Jakarta, Indonesia, 11480
[email protected],

Abstract—Sentiment analysis from code-mixed texts has service [5] , and sentiment on bitcoin prices during pandemic
been gaining wide attention in the past decade from researchers era [6]. However, those studies only applied on mono lingual
and practicians from various communities motivated, among languages.
others, by the increasing popularity of social media resulted in High popularity of social media among bilingual users
a huge volume of code-mixed texts. Sentiment analysis is an resulted in huge volume of mixed language (code-mixed)
interesting problem in Natural Language Processing with wide data. Studies on code-mixed sentiment analysis face some
potential applications, among others, to understand public difficulties including language identification, unstructured
concerns or aspirations toward some issues. This paper presents language, context of the sentence especially with the same
experimentation results aim to compare performance of lexicon-
words. In Indonesia, with more than 600 spoken languages,
based and Sentence-BERT as sentiment analysis models from
code-mixed of low-resources texts as input. In this study, some
mixing two or more languages has become common practices
code-mixed texts of Bahasa Indonesia and Javanese language in Indonesia including in social media communication. The
are used as sample of low-resource code-mixed languages. The mixing of one language with other language is known as code-
input dataset are first translated to English using Google mixed [7]. Similar word can be found many in Indonesian
Machine Translation. The Sentiwordnet and VADER are two language and local language, and between local languages.
English lexicon label datasets used in this study as basis for One of the examples is the word of “kesel” in Indonesian and
predicting sentiment category using lexicon-based sentiment Sundanese language means upset, whereas in Javanese
analysis method. In addition, a pretrained Sentence-BERT language means tired. Level of politeness of a word also
model is used as classification model from the translated input become one of the issues in the sentiment analysis.
text to English. In this study, the dataset is categorized into Code-mixed sentiment analysis is still a challenging
positives and negative categories. The model performance was research topic in Natural Language Processing field as it is a
measured using accuracy, precision, recall, and F1 score. The difficult task to be addressed using monolingual sentiment
experimentation found that the combined Google machine analysis methods. Research reported by [8] concludes that
translator and Sentence-BERT model achieved 83 % average analyzing code-mixed text sentiment cannot be done with the
accuracy, 90 % average precision, 76 % average recall, and 83 monolingual analysis method. Some of the main causes
% average F1 Score. include inconsistency in the writing of lexicons and the use of
Keywords— machine translation, sentiment analysis, lexicon- grammatical structures [9]; changes in the spelling of a word,
based approach, transformer model. the use of transliterated words from other languages, or the use
of sentence structures that do not comply with language
I. INTRODUCTION grammar [10]; or the use of non-standard words or phrases,
Indonesia as the fourth largest population country has use of abbreviated words or phrases, misspellings, use of
population more than 270 million with 160 million users of emojis or emoticons, or use of grammar constructs [11].
social media. With such numbers of social media users, huge Three main approaches in sentiment analysis is lexicon-
information can be extracted and used for many purposes. based approach, machine learning approach and combination
Extracting opinion or subjective information from text will of two is called hybrid approach. Machine learning approach
benefit many companies, institution or individuals, this offers better result especially for large dataset [12], however it
process is also known as sentiment analysis[1]. The result of is also its weakness. In order to perform well, machine
the subjective opinion in form of positive, negative, or neutral. learning approach needs large dataset for training. On the
Some of the studies in sentiment analysis were used to predict other hand, Lexicon based approach can be used for small
the presidential election result in Indonesia [2], disaster data dataset, but this approach depends on lexicon resources which
[3], [4]. Applying sentiment analysis can be applied to learn are rare for low resources language.
customer behavior such as opinion on internet provider

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

81 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Responding to the rare lexicon resources, following [13], among preceding combination of machine translation and
dataset were translated into English or other language using deep learning.
Google Translator as machine translation. Despite many In [15] five experiments were conducted with two of
sentiments analysis has been reported using code-mixed text. experiments applied combination of machine translation and
this paper is aimed to present performance of Google API deep learning approach for Arabic sentiment analysis. It
Translation combined with sentiment analysis method: showed that Lexicon based approach outperformed two other
lexicon-based method and pretrained Sentence-BERT model. experiment that combined machine translation and deep
Both methods are tested using code-mixed texts of Indonesian learning.
and Javanese languages. The advent of transformer-based models has motivated
The arrangement of this paper consists of five sections.
many researchers to use the models as text classification
First section discussed introduction of the study; related works
is presented in section 2. Section 3 describe the method, and models. The transformer model shown in figure 1 as a deep
the results are displayed in section 4. Result and discussion are structure neural network-based model has been proposed by
closed by conclusion in section 5. [16]. Many research reports have firmly established the
transformer model as state-of-the-art approaches in sequence
modelling such text classification and machine translation.

II. RELATED WORKS


Sentiment analysis task aims to categorized text into
several sentiment categories based its content. Given a set of
labelled texts {(𝑥! , 𝑦! ), (𝑥" , 𝑦" ), … , (𝑥# , 𝑦# )} where 𝑥$ is text
and 𝑦$ is sentiment category (data label). The objective of
sentiment analysis method is to estimate a 𝑓 model that maps
𝑋 to 𝑌 where 𝑋 = {𝑥! , 𝑥" , . . . , 𝑥# } and 𝑌 = {𝑦! , 𝑦" , . . . , 𝑦# }.
Hence, sentiment analysis is a supervised learning
(classification) problem.
Performance of sentiment analysis depends on many
factors including quality of input dataset and text
classification model. In particular, lexicon-based approach
sentiment analysis depends on how lexical resources
categorized the polarity of each sentence. Some of the
available lexical resources are in English such as
Sentiwordnet, WordNet-Affect, MPQA, SenticNet VADER
and Liu. Usually, polarity of lexicon resources divided into
three positive, negative, and neutral. However for simplicity
reason, in this study, only positive and negative categories are
used.
Lexicon based approach is used in this study because of
its easiness to understand compared to machine learning and
hybrid approach [14]. The basic of lexicon based approach Fig 1. The Transformer Model Architecture [16]
is that it uses Lexicon resources which contain score of each
word and summing the scores to determine the sentiment
polarity. The advantages of transformer model for sentiment
Since most of available lexical resources are in English, analysis, among others, is training process which have been
non-English dataset was translated into English. Text implemented involving a large volume of data. However, its
translation into English using manual translation by expert high number of model parameters require high computing
will increase the accuracy of the sentiment analysis, but this resources to adopt such pre-trained model.
comes with cost of the expert salary and not to mention the Our approach in this study is to compare performance of
time needed to translate. On the other hand, machine machine translation with lexicon-based approach and
translation offers fast translation and without any expense Transformer-based model for sentiment analysis with code-
needed. Some of the well-known machine translation is mixed text as input. As machine translation, in this study we
Google Translation API which is considered as statistical used Google Translation API to translate code-mixed text to
machine translation. English text. The translated texts are used as input for
Implementation of machine translation has been reported classification using lexicon-based approach and transformer-
by [15] for Arabic sentiment analysis. The study presents the based model.
result of combining the machine translation with deep III. METHOD
learning compared with the lexicon-based approach and
machine learning approach. Google translation was used in Method in this study consist of three major steps namely
that study with average precision of 68.58 and 74.10. This is data collection, pre-processing and sentiment classification
and evaluation as can be seen in figure 1. At data collection
lower than only using lexicon-based approach that reach
step, tweets were gathered by using the twitter API on account
to 76.79. Accuracy of lexicon based also achieved the highest that provides code-mixed tweets. Several algorithms for data
collecting are as follows.

82 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

First step of this study is gathering tweets by using the or question marks. The Extract_sentiment function is applied
Twitter API and this step is described in algorithm 1. The to all tweets to count score, negativity score and positivity
targeted twitter account is account that has tweets in code- score of each tweets.
mixed of Javanese language and Indonesian language.
The next step is pre-processing step including correcting
Accessing twitter will need consumer_key, consumer_secret,
access_code and access_secret that can be requested through writing of lexicons, grammatical structures, non-standard
twitter website. spelling of words, transliterated words from other languages,
Algorithm 1. Algorithm for gathering tweets abbreviated words or phrases; and removing emojis or
Begin emoticons. The cleaned dataset that already translated into
//access twitter by using consumer_key, consumer_secret, Indonesian language was then translated to English by using
access_toke, and access_secret Google Machine Translator. Python programming language
setup_twitter_oauth(consumer_key, consumer_secret, access key, was used to conduct the translation. Google Translation was
access_secret)
Open csv file
implemented by calling Googletrans library in python.
tweet <- follow twitterId(3571513339) Last step is developing sentiment analysis model. The first
tweet <- follow twitterId(4848632718)
tweet <- follow twitterId(2187988976)
model is lexicon-based method using VADER and
tweet <- follow twitterId(914578934) Sentiwordnet lexicon dataset. These lexicon resources have
tweet <- follow twitterId(2925324966) scoring method that will calculate the polarity score of each
tweet <- follow twitterId(1132104912685244416) sentence. In sentiment identification both VADER and
save csv file Sentiwordnet will calculate the negativity and positivity score
end
for each word and then summarized as the negativity and
positivity score for the sentence.
Algorithm 2. Algorithm for sentiment analysist
Begin
for all tweets from dataset.csv
do
retrieve tweets
preprocessing()
extract_sentiment()
counting_polarity()
save to csv file (tweets, score, scoring_string, negativity,
positivity, uncovered_tokens, total_tokens, polarity)
end for
end

function preprocessing() //function for preprocessing tweets


begin
transform cases to lower cases
filter stopword (english)
filter tokens (; ? # . = + - _ )
end function

function extract_sentiment()
begin
if lexicon_resources equal to VADER
set model to VADER
else
set model to Sentiwordnet
end if
set attributes to tweets
counting score for negativity and positivity
end function Fig 2. Research Workflow

function counting_polarity
The sentiment polarity will be divided into negative and
begin positive based on the following rules as summarized in Table
if (Negativity==0)&&(Positivity==0) 1. Two lexicon resources that are provided for lexicon-based
set polarity to “Positive” method namely: Sentiwordnet and VADER.
if (Negativity>Positivity)
set polarity to “Negative” TABLE I. SENTIMENT POLARITY RULES FOR LEXICON-BASED
else
set polarity to “Positive” No Requirement Final Polarity
end if
end if 1 Negativity > Positivity negative
end function 2 Negativity == 0 and positivity == 0 positive
3 Positivity >= Negativity Positive
Second step is pre-processing and described in algorithm
The second model explored in study is pretrained
2. Algorithm 2 shows consist of three functions namely
preprocesing, extract_sentiment and counting_polarity. sentence-BERT model proposed by Reimers and Gurevych
Preprocessing function is intended to prepare the dataset for [16] to categorize the input text into positive and negative
the next step. Transform cases is used to lower all the cases in categories. Finally, the result from sentiment analysis then
tweets, filter stopword is used to remove all the stopwords and compared with the manual labelling from native. Indicators of
the last one is to remove unneeded tokens such as semicolon performance are accuracy, recall, precision and F1 score.

83 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

IV. RESULT AND DISCUSSION Performance of the sentiment analysis result will be based
The dataset as input in this study consist of 1161 tweets. on accuracy and precision and recall and F1 score. Whereas
The sentiment is categorized into positive and negative P is precision, R is recall, A is accuracy and F is for F Score.
categories. Table 2 shows the sentiment polarity that has been TP is True Positive for related sentiment, FP is False
manually labelled with total of 1,161 tweets after pre- Positive, and FN is False Negative.
processing.
𝑇𝑃
TABLE II. DATASET CATEGORY 𝑃=
𝑇𝑃 + 𝐹𝑃 (1)
Sentiment Category Sample Numbers
𝑇𝑃
Negative 552 𝑅=
𝑇𝑃 + 𝐹𝑁 (2)
Positive 609
Σ𝑇𝑃
Total 1161 𝐴 = (3)
𝑛

Table 3 shows classification result of dataset from dataset 2∗𝑃∗𝑅


that was translated using Google Translation. Two Lexicon 𝐹%&'() =
𝑃+𝑅 (4)
resources were applied to the dataset.

TABLE III. CONFUSION MATRIX OF TESTING RESULT Table 4 shows that the best precision is achieved by
Lexicon - Lexicon - Sentence- Sentence-BERT model. The table also shows that Lexicon-
Sentiwordnet VADER BERT based model do not achieve high performance for both
Polarity Pred Pred Pred Pred Pred Pred negative and positive sentiments.
Neg Pos Neg Pos Neg Pos
Actual TABLE IV. SENTIMENT ANALYSIS MODEL PRECISION
236 316 243 309 501 51
Negative
Experiment Precision
Actual
141 468 72 537 146 463 Lexicon - Sentiwordnet 60%
Positive
Lexicon – VADER 63%
Total 377 784 315 846 647 514
Sentence-BERT 90%

As can be seen from Table 5, the best recall is achieved


by combination of Google translation, Lexicon-based, and
VADER dataset.

TABLE V. SENTIMENT ANALYSIS MODEL RECALL


Experiment Recall
Lexicon - Sentiwordnet 77%
Lexicon – VADER 88%
Sentence-BERT 76%

TABLE VI. SENTIMENT ANALYSIS MODEL ACCURACY


Fig 3. Sentiment Polarity distribution of Google Translation dataset using
Sentiwordnet Lexicon Resources Experiment Recall
Lexicon - Sentiwordnet 61%
Distribution of sentiment polarity using Sentiwordnet Lexicon – VADER 67%
lexicon resources is described in Figure 3. However different Sentence-BERT 83%
results occurred when using the VADER lexicon resources as
shown in Figure 4. Score of the sentiment is more scattered
TABLE VII. SENTIMENT ANALYSIS MODEL FSCORE
when using the VADER lexicon resources.
Experiment Fscore
Lexicon - Sentiwordnet 67%
Lexicon – VADER 74%
Sentence-BERT 83%

In terms of accuracy and F1 Score, Table 6 and Table 7


shows that the best accuracy and F1 Score achieved by the
Sentence-BERT model. Low accuracy and F1 Score of
lexicon-based model is partially caused by the lack
performance of Google machine translation to translate daily
words such as “ambyar”, “mas”, “kak” and “wkwk”.
Fortunately, those words were be ignored by both
Sentiwordnet and VADER. In addition, low accuracy of low
Fig 4. Sentiment Polarity distribution Graph using VADER Lexicon lexical-based model is cultural factors. Most of the tweets
Resources were classified manually by expert as positives. This due to

84 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

the nature of languages both Javanese and Indonesian. Most sentiment analysis for social media analytics Secur. Inform. 4
[15] Refaee E and Rieser V 2015 Benchmarking Machine Translated
of the word such as “ya”, “Tuhan”, “kayanya” or “seperti”
Sentiment Analysis for Arabic Tweets 71–8
can be found in many discussions in both Javanese and [16] Reimers N and Gurevych I 2019 Sentence-bert: Sentence
Indonesian language. Those words are translated into yes, embeddings using siamese bert-networks. arXiv preprint
God, rich and like, which are perceived as positive by arXiv:1908.10084.
Sentiwordnet and VADER. However, according to the word
context, the overall sentence represents negative sentiment.

V. CONCLUSION
This study showed that VADER dataset gave best
accuracy for both translated dataset using Google machine
translation. In term of recall, accuracy and F1, Sentence-
BERT model gave better performance compared to lexicon-
based method by recall, accuracy, and F1 except precision.
The experiment results showed that Javanese language
tend to be less expressive than English to represent neither
positive nor negative sentiment polarities. This cultural
language factor tend to hamper sentiment analysis
performance when using machine translation approach.
Considering that many words in local languages that has
sentiment of neutral will be translated as positives or
negatives, further study is needed to combine the machine
translation with machine learning approach for low resources
language.
REFERENCES
[1] De Leon F A L, Guéniat F and Madabushi H T 2020 CS-Embed-
francesita at SemEval-2020 Task 9: The effectiveness of code-
switched word embeddings for sentiment analysis
[2] Budiharto W and Meiliana M 2018 Prediction and analysis of
Indonesia Presidential election from Twitter using sentiment analysis
J. Big Data 5 1–10
[3] Fuadvy M J and Ibrahim R 2019 Multilingual Sentiment Analysis on
Social Media Disaster Data ICEEIE 2019 - Int. Conf. Electr. Electron.
Inf. Eng. Emerg. Innov. Technol. Sustain. Futur. 269–72
[4] Shalunts G, Backfried G and Prinz K 2014 Sentiment analysis of
German social media data for natural disasters ISCRAM 2014 Conf.
Proc. - 11th Int. Conf. Inf. Syst. Cris. Response Manag. 752–6
[5] Napitu F, Bijaksana M A, Trisetyarso A and Heryadi Y 2018 Twitter
opinion mining predicts broadband internet’s customer churn rate
2017 IEEE Int. Conf. Cybern. Comput. Intell. Cybern. 2017 - Proc.
2017-Novem 141–5
[6] Pano T and Kashef R 2020 A Complete VADER-Based Sentiment
Analysis of Bitcoin ( BTC ) Tweets during the Era of COVID-19
[7] Chandu K, Loginova E, Gupta V, Genabith J van, Neumann G,
Chinnakotla M, Nyberg E and Black A W 2019 Code-Mixed Question
Answering Challenge: Crowd-sourcing Data and Techniques 29–38
[8] Vilares D, Alonso MA, and Gómez-Rodriguez C 2015 Sentiment
analysis on monolingual, multilingual and code-switching twitter
corpora Proceedings of the 6th Workshop on Computational
Approaches to Subjectivity, Sentiment and Social Media Analysis,
2015 2–8.
[9] Pravalika A, Oza V, Meghana N P, and Kamath S S 2017 Domain-
specific sentiment analysis approaches for code-mixed social network
data 2017 8th international conference on computing, communication
and networking technologies (ICCCNT), 2017 1–6.
[10] Choudhury M, Saraf R, Jain V, Mukherjee A, Sarkar S, and Basu A
2007 Investigation and modeling of the structure of texting language
Int. J. Doc. Anal. Recognit., vol. 10, no. 3–4, 2007 157–174.
[11] Singh P and Lefever E 2020 LT3 at SemEval-2020 Task 9: Cross-
lingual Embeddings for Sentiment Analysis of Hinglish Social Media
Text arXiv Prepr. arXiv2010.11019.
[12] Wang Z, Joo V, Tong C and Chan D 2015 Issues of social data
analytics with a new method for sentiment analysis of social media
data Proc. Int. Conf. Cloud Comput. Technol. Sci. CloudCom 2015-
Febru 899–904
[13] Søgaard A, Vulić I, Ruder S and Faruqui M 2019 Cross-Lingual Word
Embeddings Synth. Lect. Hum. Lang. Technol. 12 1–132
[14] Jurek A, Mulvenna M D and Bi Y 2015 Improved lexicon-based

85 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Estimation of Technology Acceptance Model


(TAM) on the Adoption of Technology in the
Learning Process Using Structural Equation
Modeling (SEM) with Bayesian Approach
1st Elok Fitriani Rafikasari 2nd Nur Iriawan
Department of Islamic Economic, Faculty of Islamic Economic Department of Statistics, Faculty of Science and Data Analytics
and Business Institut Teknologi Sepuluh Nopember
UIN Sayyid Ali Rahmatullah Tulungagung Surabaya, Indonesia
Tulungagung, Indonesia [email protected]
[email protected]

Abstract—Employing computers in the learning technology Davis on [3] said that the most appropriate method to measure
becomes very important in every classroom learning activity. In the level of acceptance to this technology adoption is
fact, the use of computers technology in classroom, however, is Technology Acceptance Model (TAM). The effectiveness of
often ignored and very rare. Therefore, it is necessary to do a TAM on knowing adoption acceptance of technology has been
research on teachers’ perception of acceptance in the use of acknowledged by researchers [4].
computers technology in their teaching and learning process
inside classroom. The most appropriate method to measure the The application of TAM on education that is ever
level of acceptance technology adoption is TAM. This method is conducted is about receiving technology of computer by
structured as a hierarchical structure and the analysis requires teacher and find out that teacher has authority on his or her
an appropriate statistical analysis tools, namely SEM. There are learning activity including using technology (Ma, Anderson
some assumptions which must be fulfilled in the SEM analysis, and Streith, 2005 on [1]). The development of teacher’s
including large sample size, and all of the observed value must positive behavior on the computer must be important as
be multivariate normally distributed. These requirements are indicator of the effectiveness of using computer in the
frequently cannot match with the conditions in the real world classroom (Lawton and Gershner, 1982 on [1]). TAM has two
therefore, SEM would not be applicable. This research was variables that influence level of acceptance and using of
conducted to only 30 teachers on SMP BSS Malang by technology. The first variable is Perceive Ease of Use (PE) and
employing Bayesian SEM which is proposed to overcome the second variable is Perceive Usefulness (PU) (Davis, 1989 on
restriction to fulfill the SEM requirement. The results show that
[1]). The factors in the TAM will correlate each other as
technology acceptance during the learning process in this school
are influenced by Perceived Ease of Use and Perceived
ordinary and graphical Causal Modelling. This method is
Usefulness which are dominated significantly by Subjective structured as a hierarchical structure and the analysis requires
Norm, Innovativeness, Training, Experience and Facilitating an appropriate statistical analysis tools, namely Structural
Conditions. Equation Modeling (SEM).
Analysis of SEM that includes model of analysis factor has
Keywords—bayesian, learning process technology, SEM, been used widely in every knowledge board. SEM is one of
TAM
multivariate technique that combines multiple regression
I. INTRODUCTION aspect (examining correlation of mutualism) and factor
analysis (describing an unmeasured concept with multiple
Progressing global competition of technology application variables) to estimate mutualism correlation as simulate so [5].
makes it important along with every side of people’s effort, According to [6] classification of SEM is latent variable that
that is education. The using of educational technology on has linier correlation and all of the observed value must be
education process is one of key factor on educator’s success. multivariate normally distributed. SEM will produce a valid
It also has advantage to component of computer on learning equation when its assumption fulfills.
activity in the classroom, but in the classroom, it is always
neglected and the using is minimum (Lim and Khine, 2006 on There are some assumptions which must be fulfilled in the
[1]). This factor is unfortunately because its function on SEM analysis, including large sample size in normal theory or
learning has important rule of students’ learning activity in the it have to be wider its sample when it uses ADF
classroom so that the students can understand visual lesson by (Asymptotically Distribution Free) approach to manage
using technology that integrate with the curriculum (American abnormal data [7], linear relationship between latent variables,
Psychological Association, 1997 on [2]). Research based on and all of the observed value must be multivariate normally
teacher’s perception in the using computer to process learning distributed. These requirements are frequently cannot match
activity in the classroom needs to be analyzed as far as the with the conditions in the real world and, therefore, SEM
receiving technology of computer by the teacher. would not be applicable [6]. SEM with Bayesian approach
proposed to overcome the restriction to fulfill the SEM
Some of models that are applied to analyze and understand requirement and it does not has to fulfill assumption that was
many factors affect acceptance information technology each on the standard SEM. Bayesian SEM is referenced by Markov
other, Theory of Reasoned Action (TRA), Theory of Planned Chain Monte Carlo (MCMC) but reliable for small sample
Behavior (TPB) and Technology Acceptance Model (TAM). size does not depend on the asymthotic theory. On the other

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

86 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

hand, estimation score of latent variable can be gotten through = mxn coefficient matrix of exogenous latent variable
= qx1 exogenous variable vector
posterior simulation of MCMC uses Gibbs Sampler algorithm

= px1 endogenous variable vector


[6]. MCMC of Gibbs Sampler algorithm makes posterior
analysis simpler than classical method [8].
The main purpose of this research is to make model of = px1 error vector in equation
technology acceptance in the learning process by teacher’s
SMP Brawijaya Smart School Malang. It can be known by q = number of exogenous variable (q=n)
teacher’s perception of acceptance in employing computer p = number of endogenous variable (p=m)
= 0; = 0; = 0;
technology on the learning process in the classroom according


to structure of Technology Acceptance Model (TAM) using with assumptions: not
Structural Equation Modeling (SEM) with Bayesian correlated with ; and is nonsingular matrix.
approach. Another latent variable, SEM also familiar with
II. LITERATURE REVIEW observation variable. It is named with manifest variable,
measures indicator and proxies. It is differentiated into two
A. Technology Acceptance Model (TAM) those are observed variable of exogenous latent variable and
In 1989, Davis describes Technology Accpentance Model endogenous latent variable. The following form of observed
variable is:
= +
(TAM) that is applicated to understand the bahaviour and
motivational factors so that its influences adoption and the (2)
= +
using Information Technology (IT). The main TAM model is
theoryof reasoned action that states one premis reactes. (3)
Someone’s point of view will decide attitude and behavior.
with:
= px1 indicator vector of ,
According to the user of IT will affets his or her attitude in
receiving it. The factor that influences his or hers is the
= qx1 indicator vector of ,
advantage as a rasionable action on using technology. Finally,

= px1 error measurement for y,


someone’s reason about the advantage of IT is become as
refision in the receiving technology. The conclussion is the
= qx1 error measurement for x,
usefulness and ease of use is believed to decide moral and

In CFA, distributes 0,
dignity of the user and receiver’s adoption of the Information
technology [4]. with covariance matrix for
B. Structural Equation Modeling (SEM) is definite positive so that variance and covariance matrix x
of is the following formulated:
= +
SEM is method that is capable to show correlation
simultaneously between indicator’s variables which is directly (4)
observed than laten’s variables which is indirectly observed.
Raykov and Marcaulides defines latent’s variable is theory or SEM model that discusses above is named standard SEM
important constructive hypothesis that has no sample or or also called LISREL model. In standard SEM, the valid
population being observed directly. Most of characteristics of results requires several assumptions that must be fulfilled
SEM according to [9] is the following: (i) SEM model cannot including latent variables must be multivariate normally
be measured indirectly and defined well, (ii) SME model distributed, large sample size, between indicator variable and
Figures out problem potential of measurement in every another latent variable have linier relationship [7]. Just like on
observation variable, especially independent variable and (iii) a standard SEM, SEM with the Bayesian approach also
SEM model is very right to be formed matrix that connect its consists of common observation variables and structural
every variable such as covariance and correlation matrix. equation. Analogous to equation (2) and (3), the equations of
observation variables which will be used next is:
= +
SEM is an integrated approach of Confirmatory Factor
(5)
with = , , partitioned into q1x1 endogenous latent
Analysis (CFA) and Path Analysis. According to [9], CFA and
Path Analysis is the SEM model. Path Analysis can be used to
observe the correlation about observed variable. Some of variable vector , and q2x1 exogenous latent variable vector
researchers suppose that it does not include in SEM model,
. So that structural equation that explains correlation
nevertheless they acknowledge that it is something important.
between endogenous latent variable and exogenous latent
CFA Model used to examine model of relationship between
variable are:
=! + +
several latent construct including some of that is measured
through observed indicators. Bollen defines latent variable is (6)
with "#$%#$ and &#$ %#' are parameter matrix of regression
unobserved factor. It is differentiated in two those are

coefficient, and ( are error vector (q1x1).


exogenous and endogenous variables. Exogenous variable is
latent variable that cannot be influenced by another one, while
endogenous variable is latent variable that can be affected by C. Adoption of Learning Technology
another one [10]. According to Bollen these are the equation
model of its variable: Learning technology is looked as tool of technology that
= + +
correlates with using tool and media to reach goal of
(1) education or to teach by supporting audio – visual. It is part
with: of three components that correlate each other, media of
=mxm coefficient matrix of endogenous latent variable
education, learning psychology and approaching system on
the education [11]. Paradigm of Learning Technology in

87 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

1994 defines that learning technology as arranging theory and d. Determine the prior distribution for each parameter to
practice, developing, benefitting, managing, and evaluating be estimated
the process and resource of learning (Seels and Richey, 1994 e. Applying MCMC with Gibbs Sampler algorithm for
on [12]). the full conditional posterior distribution of the model
Information Technology (IT) in education has potential to to acquire and parameter estimation
improve education intuitively. Unfortunately, many reasons f. Validation of the model
that using technology of education on learning and teaching
B. Descriptive Data Analysis
activity are a economical and dangerous. But there is reason
that technology of education can give good education. The The result of the questionair appeared that 91% respondent
effectiveness of education technology can be known whether chooses high category, 35 % is third category (agree), 20% is
it can give certain result and adopted to apply continually fourth category (more agree), and 36% is fifth category
(Davis, 1989 on [13]). Learning technology can improve (strongly agree). A chosen category is third category (agree)
and fifth category (strongly agree). Mean score, deviation
teaching of education. It has make simple experiment to
standard and coefficient is descriptive analysis that is
change teaching method in the schools and universities
explained on the Table 1.
(Moser, 2007 on [14]). The main of learning technology is
media of communication that develops quickly and it can be TABLE I. MEAN VALUE, STANDARD DEVIATION, AND SKEWNESS
beneficiate in education. COEFFICIENT FOR EACH INDICATOR VARIABLE
VAR MEAN STDEV SKEW VAR MEAN STDEV SKEW
III. ANALYSIS AND DISCUSSION
SN1 3,367 1,15917 0,20537 FC3 3,233 1,0063 0.582
A. Data Sources and Analysis Methods SN2 3,5 1,25258 -0,169 PU1 4,433 0,77385 -0.958
I1 4,167 0,87428 -0,344 PU2 4,4 0,77013 -0.854
This study uses primary data that collected from a survey I2 3,467 1,07425 0,182 PU3 4,167 0,91287 -0.351
on 30 teachers of SMP Brawijaya Smart School Malang. The I3 4,333 0,8023 -0,699 PU4 4,533 0,68145 -1.179
questionnaire in this study consisted of 29 questions related T1 3,867 0,81931 -0,144 PE1 3,733 0,94443 0.582
to teachers' perceptions of SMP Brawijaya Smart School T2 3,833 0,94989 0,096 PE2 3,9 1,02889 0.008
T3 3,767 1,07265 -0,217 PE3 3,6 1,06997 0.174
Malang in the acceptance of technology in the learning
T4 3,3 1,05536 0,098 PE4 3,7 0,83666 0.636
process. The question is an indicator for the latent variables T5 3,233 0,93526 0,039 BI1 3,867 0,9732 0.042
and the external main latent variables in TAM. In the E1 4,2 1,15669 -1,421 BI2 4,033 0,92786 -0,069
perception fill in the questionnaire, the respondents' answers E2 4,1 0,95953 -0,462 BI3 3,867 0,9371 0,012
in the form of a Likert scale bounded in 5 (five) categories E3 4,267 0,86834 -0,568 AU1 3,533 1,30604 -0,725
ranging from "Strongly Disagree" to "Strongly Agree". FC1 3,1 1,18467 -0,205 AU2 4,333 0,88409 -1,057
FC2 3,067 1,11211 0,183
The main variable in this study is, Perceived Usefulness
(PU), Perceived Ease of Use (PE), Behavioral Intention to
Use (BI) and the Actual System of Use (AU). PU and PE in According to questionair result and skewness coefficient that
TAM influenced by external variables. In this study, the is appeared on the Table 1 can be concluded that data
external variables identified by previous studies using the distribution has more curved in the left than normal
TAM and also based on the real condition of the object of distribution. This is caused by category frequency that
research. External variables that used in this study is the correspondents’ answers have high score.
Subjective Norm (SN), Innovativeness (I), Training (T), C. Validity and Reliability of Research Instrument
Experience (E) and Facilitating Conditions (FC).
The result of validity uses Pearson Product Moment. It
Analysis methods of this research is as follows:
showed that all variables can be explained valid by co-
1) Determine the basic TAM model design appropriate
efficient validity r-count > r-table of level significant 5% with
educational technology to model acceptance by teachers
smaller coefficient 0.62 and higher 0.946. Reliability test
of SMP Brawijaya Smart School Malang
appeared that all variables with smaller alpha is 0.59
2) Determine the variables thought influence the acceptance
according to [15] and higher alpha is 0.914.
of educational technology by teachers of SMP Brawijaya
Smart School Malang D. Bayesian SEM Estimation
a. Determine the observation variables for each latent Before doing estimation using Bayesian’s approach, firstly
variable the data is changed becoming continue data of distribution N
b. Constructing a questionnaire survey (0.1) with deciding threshold score because the questioner
c. Conducting the survey result showed respondents’ answer tend to high category [6]
d. Perform data entry survey deciding threshold is conducted by every frequency category
3) Perform the TAM model estimation for educational on the indicator variable then counting propotition and its
technology acceptance by teachers of SMP Brawijaya cummulative under normal distribution with the zero mean

determining )* as minimum threshold value is – 100 and )+


Smart School Malang using SEM with Bayesian approach and variance equal to one to get the threshold value and
a. Determine the measurement model and the structural
model as maximum threshold value is 100.
b. Determine matrix the parameters to be estimated
c. Calculate the threshold (α) for each research variable
to change the categorical data into continuous data (Y)
with N(0,1) distribution

88 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

/ and ./ has nonlinear pattern and nonlinear pattern are


thought to occur due to the quadratic effect of latent variable
/ . Matrix structural equations of nonlinear models is as
follows:
./ 0 −0,8185 0 0 ./
.0 0 0 0 0 .0
-. 3 = - 3- 3
1 0,2246 0,5917 0 0 .1
(a) (b) .2 0 0 0,8476 0 .2
Fig. 1. Prior Distribution Structure for (a) Measurement and (b) Stuctural
⎡ ⎤
/
1,118 0,8646 1,045 1,051 1,044 1,242 ⎢
Equation
0
⎥ /
1,119 0,2586 0,4872 0,2298 0,6851 0
+- 3⎢ 1⎥
+ - 03
0 0 0 0 0 0 ⎢ 2⎥
Next step is determining prior distribution that is used. It is
1
0 0 0 0 0 0 ⎢ +⎥
conjugate prior that focuses on Lee’s research (2007) in
2
⎣ /⎦
0
where on Fig. 1 show prior distribution structure for
measurement and structural equation that is used. It
completes of its parameter can be see on the Table 2. Parameter estimation of nonlinear TAM model is conducted
by adding prior distribution. The result showed that there are
TABLE II. PRIOR DISTRIBUTION six nonsignificant relationship there are 1) the using easy
No Parameter Model perception (PE) by the usefulness perception (PU), 2)
1 Θ δ ~ Invers Gamma(10,8) Innovativeness (I) with PE, 3) Innovativeness (I) with PU, 4)
Training (T) with PE, 5) Experience (E) with PE and 6)
2 Θ ε ~ Invers Gamma(10,8) Facilitating Condition (FC) with PE.
3 [Λ θ ] ~ Normal[0.6;4θ ]
x δ ε
Selection of the best model is conducted by differenting
4 [Λ θ ] ~ Normal[0.6;4θ ]
y δ ε
structural equation error score between linear SEM and
nonlinear SEM. The model of smallest structural equation
5 ξ ~ Multivaria te Normal (0, φ ) error is a good model. Table 3 shows it in every model. From
the table is known that it is for smaller linear model than
 8.0 0 .0 0 .0 0 .0 0 .0  
  nonlinear model.
  0 .0 8.0 0.0 0.0 0.0  
6 φ ~ IW  0.0 0.0 8.0 0.0 0.0  ,30  TABLE III. ERROR OF STRUCTURAL EQUATION TAM MODEL ON
THE ADOPTION OF TECHNOLOGY IN THE LEARNING PROCESS
 
  0 .0 0 .0 0 .0 8 .0 0 .0  
   Error Structural Equation Linear Model Nonlinear Model
  0 .0 0.0 0.0 0.0 8.0   / -0,11349 -0,18064
0 -0,09501 -0,14294
1
7 Θ η ~ Invers Gamma (10 ,8) -0,0174 -0,03473
8 β ~ Normal (1 .1;10 .0θ ) 2 0,078416 0,073727

9 γ ~ Normal (1 .5;9.0θ ) Selection of the best model is also done through the
10 θ ~ Invers Gamma(10,8) calculation of Bayesian Information Criterion (BIC). The
best model which made the selection of the best model has a

nonlinear models (DEFGHIJ I KL ) is:


smaller BIC value [16]. BIC value for SEM TAM on
parameter estimation for , on TAM model that all significant
The result of estimation uses WinBUGS being gotten
1
I

DEF* = −2 log P|ΨS |T 0 VWX Y− Z [ − Λ]/ ^ ΨST/ [ − Λ]/ `a


IU
2
range between 0.5069 – 1.049. This shows that all indicators
_/
+ b* log c = 268,51 + 121W log 30 = 446,783
be able to explain its measured latent variable. There is
BIC values for SEM TAM on linear models (DEFd I KL ) is:
relationship between PE and PU that not significant. The
1
matrix of structural equation is: I

./ 0 0.3832 0 0 ./ DEF/ = −2 log P|ΨS∗ |T 0 VWX Y− Z [ − Λ∗ ]∗ ^ ΨST/ [ − Λ∗ ]∗ `a


IU

.0 2
0 0 0 0 .0
-. 3 = - 3- 3
_/
0.3048 0.6412 0 0 .1 + b/ log c = 240,9 + 120W log 30 = 418,154
Comparison of DEFd I KL with DEFGHIJ I KL for TAM on the
1
.2 0 0 0.8537 0 .2

1.147 1.172 1.238 1.728 1.202 ⎡ ⎤ DEF*/ = DEF* − DEF/ = 446,783 − 418,154 = 28,628
/
adoption of technology in the learning process is:
/
1.205 0.9868 0.9813 1.688 0.9967 ⎢ ⎥
0
+- 3 ⎢ 1⎥ + - 03
0 0 0 0 0
⎢ 2⎥
1
The results of the difference BIC value is equal to 28.628,
0 0 0 0 0
⎣ +⎦ 2
which means that there is a strong enough reason to choose
SEM TAM linear models as a better model.
Before being interpretation the result of estimation, firstly
it must be done analysis on latent variable to know the
effectiveness of the model. Latent variable analysis is done by
regressing the factor score of each exogenous latent variable
with endogenous latent variable based on the model. To know
accurate relationship is done test of Lack of Fit (LOF) to know
quallivite of the model produced. The result of LOF test

between / and ./ so it is possible that the relationship between


showed that there is a significant of LOF score. That is

89 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

that it can develop learning activities from making learning


process, learning stuff up to learning evaluation. The positive
effect of it can give many advantages for them so that the
correlation between PE and PU is significant. The correlation
latent variable is between PE and PU show a positive and
significant. It means that those affect significant through
someone’s tendency on using computer and someone’s
habitual actions on better learng process.
IV. CONCLUSIONS
TAM model that matches to demonstrate technology on
the learning process by teachers’ SMP Brawijaya Smart
School of Malang is TAM with four major variables and five
external variables. TAM model has 29 indicator variables that
become measuring variable of major variable and external
Fig. 2 Structure of SEM TAM Estimation Result on the Adoption of variable. The acceptance teachers’ SMP Brawijaya Smart
Technology in the Learning Process School of Malang on the technology adoption of learning
process can be known based on the perceive Ease of Use (PU)
After being gotten it, the next is conducted interpretation and Perceive Usefulness (PE). Variable that affects significant
the result of the good model that is linear model. Fig. 2 above on the PE and PU are Subjective Norm (SN), Innovativeness
shows the result of SEM TAM estimation of Technology (I), Experience (E), Training (T) and Facilitating Conditions
Adoption of Learning Process clearly and completely with (FC). Those affect on someone’s tendency using computer
it’s the structure of relationship. From that, it is gotten result and someone’s habitual action on using computer everyday so
that all external variables affect significantly through the that they can progress well.
advantage of perception (PU) and easy perception (PE).
Subjective Norm (SN) influences significant through PE and REFERENCES
PU because technology adoption on the learning process that [1] Teo, T., Lee, C.B. and Chai, C.S., “Understanding Pre-Service
Teachers’ Computer Attitudes: Applying and Extending the
has mandatory at SMP Brawijaya Smart School Malang. It Technology Acceptance Model”, Journal of Computer Assisted
still has not enough mandatory. This affects to the teachers Learning 24, pp. 128-143, 2007.
still using their subjective perception when they use it. This [2] Wozney, L., Venkatesh, V. and Abrami, P.C., “Implementing
will be different if it has enough. The teachers must use Computer Technologies: Teachers’ Perceptions and Practices”,
computer in every teaching and learning without priorifying Journal. of Technology and Teacher Education 14(1), pp. 173-207,
2006.
their subjective perception.
[3] Khosrow-Pour, M., “Case on Information Technology and Business
Innovativeness (I) of using computer will affect a positive Process Reengineering”, Idea Group Publishing, United States of
and significant on the PU and PE. This case can be looked on America, 2006.
teachers’ question that states they has been usual using [4] Lee, Y.C., Li, M.L., Yen, T.M. and Huang, T.H., “Analysis of
computer to make learning matery creatively such as Adopting an Integrated Decision Making Trial and Evaluation
Laboratory on a Technology Acceptance Model”, Journal of Expert
animation game’s education that improves students’ interest System with Application, Chung-Hua University, Taiwan, 2010.
to enjoy the learning process. Training (T) affects a positive [5] Hair, J.F., Anderson, R.E. and Tatham, R.L., “Multivariate Analysis, 5
and significant on the PU and PE. The effect of this research Edition”, Prentice Hall International, Inc., 1998.
can be looked up to the teachers’ opinion states that it is given [6] Lee, S. Y., “Structural Equation Modeling: A Bayesian Approach”,
be able to help understanding computer. It is also given John Wiley & Sons, Ltd., 2007.
making media base ICT. The level of teachers on [7] Lee, S.Y. and Song, X.Y., “Evaluation of the Bayesian and Maximum
understanding computer and ability to make learning media Likelihood Approaches in Analysing Structural Equation Models with
Small Sample Sizes”, Multivariate Behavioral Research Vol. 39 No.4,
can improve after this training. Experience (E) uses computer pp. 653-686, 2004.
directly measuring to know teachers’ perception through their [8] Anggorowati, M.A., Iriawan, N., Suhartono and Gautama, H.,
experience. It has positive and significant through PE and PU “Restructuring and Expanding Technology Acceptance Model:
because learning activity uses computer is conducted almost Structural Equation Model and Bayesian Approach”, American Journal
of Applied Sciences 9(4), pp.496-504, 2012.
every day so that it will affect easy directly and many
[9] Raykov, T. and Marcoulides, “A First Couse in Structural Equation
advantages. Facilitating Conditions (FC) can be looked up Modeling”, Lawrence Erlbaum Associates, USA., 2006.
that there is facility of the school such as guiding that will [10] Bollen, K.A., “Structural Equation with Latentt Variables”, Dept. of
help the teachers that find out difficulty on using computer. Sociology The University of North Carolina, Chapel Hill North
This facility will affect positive and significant to the PU and Carolina, 1989.
PE because it will help them to improve their perception on [11] Sudrajat, A., “Teknologi Pembelajaran”,
using it. akhmadsudrajat.wordpress.com/2008/04/20/teknologi-pembelajaran/,
2008
The relationship between PE and PU is not significant
[12] Setyosari, P., “Isu-isu Terkini dalam Pengembangan Penelitian TEP”,
because Operating System (OS) that is used on the teachers’ tep.ac.id/berita-isuisu-terkini-dalam-pengembangan-penelitian-tep-
computer at SMP Brawijaya Smart School Malang uses OS .html, 2008
Windows that has been completed by many easy ones to [13] Teo, T., “Technology Acceptance in Education”,
operate it such as Microsoft Office Word, Microsoft Office Rotterdam/Boston/Taipei: Sense Publishers, 2011.
Excel, and Microsoft Office PowerPoint. Facility that has [14] Zhu, C., “Teacher Roles and Adoption of Educational Technology in
been supplied Microsoft helps them to operate computer so the Chinese Context”, Journal for Educational Research Online,
Volume 2 (2010), No. 2, pp.72-86, 2010.

90 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

[15] Hair, Jr., Black, B., Babin, B., Anderson, R.E., and Tatham, R.L.,
“Multivariate Data Analysis”, 6th edition, Prentice Hall, 2006.
[16] Anggorowati, M.A., “Pengembangan Metode Estimasi SEM Non-
Standar pada Analisis Technology Acceptance Model”, Disertasi,
Surabaya: Program Pascasarjana, Institut Teknologi Sepuluh
Nopember, 2013.

91 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Predicting Stock Market Prices using Time


Series SARIMA
Daryl Aditya Winata Sena Kumara Derwin Suhartono
Computer Science Computer Science Computer Science Computer Science
Department, School of Department, School of Department, School of Department, School of
Computer Science, Computer Science, Computer Science, Computer Science,
Bina Nusantara University, Bina Nusantara University, Bina Nusantara University, Bina Nusantara University,
Jakarta, Indonesia Jakarta, Indonesia Jakarta, Indonesia Jakarta, Indonesia
[email protected] [email protected] [email protected] [email protected]

Abstract - Companies nowadays are not owned by a exchange rate, and investors’ expectations and
single person or group who works in said company. They psychological factors. Because of this stock market
are owned by multiple people who have a portion of the values can have drastic changes over a period of time
share belonging to the company, these shares are usually thus making it extremely unpredictable.
called stocks. Stocks are commonly traded in the modern
age as it has the possibility to yield high amounts of profit.
The use of time series to try and predict future stocks is
an ability desired by many. Thus we conducted research
to predict Apple’s stock prices using the “SARIMA”
model. “SARIMA” model is a conventional model based
on statistics that are often used to predict the stock
market. This is because stock market prices are not static
and would often vary over time which “SARIMA” is able
to predict. Thus we created 3 “SARIMA” stock
predicting models with 580.165, 451.591, 114.612 AIC
scores respectively, and found that the best model had a Fig. 1 Stock Price of Prospect Capital Corporation [2]
MAPE score of 36.05%. We concluded that although the
algorithm is working as intended, it is ultimately unable
to accurately predict the real-time stock market value of Fig. 1 shows how over just 6 hours of a day, there
the Apple company. have been multiple fluctuations in the stock prices of
PCC. To deal with the unpredictability of the stock
Keywords - Apple stock market, SARIMA, market, most researchers use models which can be
sentiment analysis, time series analysis classified into 2 categories, statistical models and
artificial intelligence models [3]. The statistical
models most commonly used include exponential
smoothing, autoregressive integrated moving average
I. INTRODUCTION
model, and generalized autoregressive conditional
The stock market is a market that is volatile. This heteroskedasticity volatility model [4]. These
is one of the reasons why investors are dedicated to statistical models are generally used when a linear
this field and it is always a hot topic for researchers correlation exists among the time series values. On the
from both financial and technical domains as the other hand, the most commonly used artificial
ability to forecast the capital market price is critical for intelligence model includes the use of the artificial
both investors and analysts. [1] Predicting the stock neural network (ANN) and genetic algorithm. Both
market is challenging due to the fact that Stock Prices these algorithms have been used to improve the
are affected by many macro-economical factors such forecasts of stock prices particularly the non-linear
as political events, firms’ policies, general economic time series. With all said and done, we as the authors
conditions, commodity price indexes, interests and of this paper, aim to find out how reliable the

978-1-6654-4002-8/21/ ©2021 IEEE

92 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

“SARIMA” model is when predicting future stock and will affect other markets in the globe. This
prices. uncertainty or volatility stirs investor’s desire to try
and predict the prices so they can safely spend their
money.
II. LITERATURE REVIEW
There have been multiple previous types of
research that delve into predicting the stock market,
Time series analysis is a statistical data analysis
each approaching the problem differently with
based on time. Time series implies that the data is
methods ranging from using similar time series as our
grouped based on a time frame. Time series analysis is
research to more unique approaches such as deep
valuable to analyze trends. Usually, the time-series
learning. [7] created multiple predictive models which
graph is used in climate analysis, earthquake
are linear model, generalized linear model, and
prediction, signal processing, and astronomy. Further
recurrent neural network. They concluded that the
analyzing the nature of the algorithm, it is theoretically
GLM and RNN models failed to improve the
possible to use it to predict future market stocks.
forecasting precision. [8] tried a deep learning
There are several popular terms in time series approach in predicting daily stock closing prices. They
Analysis such as autocorrelation, seasonality, and used EWT-based decomposition, dropout strategy and
stationary. Autocorrelation occurs when a similar PSO-based LSTM deep network optimization, and
value to the starting value is identified somewhere ORELM based error correction. They found that their
within the graph. There can be multiple instances model has a better prediction accuracy than other deep
where the value is similar to the starting point. The learning methods and single models they used. [9] on
time difference between similar values can create a the other hand, used SVM regression to predict future
constant time interval, in which a similar value will stocks. They concluded that for future predictions
start to appear after that constant time. The other ranging to 22 days ahead’s error rate is of an
popular term is seasonality. Seasonality occurs due to acceptable range. [10] used both sentiment analysis
periodic fluctuations within the graph. In this case, it and data mining. They first check the polarity of news
may or may not indicate a time interval with the articles of a certain company, then combine that with
highest activity. Stationary is a term used when the the company’s history to predict the stock prices in the
mean and variance of the data do not undergo changes future which improves the accuracy up to 89%.
over periods. However, the time-series graph won’t be
stationary if the variables are ever-changing. In order
to check whether the graph is stationary or not, the B. Previous Works on Time Series Analysis
Dickey-Fuller test is used. If the result of the test is 0,
then the graph is stationary. Else, it would mean that There are several previous research utilizing time
the graph is still changing. series analysis where different time series models are
used to acquire their results. [11] One of these models
is the Holt-Winters Multiplicative method which was
A. Previous Works on Analyzing the Stock Market the research conducted by Ryan Miller, Harrison
Schwarz, and Ismael S. concluded that out of the three
The stock market or stock exchange is defined by different models they used, Holt-Winters
[5] as a “collection of markets and exchanges where Multiplicative has the highest accuracy.
regular activities of buying, selling, and insurance of
However, that isn’t to say that the other methods
shares of publicly-held companies take place”. The
are not worth investigating as [12] proved in their
prices of the stock or share are always changing as they
research with the “ARIMA” model comparison.
correlate directly with the company the share belongs
Through several tests, they have concluded that all
to. This causes a situation where the price of a share
“ARIMA” models in a variety of sectors have a
can either rise or drop dramatically. This volatility can
minimum of 85% accuracy, with the fast-moving
be dangerous for the whole market. [6] mentions that
consumer goods section having the highest score and
when a major marketplace, US in their paper, it can
automobile and banking having the lowest. Although

93 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

the change in sectors does affect their accuracy, the taken from [23]. Exponential smoothing is similar to
results do not differ too much from one another. [13] moving average with the difference being the added
successfully used “SARIMA” to predict temperatures weight as a different weight will be assigned to each
in Nanjing. They concluded that the “SARIMA” observation.
model they used predicts future temperatures with
Fig. 3 shows the general mathematical expression
acceptable accuracy. Since the temperature in Nanjing
of exponential smoothing. With alpha being the
correlates to the seasons, it is a perfect fit to use
smoothing factor that determines the weight changes
“SARIMA”. On the other hand, [14] shows that
and the value is between 0 and 1, xt is the observed
“ARIMA” does not always result in high accuracy
value from t period, yt-1 is the previous period forecast
models since it uses many uncertain parameters. Holt-
demands.
Winters is easy to use, generally performs well in
many circumstances, and recommends that the horizon
prediction will not exceed the seasonal cycle. E. Double Exponential Smoothing
Finally, the last model to be used is the exponential
smoothing model. In [15], exponential smoothing is
used to minimize errors made within forecasting.
However, there are many types of exponential
Fig. 4 Double Exponential Smoothing Equation
smoothing which each perform differently from one
another. [16] concluded in their research that based on
MAPE, HWT outperforms the others. The double exponential smoothing equation is
based on [24]. Double exponential smoothing is a
technique used when there is a trend in the time-series
graph. This technique, as the name states, calls the use
C. Moving Average
of recursive in exponential smoothing twice. In this
equation, alpha is the data smoothing factor, xt is the
observed value from t period, beta is the trend
smoothing factor, bt is the trend estimation at time t,
Fig. 2 Moving Average Equation and yt represents the smoothed value for time t

The moving average equation is based on [22]. F. “SARIMA” Model


Moving average model is the naivest model as this
model only states the mean of all observations. It is the The seasonal autoregressive integrated moving
simplest model and it works well as a starting point. average model (“SARIMA”) is a combination of
Not only that, but it can also act as a way to identify complex models that can model time series analysis
trends in the data. Fig.2 shows the equation used for with non-stationary properties and seasonality. It
the moving average model where An is the data point consists of an autoregression model, a moving average
in the nth period. model, order of integration, and seasonality. We have
to apply changes to our model to remove seasonality
and non-stationary behaviors.
D. Simple Exponential Smoothing

III. METHODOLOGY
Fig. 3 Exponential Smoothing Equation

A. Workflow
The simple exponential smoothing equation was

94 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Fig. 5 Workflow

Our group’s workflow will be divided into 4 major The dataset will have to undergo pre-processing.
sections, each focusing on the 4 major steps required Data columns deemed unnecessary to the process will
to gain the final results. The first step, preprocessing, be dropped. Some columns will also undergo data type
deals with preparing and modifying the dataset before changes. Afterwards the data is grouped by months
being used for the model. The second step, and the results are the mean value of each month. The
Exploratory data, is focused on exploring the different resulting table will be similar to the table shown in
aspects of the dataset. Our group will analyze it based Table 2.
on 3 methods which are moving average, exponential
smoothing, and Dicky-Fuller test. The third step is pre-
modeling which is used to prepare the data and identify
key requirements such as seasonality before fitting the
dataset into the model. Finally, the last step will be
used for identifying the most effective parameters for
the model as well as the fitting and finally evaluation.

B. Dataset
Fig. 6 Plotting the Data

Table.1 Raw Dataset Sample Fig 6 shows the overview of the new dataset which
has been set to the mean of each month.

D. Exploratory Data Analysis: Moving Average

Our dataset consists of 2518 days of stock market


prices taken from the website [17] Kaggle. Since we
are trying to predict the closing stock price during each
day, we will be using the close column as our main
value.

Fig. 7 Moving Average


C. Preprocessing
The graph above shows our data with moving
average equations applied. We have concluded that the
Table. 2 Dataset Sample 5-month smoothing would be sufficient. After the
smoothing process using the moving average model,
the graph shows trends which means that the dataset

95 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

that we use is suited to be applied to a time series


model.

E. Exploratory Data Analysis: Simple Exponential


Smoothing

Fig. 10 Dickey - Fuller Plot

As seen in Fig 10, the Dicky-Fuller value is close


to 0, This means that the graph is stationary, which is
exactly what we were looking for. Also seen in the
Fig. 8 Simple Exponential Smoothing autocorrelation and partial autocorrelation graph is
that there is a significant peak in lag 12. This means
that the 12th month will be perfect for using in making
Fig. 8 shows the exponential smoothing of the data predictions since it has the most significant effect.
(Apple stocks) over the years. The graph is used to
ease viewing of the dataset.
H. Data Separation
F. Exploratory Data Analysis: Double Exponential
Smoothing6 The current data is divided into testing data and
training data. The testing data will be composed of the
last 12 months of the dataset. This will later be used
for predicting and comparing the predicted 1 year and
the actual data.

I. Seasonal Decompose

Fig. 9 Double Exponential Smoothing

Double exponential smoothing is done to get


better-looking curves and as a general smoothing
method in which the smoothing provides short-term
forecasts when your data have a trend and do not have
a seasonal component.

G. Dickey - Fuller Plot


Fig. 11 Seasonal Decompose

Fig. 11 Shows that the data clearly has seasonality


to it from 2010-2019. Not only that but the trend graph
also shows that the value will continuously see an
increase over the year. Because of the presence of a

96 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

seasonal factor, the “SARIMA” model will be used


instead of the “ARIMA” counterpart.

J. “SARIMA” Model

Firstly, since the “SARIMA” model requires 7


parameters, 3 for the “ARIMA” and 4 for the
seasonality, we are required to find the perfect
combination of these parameters. Because of that, we
Fig. 12 SARIMAX(1,1,1)(1,1,1,12) Prediction Graph
have created a loop in which the combination of
parameters used in the “SARIMA” model is constantly
changed and the AIC value will be viewed. However,
the value of the period of seasons will be set to 12 since
it represents the 12 months in a year. We have
conducted 3 different range tests which are (0,2), (0,3),
and lastly (0,5). From the three range tests, we have
taken the model with the lowest AIC value from each
range. These models are to be used in predicting the
next year’s Apple stock market. Afterward, the 3
models are fitted one by one and tested for prediction
with the testing data. The MAPE and RMSE values are
Fig. 13 SARIMAX(1,0,1)(2,2,2,12) Prediction Graph
also taken at this part of the process.

IV. RESULTS AND DISCUSSION

Table. 3 “SARIMA” Model Evaluation

3 models are created from different ranges based Fig. 14 SARIMAX(0,0,0)(4,4,0,12) Prediction Graph
on the lowest AIC score which are then used for
predicting the next 1 year of the Apple stock market. Fig.12, fig.13, and fig.14 respectively show the
Out of the three the SARIMAX(0,0,0)(4,4,0,12) prediction graph of each “SARIMA” model. The blue
despite having the lowest AIC, BIC, and HQIC scores, line represents the actual stock closing prices on each
has the highest MAPE percentage and RMSE. Instead month where else the orange line represents each
the SARIMAX (1,0,1)(2,2,2,12) model performs the models’ graph prediction. On all three graphs, it shows
that all three models are not able to predict the apple
best among the three models used. However, despite
stock with high accuracy. Each somewhat follows the
that, all three models have similar MAPE and RMSE actual graph closely in the first few months but then
scores and are all considered mediocre at best. deviates afterward.

V. CONCLUSION
In this study, we have attempted to predict the
Apple stock market by using the “SARIMA” time

97 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

series model. The “SARIMA” models used have [6] Z. Su, T. Fang and L. Yin, "Understanding stock market
volatility: What is the role of U.S. uncertainty?", The North
mediocre performance with a MAPE value around
American Journal of Economics and Finance, vol. 48, pp.
36,05%, and RMSE 112,21. The model performs as 582-590, 2019. doi: 10.1016/j.najef.2018.07.014.
intended, however, during the multiple tests we did, it
[7] A. Elliot, and C. Hua Hsu, “Time Series Prediction: Predicting
can never predict the stock value accurately.
Stock Price”, 2017. doi: https://arxiv.org/abs/1710.05751
We conclude that predicting stock (in this case the
[8] H. Liu and Z. Long, "An improved deep learning model for
Apple stock) through “SARIMA” is ineffective. It is predicting stock market price time series", Digital Signal
not because a “SARIMA” model is a defective model Processing, vol. 102, p. 102741, 2020. doi:
but this is due to the volatile nature of the Apple stock 10.1016/j.dsp.2020.102741.
itself as they can have drastic changes at unpredictable [9] P. Meesad and R. Rasel, "Predicting stock market price using
times. Looking at the earlier graph we can see that the support vector regression", 2013 International Conference
“SARIMA” model has successfully predicted the on Informatics, Electronics and Vision (ICIEV), 2013. doi:
10.1109/iciev.2013.6572570.
Apple stock market prices in the first few months but
the prediction went off in the last few months due to [10] A. Khedr, S.E.Salama and N. Yaseen, "Predicting Stock Market
the number of metrics involved when determining the Behavior using Data Mining Technique and News Sentiment
Analysis", International Journal of Intelligent Systems and
value of the Apple stock itself.
Applications, vol. 9, no. 7, pp. 22-30, 2017. doi:
Further prove to show that the “SARIMA” model 10.5815/ijisa.2017.07.03

is not a defective model is the research done in [13] [11] R. Miller, H. Schwarz and I. Talke, "Forecasting Sports
where they were able to successfully predict the future Popularity: Application of Time Series Analysis", Academic
temperature of Nanjing, It nature of a district’s Journal of Interdisciplinary Studies, vol. 6, no. 2, pp. 75-82,
2017. doi: 10.1515/ajis-2017-0009.
temperature is easier to predict due to it being heavily
influenced by the current season where else stock [12] P. Mondal, L. Shit and S. Goswami, "Study of Effectiveness
markets are far more volatile in nature and of Time Series Modeling (Arima) in Forecasting Stock
Prices", International Journal of Computer Science,
unpredictable at times.
Engineering and Applications, vol. 4, no. 2, pp. 13-29, 2014.
In the end, the “SARIMA” model is a great doi: 10.5121/ijcsea.2014.4202.

statistical prediction model to predict data that has a [13] P. Chen, A. Niu, D. Liu, W. Jiang and B. Ma, "Time Series
distinct seasonality compared to something as volatile Forecasting of Temperatures using SARIMA: An Example
as the Apple stock market in this case. from Nanjing", IOP Conference Series: Materials Science
and Engineering, vol. 394, p. 052024, 2018. doi:
10.1088/1757-899x/394/5/052024.

[14] C.P.D. Veiga, C.R.P.D Veiga, A. Catapan, U. Tortato and


VI. REFERENCES W.V.D. Silva, “Demanding forecasting in food retail: a
comparison between the HoltWinters and ARIMA models”.
Wseas.us, 2021. [Online]. Available:
[1] Wang, J. Wang, Z. Zhang and S. Guo, "Stock index forecasting http://www.wseas.us/journal/pdf/economics/2014/a085707-
based on a hybrid model", Omega, vol. 40, no. 6, pp. 758- 276.pdf.
766, 2012. doi: 10.1016/j.omega.2011.07.008.
[15] N. Nurhamidah, N. Nusyirwan and A. Faisol, "Forecasting
[2] "Prospect Capital Corporation (PSEC)", Finance.yahoo.com, seasonal time series data using the holt-winters exponential
2021. [Online]. Available: smoothing method of additive models", Jurnal Matematika
https://finance.yahoo.com/quote/PSEC/. Integratif, vol. 16, no. 2, p. 151, 2020. doi:
10.24198/jmi.v16.n2.29293.151-157
[3] P. Pai and C. Lin, "A hybrid ARIMA and support vector
machines model in stock price forecasting", Omega, vol. 33, [16] N.A.A. Jalil, M. Ahmad, and N. Mohamed, “Electricity load
no. 6, pp. 497-505, 2005. doi: 10.1016/j.omega.2004.07.024. demand and forecasting using exponential smoothing
methods”, World Applied Sciences Journal 22(11), pp.
[4] K. Chng,”Stock Prediction Using ARIMA”, MATLAB Central 1540-1543, 2013. doi: 10.5829/idosi.wasj.2013.22.11.2891
File Exchange.
[17] "Apple (AAPL) Historical Stock Data", Kaggle.com, 2021.
[5] "Stock Market | Investopedia", Investopedia, 2021. [Online]. [Online]. Available:
https://www.investopedia.com/terms/s/stockmarket.asp. https://www.kaggle.com/tarunpaparaju/apple-aapl-
historical-stock-data.

98 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

[18] M. Peixeiro, "The Complete Guide to Time Series Analysis


and Forecasting", Medium, 2021. [Online]. Available:
https://towardsdatascience.com/the-complete-guide-to-time-
series-analysis-and-forecasting-70d476bfe775.

[19] B. Billah, M. King, R. Snyder and A. Koehler, "Exponential


smoothing model selection for forecasting", International
Journal of Forecasting, vol. 22, no. 2, pp. 239-247, 2006.
doi: 10.1016/j.ijforecast.2005.08.002.

[20] V. Software and V. Software, "Vose Software",


Vosesoftware.com, 2021. [Online]. Available:
https://www.vosesoftware.com/riskwiki/Comparingfittedmo
delsusingtheSICHQICorAICinformationcritereon.php.

[21] D. Montgomery, M. Kulahci and C. Jennings, Introduction to


time series analysis and forecasting.

[22] M. Swari, M. Qusyairi, E. Mandyartha and H. Wahanani,


"Business Intelligence System using Simple Moving
Average Method (Case Study : Sales Medical Equipment at
PT. Semangat Sejahtera Bersama)", Journal of Physics:
Conference Series, vol. 1899, no. 1, p. 012121, 2021. doi:
10.1088/1742-6596/1899/1/012121.

[23] E. Ostertagová and O. Ostertag, "Forecasting using simple


exponential smoothing method", Acta Electrotechnica et
Informatica, vol. 12, no. 3, 2012. doi: 10.2478/v10198-012-
0034-2.

[24] F. Sidqi and I. Sumitra, "Forecasting Product Selling Using


Single Exponential Smoothing and Double Exponential
Smoothing Methods", IOP Conference Series: Materials
Science and Engineering, vol. 662, p. 032031, 2019. doi:
10.1088/1757-899x/662/3/032031.

99 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Sentiment Analysis using SVM and Naïve Bayes


Classifiers on Restaurant Review Dataset
Jason Cornelius Sugitomo Nathaniel Kevin Nayra Jannatri Derwin Suhartono
Computer Science Department Computer Science Department Computer Science Department Computer Science Department
School of Computer Science School of Computer Science School of Computer Science School of Computer Science
Bina Nusantara University Bina Nusantara University Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia Jakarta, Indonesia Jakarta, Indonesia Jakarta, Indonesia
[email protected] [email protected] [email protected] [email protected]

Abstract—Consumer reviews on the food and services of a Sentiment Analysis, having another name of Opinion
restaurant is a significant thing to monitor for restaurant Mining, is a technique that was used in order to identify
businesses. Sentiment Analysis, having another name of Opinion people’s opinions and attitudes towards certain subjects. These
Mining, is a technique that was used in order to identify people’s subjects, or entities, may refer to topics or individuals. A
opinions and attitudes towards certain subjects, and the most person’s opinions, or sentiments, on a subject may be positive,
widely used application of sentiment analysis is analyzing neutral, or negative. Sentiment analysis technology is very
consumer reviews of their products and services. This paper will beneficial for organizations and businesses as it allows them to
assess sentiment analysis’ performance with SVM and Naïve
Bayes classifiers on a dataset of restaurant reviews. A grid search
understand customer needs and monitor the reputation of their
with different hyperparameters of the classifiers and feature products. In businesses, the most widely used application of
selection methods is done to compare their effects on performance. sentiment analysis is analyzing consumer reviews of their
Each model will be evaluated based on accuracy, F1 score, and services and goods.
confusion matrix. The trained models can be further finetuned to
aid restaurant businesses in tracking their business performance Sentiment analysis is rooted on a classification process.
and reputation. There are five main types of sentiment classification problems
in sentiment analysis: document-level, comparative, aspect-
Keywords—Sentiment Analysis, Restaurant reviews, Sentiment based, sentiment lexicon acquisition, and sentence-level [2].
Classification, ML approach, Naïve Bayes, Support Vector Machines Document-level sentiment analysis is the most basic of
sentiment analysis, which makes it the most suitable for
I. INTRODUCTION analyzing business reviews. It expects that the document as a
In this digital age, it has become easier for people to post whole contains one opinion on the subject, which is the case for
reviews or read other people’s reviews on restaurants. Many reviews. The techniques of Sentiment classification can be
have developed the habit of reading a restaurant’s reviews differentiated into three: lexicon-based, machine learning, and
before visiting the site. According to a 2019 industry report by hybrid, which combines the previous two [3]. A study on
Toast, Inc., 35% of guests and 49% of restaurateurs choose their sentiment analysis of polarizing movie reviews have shown that
restaurants based on online reviews compared to other criteria classification using ML techniques is the most successful, with
of decision-making [1], shown in Fig. 1. Therefore, consumer the SVM algorithm being the most superior [4].
reviews are of significant interest to restaurant businesses as This paper aspires to assess sentiment analysis performance
their reviews can affect the restaurant’s reputation and, using machine learning approaches using restaurant review
potentially, their profits. It is important for restaurants to data. The trained model can be useful for restaurants to monitor
monitor their consumer’s sentiments on their food and services. their consumer’s sentiments, or even used by other types of
businesses to monitor product reputation. We believe that
TABLE I. METHODS FOR CHOOSING RESTAURANTS
sentiment analysis can measure a person’s true sentiments in
How to choose restaurants Restaurateurs Guests their review more accurately than any existing rating systems
Online reviews 49% 35% (e.g., five-star rating, binary rating, etc.).
Restaurant social media 33% 10%
Our paper will explain the process of performing sentiment
Facebook 38% 28%
Restaurant website 30% 19%
analysis with a dataset of 1000 restaurant reviews in English
Instagram 22% 8%
retrieved from Kaggle. The chosen method of sentiment
Online articles 12% 21% analysis is to use the ML approach, more specifically using the
Consumer ordering platforms 17% 12% SVM and Naïve Bayes classifiers. Furthermore, the
performance of the classifiers combined with various feature
Fig. 1. A part of the results of a 2019 industry report by Toast, Inc. that shows
some ways in which diners choose which restaurants to dine in.

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

100 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

selection methods will be compared in order to obtain the object, adverb, compound noun). Some other important
optimal models. preprocessing steps are stop words removal, stemming, or
lemmatization.
Our reason for using SVM and Naïve Bayes is because of
the popular usage of those algorithms in research of sentiment In feature selection, representative features are selected from
analysis. However, it is still quite uncertain which of them text to improve the sentiment classification step. By removing
perform better. An example can be seen from one research [5] irrelevant features, sentiment classification can become more
which used both Naïve Bayes and SVM to figure out people’s accurate and reduced running times of learning algorithms can
opinion on Twitter before doing polarity analysis, but it is still be achieved [11]. Some commonly used feature selection
undecided which one of the algorithms is the best in their methods are Document Frequency, Relief-F Algorithm, CHI
conclusion. Therefore, we use both SVM and Naïve Bayes to Statistic, Gain Ratio, and Information Gain.
find out people's opinions regarding the restaurant from the
reviews in the Kaggle dataset. In sentiment classification, the sentiment classifier model
can be developed by machine learning algorithms. Commonly
Previous studies and results of sentiment analysis on product used machine learning algorithms for training of sentiment
reviews have been successful, so we believe that sentiment classifiers are Naïve Bayes and SVMs, with the latter known to
analysis is a suitable tool to aid in classifying restaurant review have the better performance. An example of this would be in
sentiments. analyzing textual reviews such as movie reviews [12]. While
II. LITERATURE REVIEW this is usually true, some cases prove NB to have better
performance [13]. This means that both might have advantage
A. Sentiment Analysis System Architecture in their accuracy at some aspect. Thus, both are usually used to
Fig. 2 shows a general sentiment analysis architecture with find out which one has the better accuracy when doing
the following steps: data collection, data preprocessing, feature sentiment analysis to make predictions of sentiment.
selection, sentiment classification, and evaluation.
Lastly, evaluation of sentiment classification results is done.
During training, results must be measured on their accuracy.
This is done to ensure that the training of the model is properly
done with accurate or near-accurate results.

B. Related Works
Various research on sentiment analysis over the years have
used data sources such as chats, tweets, newspapers, and photos
[14]. More recent sentiment analysis research focused on the
online domain. Online text has evolved drastically from formal
written text, so sentiment analysis methods must always adapt
to the nature of online text. For example, capitalization or
Fig. 2. The general architecture of sentiment analysis showing its workflow. punctuation in online text can be a sign of seriousness or strong
opinions, so methods could be adapted to include those as
features in classification [15].

There are various methods for data collection. One of the A study [16] found that sentiment analysis was able to be
most common methods is the use of API’s (application used in businesses such as restaurants. The research showed that
programming interface). A study on sentiment analysis of sentiment analysis allows people who use it to understand what
Twitter posts uses the Twitter Streaming API to retrieve the data they need to improve according to the analysis’ outcome.
[6]. Another sentiment analysis study on restaurant reviews use Sentiment analysis also proved to be very flexible, allowing
the AYLIEN Text Analysis API [7]. people to use it in businesses of all sizes due to its flexibility.
The results of another research [17] found out that people are
The data preprocessing step significantly impacts the influenced by five attributes of restaurants, namely food,
accuracy and performance of NLP systems [8]. The process of service, ambience, price, and context. Further research finds
tokenization breaks up a stream of text to pieces known as that food, service, and context affect the reviews made by
tokens, which could contain either words, terms, or even customers when compared to ambience and price.
symbols, and a typical tokenizer splits tokens based on
whitespace characters or marks between words [9]. The A research [18] found that sentiment analysis achieved
normalization process helps to replace abbreviations, or outstanding results with the SVM classifier. The research found
microtext (e.g., OTW, WDYM, w8), into their actual meanings out that sentiment analysis that was done using the SVM
(e.g., on the way, what do you mean, wait) [10]. In POS tagging, classifier was able to reach a high accuracy when applied to the
each token is given their part-of-speech tags (e.g., subject, dataset that they have used, reaching 94.56%. Another research

101 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

[19] supplements this as they have found that SVM produces second column is ‘Liked’ which contains a Boolean value as the
good results in the dataset that they have used. The results of sentiment label (0 = negative, 1 = positive).
the experiment showed that the classifier reached an accuracy
of 88.906% with a lambda setting of 0.0003. The data also TABLE II. FIRST FIVE ROWS OF DATASET
showed that the results might help customers choose their Review Liked
favorite cuisine as well as giving restaurants their advantages Wow... Loved this place. 1
and shortages or disadvantages. This was supplemented by Crust is not good. 0
another research [20] that found that SVM’s performance can Not tasty and the texture was just nasty. 0
be tuned even more in order to make the sentiment analysis Stopped by during the late May bank holiday off Rick Steve 1
better and more accurate. The research found out that the best recommendation and loved it.
way to tune the classifier was to use the grid search technique The selection on the menu was great and so were the prices. 1
as it was capable of increasing the performance of the classifier, Now I am getting angry and I want my damn pho. 0
albeit with uncertain accuracy. Honeslty it didn't taste THAT fresh.) 0

Another paper shows the high results of using Naïve Bayes Fig. 3. A sample of the first five reviews in the dataset.
classifier [21]. The results of the research showed that sentiment
analysis was able to be done using Naive Bayes which found out
that high user evaluation relates to larger average score of
constructive evaluation and low user evaluation relating to larger
average score of negative evaluation. The research also found
that reviews made by people can be influenced by some factors,
such as the location of the person making the review. In one
study on movie review datasets, both Multinomial Naïve Bayes
and Bernoulli Naïve Bayes score very high in accuracy, with
Multinomial Naïve Bayes having an accuracy of 88.5% [22].
Another study has shown similarly high accuracy (above 80%)
for using Naïve Bayes on movie reviews, but results in lower
accuracy for hotel reviews [23]. These two studies, along with
another research [24] shows Naïve Bayes having the highest
accuracy compared to the other classifiers in the respective
studies. A research has also shown that Naïve Bayes performs
with varying accuracies with different datasets, but SVM has
lower variation and higher accuracy [25]. It shows that sentiment
analysis accuracies vary based on context of the datasets, and the
efficacy of using Naïve Bayes for sentiment analysis should
continue to be investigated for other types of reviews, in this case
restaurant reviews.
Different feature selection and extraction methods have been
compared and evaluated. Using Chi Square in feature selection
has resulted in increased speed of computation time but decrease
in system performance in one study [26]. A research [27] shows
that feature extraction methods are suitable for different ML
classifiers. POS is best suited for SVM and Naïve Bayes while
Hass tagging is best with Random Forest and linear regression.
Another study proposed combining information gain and DF
thresholding for feature selection which results in high testing
accuracy [28].

III. METHODOLOGY
A. Data Collection
The data used is a dataset from Kaggle of 1,000 restaurant
reviews in English with equal amounts of positive and negative
reviews. The dataset consists of 2 columns; the first column
being ‘Review’ which contains a string of the review, and the
Fig. 4. The workflow showing the sentiment analysis process.

102 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

B. Data Preprocessing document as shown on the TF part of the name that stands for
For the data preprocessing, we will use the four steps, Term Frequency.
namely lowercase conversion, tokenization, stop words It can be computed with this formula :
removal, and stemming.
Tokenization refers to the step in which text is turned into
“tokens” before being changed into vectors. Tokens are
meaningful parts of the text, such as words or phrases. The text
is broken into tokens by taking only alphanumeric characters
and leaving the non-alphanumeric characters.
Stop words refers to terms that occurs frequently inside the IDF or Inverse Document Frequency prioritizes the rarely
text that is not important or related to the data that we want. As occurring words in the text and is calculated using the following
such, stop words are generally expected to not be relevant for formula:
the process of text classification, so they are not included. Some
stop words that will be removed are shown in Fig. 5.
Stemming is the name of the process in which the root word log "
of a derived work is taken. This process takes the language into
account since this process has algorithms that are specific to
some languages.
The TF-IDF score is counted using the following formula:
Lowercase conversion is the process of converting
uppercase letters into lowercase letters. Uppercase and # $
lowercase letters are presumed to be identical. Because of this,
all uppercase letters in the text are converted into lowercase
letters before the classification process. 2) Hyperparameter Grid Search
In order to seek the optimal combination of hyperparameters
for our classifier, we will perform a grid search. The parameters
that will be tuned is the gram range, minimum document
frequency, maximum document frequency, and machine
learning model.

TABLE III. GRID SEARCH PARAMETERS


Parameter
Gram range (1,1), (1,2), (1,3)
Min DF 5, 10
Max DF 0.6, 0.8, 1.0
ML model SVM, MNB
Fig. 6. The parameters for our grid search.

The N-gram refers to the number of consecutive words or


tokens that can be considered a single feature. Unigram only
selects single tokens as features, while bigram and trigram can
Fig. 5. Example of stop words that are removed in data preprocessing. select up to pairs of adjacent tokens and three adjacent tokens
respectively. In one study [30], it is observed that classification
accuracy decreases as the gram range increases. However,
another study [31] shows that bigram and trigram can perform
C. Feature Selection better under different settings. The use of unigram, bigram, and
1) TF-IDF trigram in feature selection for restaurant reviews will be tested
For feature selection, we use the vectorizer TF-IDF (Term to compare their effects towards the function of the classifiers.
Frequency - Inverse Document Frequency). TF-IDF is an
unsupervised feature extraction algorithm that operates on the D. Sentiment Classification
level related to the words or vocabulary of a language [29]. The From a research [16], it was found out that there are
algorithm is rooted on the amount a term occurs inside of a advantages and disadvantages in the usage of both SVM and
Naive Bayes in sentiment classification. In the research, it is

103 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

stated that SVM is better to be used with large datasets while


Naive Bayes is more suited for small datasets. This statement
contradicts another research [32] which uses 2090 twitter
messages in its dataset and found out that Naive Bayes performs
better than SVM, which contradicts the research before.
Because of that, we will use both algorithms in our experiments
in order to see which one performs better for our dataset. The
Naïve Bayes algorithm we will use is Multinomial Naïve Bayes,
which is most suitable for sentiment analysis as it can be used
for term frequencies.

E. Evaluation methods
Because of the compact size of the available dataset, the
resulting models are evaluated by cross validation. For each
model, 700 reviews are selected nonspecifically for training and
Fig. 8. Word Cloud for positive reviews.
the leftover 300 reviews is used for testing. For each training
and testing dataset, there is a balance between constructive and In Fig. 8, the word cloud is for the positive words that appear
destructive evaluations. Each model will be evaluated based on in the positive sentiment reviews. As we can see, the words that
accuracy, F1 score, and confusion matrix. are used the most by reviewers are good, great, and food place.
There are also other positive words such as amazing, friendly,
best, and delicious. The positive reviews use words with strong
IV. RESULTS AND DISCUSSION positive sentiments.
During the processing of the data, word clouds of the dataset
are generated as an initial observation to see the most common
words in restaurant reviews.

Fig. 9. Word Cloud for negative reviews.

In Fig. 9, the word cloud is for the negative words that


Fig. 7. Word Cloud for combined reviews. appear in the negative sentiment reviews. As we can see, the
words that are used the most by reviewers in the word cloud are
In Fig. 7, the word cloud is the combination of the words not strong negative sentiment words, but they are mostly neutral
used in the negative and positive sentiment reviews. This word sentiment words such as service, food, and place. This is caused
cloud is used to find the most common words that are present by the fact that the negative reviews in our dataset are not as
in both positive and negative sentiment reviews in our dataset. straightforward as the positive ones, which directly praise the
As we can see from the word cloud, the most common words food and the restaurant experience. While this is the case, there
for restaurant reviews are time, great, place, good, food, service, are still negative sentiment words that can be identified in the
and back. There are also words like amazing, experience, word cloud such as bad, never, worst, and bland. However,
friendly, and many more. Based on this result, we can see that some positive sentiment words are also found, which can be
there seemed to be a lot of positive words even for the negative explained by ‘vague’ negative reviews. An example of a
sentiment reviews. This is perhaps because the negative negative review which contains positive sentiment words is
sentiment reviews are not written to be as direct as the positive “seems like a good quick place to grab a bite of some familiar
ones. pub food, but do yourself a favor and look elsewhere.”. This

104 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

contains positive sentiment words such as “good” and “quick”, To have a better understanding of the performance of each
but the overall sentiment of the review is negative. parameter, density distributions are plotted against accuracy for
better visualization.
After visualization with word clouds, the data is
preprocessed and fitted into a total of 36 models based on the
grid search. Each model has a unique combination of
hyperparameters. Fig. 10 shows a table of the results of all 36
models sorted by accuracy, then F1 score. The best model uses
a Naïve Bayes classifier which achieved a precision of 77.33%
and F1 score of 0.7792, with settings N-Gram = (1,3), Min DF
= 5, and Max DF = 1.0. Another observation from the table of
results is that Min DF = 5 is clustered at the top while Min DF
= 10 is clustered at the bottom of the results.

TABLE IV. MODEL RESULTS


N-Gram Min DF Max DF Model Accuracy F1 TN FN TP FP
(1, 3) 5 1.0 mnb 0.7733 0.7792 112 30 120 38
(1, 3) 5 0.6 mnb 0.7733 0.7733 116 34 116 34
(1, 2) 5 1.0 svm 0.7733 0.7671 120 38 112 30
(1, 3) 5 0.6 svm 0.7700 0.7596 122 41 109 28
(1, 2) 5 0.8 svm 0.7667 0.7619 118 38 112 32
(1, 3) 5 0.8 svm 0.7667 0.7482 126 46 104 24
(1, 2) 5 0.6 mnb 0.7633 0.7657 113 34 116 37
(1, 3) 5 1.0 svm 0.7633 0.7509 122 43 107 28
(1, 1) 5 0.6 mnb 0.7567 0.7653 108 31 119 42
(1, 2) 10 0.6 svm 0.7567 0.7245 131 54 96 19
(1, 3) 5 0.8 mnb 0.7500 0.7387 119 44 106 31
(1, 2) 5 0.6 svm 0.7500 0.7350 121 46 104 29
(1, 2) 5 1.0 mnb 0.7433 0.7556 104 31 119 46
(1, 1) 5 0.8 mnb 0.7433 0.7524 106 33 117 44
(1, 1) 10 1.0 svm 0.7433 0.7159 126 53 97 24
(1, 3) 10 0.8 svm 0.7433 0.7138 127 54 96 23
(1, 1) 5 1.0 mnb 0.7400 0.7383 112 40 110 38
(1, 2) 5 0.8 mnb 0.7400 0.7365 113 41 109 37
(1, 1) 5 0.6 svm 0.7400 0.7310 116 44 106 34
(1, 1) 10 0.8 svm 0.7400 0.7068 128 56 94 22
(1, 3) 10 1.0 mnb 0.7367 0.7393 109 38 112 41
(1, 3) 10 0.6 mnb 0.7333 0.7122 121 51 99 29
(1, 2) 10 1.0 svm 0.7300 0.7055 122 53 97 28
(1, 2) 10 0.8 svm 0.7233 0.7067 117 50 100 33
(1, 2) 10 1.0 mnb 0.7200 0.6934 121 55 95 29
(1, 1) 5 1.0 svm 0.7133 0.7152 106 42 108 44
(1, 1) 10 0.6 svm 0.7133 0.6861 120 56 94 30
(1, 3) 10 0.6 svm 0.7100 0.6692 125 62 88 25
(1, 3) 10 0.8 mnb 0.7067 0.6944 112 50 100 38
(1, 2) 10 0.6 mnb 0.7067 0.6923 113 51 99 37
(1, 1) 10 0.6 mnb 0.7067 0.6901 114 52 98 36
(1, 2) 10 0.8 mnb 0.7067 0.6879 115 53 97 35
(1, 1) 5 0.8 svm 0.7033 0.6983 108 47 103 42
(1, 3) 10 1.0 svm 0.6933 0.6849 108 50 100 42
(1, 1) 10 1.0 mnb 0.6900 0.6782 109 52 98 41
(1, 1) 10 0.8 mnb 0.6767 0.6255 122 69 81 28
Fig. 10. Table of results for 36 models with unique combination of hyperparameters.

105 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Fig. 11. Accuracy density distribution for SVM and MNB models. Fig. 14. Accuracy density distribution for different Max DF.

Fig. 11 compares the performance accuracy of the two


classifier models used, SVM and Naïve Bayes. The line for
SVM is located further right compared to the line for Naïve
Bayes, meaning that the models with SVM classifier can
achieve an overall greater accuracy. The accuracy of the
different gram ranges is compared in Fig. 12, in which bigram
and trigram seems to perform better than unigram. This may be
because features selected with bigram and trigram can be much
more specific and less ‘vague’, as it was an issue in the single-
words word cloud. Fig. 13 shows that minimum DF of 5 has the
better performance. This may be because using higher
minimum DF may result in omitting features that are actually
important for classification, which decreases accuracy. Finally,
Fig. 12. Accuracy density distribution for different gram ranges. Fig. 14 compares the accuracy of using different values for
maximum DF. The plotted density distribution does not really
show a clear pattern, and it may be influenced by other
variables. Overall, these are the settings that result in highest
performances for sentiment analysis of restaurant reviews.

Fig. 15 shows all the features selected by the best model (the
model on the top row of Fig. 10). Most of the features that are
selected are strong sentiment terms, such as “recommend thi
place” and “veri disappoint”. Some of the strong sentiment
terms are also specific to the topic of food and restaurants, such
as “delici” (delicious) and “tasteless”. This shows that the fitted
models are effective specifically for restaurant reviews, because
they can successfully select the sentiment terms that are strong
for restaurant reviews.

Fig. 13. Accuracy density distribution for different Min DF.

106 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

['-', '1', '10', '2', '3', '30', '5', 'absolut', 'also', 'alway', 'amaz', 'ambianc', 'ani',
'anoth', 'anytim', 'anytim soon', 'area', 'around', 'arriv', 'ask', 'atmospher',
'attent', 'authent', 'avoid', 'away', 'awesom', 'back', 'bacon', 'bad', 'bar', VI. REFERENCES
'bare', 'bathroom', 'beauti', 'becaus', 'beef', 'beer', 'befor', 'best', 'better', [1] Toast Inc., “Restaurant Success in 2019 Industry Report,” 2019.
'bit', 'bland', 'bread', 'breakfast', 'bring', 'buffet', 'burger', 'busi', 'came',
'check', 'chef', 'chicken', 'chip', 'clean', 'close', 'cold', 'come', 'come back', [2] R. Feldman, “Techniques and applications for sentiment analysis,”
'consid', 'cook', 'could', 'custom', 'custom servic', 'day', 'deal', 'decor', Commun. ACM, vol. 56, no. 4, pp. 82–89, Apr. 2013, doi:
'definit', 'delici', 'delight', 'dessert', 'dine', 'dinner', 'disappoint', 'dish', 10.1145/2436256.2436274.
'done', 'dri', 'drink', 'drive', 'dure', 'eat', 'eaten', 'egg', 'either', 'enjoy',
[3] D. Maynard and A. Funk, “Automatic Detection of Political
'enough', 'even', 'ever', 'everi', 'everyth', 'everyth wa', 'excel', 'expect',
Opinions in Tweets,” in 8th international conference on the
'experi', 'extrem', 'famili', 'fantast', 'far', 'fast', 'feel', 'feel like', 'felt', 'first',
semantic web, 2012, pp. 88–99, doi: 10.1007/978-3-642-25953-1_8.
'first time', 'fish', 'flavor', 'food', 'food servic', 'food wa', 'found', 'fresh', 'fri',
'friend', 'friendli', 'full', 'get', 'give', 'go', 'go back', 'good', 'good food', 'got', [4] M. Annett and G. Kondrak, “A Comparison of Sentiment Analysis
'great', 'great food', 'great place', 'great servic', 'ha', 'hand', 'happi', 'hard', Techniques: Polarizing Movie Blogs,” in Canadian AI 2008:
'heart', 'help', 'hi', 'hit', 'home', 'hope', 'horribl', 'hot', 'hour', 'huge', 'ice', Advances in Artificial Intelligence, 2008, pp. 25–35.
'impress', 'incred', 'insid', 'kept', 'know', 'lack', 'larg', 'last', 'leav', 'left', 'like',
'like thi', 'littl', 'live', 'locat', 'look', 'lot', 'love', 'love thi', 'lunch', 'made', [5] A. L. Firmino Alves, C. de S. Baptista, A. A. Firmino, M. G. de
'make', 'manag', 'mani', 'may', 'meal', 'meat', 'mediocr', 'menu', 'minut', Oliveira, and A. C. de Paiva, “A Comparison of SVM Versus
'money', 'mouth', 'much', 'must', 'need', 'never', 'new', 'next', 'nice', 'night', Naive-Bayes Techniques for Sentiment Analysis in Tweets,” in
'noth', 'old', 'onc', 'one', 'onli', 'order', 'outsid', 'overal', 'overpr', 'owner', Proceedings of the 20th Brazilian Symposium on Multimedia and
'pay', 'peopl', 'perfect', 'pho', 'pizza', 'place', 'pleas', 'poor', 'portion', the Web - WebMedia ’14, 2014, pp. 123–130, doi:
'potato', 'pretti', 'price', 'probabl', 'qualiti', 'quick', 'quit', 'rare', 'real', 'realli', 10.1145/2664551.2664561.
'realli good', 'reason', 'recommend', 'recommend thi', 'recommend thi [6] M. S. Omar, A. Njeru, S. Paracha, M. Wannous, and S. Yi, “Mining
place', 'restaur', 'return', 'review', 'right', 'roll', 'rude', 'said', 'salad', tweets for education reforms,” in 2017 International Conference on
'sandwich', 'sat', 'sauc', 'say', 'seafood', 'seat', 'see', 'select', 'serious', 'serv', Applied System Innovation (ICASI), May 2017, pp. 416–419, doi:
'server', 'server wa', 'servic', 'servic wa', 'set', 'shrimp', 'sick', 'side', 'sinc', 10.1109/ICASI.2017.7988441.
'slow', 'small', 'soon', 'special', 'spici', 'spot', 'staff', 'star', 'stay', 'steak', 'still',
'suck', 'super', 'sure', 'sushi', 'sweet', 'tabl', 'taco', 'take', 'talk', 'tast', [7] M. R. D. Ching and R. de Dios Bulos, “Improving Restaurants’
'tasteless', 'tasti', 'tell', 'tender', 'terribl', 'thai', 'thi', 'thi one', 'thi place', 'thi Business Performance Using Yelp Data Sets through Sentiment
wa', 'thing', 'think', 'thought', 'time', 'took', 'total', 'town', 'treat', 'tri', 'trip', Analysis,” in Proceedings of the 2019 3rd International Conference
'twice', 'two', 'us', 'use', 'vega', 'veri', 'veri disappoint', 'veri good', 'visit', 'wa', on E-commerce, E-Business and E-Government - ICEEG 2019,
'wa amaz', 'wa delici', 'wa first', 'wa good', 'wa great', 'wa pretti', 'wa terribl', 2019, pp. 62–67, doi: 10.1145/3340017.3340018.
'wa veri', 'wait', 'waiter', 'waitress', 'waitress wa', 'want', 'warm', 'wast',
[8] J. Camacho-Collados and M. T. Pilehvar, “On the Role of Text
'watch', 'way', 'well', 'went', 'wine', 'wonder', 'worst', 'worth', 'would', Preprocessing in Neural Network Architectures: An Evaluation
'wrong', 'year'] Study on Text Categorization and Sentiment Analysis,” in
Proceedings of the 2018 EMNLP Workshop BlackboxNLP:
Fig. 15. Features selected by the best model. Analyzing and Interpreting Neural Networks for NLP, 2018, pp. 40–
46, doi: 10.18653/v1/W18-5406.

[9] V. S and J. R, “Text Mining: open Source Tokenization Tools – An


Analysis,” Adv. Comput. Intell. An Int. J., vol. 3, no. 1, pp. 37–47,
V. CONCLUSION Jan. 2016, doi: 10.5121/acii.2016.3104.
This study was conducted to compare the capabilities of two [10] R. Satapathy, C. Guerreiro, I. Chaturvedi, and E. Cambria,
different machine learning classifiers for sentiment analysis of “Phonetic-Based Microtext Normalization for Twitter Sentiment
Analysis,” in 2017 IEEE International Conference on Data Mining
restaurant reviews. The two classifiers compared are SVM and Workshops (ICDMW), Nov. 2017, pp. 407–413, doi:
Naïve Bayes classifiers. Using grid search to assess different 10.1109/ICDMW.2017.59.
hyperparameter combinations for each classifier, a total of 36
[11] A. Sharma and S. Dey, “A comparative study of feature selection
models were fitted. Evaluation of these models shows that and machine learning techniques for sentiment analysis,” 2012, doi:
Naïve Bayes resulted in the single best model with the highest https://doi.org/10.1145/2401603.2401605.
precision of 77.33% and F1 score of 0.7792, but for the overall [12] N. Banik and M. Hasan Hafizur Rahman, “Evaluation of Naïve
performance, SVM slightly outperformed Naïve Bayes, also Bayes and Support Vector Machines on Bangla Textual Movie
reaching accuracies of up to 77%. Results also show that bigram Reviews,” in 2018 International Conference on Bangla Speech and
Language Processing (ICBSLP), Sep. 2018, pp. 1–6, doi:
and trigram result in better accuracy compared to unigram. 10.1109/ICBSLP.2018.8554497.
Omitting less features can also result in better performance.
Further research to compare these classifiers can be done by [13] D. A. Kristiyanti, A. H. Umam, M. Wahyudi, R. Amin, and L.
Marlinda, “Comparison of SVM &amp; Naïve Bayes Algorithm for
adding more parameters in the grid search, such as number of Sentiment Analysis Toward West Java Governor Candidate Period
features selected and length of features. Overall, the results 2018-2023 Based on Public Opinion on Twitter,” in 2018 6th
from this study show promising accuracy in sentiment analysis International Conference on Cyber and IT Service Management
(CITSM), Aug. 2018, pp. 1–6, doi: 10.1109/CITSM.2018.8674352.
of restaurant reviews, and the trained models can be further
finetuned to aid restaurant businesses in tracking their business [14] M. V. Mäntylä, D. Graziotin, and M. Kuutila, “The evolution of
sentiment analysis—A review of research topics, venues, and top
performance and reputation. cited papers,” Comput. Sci. Rev., vol. 27, 2018, doi:
10.1016/j.cosrev.2017.10.002.

107 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

[15] M. Taboada, “Sentiment Analysis: An Overview from Linguistics,” gallery: opinion extraction and semantic classification of product
Annu. Rev. Linguist., vol. 2, no. 1, 2016, doi: 10.1146/annurev- reviews,” in WWW ’03: Proceedings of the 12th international
linguistics-011415-040518. conference on World Wide Web, 2003, pp. 519–528, doi:
https://doi.org/10.1145/775152.775226.
[16] K. Kaviya, C. Roshini, V. Vaidhehi, and J. D. Sweetlin, “Sentiment
analysis for restaurant rating,” in 2017 IEEE International [32] A. Hasan, S. Moin, A. Karim, and S. Shamshirband, “Machine
Conference on Smart Technologies and Management for Learning-Based Sentiment Analysis for Twitter Accounts,” Math.
Computing, Communication, Controls, Energy and Materials Comput. Appl., vol. 23, no. 1, p. 11, Feb. 2018, doi:
(ICSTM), Aug. 2017, pp. 140–145, doi: 10.3390/mca23010011.
10.1109/ICSTM.2017.8089140.

[17] Q. Gan, B. H. Ferns, Y. Yu, and L. Jin, “A Text Mining and


Multidimensional Sentiment Analysis of Online Restaurant
Reviews,” J. Qual. Assur. Hosp. Tour., vol. 18, no. 4, pp. 465–492,
Oct. 2017, doi: 10.1080/1528008X.2016.1250243.

[18] A. Krishna, V. Akhilesh, A. Aich, and C. Hegde, “Sentiment


Analysis of Restaurant Reviews Using Machine Learning
Techniques,” 2019, pp. 687–696.

[19] B. Yu, J. Zhou, Y. Zhang, and Y. Cao, “Identifying restaurant


features via sentiment analysis on yelp reviews,” 2017.

[20] M. Ahmad, S. Aftab, M. S. Bashir, N. Hameed, I. Ali, and Z.


Nawaz, “SVM optimization for sentiment analysis,” Int. J. Adv.
Comput. Sci. Appl., vol. 9, no. 4, 2018, doi:
https://doi.org/10.14569/IJACSA.2018.090455.

[21] A. Micu, A.-E. Micu, M. Geru, and L. Radu, “Analyzing user


sentiment in social media: Implications for online marketing
strategy,” Psychol. Mark., vol. 34, no. 12, 2017, doi:
https://doi.org/10.1002/mar.21049.

[22] A. Rahman and M. S. Hossen, “Sentiment Analysis on Movie


Review Data Using Machine Learning Approach,” Sep. 2019, doi:
10.1109/ICBSLP47725.2019.201470.

[23] L. Dey, S. Chakraborty, A. Biswas, B. Bose, and S. Tiwari,


“Sentiment Analysis of Review Datasets Using Naïve Bayes‘ and
K-NN Classifier,” Int. J. Inf. Eng. Electron. Bus., vol. 8, no. 4, 2016,
doi: 10.5815/ijieeb.2016.04.07.

[24] K. L. S. Kumar, J. Desai, and J. Majumdar, “Opinion mining and


sentiment analysis on online customer review,” 2016, doi:
10.1109/ICCIC.2016.7919584.

[25] T. K. Shivaprasad and J. Shetty, “Sentiment analysis of product


reviews: A review,” Mar. 2017, doi:
10.1109/ICICCT.2017.7975207.

[26] M. S. Mubarok, Adiwijaya, and M. D. Aldhi, “Aspect-based


sentiment analysis to review products using Naïve Bayes,” 2017, p.,
doi: 10.1063/1.4994463.

[27] N. K. Singh, D. S. Tomar, and A. K. Sangaiah, “Sentiment analysis:


a review and comparative analysis over social media,” J. Ambient
Intell. Humaniz. Comput., vol. 11, no. 1, 2020, doi: 10.1007/s12652-
018-0862-8.

[28] A. I. Pratiwi and Adiwijaya, “On the Feature Selection and


Classification Based on Information Gain for Document Sentiment
Analysis,” Appl. Comput. Intell. Soft Comput., vol. 2018, 2018, doi:
10.1155/2018/1407817.

[29] A. Madasu and S. Elango, “Efficient feature selection techniques for


sentiment analysis,” Multimed. Tools Appl., vol. 79, pp. 6313–6335,
2020, doi: https://doi.org/10.1007/s11042-019-08409-z.

[30] A. Tripathy, A. Agrawal, and S. K. Rath, “Classification of


sentiment reviews using n-gram machine learning approach,” Expert
Syst. Appl., vol. 57, pp. 117–126, 2016, doi:
https://doi.org/10.1016/j.eswa.2016.03.028.

[31] K. Dave, S. Lawrence, and D. M. Pennock, “Mining the peanut

108 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Developing An Automated Face Mask Detection


Using Computer Vision and Artificial Intelligence
1st Samuel Mahatmaputra Tedjojuwono
Business Information Systems Program, Information Systems 2nd Sheryl Livia Sulaiman
Department Business Information Systems Program, Information Systems
Faculty of Computing and Media, Bina Nusantara University Department
Jakarta, Indonesia 11480 Faculty of Computing and Media, Bina Nusantara University
[email protected] Jakarta, Indonesia 11480
[email protected]

Abstract—As the number of people affected by COVID- The increase of COVID-19 in Indonesia has put everyone
19 keeps on rising. Importance of wearing masks and washing at risk. Indonesia is also facing an economic recession due to
hands has been the most important protocol right now to people losing their jobs and lifestyle change. According to
prevent the spread of COVID-19. As the pandemic has been The Jakarta Post, [3], Hospitals are also facing a crisis as they
going on for almost a year now, people have already started to
do not have enough room to accommodate people while
go around to public places whether it is to eat out, work, or
grocery shopping. Many people, however, have not been patients' conditions worsen as the ICUs were full. Face mask
wearing masks properly by only putting them below their nose detection systems have started to be implemented in Jakarta,
or putting it down until their chin. Hence, in this project a mask as written in [4] , Jakarta Smart City has started involving
detection system is made to detect people live time who are face mask detection in public.
wearing or not wearing a mask and can generate a business Since the rise of the COVID-19 pandemic, the face mask
intelligence report for the shop owner to be aware of the number detection system is one of the most effective solutions in
of people not wearing a mask per day. This system can detect monitoring a large crowd of people and keep track of the
the percentage of the mask is worn properly or not. The more number of people not wearing masks and promote personal
proper it is worn (full up to nose), the higher the percentage will
health protection. The distinctive facial recognition in
be. This system is useful in a pandemic like this as it is hard to
keep track of the number of people who are not wearing masks, mandatory face mask circumstances delivers a prospect for
especially in a big crowd or in a large space as one person not programmed documentation. The components used in this
wearing a mask can greatly affect others. project are computer vision and deep learning using CNN
which established excellent implementation in various areas,
such as object detection, image processing, and image
Keywords—Augmented Reality, User Experience, Furniture division.
Shop.

I. INTRODUCTION B. Scope
A. Background The project scope focuses on creating a face mask
COVID-19 is still ongoing from late 2019 until now and detection system and output using business intelligence that
is indefinite. With the increasing number of people affected shows the number of people who are wearing masks and who
by COVID-19, health is the number one most important thing are not in a day. The following scope below is covered on this
to be valued. The most effective prevention of COVID-19 is project:
to wear a mask and washing our hands. In the study led by 1. Create an automated face mask detection system
Japanese researchers [1], wearing a mask reduces 60% of the using python scripts.
infection that comes out of a contaminated individual. 2. Connect python with MySQL and store the number
With the high population in Indonesia, new cases are of people not wearing or wearing a mask and in a
highly increasing per day. Even with Large scale social database with corresponding columns (Timestamp,
restrictions, people are still going out. The city with the Subject, Percentage).
highest newest cases daily is Jakarta. Figure 1 shows the 3. Connect the database to business Intelligence and
statistics of new cases daily as of 8th March 2021. produce the output in business intelligence for an
easier view.

C. Aims and Benefits


This project aims to develop an automated mask
detection to detect and keep shop owners and security to track
and see the number of people wearing masks or not. Although
security can detect someone not wearing a mask by eye, it is
hard to detect someone or a group of crowds not wearing a
Figure 1: Indonesia COVID-19 Statistics [2]. mask in a large space and without security present on the spot.

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

109 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Moreover, if a person is infected with COVID-19, it is easier management. Nowadays, Face recognition algorithms are
to track where he/she has been for the last 2 weeks. With this used in CCTVs, digital cameras, and video conference
automated face mask detection, we can see through the systems to detect motion and face detection.
business intelligence which shows the number of people
wearing a mask or not in a day during his visit to the place
C. Convolutional Neural Network
where this system is implemented and see how many people
has probably got infected to COVID-19 because of this A Convolutional Neural Network is a subset of Deep
person. Learning which inputs an image and processes it and
This face mask detection system uses deep learning to classifies it under different categories [7]. The way CNN
detect a person with or without a mask. This system will be works is by doing recognition and classifying inside images
connected to a camera. from learned geometrics structures. the multi-layered
This automated face mask detection is beneficial for structure is effective in acquiring and calculating the estimate
supermarkets, shopping malls, restaurants, or any other of necessary features of structures in an image [8]. In the field
public places to obtain the following benefits: of computer vision, CNN is a widely and predominantly used
algorithm model for its great competency in the extraction of
1. Keep track of the number of people not wearing facial features. After a CNN is trained, the model then can
masks per day. identify and recognize images that it has been trained to
2. Providing an easier view of the statistic by classify as an object. for this thesis, the object being a face
implementing business intelligence. [9].
3. Maintain health protocol by reducing the risks of
III. PROBLEM ANALYSIS
people affected inside the place.
From the survey conducted, a total of 34 respondents has
D. Hypotheses responded to a total of 16 questions. The survey is titled
A set of hypotheses before commencing this project are "Survey for Developing an Automated Face Mask Detection
listed as below: during COVID-19 Pandemic" as this survey is made to
1. The implementation of automated face mask identify the existing problems surrounding the importance of
detection and deep learning can solve the problem wearing masks during COVID-19 and how frequent the
of monitoring and keeping track of a crowd of respondents have encountered people not wearing masks in
people not wearing masks. public. This survey can be used to find out whether automated
2. By connecting the automated face mask detection to face mask detection is found useful by the respondents. We
a database, the system will calculate and segregate can also find out from this survey how often the respondents
the number of people wearing a mask and not and go out in public places, or they have taken off their masks
store it in the database. when there is no security thus, increasing the need to develop
3. Implementing automated face mask detection help an automated face mask detection.
organizations to track and reduce the needed range
A. The respondents
to search potential new infected person during a day-
to-day scale by tracking the statistics given by this
automated face mask detection that projects how
many people are wearing a mask or not and project
a rough estimate of how many people could be
affected with COVID-19 in a particular place.
II. THEORETICAL FOUNDATION
A. Deep Learning Fig. 2. Respondents’ age, gender, occupation, and area.
Deep Learning is an element of machine learning that
imitates human action and way of thinking. Deep Learning The findings for the demographic questions show that
aims to produce alike interpretations as humans would by most of the respondents are students, age range from 19-29
analyzing data continuously. To accomplish this, Deep years old, and most of the respondents are females. The area
learning utilizes a multi-layered structure for calculations where most respondents live is in Indones Utara.
known as neural networks.
As written in [5], the computer collects information by B. Respondents Experience and Behavior
learning. The instruction of thoughts lets the computer grip With the need to collect answers and to find out the
complex information by organizing them out of smaller ones. existing problems, several questions are created that
highlights the respondents’ behavior during the COVID-19
pandemic and the existing problem. Some of the responses
B. Face Detection are as follow:
Face Detection is a computer innovation that is utilized
in various applications to detect and distinguish human faces
in images [6]. Face detection is utilized in biometrics,
frequently as a portion of (or along with) a facial recognition
framework. It is additionally utilized in video observation,
human-computer interface, and picture database

110 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

a day not wearing masks in public places and they also find
this system is helpful to help and maintain health protocols in
public places. From these two questions, it is noted that
developing and implementing this system is beneficial during
the COVID-19 outbreak.

Fig. 3. Respondents’ Experience and Behavior. D. Related Works


The focus of this literature review is to discuss the need
Regarding whether seeing someone not wearing a mask for automated face mask detection throughout the COVID-19
in public places, we can see that 32.4% of the respondents plague and the problems of people not wearing masks in
occasionally saw someone in public not wearing a mask and public places. While there have been many types of research
no security stops him/her, second and third place is a tie, with on wearing masks throughout the COVID-19 outbreak. There
26.5% of the respondents have very frequently and frequently are only a few types of research about automated face mask
seen people with no mask on, and the last place 14.7% of the detection. This topic is important due to the rise in the number
respondents rarely saw people with no mask on. This shows of COVID-19 cases daily and developing an automated face
that people occasionally and frequently saw someone without mask detection is useful during this period. The timeline of
a mask and no security did not stop him/her. This is the sources gathered in this literature review is from the year
dangerous and could spread the virus rapidly in a public 2020 – 2021 as COVID-19 starts from 2020. From this
place. It also shows that no one selected "never" which means literature review, we can gather information on the problems
the respondents have seen people not wearing a mask in a of people not wearing masks can greatly affect others,
public place at least once. especially in public places. Thus, rising the need for and
Respondents were also asked if “they have taken importance of developing an automated face mask detection.
their mask off in the event there are no other individuals Bosheng Qin and Dongxiao Li agreed that although
around” or “have taken their mask off when security isn’t wearing a face mask may help to prevent the increasing
around”, A percentage of 70.6% answered yes and 29.4% of spread of the COVID-19 virus, However, the efficiency of
our respondents answered no. Most of our respondents may facemasks in preventing the spread of the COVID-19 Virus
not be following the health protocols properly as the majority via airborne in public places has decreased, mostly due to
have taken of their masks when no security or no one is improper wear.
around. This is not a good habit as the virus may still be Thus, it has become a necessity to develop an automated
present in the air even though no one is around. face detection solution for mandatory public places that
When asked whether the respondents have experienced regulates face mask facemask-wearing conditions, which
“Not being permitted into a place without a mask” or “Being helps an individual’s health defense and also other people’s
told to wear a mask by another person, A massive percentage health [9].
of our respondents with 67.6% answered yes. The responses Another study focusing on developing a face mask
to this question show that most of the respondents detector by Mingjie Jiang and Xingqi Fan theorizes that
experienced this means that they are not following the health numerous public places necessitate consumers to enter the
protocols as told. building only if they wear masks. Because research relating
to face mask detection is limited, it has become very
important for computer vision to help society [10].
C. Respondents Opinion
While the study conducted by Sammy V. Militante, also
To find out whether developing this system would be confirmed that wearing face masks will efficiently block
useful, questions were asked to figure whether the COVID-19 viruses so that they cannot spread and reach
respondents would find developing an automated face mask human lungs via air, and face detection is a low-cost method
detection with an intelligent dashboard is important and to lessen the death rate and infection rate. Nonetheless, the
helpful. effectiveness of wearing facemasks in stopping the COVID-
19 virus spread in public places is reduced due to facemask
use, it is still important to develop an automated face mask
detection [8].
All three pieces of literature cited demonstrate similarity
with this project of developing a face mask detection which
is proved by Sammy V. Militante, who stated the
contributions for their project is as follow: Create a system
for categorizing face images using CNN and implement deep
learning methods to identify people in face mask-wearing
conditions [8].
Although the implementation is using different
Fig. 4. Respondents’ Opinion. architectures and methods, all three works of literature use
CNN to develop their face mask detection and are similar to
From the pie chart above, we can see that most of the this project as well and have the same goal which is to
respondents find an automated face mask detection very promote proper wear of mask during COVID-19.
important to detect and keep track of the number of people in

111 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

IV. PROPOSED SOLUTION Fig. 6. Use Case Diagram for Automated Face Mask Detection System -
Security/ Shop Owner/ Shop Manager.
A. System Design and User Interactions

An application prototype is designed for this automated


face mask detection system according to the user
requirements provided after interviewing 3 people with the
occupation of Security, Shop Manager and Shop Owner.
Use Case Diagram and Sequence Diagram is also
developed to illustrate the interactions between External
actors (Security/ Shop Owner/ Shop Manager and Internal
actor (System Analyst) with the system. Below is the current
development of system design:

Fig. 7. Use Case Diagram for Automated Face Mask Detection System -
System Analyst.

Fig. 8. Sequence Diagram for Automated Face Mask Detection System -


Security/ Shop Owner/ Shop Manager.

Fig. 9. Sequence Diagram for Automated Face Mask Detection System -


System Analyst.

B. Solution Implementation
The aim of testing the solution is to test whether the
system works properly before implementing it in public
Fig. 5. Application prototype design for automated face mask detection places. After this project is fully developed, this system will
system. be implemented in Jakarta, Indonesia.

Fig. 10. Detecting real-time video of one person and one printed image.

112 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

In figure 10, the face mask detector can detect two mask completely then the algorithm would show No
people and identify them whether they are wearing a mask Mask: 100%.
or not. The percentage of accuracy of how proper they are
wearing the mask is shown and the subject number (the
number of people present). For this experiment, we tried
with one person and an image that is printed to recreate
the illusion of a person present in front of the camera.

Fig.14. Database view for detecting real-time video with 1 person wearing a
mask below mouth.

It is shown in Figure 14, that the percentage ranges


from 94.49% - 98.95%. This is due to the movement of the
person. It is to note that movement affects the percentage.

Fig. 11. Close-up picture of records inserted in the database from Anaconda
Prompt.

Figure 11 shows that the records are inserted into the


database with the mask percentage and subject as a new row
Anaconda Prompt. It keeps adding new rows per second.

Fig.15. Face Mask Detector for webcam/ video stream detecting one person
wearing mask covering mouth only.

Fig. 12. Database view for real-time video with two people. We can see in Figure 15 that the algorithm detects the
person as wearing a mask when the person only covers the
In Figure 12, the database stores the timestamp, mouth and not the nose. This may be because the algorithm
subject number, and percentage of mask-wearing is not trained and supplied with images of people not wearing
accuracy and if a person wearing a mask or not wearing a a mask properly, such as wearing a mask below mouth and
mask. only covering mouth. Due to this, the algorithm fails to detect
improper wear of mask accurately.

Fig.16. Database view for detecting real-time video with 1 person wearing a
mask covering mouth only.
Fig. 13. Face Mask Detector for webcam/ video stream detecting one person
wearing mask not covering mouth. It is shown in Figure 16, that the percentage ranges
from 98.84% - 98.98%. This is due to the movement of the
Figure 13 shows the algorithm detecting the person person. It is to note that movement affects the percentage.
wearing a mask below mouth as not wearing a mask and
the percentage is 94.49%. If the person is not wearing a

113 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

pandemic. To conclude, the algorithm is working well and is


able to detect people wearing a mask and not wearing a mask
and results were positive.
Developing Automated Face Mask Detection has
also fully achieved the aim in improving customer’s
experience in public places by designing a system according
to the existing problem provided by the respondents.
The classification of this work is mainly divided into
two classifications that has been successfully implemented,
which is wearing a mask and not wearing a mask. Third
classification such as improper wear of mask would be
considered as our next research topic.
Fig. 17. Face Mask Detector for Detecting people in an image. ACKNOWLEDGMENT
This work is supported by Research and Technology
Figure 17 shows the system can detect 4 people only
Transfer Office, Bina Nusantara University as a part of Bina
although there are more people in that image. This may be
Nusantara University’s International Research Grant entitled
due to the people at the back is quite blurry which causes it
“Computational Intelligence and Advanced Predictive Data
hard to be detected.
Analytics of Covid-19 Transmission Dynamics and
Automated Detection of Health Protocol Compliance” with
contract number: No.017/VR.RTT/III/2021 and contract
date: 22 March 2021.

VI. REFERENCES

Fig.17. Database view for detecting real-time video with 6 people.


[1] AP, "Masks, explained: What percentage of the virus can be blocked,
and what it means for the infected and uninfected?," Economic
Times, 18 November 2020. [Online]. [Accessed 9 March 2021].
C. Result and Discussion [2] "Coronavirus update (live): 118,204,121 cases and 2,623,297 deaths
The experiment showed positive results when from COVID-19 virus pandemic - worldometer," Worldometers.info.
[Online].
detecting smaller number of people instead of larger
[3] T. J. Post, "In 'critical' condition, hospitals struggle to decide who
number of people. By trying to test the automated face gets into ICU," Thejakartapost.com, 30 December 2020. [Online].
mask detection system with real-time video and images [Accessed 9 March 2021].
we can know that the distance between the camera and the [4] N. Andarningtyas, "Jakarta Smart City gandeng startup kembangkan
person also affects the accuracy. The position of the fitur COVID-19," ANTARA, 6 February 2021. [Online]. [Accessed
person is also another reason the system could not detect 9 March 2021].
the person’s face. For example, if one person’s face is [5] Y. B. a. A. C. Ian Goodfellow, Deep Learning, London, England:
covered by another person, the system will not be able to MIT Press, 2016.
detect him/her. This automated face mask detection [6] C. Bernstein, "What is face detection and how does it work?,"
Techtarget.com, February 2020. [Online]. [Accessed 9 March 2021].
system can detect people <6 people based on the current
development from an eye-level camera angle. [7] Prabhu, "Understanding of Convolutional Neural Network (CNN) —
Deep Learning," Medium, 4 March 2018. [Online]. [Accessed 28
The percentage can decrease and increase from a March 2021].
range of 0.01% - 5% when detecting real-time video due [8] S. V. Militante and N. V. Dionisio, "Real-Time Facemask
to the movement of the people which affects the accuracy Recognition with Alarm System using Deep Learning," 2020.
percentage. However, more testing needs to be done to be [9] B. Qin and D. Li, "Identifying Facemask-Wearing Condition Using
surer of the percentage range changes due to movement. Image Super-Resolution with Classification Network to Prevent
To improve the algorithm in detecting a larger COVID-19," Sensors (Basel), vol. 20, no. 18, p. 5236, 2020.
number of people, large number images of crowds must [10] M. Jiang, X. Fan and H. Yan, "RetinaMask: A Face Mask detector,"
2020. [Online]. [Accessed 17 March 2020].
be inputted into the dataset folder and trained. However,
due to time limitations and difficulty in obtaining dataset
images of crowds, training for crowds is not done.
Improper wear of mask is not within the training
data, where the classification is wearing or not wearing a
mask. If the research were to be extended, a third
classification as improper wear of mask is needed to be
included.

V. CONCLUSION
This paper is implementing an Automated Face
Mask Detection for research purposes during the COVID-19

114 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Blockchain Technology behind Cryptocurrency and


Bitcoin for Commercial Transactions
Frederik Arnold Cahyadi Albert Ivando Owen Franseda Ricardo Alexander A S Gunawan
Computer Science Department Computer Science Department Computer Science Computer Science Department
School of Computer Science Bina School of Computer Science Department School of School of Computer Science
Nusantara University Jakarta, Bina Nusantara University Computer Science Bina Bina Nusantara University
Indonesia 11480 Jakarta, Indonesia 11480 Nusantara University Jakarta, Jakarta, Indonesia 11480
[email protected] [email protected] Indonesia 11480 [email protected]
[email protected]

Abstract— Blockchain is a technology used as a digital data computerized database. Financial unit creation and
record system connected through cryptography. Meanwhile, confirmation of support exchanges are secured utilizing
cryptocurrency is a digital asset that is understood as a digital encryption strategies and dispersed over numerous gadgets
currency mainly based on blockchain technology. This research (hubs) on peer-to-peer systems [6]. This whole exchange
aims to understand how blockchain works inside history can be freely confirmed within every node since
cryptocurrency by conducting a systematic literature review everybody contains a duplicate of the shared record. This
(SLR). In future, it may the cryptocurrency will replace paper shared record, by and large within the frame of blockchain,
currency into digital currency. Blockchain serves to be a comprises of exchange sub-blockchain and is continually
security system to prevent loss or duplication of data. With this
confirmed by a prepare called "burrowing". Through this
new technology makes the security system especially for finance
to be improved. Based on our study, it can be concluded that
preparation, the modern unit cryptocurrencies are made.
blockchain is the right technology for cryptocurrency in Anybody is free to connect and take off the cryptocurrency
commercial transactions because it allows cryptocurrency to framework at any time without the character is joined by
work without a central authority. This can reduce risk as well clients.
as transaction costs. These technology-enhanced and privacy-preserving
highlights make it possibly distinctive from other existing
Keywords—Blockchain, Bitcoin, Commercial Transaction, money related rebellious and have caught the consideration of
Cryptocurrency, Cyber Security, Cryptography numerous financial specialists and analysts. The starting
budgetary writing on cryptocurrencies fundamentally
I. INTRODUCTION centered on surveying the productivity of bitcoin data utilizing
Blockchain may be an innovation that's emphatically autocorrect proportion tests and found that, in spite of the fact
related to cryptocurrency. Blockchain is additionally a record that bitcoin appeared signs of wastefulness within the early
of digital transactions utilized to record exchanges made with period 2010-2013, it gradually advanced towards a more
cryptocurrencies, such as bitcoin. Blockchain could be a effective advertise afterward in life [6]. The bitcoin advertise
disseminated record of decentralized peer-to-peer systems, can be considered informationally productive and looks
with boundless computerized exchanges across the arrange profoundly unstable [7]. Bitcoin has ended up prevalent
without required third parties [1]. This innovation can be advanced cash within the advertise. Be that as it may, Bitcoin's
utilized in different areas such as cryptocurrency since it notoriety has pulled in rivals to utilize the Bitcoin organize for
contains a number of valuable highlights such as unwavering self-centeredness and benefit. Nowadays, we have roughly
quality and traceability. It is said that blockchain could be an more than one thousand cryptocurrencies in operation,
innovation of putting away and transmitting information numerous of which have as of late entered the showcase. From
without a control organ [2]. In fact, it may be a conveyed all these legitimate monetary forms, Bitcoin's great ubiquity
database whose data is sent by the user and inner joins to the and tall showcase capital make it appealing to rivals to
database are checked and assembled at time interims within dispatch different security dangers. Among these assaults,
the square, all of which are secured by cryptography by there are few effective and compelling security arrangements
shaping a chain [2]. Blockchain could be a piece of dynamic that can guarantee the normal operation of Bitcoin within the
information to convert computing potential and disturb a few future. Along side security, the disseminated nature of the
wanders with more innovative settings [3]. In recent years, Bitcoin blockchain leads to glitches in client security and
blockchain has grown rapidly starting from bitcoin, the anonymity prerequisites.
primary cryptocurrency made, stemming from a proposition Cryptocurrency and bitcoin have gone through different
to bypass the money related framework to form installments advancements as monetary forms and as computerized
on a peer-to-peer bases. The activity emerged after the 2007- resources. Most of the cryptocurrencies are clones of bitcoin
2008 financial crisis, which was written on white paper and or other crypto coins. The cryptocurrency industry itself is
posted anonymously [4]. Due to the transition to blockchain, worldwide and localized. Cryptocurrencies can trade without
blockchain-based applications have been involved in our daily limitations. The trade can be done by anybody and at any time.
lives. As the number of users of blockchain systems increases Subsequently, numerous individuals are mining crypto coins
extensively, major public chain scalability issues have arisen like bitcoin [8]. In terms of crypto security, all exchanges are
and greatly affected blockchain development. recorded on the blockchain and can be recovered by the client
Cryptocurrency could be computerized cash mainly based by guaranteeing the user's anonymity. In this way, clients with
on blockchain innovation. It is the longer-term insurgency of an open key can perform crypto coin exchanges related to the
monetary standards and innovation that has the potential to private key [2]. Cryptocurrency, particularly bitcoin,
alter the world [5]. Cryptocurrency may be an advanced enormously influences the economies of a few nations.
resource planned as a medium of trade where possession of Bitcoin has been tallied as crypto money other than fiat cash.
person coins is recorded in a computerized record or As the number of cryptocurrencies included has developed,

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

115 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

they have entered a more profound financial environment, • RQ 2 : How do bitcoin transactions work and affect
they will end up more related and related to fiat monetary various aspects, especially finance?
forms and may indeed be gotten to be successors [9]. Since of
this, increasingly offenders are beginning to utilize The main filter criteria in our research are research articles
cryptocurrency. By utilizing blockchain, we will store that discussing cryptocurrencies and blockchain articles. And
information securely. In spite of the fact that there's still a the other criteria are research articles at least 5 years before
plausibility that somebody can hack it, but by persistently this year (2016-2021), and research articles that discussing
overhauling the framework blockchain, the chances of cryptocurrencies transactions.
somebody hacking it'll be indeed littler. III. RESULTS
In Indonesia itself, cryptocurrency regulations have been
regulated by Commodity Futures Trading Regulatory Agency A. RQ 1 : What’s the difference between cryptocurrencies
or BAPPEBTI. Through regulation of BAPPEBTI No. 5 year transactions and traditional transactions?
of 2019, cryptocurrency assets were confirmed as
commodities that can be traded on the futures exchange. This Table I. Difference Table
regulation provides a clear legal umbrella for the
cryptocurrency industry and is growing in Indonesia. We Transactions
describe the methodology of the study in chapter 2, followed Difference
Cryptocurrencies Traditional
by the result of the systematic literature review (SLR) for Using peer to peer Using third party
chapter 3. Discussions about the results that the authors Security
system. (banking).
gathered will be in chapter 4. And this paper will be closed Control Decentralized. Centralized.
with a conclusion in chapter 5. Digital medium of Physical medium of
Exchange
II. METHODOLOGY exchange. exchange.
Supply Limited. Unlimited.
The methodology used in this paper is SLR (Systematic
Literature Review). It needs some inquiries related to this Determined by Determined by
study. The next step is collecting information from articles, Value supply and market and regulation
and the web, and then review, evaluate, and identify all the demand. in country.
research sources that have been obtained and all of the Produced by Produced by
Production
resources were talking about blockchain technology and computers. government.
cryptocurrencies. And the last is to report the results of the
work, especially the results of evaluation and identification of Blockchain is a security technology commonly used in
this research. cryptocurrencies to prevent data loss or duplication. These
days more than 60% of business trades instantly appear on the
web, so this block/web requires a high type of security for the
most complete trade [3]. In this way, cybersecurity has
become a problem. The level of network security is not
optimized to check data in addition to information technology
businesses but further to the various web, etc. Blockchain may
be a conveyed book that's fully open to anybody within the
organization [10]. One of the foremost imperative properties
of blockchain is that once a piece is included in the blockchain,
it is exceptionally troublesome to alter the information in it. A
square comprises a number of substantial exchanges that are
hashed and encoded into the Merkle tree and included in the
piece. Cryptocurrency may still be remote to us and we don't
know how to utilize it. The innovation behind cryptocurrency
could seem outsider to individuals who don't get it things in
innovation and back [11]. To get it clearly how cryptocurrency
works, to begin with, we must know the best thing
approximately cash. For advertising, this cryptocurrency has
pushed innovation towards money-related or monetary issues.
The execution cost productivity and instability of
cryptocurrencies have been inspected by numerous analysts.
in an effective advertising, costs continuously reflect the data
accessible to the showcase [12]. The cryptocurrency we
frequently listen around is bitcoin. In a later paper, Foley,
Karlsen, and Putniņš highlighted that cryptocurrencies have
developed quickly in cost, notoriety, and standard selection.
Until December 2019, there are more than 4,900
cryptocurrencies on the showcase with a showcase
Figure 1. PRISMA Flowchart capitalization of over 197 billion USD [13]. Bitcoin was the
Research Question that we have designed is as follows: primary cryptocurrency created with approximately five
thousand more cryptocurrencies presently taking after in its
• RQ 1: What’s the difference between cryptocurrencies strides. In spite of the fact that exceptionally well known,
transactions and traditional transactions?

116 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

bitcoin and other cryptocurrencies cause far-reaching central managing an account framework, through an
discussion in terms of their legitimate status and part as cash. untrusted, fungible, and tamper-resistant conveyed
The notoriety of cryptocurrencies as theoretical resources is bookkeeping framework known as blockchain [22]. The
affirmed by the activities of showcase members. Bitcoin utilize of cryptocurrency and blockchain innovation is an
performs the capacities of cash (medium of trade, capacity of elective to cash exchanges. To exchange bitcoins, the sender
esteem, and measure of esteem) as it were to a restricted makes a exchange message with the number of bitcoins or
degree [14]. There are monthly details of bitcoin bits to be exchanged [23]. After the sender confirms the
cryptocurrency until February 2020. There is a comparison expecting beneficiary, the primary will send bitcoins
with the USD dollar with the increase as well as the decrease
electronically after marking the exchange utilizing the
in orders by trade, there are two red and green columns on the
sender's cryptographic signature [23]. This exchange gets
side of the date (or bottom) that indicate the sell and buy using
BTC according to the month and on the RHS there is the same extra innovation to streamline this cryptocurrency exchange.
concept bar chart using the one below but there is a bar chart The utilize of this innovation dispenses with the require for
corresponding to the US dollar. And according to the bottom untrusted third parties in support exchange operations. With
bar and RHS charts, there is a show of red and blue lineages the expanding infiltration of all cell phone innovations,
that determine the right currency and time chart before 2015 indeed those who may be "unbanked" are likely to have get
to February 2020 [15]. Distinctive streams of writing consider to to cell phones. In such cases, cryptocurrency/blockchain
the relationship over cryptocurrencies, or between innovation can be utilized to exchange stores to their versatile
cryptocurrencies and conventional resources [16]. There's gadgets [24]. With the development of blockchain innovation
prove of a relative separation of bitcoin, swell, and litecoin in common and its most well-known down-to-earth
from stocks, government bonds, and gold records hence applications, cryptocurrency, in specific, its administrative
advertising a few broadening benefits for financial specialists issues have gotten to be progressively pertinent. Scholastics,
within the brief term [17]. It found a positive but time-varied policymakers, speculators, and blockchain devotees alike are
conditional relationship between cryptocurrencies (Bitcoin, locked in in a warmed wrangle about approximately what
Swell, Sprint, Monero) and affirmed their irrelevant level of control of cryptocurrencies is socially alluring. A few
relationship with conventional resources [18]. In expansion, of the foremost blunt rivals, outstandingly Nobel laureate
prove was found that crowding behavior amid the advertise Joseph Stiglitz, called for a total cryptocurrency boycott,
declined which the littlest coins taken after a bigger way (not whereas others supported a more or less adaptable
fair bitcoin) [19]. It is detailed that the normal return between administrative system [25]. Investigators analyze the macro-
cryptocurrencies is trending up, showing that showcase
economic see of the cryptocurrency-based monetary
interconnecting increments over time [20]. In spite of the fact
that bitcoin shows up to be confined from other money related framework, specific on different blockchains, and their effect
resources all through the period, showcase interrelationships on worldwide back [26]. This segment will cover the effect
emerge when sub-periods are carefully inspected. of the taking after factors on cryptocurrency exchanges,
within the frame of vitality amounts, and costs, nearby or
B. RQ 2 : How do bitcoin transactions work and affect worldwide economies, and socioeconomic components.
various aspects, especially finance? Aberdeen-Group appears that within the important supply
chain exchanges, the technique of moving the money related
Table II. Bank and Bitcoin Differences Table burden to the company with the least weighted normal taken
Using a toll on capital makes supply chains working at diverse costs
Aspects of capital create higher returns [27].
Bank Bitcoin
Round, Deposit, Gold, IV. DISCUSSION
Round, Deposit,
Assets Digital Assets like
Gold Advanced monetary standards were created as a result of
bitcoin and another.
Using money to Almost 30% people an advance in cryptography by hashing method, and its
Exchange development into the blockchain technology. It was this
exchange already use bitcoin
Saving Bank safe Using private device technology that got to be the catalyst for the development of
the cryptocurrency. Bitcoin can be considered as pioneer in
Bank as third
Transfer Peer to peer the active cryptocurrency sector and drives many innovations
party
in new cryptocurrency frameworks, such as Etherium. The
Using bank Using blockchain to
Security cryptocurrency as digital currencies can be classified based
manual security secure transaction
on the purpose reason, degree of the system’s openness, the
issuer, the design of generation, properties, characteristics,
The current installment framework and budgetary
and yield volumes of the platform. As a result, there is many
innovation markets are the most references focuses for the
issues related to the cryptocurrency, for example their
improvement of crypto resources, as the number of non-cash
security, cross-border nature for their circulation, exchanges
installments and bank cards proceeds to develop and develop
and mining. In this way, the cryptocurrency as digital
[21]. All national and universal financial structures, counting
exchange media has grown as modern challenges for
banks and other monetary middle people as well as central
government and world institutions to develop future use
banks, have advanced to help within the creation and
strategies.
administration of imperial fiat monetary standards. This
Blockchain can be said as the proper innovation for
money related status quo abruptly rose with the entry of the
cryptocurrency in commercial exchanges. It permits
primary cryptocurrency, bitcoin, in 2008 which presented a
cryptocurrency to work without a central specialist. This will
peer-to-peer advanced fiat money without the require for a
decrease chance as well as exchange costs. One of the

117 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

foremost critical properties of blockchain is that once a piece signature. This exchange moreover gets extra innovation to
is included to the blockchain, it is exceptionally troublesome disentangle this cryptocurrency exchange. The utilize of this
to alter the information in it. It makes the commercial innovation kills the require for untrusted third parties in
exchanges cannot control by programmers. In any case, finance exchange operations. With the expanding infiltration
cryptocurrency raises different viewpoints in society and in of all cell phone innovations, indeed those who may be
enactment. There are many cases of fraud in the name of "unbanked" are likely to have get to to cell phones.
bitcoin in buying and selling transactions. This fraud occurs REFERENCES
due to fluctuations in the price of bitcoin which is always
changing so that it causes confusion in setting the price for 1
[1] Mohammed, S. T., & Hussien, J. A. (2020). A Traceable and Reliable
bitcoin. the transaction is also not accompanied and Electronic Supply Chain System Based on Blockchain Technology.
supervised by an authorized institution so that people are free UHD Journal of Science and Technology, 4(2), 132.
to conduct transactions without any special regulations. https://doi.org/10.21928/uhdjst.v4n2y2020.pp132-140
Nevertheless, this is not because there is no oversight agency, [2] Derbali, A. (2019). Block Chain the New Energy Revolution What Is
the government itself also states that transactions using crypto the Block chain ? 2(6).
coins are unpredictable. [3] Dr. P. S. AITHAL, D. K. S. M. (2021). Cyber Security and Privacy
Internal Attacks Measurements Through Block Chain. Information
The high level of volatility in the free market makes it Technology in Industry, 9(1), 1033–1044.
difficult to regulate transactions using crypto coins. In https://doi.org/10.17762/itii.v9i1.236
Indonesian state laws and regulations, there is no prohibition [4] Aslanidis, N., Bariviera, A. F., & Perez-Laborda, A. (2020). Are
on transacting between crypto coins. Cryptocurrency is not cryptocurrencies becoming more interconnected. ArXiv, December.
https://doi.org/10.2139/ssrn.3702278
considered illegal by the country itself. Commodity Futures
[5] Vigna, P., & Casey, M. J. (2016). Cryptocurrency: The Future of
Trading Regulatory Agency (BAPPEBTI) and Financial Money? 386. https://books.google.com.au/books?id=niCQrgEACAAJ
Services Authority (OJK) only supervise transactions [6] Lin, M. Bin, Khowaja, K., Chen, C. Y. H., & Härdle, W. K. (2020).
between crypto coins in money market and are not Blockchain mechanism and distributional characteristics of cryptos.
responsible for other transactions outside of that. At least, the ArXiv. https://doi.org/10.2139/ssrn.3784776
light in here is that cryptocurrencies are not illegal like [7] Hudson, R., & Urquhart, A. (2021). Technical trading and
narcotics and gambling. In Islamic law itself, there is no cryptocurrencies. Annals of Operations Research, 297(1–2), 191–220.
https://doi.org/10.1007/s10479-019-03357-1
mention that cryptocurrencies are haram. once again. The
[8] Aslanidis, N., Bariviera, A. F., & Martínez-Ibañez, O. (2018). An
high risk of fluctuations that continue to change every second analysis of cryptocurrencies conditional cross correlations. ArXiv,
is something that must be considered. March 2019. https://doi.org/10.2139/ssrn.3287697
Finally, one's lack of insight and awareness of the risks [9] Ayedh, A., Echchabi, A., Battour, M., & Omar, M. (2020). Malaysian
makes these cryptocurrencies cause many negative impacts. Muslim investors’ behaviour towards the blockchain-based Bitcoin
Everyone who wants to dive into the field of cryptocurrency cryptocurrency market. Journal of Islamic Marketing, February.
https://doi.org/10.1108/JIMA-04-2019-0081
must pay more attention to the disadvantages and risks.
[10] Bharimalla, P. K., Praharaj, S., & Dash, S. R. (2019). ANN based block
Technologies behind it such as blockchain and cryptography chain security threat mechanism. International Journal of Innovative
may not be understood by many people. The lack of Technology and Exploring Engineering, 8(10), 2672–2679.
understanding makes the people can influence the opinion of https://doi.org/10.35940/ijitee.J9442.0881019
others on a phenomenon and give decisions based just on [11] Prof. Jumel. (2021). The Possibilities of Cryptocurrency in The Global
Economy Through the Eyes of a Developing Country.
gimmick or assumptions.
[12] Kumar, S. S. (2021). Cryptocurrencies and Market Efficiency.
V. CONCLUSION [13] Grobys, K. (2021). When the blockchain does not block: on hackings
and uncertainty in the cryptocurrency market. Quantitative Finance,
The execution cost productivity and instability of May. https://doi.org/10.1080/14697688.2020.1849779
cryptocurrencies have been inspected by numerous analysts. [14] Kołodziejczyk, H., & Jarno, K. (2020). Stablecoin – the stable
in an effective showcase, costs continuously reflect the data cryptocurrency. Studia BAS, 3(63), 155–170.
accessible to the showcase. The current installment https://doi.org/10.31268/studiabas.2020.26
framework and budgetary innovation markets are the most [15] Bradh, G. (2020). CRYPTOCURRENCY: BITCOINS. Research Gate,
references focus for the improvement of crypto resources, as October 2020.
the number of non-cash installments and bank cards proceeds [16] Jani, S. (2018). The Growth of Crypto currency in India: Its
Challenges& Potential Impacts on Legislation. Research Gate
to develop and develop All national and universal financial Publication, April. https://doi.org/10.13140/RG.2.2.14220.36486
structures, counting banks and other budgetary middle people
[17] De Pace, P., & Rao, J. (2020). Comovement and Instability in
as well as central banks, have advanced to help within the Cryptocurrency Markets. SSRN Electronic Journal, January.
creation and administration of paramount fiat monetary https://doi.org/10.2139/ssrn.3523993
standards. This money related status quo abruptly rose with https://doi.org/10.47363/jesmr/2020(1)113
the entry of the primary cryptocurrency, bitcoin, in 2008 [18] Peter, E. G., & Akadiri, S. Saint. (2020). Cryptocurrency and the
which presented a peer-to-peer computerized fiat cash without Nigerian Economy. Journal of Economics & Management Research,
the require for a central keeping money framework, through 1(3), 1–2.
an untrusted, fungible, and tamper-resistant conveyed [19] Keilbar, G., & Zhang, Y. (2021). On cointegration and cryptocurrency
dynamics. Digital Finance, 0123456789.
bookkeeping framework known as a blockchain. The utilize https://doi.org/10.1007/s42521-021-00027-5
of cryptocurrency and blockchain innovation is an elective to [20] Zhou, Q., Huang, H., Zheng, Z., & Bian, J. (2020). Solutions to
cash exchanges. To exchange bitcoins, the sender makes a Scalability of Blockchain: a Survey. IEEE Access, 8(January), 16440–
exchange message with the number of bitcoins or bits to be 16455. https://doi.org/10.1109/aCCESS.2020.2967218
exchanged. After the sender verifies the planning beneficiary, [21] Titov, V., Uandykova, M., Litvishko, O., Kalmykova, T., Prosekov, S.,
the primary will send bitcoins electronically after marking the & Senjyu, T. (2021). Cryptocurrency Open Innovation Payment
exchange utilizing the sender's individual cryptographic System: Comparative Analysis of Existing Cryptocurrencies. Journal

118 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

of Open Innovation: Technology, Market, and Complexity, 7(1), 102. [25] Shanaev, S., Sharma, S., Shuraeva, A., & Ghimire, B. (2019). Taming
https://doi.org/10.3390/joitmc7010102 the Blockchain Beast? Regulatory Implications for the Cryptocurrency
[22] Georgiou, G. C. (2020). Cryptocurrency Challenges Sovereign Market. SSRN Electronic Journal, December.
Currency. World Economics, 21(1), 117–141. https://doi.org/10.2139/ssrn.3397939
http://search.ebscohost.com/login.aspx?direct=true&db=buh&AN=14 [26] IRLANE MAIA DE OLIVEIRA. (2017). No 主観的健康感を中心と
2664284&site=ehost-live した在宅高齢者における 健康関連指標に関する共分散構造分
[23] Moosa, F. (2019). Cryptocurrencies: Do they qualify as “gross 析 Title. 1–14.
income”? Journal for Juridical Science, 44(1), 10–34. [27] Chaoyong, Z., & Aiqiang, D. (2018). The coordination mechanism of
https://doi.org/10.18820/24150517/jjs44.i1.1 supply chain finance based on block chain. IOP Conference Series:
[24] Kulkarni, R., Schintler, L., Koizumi, N., & Stough, R. R. (2020). Earth and Environmental Science, 189(6), 0–4.
Cryptocurrency, Stablecoins and Blockchain: Exploring Digital Money https://doi.org/10.1088/1755-1315/189/6/062019
Solutions for Remittances and Inclusive Economies. SSRN Electronic
Journal, January. https://doi.org/10.2139/ssrn.3511139

119 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

The Effect of UI/UX Design on User Satisfaction in


Online Art Gallery
Alvin Wijaya Kefry Wendy Wihalim Alexander Agung Santoso
Computer Science Department Computer Science Computer Science Department Gunawan
School of Computer Science Department School of School of Computer Science Bina Computer Science Department
Bina Nusantara University Computer Science Bina Nusantara University Jakarta, School of Computer Science
Jakarta, Indonesia 11480 Nusantara University Jakarta, Indonesia 11480 Bina Nusantara University
[email protected] Indonesia 11480 [email protected] Jakarta, Indonesia 11480
[email protected] [email protected]

Abstract—UI and UX design in an online art gallery has a the web browser [2]. By doing this research, we hope readers
big impact for user satisfaction. The purpose of this research is can find out how to design UI and UX for an online art gallery
to find out how UI and UX design affects user satisfaction and to increase user satisfaction. And able to fill the requirements
how to design good UI and UX in an online art gallery. We of users in other fields, like using an online art gallery for
evaluated 27 research papers related to UI and UX design in an supporting art learning in increasing the achievement of
online art gallery using literature review approach. learning objectives [3], and collecting works of art as notes in
Furthermore, we used a survey to measure System Usability the portfolio [4].
Scale (SUS) of two renown Art Gallery in the form of a
questionnaire. The results showed that the UI and UX designs Furthermore, we want to learn from best practices by
in an online art gallery do have a big impact on user satisfaction. comparing two well-known art gallery platforms: ArtStation
Nevertheless, user satisfaction is relative factors depend on the and DeviantArt. ArtStation was relatively new to the online
user, for example users do not like websites that tend to be dark. artist community around 2014. DeviantArt was launched in
Based on our research, we concluded four important factors 2000, which means ArtStation came into existence 14 years
which need to be considered in designing UI and UX for an later. But the new platform seems to be surging and that's one
online art gallery, that is web system must be simple, consistent, of the reasons why we wanted to compare DeviantArt to
work properly, and fulfill the user requirements. We encourage ArtStation. According to the ArtStation website, it focuses on
further research to conduct larger studies to correlate with our
helping artists become self-reliant by providing them with job
findings.
opportunities. ArtStation covers cover games, movies, media
Keywords—UI/UX, User Satisfaction, Online Art Gallery,
DeviantArt, ArtStation
and entertainment. While DeviantArt hasn't always been a
platform where artists share their work or a vibrant community
I. INTRODUCTION of young artists. It used to be a platform where computer
experts modify applications according to their taste.
Website is not something that is uncommon anymore for
most people, and the website has been widely used in various We define the methodology of the study in chapter 2,
fields, like business, education, industry, and many more. followed by the result of the literature review and the survey
Nevertheless, many websites pay less attention for the design in chapter 3. Furthermore, chapter 4 will discuss about the
of the web. Whereas in fact, the design is not only about results of our research. Finally, the conclusion will be
whatever website is not only for appearance but also the web confirmed in chapter 5.
design give many impacts for the interactions to provide
solutions for users. In long run, this matter can affect user II. METHODOLOGY
satisfaction in using a website. By having a good design, The methodology that we use for this study is literature
especially good User Interface (UI) and User Experience review, where we start it with doing research based on the
(UX), a web can attract more visitors, which indirectly also literature review method. We use this method on comparing
maintains the level of stability and transformation between the research papers by collecting and reviewing journal
user and business [1]. publications that are related to our topic. The start point of this
UI and UX design of the website is considered based on method is by finding related papers by using our keywords,
the requirements and desires of the user. Because without any and then we collect some of the related data. By doing this
consideration for the requirement and desire of the user, user process, we can find many possibilities of those papers to
satisfaction. Because without any consideration for the profoundly improve our study.
requirements and desires of the user, the user satisfaction, From the collected papers, we filter them from the last 5
which is the goal of UI and UX design, cannot be achieved. years from now to get the newest information. And because
For example, when a user gives a comment on the comment not all papers that we got from the keyword are related to our
column of a web, user-generated comments by the user are not paper, we select the paper based on 3 research questions: (i)
displayed or displayed. It turns out that the comment must be What is an online art gallery, (ii) What is UI and UX, (iii)
edited or posted by the web author before showing it up. As a What factors from UI and UX that can impact user
result, users think that their network is not good enough [1]. satisfaction. After those research questions, we get a total of
Therefore, our objective of this research is to know what 27 papers that relate to our paper, then we do further
make a good UI and UX design for the website, especially for examination on it. During the examination, we try to find out
an online art gallery. Because at this era information makes what research is done and make a conclusion from those
many businesses move online via web browser. Of course, this papers by comparing them with other papers. Other than that,
opens opportunities for many art galleries to move online via we also use the System Usability Scale (SUS) in the form of a
questionnaire to get the evaluation usability of the currently

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

120 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

existing online art gallery from 12 random respondents who 9 The marketing 2020 Like all businesses,
are interested in art. So, with the papers and responses that we strategy to art galleries must
stimulate adapt to survive by
collect, we can produce categories of similar papers in UI and customer's innovating several
UX design, especially for an online art gallery. interest in art- components of the
gallery business marketing mix to
III. RESULT plan [10] create a competitive
To make the findings more valid, we use a table of advantage. It also
requires a strategy
comparison to compare the findings from each paper that we that combines
have collected before, and we also use questionnaires to get targeting and
the usability evaluation of the existing online art gallery. segmenting with the
incorporation of the
TABLE I. TABLE OF COMPARISON marketing mix to
achieve
No Title Year Conclusion competitiveness.
10 The user 2018 Having UI that have
1 Online Art 2018 Online art gallery is a interface and user design principles
Gallery [2] place to exhibit and experience of well implemented
sell artworks in the Web Design [11] have a big impact on
future, and this UX
information age 11 Exploring and 2020 This showed that
opens opportunities Comparing the IDM based website
for the development Performance of performed well then
of online art gallery. Design Methods Card Sorting
2 Virtual gallery as 2019 There is an increase Used for technique from the
a media to in the learning Information usability perspective
simulate painting outcomes of painting Intensive
appreciation in appreciation achieved Websites [12]
art learning [3] by students from 12 Developing a 2020 Building a successful
appreciation and art website with user website depends on
critics class after experience [1] many factors, such as
using virtual gallery UI/UX design and
media. human behavior
3 Web-Based 2017 Through the process 13 Model-based 2018 Using MM helps in
Gallery as of collecting tasks adaptive user improving the
Portfolio for Art and student works, interface based information
and Design the student will have on context and accessibility,
Academia [4] many work records in user experience usability, user
his portfolio. evaluation [13] experience of system
4 Snowsylvania : A 2019 Snowsylvania has the 14 The development 2016 Implementation of
Modern Platform potential to be the of digital library the user experience
for the Sharing of next generation of user interface by and Responsive Web
Creative Work sites like DeviantArt. using responsive Design in the digital
[5] web design and library interface
5 The 2018 For future user experience design proposals can
Recommendation improvements, it can [14] improve the
Algorithm for an be done by estimated percentage
Online Art remembering the tags of success and
Gallery [6] that users encounter estimated task time.
frequently to improve And able to optimize
the accuracy of content search and
recommendations, view content on
and as hints of mobile device with
possible tags when limited screen size.
users fill out their 15 A Study on the 2018 Enhanced web
preferences. User Opinion of learning system for
6 How does art 2019 People who have a User Interface website with the
appreciation high self-evaluation Design [15] design considerations
promote artistic of art-making is more effective in
inspiration? [7] experiences get terms of learning
inspiration to create outcome and visual
more intensely when attraction
they appreciate 16 Affording choice: 2018 A successful website
paintings. how website is website that shape
7 ARTIQUE: An 2020 The online gallery is designs create choice operate by
Art Criticism a good platform for and constrain drawing together the
Session through an independent artist ‘choice’ [16] objectives of the site
Online Gallery to improve and gain owners (such as
for Independent knowledge. providing a public
Artists [8] good, selling a
8 Art of designing 2017 The artists can easily product, or collecting
an e-art gallery showcase their personalized data)
[9] artwork without and the web users
involving any art 17 Beautiful 2017 Standards, usability
gallery or curator. interfaces. From paradigms, and
user experience patterns have

121 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

to user interface rewritten the priority in Information effective in


design [17] of design excluding System maiproject
or, at least, limiting Program of applications, and we
formal aspects and Telkom can see from the
perceived University [23] results, it can get 71
pleasantness as and 72 scores
negative components 24 Enhancing Art 2018 Augmented reality
of the people Gallery Visitors’ app enhances the
experience Learning learning experience
18 Service design 2017 iot showed the Experience using and achieve
based on iot and possibility of a new Wearable outcomes such as
technology UI/UX by going Augmented increasing knowledge
comparison for beyond browser and Reality: Generic and understanding,
fine art gallery applications Learning skills, changing
[18] Outcomes attitudes and values,
19 Spot the 2016 This project has Perspective [24] increasing
Artwork: successfully achieved enjoyment,
Visualizing Gaze its goal of creating inspiration and
Data for the games that are creativity and
Manchester Art considered increase the activity,
Gallery [19] entertaining by the behavior, and
majority and can be development of
used as a tool for visitors to a work of
further research on art
visualization. galleries compared to
20 Applying user 2017 This study describes visitors who
experience (UX) and investigates experience galleries
design in interior research to improve without access to this
space for art, the learning technology
science environment. The 25 Analysis and 2019 The author has
museums, and aim of this research is Design of User succeeded in
learning to improve the Interface and analyzing the ui/ux
environments presentation display User Experience design for his e-
[20] collection methods (UI / UX) E- commerce website.
and techniques. In Commerce Manage and analyze
addition, this study Website PT and design based on
investigates files Pentasada data from
different information Andalan Kelola Community.
display systems are Using Task
implemented and can System Centered
be improved. User Design (TCSD)
behavior and needs in Method [25]
such an environment 26 User Interface 2019 Create guidelines for
are observed. Design of evaluating a proper
21 Algorithms for 2017 They present two Mobile-based user interface by
art gallery efficient algorithms Commerce [26] conducting surveys
illumination [21] for this case, with among customers in
fixed guard positions e-commerce
derived from the applications
infinite LP 27 Design of gallery 2021 This research can
formulation. While web-based space implement because it
others are faster in booking system has met the value of
practice and as media communication
generally provide service [27] efficiency, display
good solutions, and effectiveness, and
others place user satisfaction, so
guaranteed limits on that laboratory
solution quality. services will be more
22 Relationships 2020 Respondents did not optimal.
among Beliefs, affect the intention to Furthermore, this
Attitudes, use wearable AR research will be
Time Resources, devices, this finding carried out used as a
Subjective indicates that guide in making the
Norms, and individuals who have B9 Gallery ordering
Intentions more free time will website to make it
to Use Wearable be able to visit art easier for students to
Augmented galleries, because this place orders a room
Reality in Art can be considered as in the Gallery.
Galleries [22] recreation. using AR
23 Analysis and 2021 User Centered
Design of UI and Design method A. Results Based on Previous Study
UX Web-Based chosen because of
Application in this method focuses After studying several previous studies related to our
Maiproyek more on the users research topic regarding good UI and UX design for an online
Startup Using targeted by the art gallery, it can be concluded that most of these papers can
User Centered application, and this
Design Method method is very
be used as a consideration to increase user satisfaction. In

122 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

general, these papers discuss the current development of B. Results Based on Questionnaire
online art gallery, examples of UI and UX designs on the web, We conducted a survey to evaluate two art galleries, that
comparing designs from various webs as references to UI and is DeviantArt (https://www.deviantart.com) and ArtStation
UX designs, things that need to be considered in designing UI (https://www.artstation.com), see figure 1. For measuring art
and UX to features that needed in the online art gallery. gallery usability, we used System Usability Scale (SUS),
However, not all of these papers explain in detail how they which consists of a 10 item questionnaire with 5-Point Likert
collected the required data. Most of these papers only provide Scale.
an overview of how the research was conducted and provide
the results.
Based on the studies, this research can be divided into
three main discussions. The first discussion is to discuss how
the development of the online art gallery this day. This aims
to find out what is already there to be maintained, repaired, or
developed. The second discussion is to compare several
existing web designs. It aims to use good design examples as
references in web designing. The third discussion is to discuss
how to design a good UI and UX for an online art gallery. This
aims to find out what things need to be considered in designing
the UI and UX online art gallery in increasing user (a)
satisfaction. An online art gallery is a place to sell, and exhibit
works of art [2]. And until now, online art galleries have also
been used for various things such as supporting art learning in
increasing the achievement of learning objectives [3] and
collecting works of art as notes in the portfolio [4].
To find out what design is good and what users expect, we
must collect data that can support us in determining the UI
design that best meets the user's UX. In this data collection
stage, it is a good idea to pay attention to or find out about web
design developments from year to year [17], [18]. By knowing
these developments, we can find out about changes and (b)
additions to features that occur on the website. By knowing Figure 1. (a) ArtStation and (b) DeviantArt
this, we can predict what features or appearance a user expects We did a comparison between the two online art galleries,
from a website. The next stage is to make comparisons on and it is found several factors that made ArtStation better than
several examples of websites [12], [23], [20], [25], [26], [21], DeviantArt. First, DeviantArt has a web appearance that tends
our goal in searching for this comparison is to get the to be darker than ArtStation, which makes ArtStation more
advantages and disadvantages of the websites that we are attractive than DeviantArt. Second, ArtStation has a more
comparing. That way, we can get data that can be an important responsive interface than DeviantArt because the interface on
factor in website development. There is also a way for us to ArtStation can adjust the size of the device, while DeviantArt
get this data by using Mining Minds (MM), with this is less responsive. Third, ArtStation has more features
technique we can get data that will be used later in developing compared to DeviantArt, where these features can support
the delivery of information, its function or use, and the user users in their work, such as the challenges feature so that they
experience system [13]. can provide a better experience for users.
In creating a website, we must pay attention to what things
will affect the UI and UX, such as UI with good design 68
principles and clearly have a good effect on UX [11], [16], 66
personality, and behavior that can affect UX [1], responsive 64
web that becomes an important factor in UX [14], [15], and 62
other techniques in UX design [12]. By paying attention to this 60
matter, we can provide services that match user expectations. 58
For example, a nice and neat appearance, providing
56
recommendations for artwork according to user interests [6],
can provide comments to appreciate the artwork [7] and 54
support the development of independent artists [8], facilitate DeviantArt ArtStation
the sale of artwork [9], can book places to create [27] and other
common features expected by users [5]. However, online art Figure 2. SUS Score
galleries also still need to continue to innovate and develop to
Based on the survey results of the SUS score of 12 random
have an advantage in competing with other online art galleries
respondents, we found that DeviantArt got a score of 58, while
[10]. For example, by developing visualized gaze data [19]
ArtStation got a score of 66. In this case, we can conclude that
and AR [22], [24] in an online art gallery, because through this
ArtStation is superior to DeviantArt. However, based on the
also an online art gallery can continue to increase user
average of SUS score, both SUS scores are still below the
satisfaction.
average, where the average of SUS score is 68. This indicates
that the UI and UX of the two online art galleries still need to
be further developed to increase user satisfaction.

123 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

IV. DISCUSSION SUS that: (i) People tend to like dark website since it makes
the web more attractive, (ii) Responsiveness of a website
A. Development of the online art gallery make user easier is using the website, (iii) Features of a web
Based on the results we got, an online art gallery is not that can support the user in their work, can provide better user
only used as a place to sell and exhibit art. Currently, an online satisfaction. In conclusion, the UI and UX design of a web can
art gallery has also been used for various things, such as impact the level of user satisfaction. By considering those
supporting art learning in improving the achievement of factors, we can build a good Online Art Gallery website that
learning objectives [3], and collecting artwork as notes in a can fulfill user satisfaction.
portfolio [4]. In addition, there are already several online art
galleries based on websites on the internet today, such as REFERENCES
DeviantArt and ArtStation which we used in the [1] D. Dang, "Developing a website with user experience," 2020.
questionnaire. And based on our analysis of these two online [2] A. Jaiswal, "Online Art Gallery," International Journal of Research -
art galleries, ArtStation has been developed with more GRANTHAALAYAH, pp. 403-406, 2018.
features that support users in their work, such as the challenges [3] E. Sugiarto, J. Julia, R. A. Pratiwinindya, N. S. Prameswari, R.
feature to provide a better experience for its users. So, it is Nugrahani, W. Wibawanto and M. Febriani, "Virtual gallery as a media
very important to keep up to date with the current to simulate painting appreciation in art learning," Journal of Physics:
Conference Series, 2019.
developments and continue to develop an online art gallery to
[4] E. R. Augury, M. Deden and Z. Dhimas, "Web-Based Gallery As
provide a better experience and increase user satisfaction. Portofolio For Art And Design Academia," Asia International
Conference of Art & Design (AiCAD) 2017 Emerging Identity &
B. Comparing several existing web designs Diversity of Art & Design in Southeast Asia, pp. 278-283, 2017.
Based on the result we got, we can compare some of the [5] E. Hoolsema and J. R. Engelsma, "Snowsylvania : A Modern Platform
designs from existing websites to fulfill our goal in finding the For the Sharing of Creative Work," 2019.
advantages and disadvantages of the website that we will [6] W. Karwowski, J. Sosnowska and M. Rusek, "The Recommendation
compare. And to get this data, we can use Mining Minds Algorithm for an Online Art Gallery," Information System in
(MM) technique to develop the delivery of information, Management, pp. 108-119, 2018.
function, and user experience systems [13]. In this case, we [7] C. Ishiguro and T. Okada, "How does art appreciation promote artistic
inspiration?," 2019.
try to use SUS in the form of a questionnaire to compare
DeviantArt with ArtStations. And based on our analysis of the [8] S. Z. Maaruf, N. S. M. Mon and K. Supramaniam, "ARTIQUE: An Art
Criticism Session through Online Gallery for Independent Artists,"
questionnaire result, we get several factors that influence user International Journal of Academic Research in Business and Social
satisfaction. First, we found that people tend to dislike dark Sciences, pp. 102-113, 2020.
websites because dark websites make the website less [9] A. Chandra and P. Uchil, "Art of designing an e-art gallery,"
attractive. Second, we found that a responsive website makes International Conference on Research into Design, pp. 537-546, 2017.
it easier for users to use a website. Third, we found that a [10] S. Han and E. Kang, "The marketing strategy to stimulate customer's
website that is equipped with features that can support users interest in art-gallery business plan," Journal of Distribution Science,
in their work can provide a better experience and increase user pp. 47-54, 2020.
satisfaction. [11] Z. Madsen, "The user interface and user experience of Web Design,"
pp. 1-34.
C. Designing UI and UX for an online art gallery [12] T. Zaki, Z. Sultana, S. M. A. Rahman and M. N. Islam, "MIJST
Exploring and Comparing the Performance of Design," pp. 49-60,
Based on the results that we found, UI and UX have a big 2020.
impact on the web. Depending on how the web is delivered, it [13] J. Hussain, A. Ul Hassan, H. Muhammad Bilal, Syed, R. Ali, M. Afzal,
can greatly influence user satisfaction. From the data we got, S. Hussain, J. Bang, O. Banos and S. Lee, "Model-based adaptive user
we can draw out an idea of what we should focus on. In interface based on context and user experience evaluation," Journal on
designing the UI and UX of an online art gallery, we need to Multimodal User Interfaces, 2018.
consider some factors on user satisfaction. We found out that: [14] D. Sasongko, R. Ferdiana and R. Hartanto, "The development of digital
(i) web system must be simple, (ii) web system must be library user interface by using responsive web design and user
experience," Indonesian Journal of Electrical Engineering and
consistent, (iii) web system must work properly, and (iv) web Computer Science, pp. 195-202, 2016.
system must fulfill the user requirement. By using what we
[15] V. U. Devi and Z. S. Kamaludeen, "A Study on the User Opinion of
found, we can keep in mind what must be done on designing User Interface Design," pp. 342-346, 2018.
an online art gallery website. [16] T. Graham and P. Henman, "Affording choice: how website designs
create and constrain ‘choice’," Information Communication and
V. CONCLUSION Society, pp. 2007-2023, 2019.
From our study, we pinpoint the discussion to 3 parts: (a) [17] L. Bollini, "Beautiful interfaces. From user experience to user interface
Development of the online art gallery, (b) Comparing several design," Design Journal, pp. S89-S101, 2017.
existing web designs, (c) Designing UI and UX for an online [18] A. Park and K. J. Lee, "Service design based on iot and technology
comparison for fine art gallery," ICETE 2017 - Proceedings of the 14th
art gallery. By the discussion, we find that UI and UX play a International Joint Conference on e-Business and
big factor in impacting the level of user satisfaction. And those Telecommunications, pp. 138-143, 2017.
factors are: (i) web system must be simple, (ii) web system [19] S. Alfares, "Spot the Artwork : Visualising Gaze Data for the
must be consistent, (iii) web system must work properly, and Manchester Art Gallery," 2016.
(iv) web system must fulfill the user requirements. With that [20] Z. Al-Hajji, "Applying user experience (UX) design in interior space
factor, we can put into consideration of the UI and UX on for art, science museums, and learning environments," Eastern
designing an Online Art Gallery. Michigan University, p. 77, 2017.
[21] M. Ernestus, S. Friedrichs, M. Hemmer, J. Kokemüller, A. Kröller, M.
The factor we get is from researching the development of Moeini and C. Schmidt, "Algorithms for art gallery illumination,"
Online Art Gallery and comparing several existing web Journal of Global Optimization, pp. 23-45, 2017.
designs in getting the data by using questionnaires. Where [22] T. Jung, M. Claudia Tom Dieck, H. Lee and N. Chung, "Relationships
long story short, we found out from the questionnaires using among beliefs, attitudes, time resources, subjective norms, and

124 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

intentions to use wearable augmented reality in art galleries," [25] Z. I. Paramarini Hardianto and Karmilasari, "Analysis and Design of
Sustainability (Switzerland), pp. 1-17, 2020. User Interface and User Experience (UI / UX) E-Commerce Website
[23] C. Adhitya, R. Andreswari and P. F. Alam, "Analysis and Design of UI PT Pentasada Andalan Kelola Using Task System Centered Design
and UX Web-Based Application in Maiproyek Startup Using User (TCSD) Method," Proceedings of 2019 4th International Conference
Centered Design Method in Information System Program of Telkom on Informatics and Computing, ICIC 2019, 2019.
University," IOP Conference Series: Materials Science and [26] O. A. Supriadi, "User Interface Design of Mobile-based Commerce,"
Engineering, 2021. IOP Conference Series: Materials Science and Engineering, 2019.
[24] M. Claudia tom Dieck, T. Jung and D. tom Dieck, "Enhancing Art [27] O. Murtiyoso, M. R. Athian, M. Mujiyono, I. Ichsan and S. Adi,
Gallery Visitors’ Learning Experience using Wearable Augmented "Design of gallery web-based space booking system as media service,"
Reality: Generic Learning Outcomes Perspective," 2016. IOP Conference Series: Materials Science and Engineering, 2021.

125 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Covid-19 Vaccine Tweets - Sentiment Analysis

Naufal Rifki Fauzan Daniel Alexander


Computer Science Department Computer Science Department
School of Computer Science School of Computer Science
Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia 11480 Jakarta, Indonesia 11480
[email protected] [email protected]

Muhammad Siraz Hafizh Maria Susan Anggreainy


Computer Science Department Computer Science Department
School of Computer Science School of Computer Science
Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia 11480 Jakarta, Indonesia 11480
[email protected] [email protected]

Abstract—The primary goal of this study is to identify The Covid-19 pandemic has already handed us plenty of
various kinds of tweets that are shared on Twitter in order to hard times and given us many impacts on life in many ways
learn about how people really feel about this Covid-19 both educationally, financially, emotionally, and many
vaccination and also to analyze its impact on society. Social more. There are already many people who lost their jobs,
media is the key source where we can learn about people's businesses, and maybe even their lives and loved ones
feeling, reactions about Covid-19. The secondary goal is to because of this pandemic. Despite many efforts, there are
identify the main types of vaccine hesitant tweeters, why they still people suffering because of the virus, so it has made
are against the vaccination program and is it bad or not. This clear that this pandemic is extraordinary and people should
may give additional ideas on the types of Twitter users that
not treat it lightly[4]. In this Covid-19 pandemic, people all
refuse to get vaccinated information. One of the social media
capabilities is to track the condition of public health, but the
around the world are forced to take some serious actions in
purpose here is not primarily to identify vaccine concerns, order to adapt with the pandemic such as social distancing,
since these have been revealed in previous surveys, but to work from home, and there are also some Covid-19
describe the information shared on Twitter, precisely from protocols that people need to comply with[5]. Most people
tweets that are posted by the masses, because of the risk that it are now staying in their home because they need to follow
is spreading hesitancy. Similarly, although this study is not Covid-19 protocols, thus they will also have a lot more time
about politics, vaccine hesitancy has political dimensions that to be on the internet because most people don’t need to go
will be explored when it is needed to. to their offices to work now.

Keywords—Classified, Sentiment Analysis, Covid-19 In early 2021, news about Covid-19 vaccination
Vaccine, Collected, Tweets (key words) programs were quickly spread across the internet and twitter
played a big part of it. Interestingly, the news received a
I. INTRODUCTION (HEADING 1) mixed response from the masses, there some people who are
positive about the programs and also there are some people
Covid-19 has had a significant impact on the life of
who are against the programs. The vaccination programs are
everyone, it is very important to know the way other people
also being targeted by some people that are reluctant to be
interact on public health interventions and understand their
vaccinated, these people usually have their own reasons to
worries [1]. The COVID-19 pandemic has caused two
not believe in vaccination [6]. This phenomenon where
million people killed and 93 million around the world
many people are getting cautious and growing disbelief in
infected as of mid-January, 2021. It is estimated that around
vaccination, without a doubt causes a big hit in vaccination
60-70% of the population will need to be vaccinated against
progression, which is not a good situation to be in, even
COVID19 to achieve herd immunity so that virus spread can
WHO states that this is one out of ten biggest risks in global
be effectively suppressed (Aguas et al. 2020) [2]. However,
health.
recent surveys have found that only 40-60% of American
adults reported that they would take a COVID-19 vaccine Professor of Oregon State University Pharmacy in the
(Funk and Tyson 2020; Hamel, Kirzinger, and Brodie United States, Prof. Taifo Mahmud, explained that the
2020). With these currently predicted levels of vaccine vaccine manufacturing technology platform had been
hesitancy, it is unlikely it will reach herd immunity; developed for a long time. Despite using different platforms,
COVID-19 will remain endemic in the population. the development of the vaccine still has the same goal,
namely to fight the Covid-19 virus.
Before there is a vaccine, the government around the
world applies regulations to maintain a distance, face mask, There are two main platforms of vaccine development
hand washing or hand hygiene, isolation, and quarantine. technology, namely classic and latest. The classical platform
This incident is very similar to the Ebola virus, many people has produced a variety of vaccines, classical platforms are
use social media to spread news, information, opinions and widely used including vaccines developed from intact
emotions. So social media is very useful for knowing how viruses. There are many vaccines through this technology,
someone's reaction and opinions are [3]. such as polio, rabies, to hepatitis A vaccine. While in the

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

126 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Covid-19 vaccine, this technology is used for the called descriptive analysis. This method can be used for
development of Sinovac and Sinopharm vaccines. processing quantitative data. This method is performed to
track the data’s performance in the past so we can conclude
Then, the vaccine of the weakened virus, to vaccines it (can be seen in Fig 1).
from recombinant or protein subunit on the virus, one of
which is the Covid-19 Anhui vaccine product. There are
also vaccines that use Virus-Like Particles (VLP)
technology or substances with a structure similar to viruses,
but do not have the genome of the virus.
Prof. Taifo continued the development of Covid-19
vaccines including vaccines with adenovirus, or utilizing
other viruses such as developed astrazeneca, JANSEEN, and
GAMALEYA. There are also Covid-19 vaccines from
MRN technology, such as vaccines developed by Moderna,
Pfizer, and Curevac. For now Indonesia is developing
Antigen-Presenting Cells (APC)[7].
However, people still needed to be connected to each
other in this pandemic, so they also need a platform to do
that, which Twitter provides. With a large proportion of the
population currently hesitant to take the COVID-19 vaccine,
it is important that people have access to accurate
information. However, there is a large amount of low-
credibility information about vaccines spreading on social
media, including Twitter, that has successfully grown some
disbelief in many people in taking the vaccine. Twitter is
one of the biggest and frequently used networking services,
where its users can interact with each other with messages
known as “tweets”. With Twitter, people can express their
opinions, feelings, and also their concerns about the
pandemic, eventually, Twitter is also used for many people
to stay up-to-date on whatever is happening around the
globe while being quarantined [8].
The reasons why some people hesitate to be vaccinated
are mainly known, some believe that the vaccine research is
rushed and needs more progress, or maybe they just don’t
believe in vaccines in general [9]. Another reason why
people doesn’t want to get vaccinated is because they’re
convinced by a number of fake news about the vaccination
program that are roaming in the internet, for instance, there
are some information which states that Covid-19 severity
isn’t as severe as people might think, there are also some
fake information that states the vaccination programs are
part of an experimental project worked by the government,
said that they mixed some kind of microchip into the
vaccines. Although this kind of information is never proven Fig. 1. Data gathering steps
to be true, some people still hang onto it and are resistant to
scientific explanations and arguments. This disbelief in This method emphasizes the description that makes it
Covid-19 in general isn’t only limited to only refusing possible for us to learn from the past. Descriptive analysis
vaccines, some people actually do refuse to use facemask has two different processes which are description and
because they are really not that concerned about Covid-19 interpretation. This kind of method is usually applied for
and refuse to believe in facts about the pandemic that are data with a very big volume such as census data. For
mostly generated by scientists and experts[10]. instance, by us observing the Covid-19 vaccine from
Twitter, we can collect a big amount of data that can be used
One solution for this matter is, as a modern citizen and to analyze covid-19 vaccine progression, whether it is ready
good internet user, it is a must to always filter all kinds of to be disseminated or still in the development stage, so we
information that comes from the internet. In this day and can always be up-to-date on Covid-19 vaccine progressions
age, there is so much fake news spread around the internet, [11].
so people need to ensure that every information they get are
from valid sources and can be trusted fully.
III. VACCINE INJECTION STAGE
II. RESEARCH METHODOLOGY Vaccination is the process of injecting a vaccine in the
body so that a person becomes more immune and protected
The data used in this work for analysis is from Twitter
over 1 month, between 11 April until 2 Mei. The method is from certain diseases, especially if they are not exposed to

127 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

the Covid-19 virus. By receiving the vaccine, this person Although the Covid-19 vaccine has so many negative side
will have milder symptoms than people who have not effects, getting vaccinated also has plenty of positive
received the vaccine. In addition to carrying out health impacts, for example, we can decrease the number of
protocols by running 3M and avoiding crowds, this is a way confirmed cases of Covid-19, promote herd immunity, and
for us not to be exposed to the virus by carrying out the dampen the economic impact[15].
Covid-19 vaccine that has been administered by the
government. In order for the Covid-19 vaccine to be more
optimal, the injection stage will be carried out 2 times. Of V. RESULT
course, it is necessary to carry out the injection of the This analysis uses the python method with the recent
vaccine according to the recommended dose. The fi rst changes to Twitter’s API, we create the easiest developer
stage of injection is for 14 days and the vaccine will work account to extract the data rather than using twint or twitter-
around 60%, after 28 days of the first injection, then the scraper. There are 5 steps we need to take.
second injection is carried out so that the vaccine given can - First, login to twitter API
work optimally. That is why two injection doses are taken to - Second, create a query to extract tweets from the Twitter
optimize the benefits of the vaccine. There are several API. During this process we do limitation for the API
requirements for giving the stage 2 vaccine, if sick and the because only free dev accounts can extract tweets
requirements cannot be met, the injection cannot be given to every 3 hours to solve this and extract a larger sample
that person [12]. Through vaccines, the body will be size, we ran the same query on for 3 days. After the
protected by giving an antigen disease in the form of a virus tweets are extracted, the output will be converted into a
that is dead or weakened, this aims to help the body's data frame and saved as a CSV file.
immune system detect and fight viruses[13]. - thirtd, cleaning and manipulating the data. During this
process, we analyze information such as date, user id,
and other metadata that is irrelevant. Therefore, when
IV. WHAT HAPPENS IF WE HAVE A VACCINE? loading the CSV file into Python, we chose text about
the user's tweet. After the data is manipulated to sel-ect
There will be side effects after injecting the covid-19 only user tweets, the data needs to be cleaned and
tokenized as it contains some unnecessary characters.
vaccine so that everyone will experience a different reaction
Some function is created and implemented to remove
after their body receives the Covid-19 vaccine, because that
all ‘character’ like @, #, and link from the data frame.
indicates the immune system is working to develop the - fourth, determining sentiment. When the data was
ability to fight the Covid-19 virus. If you don't experience cleaned, a function was created to determine the
side effects after the injection, that's okay, because that's subjectivity and polarity. Polarity is very important
how everyone receives the vaccine[14]. The following are because it evaluates the emotion expressed in tweets
some of the side effects of vaccination, including: and provides a numerical value. A polarity more than 0
- The arm hurts, it could hurt because after injection indicates positive sentiment, a polarity of exactly 0
of the vaccine through the arm when the needle is equals a neutral sentiment, and a polarity less than 0
inserted the arm will also experience redness and indicates a negative sentiment. After polarity and
swelling at the spot. The solution for this is to put subjectivity are calculated, a new column is created
something cold, like compressing the injected area and applied to categorize each tweet into 3 type
using ice cubes. categories.
- Fever, this effect appears after injection of the - fifth, summarizing and visualizing the data. Finally, the
second vaccine. Because the immune system is data is visualized using a bar chart and calculations are
more ready than when it first received the vaccine, performed to determine the exact percentage of
so the system works higher. positive, negative, and neutral tweets.
- Fatigue, experiencing systemic effects such as
headaches and muscle aches which will be more
common in women and people aged 55 years and VI. COVID-19 CASES IN INDONESIA
under. The solution is to get plenty of rest. In Indonesia, confirmed Covid-19 cases have only
- Headache, which occurs when the second vaccine increased from May 2020 and peaked at January 2021, since
is received. The solution to this is simply taking then the numbers have started to decrease[16]. Some factors
pain medication. that can also affect the spike in the Covid-19 cases are the
- Nausea occurs when the second vaccine is ignorance of many people about the virus itself, lack of
received. The solution to this is to stay well Covid-19 protocols knowledge, and vaccine hesitancy.
hydrated and take medication for nausea.
- Muscle pain, The solution to this is to take the
same pain reliever medication for headaches if the
taste gets worse.
- Swollen lymph nodes, this is different from the
previous effect because this swelling takes longer
to disappear, possibly up to a week. Experts
recommend delaying swelling checks for 4-6
weeks after the Covid-19 vaccine is injected. The
solution is to take pain relievers.

128 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

19_Public_Sentiment_Insights_A_Text_Mining_Approach_to_the_G
ulf_Countries

[4] Kausar, M. A., Soosaimanickam, A., & Nasar, M. (2021, January).


Public Sentiment Analysis on Twitter Data during COVID-19
Outbreak. ResearchGate.
https://www.researchgate.net/publication/349749250_Public_Sentime
nt_Analysis_on_Twitter_Data_during_COVID-19_Outbreak

[5] Gustomy, R. (2020, September). Pandemic to Infodemic: Political


Polarization in the Covid-19 Discourse on Twitter Users.

[6] Bonnevie, E., Goldbarg, J., Byrd, B., Smyser, J., & Gallegos-Jeffrey,
A. (2020, December 15). Quantifying the rise of vaccine opposition
on Twitter during the COVID-19 pandemic. Taylor and Francis
Online. Available:
https://www.tandfonline.com/doi/full/10.1080/17538068.2020.18582
22

Fig. 2. Recorded Covid-19 cases in Indonesia [7] Maulana, A. (2021, April 13). Mengenal Beragam Teknologi
Pembuatan Vaksin Covid-19. Universitas Padjadjaran.
https://www.unpad.ac.id/2021/04/mengenal-beragam-teknologi-
From the chart above (Fig 2), we can see how rapidly the pembuatan-vaksin-covid-19/
cases grow.
[8] Chakraborty, K., Bhatia, S., Bhattacharyya, S., Platos, J., Bag, R., &
Hassanien, A. E. (2020, July 23). Sentiment Analysis of COVID-19
tweets by Deep Learning Classifiers—A study to show how
VII. CONCLUSION popularity is affecting accuracy in social media. ELSEVIER.
https://www.sciencedirect.com/science/article/abs/pii/S15684946203
0692X?via%3Dihub
Covid-19 pandemic is classified as a dangerous virus for [9] Thelwall, Mike; Kousha, Kayvan; Thelwall, Saheeda (2021). “Covid-
humans, there have been more than 100 million people 19 vaccine hesitancy on English-language Twitter”. Profesional de
infected with Covid-19, and more than 10 million people die la información, v. 30, n. 2, e300212. Available:
https://revista.profesionaldelainformacion.com/index.php/EPI/article/
from Covid-19. Until now, there is no real “cure” for Covid- view/86322/62939
19 virus, but it can be resisted using the Covid-19 vaccine.
However, on social media such as Twitter, there are a lot of [10] Bania, R. K. (2020, December). COVID-19 Public Tweets Sentiment
Analysis using TF-IDF and Inductive Learning Models.
pros and cons, and not a few people are hesitant about the ResearchGate.
Covid-19 vaccine. Studies in the United States proved that https://www.researchgate.net/publication/346572350_COVID-
19_Public_Tweets_Sentiment_Analysis_using_TF-
there were only around 40-60% of adults who wanted to IDF_and_Inductive_Learning_Models
receive Covid-19 vaccines, even though to reach the HERD
Immunity this figure needs to reach 60-70%, with this [11] T. Annisa, “6 kinds of data analysis methods that need to be known”
10 January 2021. Available :
situation, it will be difficult to achieve Herd Immunity and https://www.ekrut.com/media/macam-macam-metode-analisis-data
Covid- 19 is expected to be endemic. Until now, many
people still show the nature of rejection of the Covid-19 [12] Must know, this is the importance of injecting the COVID-19 vaccine
vaccine. phase 2. (2021, February 22). Halodoc.
https://www.halodoc.com/artikel/harus-tahu-ini-pentingnya-
penyuntikan-vaksin-covid-19-tahap-2
That is, the Covid-19 vaccine can protect a person's body
from Coronavirus infection. Based on the third phase [13] Hung, M., Lauren, E., Hon, E. S., Birmingham, W. C., Xu, J., Su, S.,
clinical trial, the value of efficacy (the effect of protection Hon, S. D., Park, J., Dang, P., & Lipsky, M. S. (2020). Social
Network Analysis of COVID-19 Sentiments: Application of Artificial
against Covid-19) was 79.34%. This value has exceeded Intelligence. Journal of Medical Internet Research, 22(8), e22590.
minimal efficacy standards (by 50%) which WHO set. Not https://doi.org/10.2196/22590
only that, if you have a Coronavirus, vaccines can help
[14] No need to worry, these 7 side effects are fair after covid-19 vaccine.
prevent your body from potentially more serious pain. Or (2021, May 9). Kompas.Com.
you could say "not too severe than vaccine users'' because https://lifestyle.kompas.com/read/2021/05/09/102306820/tak-perlu-
the antibodies owned are stronger compared to people who khALAH-7-efek-samping-ini-wajar-usai-divaksin-covid-19?page=all
do not vaccinate. Through vaccines, you don't just protect [15] Mengetahui Manfaat Vaksin COVID-19 dan Kelompok Penerima
yourself, but also help protect others from exposure to the Prioritasnya. (2021, January 9). ALODOKTER.
Coronavirus. https://www.alodokter.com/mengetahui-manfaat-vaksin-covid-19-
dan-kelompok-penerima-prioritasnya

[16] Covid-19, W. R. P. (2021). Peta Sebaran | Covid19.go.id.


Covid19.Go.Id. https://covid19.go.id/peta-sebaran
REFERENCES

[1] Jang, H., Rempel, E., Carenini, G., & Janjua, N. (2020, July 5).
Exploratory Analysis of COVID-19 Related Tweets in North America
to Inform Public Health Institutes. ArXiv.
https://arxiv.org/abs/2007.02452

[2] DeVerna, M. R., Pierri, F., Truong, B. T., Bollenbacher, J., Axelrod,
D., Loynes, N., Torres-Lugo, C., Yang, K.-C., Menczer, F., &
Bryden, J. (2021, February 22). CoVaxxy: A global collection of
English-language Twitter posts about COVID-19 vaccines.
Available:Arxiv. Available:
https://arxiv.org/pdf/2101.07694.pdf

[3] Albahli, S., Algsham, A., Aeraj, S., & Alsaeed, M. (2021, February).
COVID-19 Public Sentiment Insights: A Text Mining Approach to
the Gulf Countries. ResearchGate.
https://www.researchgate.net/publication/349120785_COVID-

129 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Image Data Encryption Using DES Method


Artha Bastanta Ramadhany Nuryansyah Christian Aditya Nugroho
Computer Science Department Computer Science Department Computer Science Department
School of Computer Science, Bina School of Computer Science, Bina School of Computer Science, Bina
Nusantara University Nusantara University Nusantara University
Jakarta, Indonesia 11480 Jakarta, Indonesia 11480 Jakarta, Indonesia 11480
[email protected] [email protected] [email protected]
d

Widodo Budiharto
Computer Science Department
School of Computer Science, Bina
Nusantara University
Jakarta, Indonesia 11480
[email protected]

Abstract The password algorithm block is used for encryption and


decryption purposes and the message is divided into blocks
Data encryption has been considered as a way to secure of bits. DES processes input data (original message) 64-bit
data that is stored either on personal computers or on the block size and 64-bit secret key to provide a 64-bit text
internet such as cloud storage and cloud computing. cipher.
Because advances in data storage technology, the main
problem that is often faced is data security, so we apply the
concept of data encryption using the DES method. by II. LITERATURE REVIEW
analyzing using the DES method is the safest data
Technology improvements make people more dependent
encryption process for safeguarding data stored on personal to the internet, nowadays everything is on the internet, from
computers and on the internet. Based on the application general things, to our personal data available on the internet.
that has been designed using the DES Algorithm with input Without us realizing, there are a lot of important data that
in the form of an image, it can successfully encrypt the enters the internet. It can happen by storing and sending some
image. sensitive information over the internet.
Keywords—Encryption, DES Algorithm, Data Security, But of course, we feel benefited by the internet, but there
Cloud Storage, Cloud Computing are also disadvantages of these technological improvements
like this, alongside of the technological improvement, that is
I. INTRODUCTION the numbers of cybercrimes are also increasing and can
The rapid growth of today’s network technologies has endanger our data on save the internet.
taken the general practice of exchanging data to extremes.
Cryptography is the science and technique of securing data
Therefore it is more unprotected and there can be data
from third parties. In cryptography there are terms encryption
duplication and data hacking by hackers.
and decryption. Encryption is changing the initial text (plain
Therefore, when there is important information sent, it text) into encrypted text (cipher text) using a key. Meanwhile,
must be carried out securely, while confidential information decryption is changing the encrypted text (cipher text) back
such as ATM cards, credit cards, banking transactions, into the initial text (plain text) using the same key, in this case
information personal, company customer information, and using symmetric-key cryptography.
digital right management are required to be protected. For
security from unauthorized user encryption, encryption There are three types of cryptography:
techniques are used to avoid hacking of information. For data • Symmetric key cryptography.
security in wireless communication encryption techniques Cryptography that uses the same key for encryption and
play an important role because wireless communication is decryption. Symmetric cryptographic algorithms are divided
used online and must be secure. into 2 categories, namely stream ciphers and block ciphers. In
Different encryption techniques are used to protect the flow algorithm, the encoding process is oriented to one bit
Confidential data from unauthorized use. Most of the or one byte of data. While in the block algorithm, the encoding
encryption has an effective means of achieving secure data. process is oriented to a set of bits or data bytes (per block).
The evolution of encryption is moving towards a future with Examples of symmetric key algorithms are DES (Data
limitless possibilities. Every day new methods of encryption Encryption Standard), blowfish, two fish, MARS, IDEA,
techniques are discovered. one of them is the Data encryption 3DES (DES was applied 3 times), AES (Advanced Encryption
Standard. Data encryption standards use cryptographic Standard) whose real name is Rijndael.
algorithms that can be used to protect electronic data. There
are three standard encryption methods they are symmetric
cryptography, asymmetric cryptography and hash function.
The DES algorithm makes use of a symmetric cryptograph.

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

130 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

• Asymmetric cryptography
With the example picture above, it can be seen that
Cryptography that uses different keys for encryption and
DES takes input and output of the same size, namely 64 bits,
decryption. The encryption key can be shared publicly and is
called the public key, while the decryption key is kept for both processes must require a secret key in order to encrypt
private use and is called the private key. Therefore, this and decrypt the secured data.
cryptography is also known as public key cryptography By encrypting plaintext based on the developed
(public key cryptography). Examples of well-known standard, there is some important data used for encryption
algorithms that use asymmetric keys are RSA (Rivest Shamir and decryption which is known as key. And when encrypting
Adleman) and ECC (Elliptic Curve Cryptography). data, the algorithm used is the Standard Algorithm.

The following is a table of the Blocks DES Algorithm:


• Hybrid cryptography Table 1. Description of Blocks DES Algorithm
Cryptography that utilizes two levels of keys, namely a Initial Description
symmetric secret key with one key (session key) and IP Initial Permutation
asymmetric encryption with a pair of keys (public/private
key). good so that the network bandwidth used is relatively IP-1 Inverse Permutation
small
PC1 Permuted Choice-1
By using DES (Data Encryption Standard) we can increase
data security. DES are standard encryption methods for data, PC2 Permuted Choice-2
which can make our data more secure while being on the E Expansion Permutation
internet. With these many experiments and all developments
towards data encryption standards, all of this aims to P Permutation
strengthen and improve the already existing algorithm system,
so that the number of cybercrimes can be reduced.
By following the standard, the input taken is 64 bits,
then the initial permutation will be carried out after 16 rounds
III. PROPOSED METHODS with the input totaling 64 bits, and finally the results of the
last round will be given to the inverse initial permutation and
Encryption of data is used to protect data on electronic there are four types of locks for each 16-round operation:
goods, so that the security of data on an item can be ensured
1. Shift
to be safe. In this research, we propose to use the Data
Encryption Standard method which is the encryption standard 2. Alternating Box (S-box)
for data that you want to secure. The algorithm used by Data 3. Instantiation
Encryption Standard uses a symmetric block cipher which 4. Permutations
aims to encrypt and decrypt data. Encrypting a data is
converting data into cipher text while the cipher text The following is a block diagram of the DES Algorithm:
description gives us access to return to the original data,
namely plain text.

Fig. 2. Top Level Block diagram of DES Algorithm

Fig. 1. Representation of Encryption and Decryption Data

131 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

IV. EXPERIMENTAL RESULTS 4. Decrypt Menu


Application Design
This experiment is data encryption of an image
using the DES encryption method. This encryption application
with the DES algorithm is made using the Java programming
language. The length of the key entered must be 8 bytes (8
characters).

User Interface
The following is a display of the application created
using the NetBeans IDE.
1. Start Menu Fig. 6. Decryption Menu showing choose file and
encryption results

5. Choose File

Fig. 3. First Menu on Application encryption and


decryption

2. Main Menu

Fig. 7. Show what file you want to Decryption

6. Flowcharts Application

Fig. 4. Choice your Encryption Menu or Decryption


Menu

3. Encrypt Menu

Fig. 8. Flow of testing encryption file and


decryption file
Fig. 5. Encryption Menu showing choose file and
encryption results

132 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

7. Pseudocode 8. Math Formula


Load_Encryption_Decryption To Encryption and Decryption, a data used a Math
Formula:
Start_Application
Choose Menu Encryption or Menu Decryption
= −1
IF “Menu Encryption” THEN
, for Kn
Read_Image_File
= −1+ − 1,
FOR i = 0 to bytearray.length THEN
BEGIN
Table 2. Testing Application with Blackbox testing
Set Flag to 0
IF bytearray[i] < 0 THEN No Scenario Expected Conclusion
Testing Result
BEGIN
1 Clear the image The system Valid
Pos = - bytearray[i] and key will reject the
selection fields, Encrypt
Flag = 1
then click the process and
ENDIF “Encrypt” issue a pop-up
button dialogue file
IF Flag = 1 THEN not found
Cipher[i] = -Cipher[i] 2 Empty the key The system Valid
ENDIF after selecting refuses the
the image to be encryption
PRINT Cipher Text from Image encrypted, then process and an
ENDFOR click the Empty Key
“Encrypt” dialogue pop-
ENDIF button up appears
3 The image field the system Valid
ELSE IF “Menu Decryption” THEN is filled and the refuses the
key length is not encryption
Read_Cipher_Text equal to 8 bytes, process and a
Set 64cbitimglst to Initial_permutation(ll) then click the pop-up
“Encrypt” dialogue
FOR ll in 64cbitimglst THEN button Wrong Key
Cipherimg64bits = Decrypt Size appears
(64bitimglst, i) 4 The image field the system has Valid
Cipherimg64bits = Encrypt is filled and the successfully
(64bitimglst, j) key length is encrypted the
equal to 8, then image and a
Cipherimg64bits = Decrypt click the pop-up
(64bitimglst, k) “Encrypt” dialogue
button appears the file
ENDFOR
encrypted
Img64bits = Inverse successfully
(Finalpermutation(img64bits))
5 Clear the image the system Valid
FOR img in img64bits and key rejects the
selection fields, decrypt
Imgr = imgr + img then click the process and a
ENDFOR “Decrypt” pop-up
button dialogue
PRINT imgr from Cipher_Text appears File
ENDIF not found
END 6 The image field The system Valid
is filled, but the rejects the
key is wrong Decrypt
process and
will not
display the

133 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Decrypted [7] Angga, A. P., & D. N, “Application of Cryptography with


image Data Encryption Standard (DES) Algorithm in Picture”,
2020.
7 The image field The system Valid
is filled and the successfully [8] Christy, A. S., & E. H, “Cryptography Triple Data
key is correct decrypts the Encryption Standard (3DES) For Digital Image
image and Security”, 2018.
brings up the [9] Devi, S., & Kotha, H, “AES Encryption and Decryption
pop-up Standards”, 2019.
dialogue The
Image Was [10] Dr. R. Sugumar, & Joycee, K. M, “DSCESEA: Data
Encrypted Security in Cloud Using Enhanced Symmetric
Successfully Encryption Algorithm”, pp. 292-295, 2017.
and displays
the original [11] Er. Manpreet Kaur, & E. J, “Data Encryption Using
image Different Techniques: A Review”, 2017.
[12] Euphrasia, K., & Rani, M. S, “Data Security Through QR
Code Encryption and Steganography”, pp. 1-7, 2016.
V. CONCLUSIONS [13] Hamouda, B, “Comparative Study of Different
The Standard Data Encryption Method (DES) is very Cryptography Algorithms”, 2020.
useful when important data that we have wants to be secured. [14] He, J. (., Peruri, R., & Vishwanath, A, “Security Fog
By performing 16 rounds of operations, the plaintext carried Computing through Encryption”, pp. 28-36, 2016.
can produce ciphertext. In the DES implementation, the
resulting output can be the same as the specified algorithm. [15] Hoobi, M, “Efficient Hybrid Cryptography Algorithm”,
Therefore the Data Encryption Standard (DES) has a decent 2020.
level of security because it performs 16 rounds of operations, [16] Kaddoum, G., Nguyen, L. P., Nguyen, M., & Nguyen, N,
making it difficult for malicious or unauthorized parties to “A Low Power Circuit Design for Chaos-Key Based Data
attack and crack the encryption code. Encryption”, pp. 104432-104444, 2020.
Based on the application that has been designed using the [17] Kynshi, M. L., & V Jose, D, “Enhanced Content Based
DES Algorithm with input in the form of an image, it can Double Encryption Algorithm Using Symmetric Key
successfully encrypt the image. The key length for the DES Cryptography”, pp. 345-351, 2017.
algorithm must be 8 bytes or 8 characters long, because DES
is a 64-bit block cipher cryptographic algorithm. While the [18] Mushtaq, M. F., & S. J, “A Survey on The Cryptographic
key used to decrypt is the same key as the key when Encryption Algorithms”, 2017.
encrypting.
[19] Muttaqin, K., & Rahmadoni, J, “Analysis and Design of
The weakness of the DES algorithm itself is, when an File Security System AES (Advanced Encryption
encrypted image is opened it will read Image not supported, Standard) Cryptography Based”, 2020.
which will raise suspicions on third parties. In addition,
[20] Pasaribu, H., Sitanggang, D., Damanik, R., & Sitompul,
because the keys used are the same, the keys can leak.
A, “Combination of Advanced Encryption Standard 256
bits with md5 To Secure Documents on Android
Smartphone”, 2018.
VI. REFERENCES
[21] Rao, S., Mahto, D., & Dr. Khan, D, “A Survey on
[1] A. H. Raut, D. P. Gadekar, & N. P. Sable, “Exploring Data Advanced Encryption Standard”, 2015.
Security Scheme into Cloud Using Encryption
Algorithms”, pp. 2271-2273, 2019. [22] Ratnadewi, R, “Implementation Cryptographic Data
Encryption Standard (DES) and Triple Data Encryption
[2] Abdullah, A, “Advanced Encryption Standard (AES) Standard (3DES) Method in Communication System
Algorithm to Encrypt and Decrypt Data”, 2017. Based Near Field Communication (NFC)”, 2018.
[3] Abdullah, D., Fitri, Z., H Harun, M Malahayati, Rahim, [23] Riaz, I., Rasheed, U., & Soofi, A. A, “An Enhanced
R., Utama Siahaan, A., & Ulva, A. F, “Super-Encryption Vigenere Cipher for Data Security”, pp. 141-145, 2016.
Cryptography with IDEA and WAKE Algorithm”, pp. 1-
5, 2018 [24] Ruzqi, F., Sihombing, P., & Sawaluddin, “Combination
Analysis of Data Encryption Standard (DES) Algorithm
[4] Adhar, D, “Implementasi Algorithma DES (Data and LUC Algorithm on File Security”, 2020.
Encryption Standard) Pada Enskripsi dan Deskripsi SMS
Berbasis Android”, 2019. [25] Saikumar, I, “DES-Data Encryption Standard”, 2017.
[5] Ahmed, K., & Henawy, I, “Increasing Robutness of Data [26] Siddaiah, N., Moduli, A., Bhargavram, K., Krishna, P.,
Encryption Standard by Integrating DNA Cryptography”, Rakesh, B., & Ganesh, G, “A Novel Approach to
2017 Modified Advanced Encryption Standard Algorithm”,
2020.
[6] Amiruddin, A., Kabetta, H., & Ramadhani, E. H,
“Exploration of the Security of Free Data Encryption
Applications for Cloud Storage”, pp. 1-6, 2020.

134 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

[27] Soufiane Oukili, S, “High Throughpout FGPA


Implementation of Data Encryption Standard with Time
Variable Sub-Keys”, 2017.

135 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)
Systematic Literature Review: An Intelligent Pulmonary TB
Detection from Chest X-Rays

Jimmy Tjeng Wawan Cenggoro Bens Pardamean


Computer Science Department, Computer Science Department, Bioinformatics and Data Science
BINUS Graduate Program – School of Computer Science Research Center
Master of Computer Science Program Bina Nusantara University Bina Nusantara University
Bina Nusantara University Jakarta, Indonesia 11480 Jakarta, Indonesia 11480
Jakarta, Indonesia 11480 [email protected] [email protected]
[email protected]

Abstract—Tuberculosis (TB) is one of the top ten reasons for Network (CNN) [1]. Therefore, to date, conventional machine
death from an infectious agent. Although TB is curable and learning approaches are no longer use for image classification.
preventable, delay in diagnosis and treatment can lead the The CNN's superior performance compared to other
patient to death. Advancements in computer-aided diagnosis traditional recognition algorithms and the ability to extract
(CAD), particularly in medical images classification, features from images makes CNN the first choice to solving a
significantly contribute to early TB detection. The current state- complex medical image classification problem. In pulmonary
of-art CAD for medical images classification applications using TB detection from X-Ray Images, CNN methods have proven
a method base on deep learning techniques. The problem faced very effective and achieved a range of high-quality diagnostic
in this deep learning technique is that, in general, it only uses a
solutions. However, in modern medical practice, especially in
single modal for the model. In contrast, in medical practice, the
data used for TB analysis not only focuses on images but also
TB detection, use images as the only input source is not
includes clinical data such as demographics, patient commons. There are other clinical data used, such as lab
assessments, and lab test results. This systematic literature results, patient demographics, and patient assessments.
review describes different deep learning methods using single Fortunately, the CNN model can be combined with other
modal or multimodal techniques that combined images with models that process clinical data inputs other than images.
other clinical data. We conducted a systematic search on Several researchers have successfully applied this model with
Springer, PubMed, ResearchGate, and google scholar for promising performance improvements. Therefore, this
original research leveraging deep learning for Pulmonary TB systematic review presents the current CNN method with
detection. various additional techniques to increase model performance,
such as augmentation, segmentation, transfer learning, and a
Keywords—tuberculosis, deep learning, transfer learning, multimodal approach that uses CNN along with other clinical
CNN, CAD, single modal, multimodal data.
I. INTRODUCTION II. REVIEW METHODOLOGY
Tuberculosis (TB) is a global health problem. According A search strategy was done by identifying recent related
to the WHO Global Tuberculosis Report 2020, TB is a published articles from Google scholar, Springer link, and
significant cause of ill death and one of the leading top 10 PubMed for recent related studies for the data sources. There
causes of death worldwide. Indonesia itself is the second- were many approaches for literature review. One of them is
largest endemic with 8.5 % of the global total. TB was the through systematic literature review (SLR). This approach is
leading cause of death in some developing countries. divided into four stages, starting from determining the
Therefore the need to improve TB diagnosis and detection is database source and conducting queries based on research
clear. topics as used by Maniah et al. [1].
One of the best ways to diagnose TB is through a sputum The first stage is defined the research question. The main
research question in this study is “what are the methods for
culture test. Still, this kind of test could take 1 to 8 weeks to
pulmonary TB detection?”. Based on the research question,
provide results. Therefore, needed an early diagnosis to queries were determined as input to multiple data sources used
increase treatment success. Through medical imaging and in this study.
deep learning methods, a radiologist could examine patient The second stage is identifying titles by applying inclusion
lung images taken from an X-Ray machine to detect TB with criteria and exclusion criteria. The inclusion criteria are paper
high accuracy and time efficiency. using artificial intelligence (AI) method or chest X-Ray
Compared to a sputum examination, CAD has a short images, and exclusion criteria are any paper not within
detection time and can be classified once the image is inserted inclusion criteria. Several words are using for a query, such
into the application. There are several algorithms applied to as: “Pulmonary Tuberculosis,” “Deep Learning,”
CAD for image classification, especially for TB detection. For Convolutional neural network,” “Detection,” “Diagnosis,”
instance, conventional machine learning algorithms such as “Ensemble,” “Multimodal” combined with “AND” and “OR”
Simple Linear Regression, k-Nearest-Neighbours (kNN), in the search string. We included all published studies that use
Sequential minimal optimization, Support Vector Machine deep learning techniques to analyze and classified TB from
(SVM), to name but a few or more advanced techniques called Chest X-Ray images. We also included some research on
deep learning. However, conventional machine learning has a biomedical photos using deep learning techniques.
limitation when extract differentiating features from the The third stage is a review by reading abstract content and
training set of data. This limitation has been covered with keywords. Again, only content related to pulmonary TB using
advances in deep learning, especially in Convolutional Neural deep learning or convolutional neural network method and

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

136 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

several other techniques such as ensemble, transfer learning, using multimodal techniques consider images and
and multimodal were selected. We further select studies based demographics variable as an input. Another study by
on the methods were used and accuracy performance as well. Yahiaoui et al. [3] using SVM with 38 properties extracted
After content filtering, we conducted an assessment of the from patient discharge summary, this method obtains an
paper by reading and rereading the whole study. Diagnostic Accuracy of 96.68%.
accuracy measures including sensitivity, specificity, and AUC Unlike previous researchers who used the conventional
were reported when available. We also reported the type of machine learning method, in this research,
dataset used and how the algorithm of the proposed study Sathitratanacheewin et al. [4] developed a TB detection
method was executed. The following is done by determining
model that uses a deep learning method. Shenzhen and
a task following the research topic and then extracting specific
ChestX-Ray8 [5] datasets were used in this study. The
details such as author and year published, input data type,
approach, datasets, and model performance, then summarized Shenzhen dataset contains 662 X-ray images with 336
in one research report. The whole process of this literature patients confirmed positive, and 326 patients proved negative
review is shown in Figure 1. for TB. Unlike the Shenzhen data set, ChestX-Ray8 datasets
only consist of lung abnormalities such as atelectasis,
III. RESULTS AND DISCUSSION cardiomegaly, effusion, infiltration, mass, nodule,
Through this systematic review, a total of 1310 studies pneumonia, and pneumothorax without TB positive or
were identified. After reviewing the complete text, a total of negative information. Referring to the WHO guideline, the
15 studies were extracted as a final review. The majority of abnormality in the chest with infiltration, pneumonia,
the studies used convolutional neural networks with both atelectasis, and effusion can be categorized as TB positive.
Shenzhen and Montgomery datasets [2]. 14 out of 15 studies With no pre-trained model involved, this model achieves a
consider only image data as an input, and we categorized decent Area under the curve (AUC) score of 0.705-0.9845.
these approaches as a single modality. One of their approach

Figure 1. Review Methodology

It has been known that deep learning algorithms, such as on learning rate, dropout rate, and l2 regularization to
CNN, typically achieve the best performance on large improve model performance. As a result, their method
datasets. This fact has been shown by Oloko-Oba et al. [6] for achieved a competitive AUC score of 0.92 on the tiny P.ELM
the case of Chest X-Ray, which tried to use only the datasets.
Montgomery dataset. They realized that this small dataset Another common technique to overcome the small
would lead to overfitting. Therefore, they performed data dataset challenge is transfer learning, which uses pre-trained
augmentation to increase images from 136 to 5000 images. CNN on a large dataset to learn from the small dataset. For
Using 4 Conv layers and one fully connected layer, this example, transfer learning was utilized by Haloi et al. [8] by
model reached 87.1 % of accuracy. fine-tuning a pre-trained model on Chest-Xray14 [5] for
Despite the requirement of a large dataset, deep learning Tuberculosis and Pneumonia classification. With five
can still achieve competitive performance with proper different fine-tune architecture, this study reached an AUC of
hyperparameter tuning. For instance, Lumbanraja et al. [7] 0.949 for Tuberculosis classification. Meanwhile, Filho et al.
showed that deep learning could perform well in a limited [9] used transfer learning on three datasets: Shenzhen,
dataset for phosphorylation site prediction. To overcome the Montgomery, and PadChest [10], with pre-trained AlexNet
limitations of the datasets, they did hyperparameters tuning [11], GoogleNet [12], and ResNet [13]. As a result, this study

137 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

reached AUC from 0.78 to 0.84, sensitivity from 0.76 to 0.86, This method was used by Guo et al. [26]. Six pre-trained
and specificity from 0.58 to 0.74. models were involved in this study, including the likes of
Meraj et al. [14] also used transfer learning with four pre- VGG16 [27], VGG19 [27], InceptionV3 [20], and ResNet34,
trained CNN models such as GoogleNet, ResNet50, VGG- ResNet50, and ResNet101. Along with the Artificial bee
16, and VGG19 were utilized on Shenzhen and Montgomery colony (ABC) algorithm for hyperparameter tuning, this
datasets. The highest performance obtained is 86.74% study has reached 0.99 of AUC for chest abnormalities
Accuracy and 92.0 AUC Score. detection.
Transfer learning could also produce a decent Ensemble deep learning also used by Hwa et al. [28]
performance when used on the different types of images, as combined with the Canny edge detector technique for image
Pardamean et al. [15] did in their study for learning preprocessing. According to researchers, canny edge detector
Mammogram X-ray images. They carried out transfer techniques would increase the model’s performance. With
learning from ChexNet [16], a model trained on 112,120 92.05% Accuracy reached, they have proved their methods
Chest X-ray images. This approach produces an accuracy rate worked.
of 90.38% on the DDSM dataset [17][18]. The ensemble method used by several previous
Not only X-ray images, but transfer learning also works researchers has been proven to improve. Research by Lakhani
well when applied to another type of medical image. Dominic and Sundaram [29] also uses this technique. By only applying
et al. [19] did in their study by using a tiny data set from NYU simple ensemble learning on Alexnet and GoogleNet, it can
for classifying autism spectrum disorders. Realize the result in increased performance. Even though only slightly
limitation of the dataset. they applied the transfer learning raised in AUC score, this technique is proven to produce a
method from InceptionResNetV2 [20] with ImageNet [21] better model.
pre-trained weights. With only 172 images, this method Images were used as input in all previously discussed
obtains 57.6% accuracy compared to other studies using 1992 methods. Heo et al. [30] used multimodal techniques to
patient images. The result obtained is only 2.4% different. combine image data and demographics variables consisting
Transfer learning is not the only approach to improve the of Age, Gender, Height, and Weight to making
performance of Chest X-Ray models. For example, Sahlol et classifications. Image segmentation was performed in
al. [22] combine Artificial Ecosystem-Based Optimisation preprocessing step using U-Net [31] algorithm. This study
[23] and Pre-Trained MobileNet [24] model to achieve an conducted 6 pre-trained models such as: VGG19,
accuracy of 90.23% on the Shenzhen dataset. On the other InceptionV3, ResNet50, DenseNet121, and
hand, Norval et al. [25] focussed on image preprocessing InceptionResNetV2. There was an increase in AUC of 0.0138
engineering such as Histogram Equalization, Contrast when adding demographic variables using concatenation
Enhancement, reduce color channels, sharpening, and ROI techniques with the CNN model. The Results indicate
cropping to achieved an accuracy of 92.54%. multimodal techniques have shown promising performance.
Another technique to improve the performance is called Table 1 displays a summary of included study and
ensemble learning, which is the technique that combines techniques.
several base models to produce one optimal predictive model.

Table 1. Summary of studies

Authors and Year Input Approach Dataset Performance


Published
Yahiaoui et al., 2017 Patient Clinical Binary classifier using Diyarbakir The accuracy
History Support Vector Machine Hospital, Turkey obtained is
Features (SVM) Database 96.68%
Sathitratanacheewin CXR Images CNN with augmentation Shenzhen and For Shenzhen,
et al., 2020 techniques ChestX-Ray8 AUC is 0.8502
and for
ChestXray-8,
AUC is 0.7054
Oloko-Oba & Viriri, CXR Images CNN and augmentation to Augmented The accuracy
2020 replicate images from 136 Montgomery obtained is 87.1%
to 5000 images to avoid dataset
overfitting
Haloi et al., 2018 CXR Images CNN with modified ChestXray-14 Sensitivity
Residual Inception Module [5], Mendeley 0.925
[32] Shenzhen, Specificity
Montgomery, 0.910
and Belarus AUC 0.949
Colombo Filho et al., CXR Images AlexNet, GoogleNet, and Shenzhen, ResNet Accuracy
2020 ResNet. Zooming, rotating, Montgomery, 67%, Sensitivity
and flipping for image and PadChest 0.76,
augmentation Specificity 0.58

138 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

GoogleNet
Accuracy 75%,
Sensitivity 0.76,
Specificity 0.74

AlexNet
Accuracy 73%,
Sensitivity 0.86,
Specificity 0.60
Meraj et al., 2019 CXR Images VGG16, VGG19, Shenzhen and For Shenzhen
Resnet50, and GoogleNet Montgomery AUC obtained is
0.92 and for
Montgomery
AUC received is
0.90
Sahlol et al., 2020 CXR Images Pre-Trained MobileNet Shenzhen and Shenzhen
with ImageNet weight and own collected accuracy obtained
AEO feature selection Dataset2 is 90.23%
Dataset2 Accuracy
obtained is 94.1%
Norval et al., 2019 CXR Images CNN with image Shenzhen and The highest
preprocessing such as Montgomery accuracy obtained
Histogram Equalization, is 92.54%
contrast enhancement,
reducing the color channel,
sharpening, and taking the
cropped ROI.
Guo et al., 2020 CXR Images Pre-trained InceptionV3, Shenzhen and AUC Obtained is
VGG16, VGG19, NIH 0.99 for Shenzhen
Resnet34, ResNet50, dan 0.976 for the
ResNet101 with Automatic NIH dataset, and
bee colony (ABC) this study only
algorithm for predicts chest
hyperparameter tuning. abnormalities
Linear Average Based
Ensembling for final
Output
Hwa et al., 2019 CXR Images Ensemble InceptionV3 and Shenzhen and Accuracy 92.05%,
VGG16 with Canny edge Montgomery Specificity
detector for image 95.45%,
processing Sensitivity
88.64%
Lakhani & Sundaram, CXR Images Ensemble AlexNet and Shenzhen, Highest AUC
2017 GoogleNet Montgomery, obtained is 0.99
Belarus and on the Ensemble
Thomas model.
Jefferson
Heo et al., 2019 CXR Images, Multimodal with images Korea Annual Highest AUC
Demographic and demographics variables worker’s health obtained is 0.9213
variable as an input. examination data with a 0.0138
Pretrained VGG19, AUC increase
InceptionV3, ResNet50, compared to
Densenet121, images only
InceptionResNetV2 models.
Highest AUC
increases on the
DenseNet121
model by 0.0288
points compared
to the images-only
model.

139 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

IV. CONCLUSION A. Budiarto, and B. Pardamean, “An evaluation of


deep neural network performance on limited protein
Through this systematic literature review, we have
phosphorylation site prediction data,” Procedia
summarized several techniques and approaches. In the
Comput. Sci., vol. 157, pp. 25–30, 2019, doi:
preprocessing image phase, segmentation and augmentation
10.1016/j.procs.2019.08.137.
techniques are proven to obtain higher final prediction
[8] M. Haloi, R. K. Rajalakshmi, and P. Walia,
performance. On the other hand, CNN with ensemble
“Towards radiologist-level accurate deep learning
received higher accuracy compared to a non-ensemble
system for pulmonary screening,” arXiv, 2018.
method. We also found that the multimodal approach that
[9] M. E. Colombo Filho et al., “Preliminary results on
uses images and clinical data improves the performance over
pulmonary tuberculosis detection in chest x-ray
single modality models. The modern medical practice relies
using convolutional neural networks,” Lect. Notes
heavily on multiple sources of data to make treatment
Comput. Sci. (including Subser. Lect. Notes Artif.
decisions, not least on the interpretation of medical images,
Intell. Lect. Notes Bioinformatics), vol. 12140
where substantial clinical context is often essential for
LNCS, pp. 563–576, 2020, doi: 10.1007/978-3-030-
making diagnostic decisions [33].
50423-6_42.
Multimodal methods have been successful in improving
[10] A. Bustos, A. Pertusa, J. M. Salinas, and M. de la
models outside of medical images [34][35]. In modern clinical
practice, images or clinical data alone are not sufficient for Iglesia-Vayá, “PadChest: A large chest x-ray image
diagnosis. In tune with medical images, leveraging the dataset with multi-label annotated reports,” Med.
multimodal method has proven effective in image recognition Image Anal., vol. 66, pp. 1–35, 2020, doi:
and classification, especially the multimodal combined with 10.1016/j.media.2020.101797.
ensemble technique. Consistently showed performance [11] A. Krizhevsky, I. Sutskever, and G. Hinton,
improvements, future work should consider multimodal with “ImageNet Classification with Deep Convolutional
ensemble techniques to solve pulmonary TB detection Neural Networks,” Neural Inf. Process. Syst., vol.
problem. 25, 2012, doi: 10.1145/3065386.
[12] C. Szegedy et al., “Going deeper with
convolutions,” in Proceedings of the IEEE
REFERENCES Computer Society Conference on Computer Vision
[1] Maniah, B. Soewito, F. Lumban Gaol, and E. and Pattern Recognition, 2015, vol. 07-12-June, pp.
Abdurachman, “A systematic literature Review: 1–9, doi: 10.1109/CVPR.2015.7298594.
Risk analysis in cloud migration,” J. King Saud [13] K. He, X. Zhang, S. Ren, and J. Sun, “Deep
Univ. - Comput. Inf. Sci., 2021, doi: Residual Learning for Image Recognition,” Proc.
10.1016/j.jksuci.2021.01.008. IEEE Conf. Comput. Vis. pattern Recognit., vol. 7,
[2] S. Jaeger, S. Candemir, S. Antani, Y.-X. J. Wáng, no. 3, pp. 770–778, Jun. 2016, doi:
P.-X. Lu, and G. Thoma, “Two public chest X-ray 10.3389/fpsyg.2013.00124.
datasets for computer-aided screening of pulmonary [14] S. S. Meraj, R. Yaakob, A. Azman, S. N. M. Rum,
diseases.,” Quant. Imaging Med. Surg., vol. 4, no. 6, A. A. Nazri, and N. Fadhlina Zakaria, “Detection of
pp. 475–477, 2014, doi: 10.3978/j.issn.2223- pulmonary tuberculosis manifestation in chest X-
4292.2014.11.20. rays using different convolutional neural network
[3] A. Yahiaoui, O. Er, and N. Yumusak, “A new (CNN) models,” Int. J. Eng. Adv. Technol., vol. 9,
method of automatic recognition for tuberculosis no. 1, pp. 2270–2275, 2019, doi:
disease diagnosis using support vector machines,” 10.35940/ijeat.A2632.109119.
Biomed. Res., vol. 28, no. 9, pp. 4208–4212, 2017. [15] B. Pardamean, T. W. Cenggoro, R. Rahutomo, A.
[4] S. Sathitratanacheewin, P. Sunanta, and K. Budiarto, and E. K. Karuppiah, “Transfer Learning
Pongpirul, “Deep learning for automated from Chest X-Ray Pre-trained Convolutional Neural
classification of tuberculosis-related chest X-Ray: Network for Learning Mammogram Data,”
dataset distribution shift limits diagnostic Procedia Comput. Sci., vol. 135, no. September, pp.
performance generalizability,” Heliyon, vol. 6, no. 400–407, 2018, doi: 10.1016/j.procs.2018.08.190.
8, p. e04614, 2020, doi: [16] P. Rajpurkar et al., “CheXNet: Radiologist-Level
10.1016/j.heliyon.2020.e04614. Pneumonia Detection on Chest X-Rays with Deep
[5] X. Wang, Y. Peng, L. Lu, Z. Lu, M. Bagheri, and R. Learning,” pp. 3–9, 2017, [Online]. Available:
M. Summers, “ChestX-ray8: Hospital-scale chest X- http://arxiv.org/abs/1711.05225.
ray database and benchmarks on weakly-supervised [17] M. Heath et al., “Current Status of the Digital
classification and localization of common thorax Database for Screening Mammography,” pp. 457–
diseases,” Proc. - 30th IEEE Conf. Comput. Vis. 460, 1998, doi: 10.1007/978-94-011-5318-8_75.
Pattern Recognition, CVPR 2017, vol. 2017-Janua, [18] M. Heath, K. Bowyer, D. Kopans, R. Moore, and P.
pp. 3462–3471, 2017, doi: Kegelmeyer, “The Digital Database for Screening
10.1109/CVPR.2017.369. Mammography,” Proc. Fourth Int. Work. Digit.
[6] M. Oloko-Oba and S. Viriri, “Diagnosing Mammogr., 2000, doi: 10.1007/978-94-011-5318-
Tuberculosis Using Deep Convolutional Neural 8_75.
Network,” in Image and Signal Processing, 2020, [19] N. Dominic, D. Daniel, T. W. Cenggoro, A.
pp. 151–161. Budiarto, and B. Pardamean, “Transfer learning
[7] F. R. Lumbanraja, B. Mahesworo, T. W. Cenggoro, using inception-ResNet-v2 model to the augmented

140 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

neuroimages data for autism spectrum disorder Convolutional Networks for Biomedical Image
classification,” Commun. Math. Biol. Neurosci., no. Segmentation,” in Medical Image Computing and
April, 2021, doi: 10.28919/cmbn/5565. Computer-Assisted Intervention -- MICCAI 2015,
[20] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. 2015, pp. 234–241.
Alemi, “Inception-v4, inception-ResNet and the [32] D. Kermany, K. Zhang, and M. Goldbaum, “Large
impact of residual connections on learning,” 31st Dataset of Labeled Optical Coherence Tomography
AAAI Conf. Artif. Intell. AAAI 2017, pp. 4278–4284, (OCT) and Chest X-Ray Images,” Mendeley Data,
2017. V3, 2018, doi: 10.17632/rscbjbr9sj.3.
[21] J. Deng, W. Dong, R. Socher, L. Li, Kai Li, and Li [33] S. C. Huang, A. Pareek, S. Seyyedi, I. Banerjee, and
Fei-Fei, “ImageNet: A large-scale hierarchical M. P. Lungren, “Fusion of medical imaging and
image database,” in 2009 IEEE Conference on electronic health records using deep learning: a
Computer Vision and Pattern Recognition, Jun. systematic review and implementation guidelines,”
2009, pp. 248–255, doi: npj Digit. Med., vol. 3, no. 1, 2020, doi:
10.1109/CVPR.2009.5206848. 10.1038/s41746-020-00341-z.
[22] A. T. Sahlol, M. A. Elaziz, A. T. Jamal, R. [34] Y. R. Pandeya and J. Lee, “Deep learning-based late
Damaševičius, and O. F. Hassan, “A novel method fusion of multimodal information for emotion
for detection of tuberculosis in chest radiographs classification of music video,” Multimed. Tools
using artificial ecosystem-based optimisation of Appl., vol. 80, no. 2, pp. 2887–2905, 2021, doi:
deep neural network features,” Symmetry (Basel)., 10.1007/s11042-020-08836-3.
vol. 12, no. 7, 2020, doi: 10.3390/sym12071146. [35] M. Person, M. Jensen, A. O. Smith, and H.
[23] W. Zhao, L. Wang, and Z. Zhang, “Artificial Gutierrez, “Multimodal Fusion Object Detection
ecosystem-based optimization: a novel nature- System for Autonomous Vehicles,” J. Dyn. Syst.
inspired meta-heuristic algorithm,” Neural Comput. Meas. Control. Trans. ASME, vol. 141, no. 7, pp. 1–
Appl., vol. 32, no. 13, pp. 9383–9425, 2020, doi: 9, 2019, doi: 10.1115/1.4043222.
10.1007/s00521-019-04452-x.
[24] A. G. Howard et al., “MobileNets: Efficient
convolutional neural networks for mobile vision
applications,” arXiv, 2017.
[25] M. Norval, Z. Wang, and Y. Sun, “Pulmonary
tuberculosis detection using deep learning
convolutional neural networks,” ACM Int. Conf.
Proceeding Ser., no. October, pp. 47–51, 2019, doi:
10.1145/3376067.3376068.
[26] R. Guo, K. Passi, and C. K. Jain, “Tuberculosis
Diagnostics and Localization in Chest X-Rays via
Deep Learning Models,” Front. Artif. Intell., vol. 3,
no. October, pp. 1–17, 2020, doi:
10.3389/frai.2020.583427.
[27] K. Simonyan and A. Zisserman, “Very deep
convolutional networks for large-scale image
recognition,” 3rd Int. Conf. Learn. Represent. ICLR
2015 - Conf. Track Proc., pp. 1–14, 2015.
[28] S. K. T. Hwa, M. H. A. Hijazi, A. Bade, R. Yaakob,
and M. S. Jeffree, “Ensemble deep learning for
tuberculosis detection using chest X-ray and canny
edge detected images,” IAES Int. J. Artif. Intell., vol.
8, no. 4, pp. 429–435, 2019, doi:
10.11591/ijai.v8.i4.pp429-435.
[29] P. Lakhani and B. Sundaram, “THORACIC
IMAGING: Deep Learning at Chest Radiography
Lakhani and Sundaram,” Radiology, vol. 284, no. 2,
pp. 574–582, 2017, [Online]. Available:
http://pubs.rsna.org.ezp-
prod1.hul.harvard.edu/doi/pdf/10.1148/radiol.20171
62326.
[30] S. J. Heo et al., “Deep learning algorithms with
demographic information help to detect tuberculosis
in chest radiographs in annual workers’ health
examination data,” Int. J. Environ. Res. Public
Health, vol. 16, no. 2, 2019, doi:
10.3390/ijerph16020250.
[31] O. Ronneberger, P. Fischer, and T. Brox, “U-Net:

141 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Design of Cadets Administration System for


Nusantara Cilacap Maritime Academy Based On
Website
Ana Umul Fadilah Tisnanto Adisatyo Widcaksono Eduard Pangestu Wonohardjo
Computer Science Department, BINUS Computer Science Department, BINUS Computer Science Department, BINUS
Online Learning Online Learning Online Learning
Bina Nusantara University Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia, 11480 Jakarta, Indonesia, 11480 Jakarta, Indonesia, 11480
[email protected] [email protected] [email protected]

Emny Harna Yossy*


Computer Science Department, BINUS
Online Learning
Bina Nusantara University
Jakarta, Indonesia, 11480
[email protected]

Abstract— Submitting permits and registering for scholarships passport, and a ceremony permit will be served by the security
are activities often conducted by cadets at the Cilacap Nusantara sector. Currently, the letter submission and letter writing
Maritime Academy. All activities carried out are still conventional, service is still performed traditionally, meaning cadets must
with the cadets having to come straight to the permit and scholarship come directly to campus to meet the required letter writing
registration. To overcome this, a website-based management system requirements. As a result, files accumulate in the retirement
was set up by the Nusantara Cilacap Maritime Academy. The system area, which in turn leads to archiving errors.
development method used is the waterfall model. The programming
language used is PHP with a MySQL database and CodeIgniter Several studies have previously been carried out relating
framework. Evaluation methods are using BlackBox testing for the to similar issues encountered at the Nusantara Maritime
system and eight golden rules for the user interface. The result of Academy, such as: shows that the research method used is in-
this research is the development of a website-based management depth research and data is obtained in the form of text. The
system for the Nusantara Cilacap Maritime Academy. This research results of this study show that the designed application still
concludes that the Cilacap Nusantara Maritime Academy's has errors in generating reports and takes a long time, and the
administrative system can increase the efficiency of time for filing resulting report is not well structured [3]. Research by
permits and writing letters. (Yulianto et al., 2019) shows the results of system tests of
users of 20 respondents aged 20 to 30 years old, 71.67%
Keywords— Information system, Waterfall Model, PHP,
agreed, 19.58% less agreed and 8.75% disagreed [4].
MySQL, CodeIgniter
Based on the above data and problems, what is needed is
I. INTRODUCTION a system that can solve these problems. Therefore, the
In today's world of work, technology is the main principle researcher intends to build a web-based administrative
in carrying out all work activities using existing resources, information system for the management of information
namely computers and internet networks. Universities that use systems and make it a study called Design of Cadets
information technology to manage archives on the web are Administration System for Nusantara Cilacap Maritime
still minimally applied. In this case, an information system is Academy Based On Website.
needed to manage information accurately and precisely. In
educational institutions, especially tertiary institutions, it is II. LITERATURE REVIEW
also inseparable from the role of information technology A. Application
which is indispensable along with its development. With the
development of a university, the number of students, and the Application is the implementation of system design to
growing knowledge, the university must improve its services process data using rules or regulations of certain programming
and the quality of existing human resources [1]. languages. From this, it can be concluded that the application
is a computer program or software that was created based on
Nusantara Maritime Academy is an educational institution the user's request to perform a specific task [5].
operating in the maritime sector that organizes Diploma III
(D-III) higher education consisting of three (three) study Based on the type of application on the computer, the
programs including Engineering Study Program (TMK), Port application is divided into two areas, desktop applications, and
Management Study Program (PP), and the Nautical Study web applications. A web application is a client/server
Program (NTK) [2]. The Cadets Department of the Nusantara application that uses a web browser as a client program to
Maritime Academy is one of the primary work units for display an attractive appearance over an internet connection
providing services to cadets. All activities related to the cadets' so that the resulting display is dynamic depending on the
correspondence services such as a certificate of good requested parameters, user habits, and security aspects. The
behavior, a certificate of passport creation, a bank account client represents the computer used by a user who wants to use
creation certificate, a license to wear a sword, an official the application, while the server represents the computer that
provides application services. In this context, the client and
*Corresponding author.
server communicate via the internet, and the intranet.

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

142 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Interestingly, the client/server model using web applications E. Database


can span multiple platforms as browsers. A database is a collection of logically related data and
The characteristic of a web application is that the user uses descriptions that meet the information needs of an
software called a web browser (e.g., Netscape Communicator, organization. Entities are objects (e.g. people, places, objects,
Internet Explorer, Chrome, and Mozilla) to access web concepts, or events) that are represented in the database.
applications. The computer acting as the server generally Attributes are properties that belong to an entity. An entity has
provides, in addition to the web server, a database server that relationships with other entities when these entities have
handles requests from users who want to access web associations with other entities. To show entities and how
applications. The database server is a server that enables these entities relate to each other, use a diagram called an
access to the database. Oracle and MySQL are examples of Entity Relationship Diagram (ERD). Entities are in the form
database servers. Examples of web servers are Apache (well of tables that contain data. ERD is used when a description of
known in the Linux environment) and IIS (Internet the part of the company that is being modeled is needed [13].
Information Server), which are the mainstay of Microsoft [6]. F. Website-Based Programming Language
B. Administrative Systems The programming languages used to create website-based
The administration is the process of organizing applications are HTML, CSS, PHP, and Javascript. HTML
organizational activities to achieve predetermined goals with (HyperText Markup Language) is a programming language
the help of human resources [7]. Whereas a system is a group for creating web pages and is used by browser applications.
of interconnected components that work together to achieve HTML documents are simple text files that contain markup,
the desired result [8]. As technology exists as a means of text, and additional data that affect the text. The latest version
supporting the management system, its utility value will of HTML is HTML5. CSS (Cascading Style Sheet) is the
increase. An administrative application system created for an language in which the formatting of web pages is specified.
agency forms a forum in which business processes in this CSS offers many options for changing all aspects of a
agency can be carried out so that it becomes more efficient, webpage, such as B. Specifying fonts (size, color, etc.), colors
user-friendly, and comprehensive. Every process can be and background colors, frames that restrict HTML elements,
computerized so that all data on operations and supporting positioning elements on the page [14].
agency data is properly recorded and stored in a reintegration JavaScript can be broken down into three parts, namely the
database [9]. One of the commonly used administrative core, the client, and the server. The core is the core of the
system mechanisms is a letter. Letters are a means of programming language, including operators, expressions,
communicating messages or information in writing to other statements, and subroutines. Client-side Javascript is a
parties that reflect the image or authority of the sender [3]. collection of objects that support browser settings and user
C. Software Engineering interaction. Server-side Javascript is a collection of objects
where these objects are used by the webserver. For example,
Software engineering is development using engineering it supports communication with a database management
principles or concepts to create economically valuable system. Server-side JavaScript is rarely used compared to
software that is trustworthy and works efficiently with client-side JavaScript. Client-side JavaScript can also access
machines. Software is widespread and often not used because and change the appearance and content of elements in HTML
it does not meet customer needs or even due to non-technical documents via the Document Object Model (DOM).
issues such as software user's reluctance to change manual
labor to be computerized or the user's inability to use the PHP is a scripting programming language for creating
computer. Therefore, software engineering is required so that dynamic web pages. The purpose of scripts is to be created
the software created is not just unused software [10]. using a plain text editor such as Notepad, Notepad ++, and
others. Although known as the language for building web
Software engineering has a concept in developing a pages, PHP can be used to build command-line applications
software development methodology, namely the System as well as GUIs.
Development Life Cycle (SDLC). SDLC is the entire system-
wide process starting at the stage of building, deploying, Websites created with PHP require software called a web
using, and updating the information system [11]. SDLC. server that does the processing of PHP code. A webserver with
SDLC has several types of development methods, one of PHP parser software processes input in the form of PHP code
which is used in this research is the waterfall. The waterfall and generates output in the form of a website. PHP is open and
method provides the SDLC concept of software sequentially cross-platform, so it can run on many brands of webserver
starting from communication to determine user specification (such as Apache and IIS). There are a lot of PHP users right
requirements (communication) and project development plans now. It is claimed that there are more than 40 million websites
(planning), modeling, construction, and product launching using PHP and running PHP from more than 2 million servers
(deployment) [12]. [15].
D. Unified Modeling Language (UML) G. CodeIgniter Framework
UML (Unified Modeling Language) is a standard set of The framework can be interpreted as a collection of pieces
models and notations from the Object Management Group of code arranged or organized in such a way that a complete
(OMG). OMG itself is a standard organization for system application can be built without having to build all of the code
development. Some examples of UML are use case diagrams, from scratch. CodeIgniter is a PHP (Hypertext Preprocessor)
class diagrams, activity diagrams, and sequence diagrams. framework that allows developers to speed up the
UML enables analysts and users to understand variations of development of PHP-based web applications instead of
the specific diagrams used in project development [11]. writing all of the code from scratch. CodeIgniter is a PHP
framework that is based on the MVC rules (Model-View-

143 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Controller) and enables the separation between application displayed on a page is limited to a certain
logic and presentation level [16]. amount, making it easier for the cadets to
memorize and understand the
H. Evaluation information on a page.

Evaluation is carried out on the system using Blackbox


testing and the user interface using eight golden rules. III. RELATED WORKS
Blackbox testing, also called behavioral testing, focuses on
the functional requirements of the software. Blackbox testing The results of the research on the design and construction
can get a set of input conditions that fulfill all the functional of the correspondence management information system
Banyuwangi Education Office, Malang, using the spiral
requirements of the program. Blackbox testing tries to find
model method. The test results showed that the designed
errors in the following categories: (1) incorrect or missing application ran well. The suggestions for application
functions, (2) display errors, (3) errors in data structures or development can, among other things, provide notifications in
external database access, (4) performance errors, and (5) the form of SMS via registered numbers, display letter
initialization errors. and termination of the program. attachments and run on the Android system [18].
Blackbox testing is usually done on the software interface.
Blackbox testing examines basic aspects of the system The results of other research on the development
regardless of the internal logic structure of the software [12]. (SISMAKA) of web-based information systems for incoming
To evaluate the user interface using eight golden rules. There and outgoing e-mail, Sukorejo Semarang, use the waterfall
are eight principles known as "golden rules" that can be method in the design of their application model. Of the system
test results by users of 20 respondents aged 20 to 30, 71.67%
applied to most interactive systems in designing interfaces,
agreed, 19.58% disagreed, and 8.75% disagreed. Suggestions
namely: strive for consistency, cater to universal usability,
generated from this research require user training, and regular
offer informative feedback, design dialogs to yield closure, data updates [4].
prevent errors, permit easy reversal of actions, support
internal locus of control, reduce short-term memory load The results of the study with the title "Design of web-based
[17]. The analysis of those eight principles can be seen in information systems for letter archiving in the XYZ district"
Table 1 below. using the waterfall method in the form of a not yet
TABLE I. EIGHT GOLDEN RULES EVALUATION implemented web-based application design for letter
archiving information systems. Therefore, in this study, it is
The Principles Descriptions
Strive for consistency The layout of the buttons, the fonts used
proposed to implement a system designed to find out the
and the navigation bar are the same for shortcomings of the designed system and to be able to develop
each page. it even better [19].
Cater to universal usability The functions provided in this
application will assist both learning and The research with the title "Web-based Digital Outgoing
experienced users to use this application. Mail Filing System" uses the SDLC (Systems development
In addition, the appearance or the created life cycle) method in the system development procedure while
surface was adapted to the needs of each the system design uses the UML (Unified Modeling
user.
Language) method. The results of this study can produce a
Offer informative feedback When the user takes an action and the
action requires feedback, a feedback system that suits user needs and can run well. The suggestion
message will be displayed with a specific from this research is that it needs to be evaluated with a certain
message, for example “Registration period to find out whether new features need to be added
Successful” or “Please fill in these according to user needs [20].
fields”.
Design dialogs to yield The sequence of actions is organized IV. METHODOLOGY
closure according to the structure used in the
database. For example, when the cadets To be able to create applications that meet the needs of
have completed their application for a Nusantara Maritime Academy's users, the authors need to
travel letter and clicked the Save button, develop a mindset to maintain a clear workflow in the research
a message will appear stating that the process. The research framework can be seen in Fig. 1.
data has been saved successfully.
Prevent errors The system carries out error prevention
so that the system can continue to run
properly and smoothly. For example, if a
user sends a letter and a field has not been
filled in and the field contains important
data for the system to perform the next
operation, the system will notify the user
that the operation will not proceed until
the field is completed .
Permit easy reversal of Detectable user errors are displayed to
actions the user so that the user can correct the
error.
Support internal locus of The cadets who are used to accessing this
control application will find it easy to use this
application and feel like an action
controller as the cadets can easily see the
data for submitting letters.
Reduce short-term memory The page related to this application is
load supported by the filing process carried Fig 1. Research framework
out by the Cadets. The information

144 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

With this in mind, the authors developed a software 6. Scholarship Submission is used to manage scholarship
development method using the waterfall method, so the information, manage cadet scholarship applications, and
phases carried out in this study can be seen in Fig. 2. view data on scholarships submitted.
7. Manage User, used to manage cadet accounts that can
access the application and update cadet account
information.
8. Website settings used to manage information in the
Cadet Administration System application at Nusantara
Cilacap Maritime Academy.
Fig 2. Research methodology While the use cases that can be done by Cadet are as
To find out the problems in the Cilacap Nusantara follows:
Maritime Academy, it is first necessary to collect data to 1. Log in to access the application.
analyze a problem. The methods used in data collection are 2. Log off, this use case is used by actors to terminate the
interviews, observations, literature studies, and application
questionnaires. Problem data obtained can be seen in the 3. Registration is used to register a new account in the
fishbone diagram in Fig 3. system if it is not already registered in the database.
4. Forgot password, used when the actor forgets the
registered password.
5. Update profile, is used to manage personal information
and to view the progress of the submitted contributions.
6. Submission of letters is used to submit requests for the
creation of a new certificate.
7. The submission of scholarships is used to apply for
scholarships.
8. Scholarship information is used to display
information about scholarships.
Fig 3. Fishbone diagram
A use case diagram is provided below to help identify user
The problems in the Cilacap Nusantara Maritime requirements for the proposed system.
Academy based on the fishbone diagram above can be divided
based on the problems faced by the visually impaired
administrative officer and cadets. The problems facing
administrators in the narrative field are the system that is
currently running to manage incoming and outgoing e-mail
has not been grouped and there is no classification as it is still
processed manually, making documents difficult to find and
processing documents inefficiently, bounded office storage
resulting in stacking of documents which causes errors, lack
of human resources who understand the management system
so this is the reason why a web-based management system has
not yet been created. While the problems cadet faces are Cadet
must slowly come to the Cadet branch to apply for a
certificate, Cadet has to wait for the letter validation to be
completed, Cadet cannot see the status of the submitted
application and has to wait for information from the
administrative officer or cadet can come to the office to ask
the officer directly.

V. PROPOSED SOLUTION
A. Software Design
From the results of the analysis of the problems carried
out, the author develops software designs using use case
diagrams (Fig. 4). The use cases that can be performed by the
administrator are as follows: Fig 4. Use Case Diagram.
1. Log in to access the application. The sequence diagram in the design of the Archipelago
2. Log off, this use case is used by actors to terminate the Maritime Academy Administration System is a description of
application. the scenario in the use case diagram. The scenario for the
3. Profile update to manage personal information. design of the Archipelago Maritime Academy Administration
4. Letter data is used to manage the system-generated System is divided based on users, namely Admin and Cadets.
certificate and to update the status of the certificate. The following is a sequence diagram for Admin users which
5. Submission of letters to report applications that have just consists of logging in, logging out, updating profiles,
been sent by the cadet and are used to accept or reject managing mailing data, managing scholarship applications,
applications from cadet. and website settings. The sequence diagram for login users

145 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

includes logging in, logging out, forgetting passwords, conduct, apply for a cadet travel certificate, apply for an
updating profiles, registering, submitting applications for account opening letter, apply for a letter carrying a sword, and
letters, applying for scholarships, and viewing scholarship view the cadet's personal data.
information.
The class diagram in the draft of the administration system The admin dashboard display is as follows:
is an illustration of the representation of the object class in the
system. The Cadet Administration System includes 15
objects, with two objects being super-classes, namely the
object of letters and scholarships. The subject of the letter
becomes the upper class for the subject of a cadet travel
document, a passport with a sword, a certificate of good
behavior, a certificate of making a passport, and a certificate
of opening an account. While the grant object becomes a
superclass for the object of pengajuan_bidikmisi and
pengajuan_ppa.
Next is the entity-relationship model which contains the
components of the entity set and the set of relations, each of
which is equipped with attributes that represent all the facts of
the real world that we are reviewing, which can be described Fig 7. Cadets Dashboard.
more systematically by using an Entity Relationship Diagram After the admin has successfully logged in, the admin will
(E-R Diagram). Next, the table structure is a design on the be on the dashboard page which is the initial appearance of the
database that will be used in the system to be developed. application for admins. On this page, the admin can select the
B. User Interface Design mail data menu which includes incoming letters, outgoing
letters, letter submissions and scholarship applications. In the
The user interface is divided into two based on the website settings menu, the administrator can set the
authorization as cadets and admins. The initial appearance of scholarship registration schedule, change the profile, and can
this application is as follows: upload the procedure for submitting a letter which will be
displayed on the web page of the Nusantara Maritime
Academy Youth Administration System. Admin can access
features, namely scholarship application, outgoing mail,
managing outgoing mail data, managing incoming mail data.
C. Evaluation
Researchers evaluate the system and user interface. The
results of the evaluation of the system using the Blackbox
Fig 5. Home Page.
testing method can be seen that the application functions run
This page is the initial view when the application is first according to the design, both the admin display and the cadet
opened. On this page, users can access the scholarship display. Evaluation is done by testing the features contained
information feature without having to login first. To log in, in the application in two stages. The results of testing the
users can use the “Please Login” button at the top right of the Website-Based Application of the Archipelago Maritime
page. When the button is pressed, the option to log in as a Academy Youth Administration System, it turned out that the
cadet or admin will appear. The cadet dashboard display is as first application gave a successful status of 13 of the 17
follows: features that were tried by the cadets at the Nusantara
Maritime Academy Cilacap. With the failure of the feature,
improvements were made so that on the second test all
features can be accessed as expected.
After that, the researcher evaluates the user interface. The
results of the evaluation of the user interface using eight
golden rules found that the user interface is consistent for each
page, can be used by experienced and inexperienced users, has
a feedback feature, the sequence of actions is organized
according to the structure used in the database, the system
prevents errors so that the system can continue to run well and
Fig. 6. Cadets Dashboard smoothly, detectable user errors will be displayed to the user
On this page, cadets who have not registered in the so that the user can correct the error, cadets who are
Maritime Nusantara Academy Youth Administration System accustomed to accessing this application will find it easy to
application will register. The cadets enter the required data use this application and feel like an action controller action
such as nik, nit, full name, email, password, and password because karuna will quickly see data for submitting letters that
have been made, the page on this application is supported by
confirmation. If the cadet does not fill in one of the fields, an
the submission procedure that will be carried out by cadets,
error message will appear such as “please fill out this field.”
cadets can read how to submit letters, the information
After registering, the cadet will automatically log in and go displayed on a page will be limited to a certain amount making
to the home page. After that, the cadets can access the
following features: apply for a letter, apply for a letter of good

146 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

it easier for cadets to submit letters. memorize and capture the 2nd Editio. New York, United States: John Wiley &
information contained on a page. Sons Inc, 2009.
VI. CONCLUSION [7] Maulina Andhini and R. P. S. C. A. Daulika,
“‘CallMe’ Aplikasi Alat Bantu Komunikasi
The paper entitled “Design of Cadets Administration Penyandang Tuna Rungu dan Tuna Netra,” Telkom
System for Nusantara Cilacap Maritime Academy Based On University, 2015.
Website” reads as follows:
1. This web-based administration system application can [8] W. & Bentley, System Analysis and Design Methods,
be used by cadets who wish to apply for permit and 7th ed. 2007.
scholarship registration [9] H. Agassi, F. I. Fajri, Y. A. Ari, and A. Kurniawan,
2. This web-based management system application can “Pengembangan Sistem Admistrasi Pada Deltamusik
facilitate the summary of letters executed by the Cadet School Berbasis Web,” 2015.
Management https://socs.binus.ac.id/2015/09/16/pengembangan-
3. This web-based administration system application has sistem-admistrasi-pada-deltamusik-school-berbasis-
data management and features that are easy to web/.
understand for cadets and the administration department. [10] R. A.S and M. Shalahuddin, Rekayasa Perangkat
When creating the Design of Cadets Administration Lunak Terstruktur dan Berorinetasi Objek,
System for Nusantara Cilacap Maritime Academy Based On Informatika. Bandung, 2013.
Website, still many things that can be developed, such as: [11] J. W. Satzinger, R. B. Jackson, and S. D. Burd,
1. The need to develop a management system based on Systems Analysis and Design in a Changing World,
mobile apps for the Nusantara Maritime Academy 6th ed. 2011.
Cilacap
[12] R. S. Pressman, Software Engineering: A
2. Internet technology makes it easier for everyone to
Practitioner’s Approach, 6th ed. McGraw-Hill,
access information from anywhere. Therefore, security
issues must always be considered so that the system can 2005.
be protected from other unauthorized parties. [13] A. Connoly, Thomas; Begg, Carolyn; Strachan,
3. The approval of electronic file technology can make it Database Systems : A Practical Approach to Design,
easier for the approver to approve letters, making the Implementation and Management, 3rd editio.
administrative process faster and reducing paper Addison Wesley, 2003.
consumption so that no paper is used at all. [14] I. Spaanjaars, Beginning ASP.Net 4.5.1 in C# and B.
John Wiley & Sons, 2014.
ACKNOWLEDGEMENET
[15] Edy Winarno, A. Zaki, and S. Community,
We said thank you to the Online Learning Computer Pemrograman PHP. 2013.
Science Study Program, Bina Nusantara University for
funding research publications and to lecturers who have guide [16] Y. Kustiyaningsih, Pemrograman Basis Data
me from the start to the completion of the research. berbasis Web menggunakan PHP dan MySQL.
Yogyakarta: Graha Ilmu, 2011.
REFERENCES [17] “Shneiderman’s ‘Eight Golden Rules of Interface
[1] D. Darmawan, Metode Penelitian Kuantitatif. Design.’”
Bandung: PT Remaja Rosdakarya, 2013. https://faculty.washington.edu/jtenenbg/courses/360
[2] A. M. Nusantara, “Sejarah Akademi Maritim /f04/sessions/schneidermanGoldenRules.html.
Nusantara,” 2021. [18] G. P. Putra, N. Santoso, and E. M. A. Jonemaro,
https://amn.ac.id/page/detail/sejarah-akademi- “Rancang Bangun Sistem Informasi Manajemen
maritim-nusantara. Persuratan Dinas Pendidikan Banyuwangi,” J.
[3] D. L. Rahmah, “Perancangan Aplikasi Sistem Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 3, no.
Persuratan Berbasis Web Pada Pt. Dwi Pilar 5, 2019.
Pratama,” Faktor_Exacta, vol. 7, no. 3, 2014. [19] E. K. Putra, W. Witanti, I. V. Saputri, and S. Y.
[4] R. W. Yulianto, “Pengembangan (Sismaka) Sistem Pinasty, “Perancangan Sistem Informasi Pengarsipan
Informasi Surat Masuk Dan Surat Keluar Berbasis Surat Berbasis Web Di Kecamatan Xyz,” Ikraith-
Web Pada Kantor Kelurahan Sukorejo Semarang,” J. informatika, vol. 4, no. 2, 2019.
Ilm. Cendekia Eksakta, vol. 2, no. 1, pp. 101–111, [20] N. W. Hartono, E., & Wardani, “Sistem Pengarsipan
2019. Surat Masuk Surat Keluar Digital Berbasis Web,” J.
[5] K. P. dan K. R. I. Badan Pengembangan dan Teknol. Inf. Dan Komput., vol. 5, no. 2, pp. 204–211,
Pembinaan Bahasa, Kamus Besar Bahasa Indonesia. 2019, [Online]. Available:
2016. https://doi.org/10.36002/jutik.v5i2.787.
[6] L. Shklar and R. Rosen, Web Application
Architecture : Principles, Protocols and Practices,

147 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Implementation of Face Recognition Method for


Attendance in Class
Bryan Gavriell Felix Fauzan Nelsen Ardian Kristien Margi Suryaningrum
Department of Computer Department of Computer Department of Computer Department of Computer
Science Bina Nusantara Science Bina Nusantara Science Bina Nusantara Science Bina Nusantara
University Jakarta, Indonesia University Jakarta, Indonesia University Jakarta, Indonesia University Jakarta, Indonesia
[email protected] [email protected] [email protected] [email protected]

Abstract— Face recognition has become one of the recording attendance. Data from questionnaires
key aspects of computer vision. In this paper, an acquired from students in Bina Nusantara will be
automatic face recognition system is proposed based on used to help show the faults and errors. An
appearance-based features that focus on the entire face
image rather than local facial features. The first step in a
automatic attendance recording will help record
face recognition system is face detection. Viola-Jones more accurately and help those students who
face detection method that capable of processing images suffer from these problems.
extremely while achieving high detection rates is used. It
From the description given beforehand, data
is a computer application for automatically identifying a
person from a still image or video frame. In this paper, analysis will be required for the research to be
we proposed an automated attendance management more effective and efficient. This requires a deep
system. In this research, our effort is to develop a system study and understanding, so the researcher is
that allows easy attendance marking using real-time face invested to do research with the title
recognition. The system is based on the Viola-Jones “Implementation of Face Recognition Method for
algorithm, this research provides a more efficient and Attendance in Class”.
secure Attendance for students at Binus University.
The key problems formulated to create this
Keywords—Face Recognition; Viola-Jones; Feature research came from problems regarding attendance
Extraction; Distance Measurement; Machine Learning; fraud and human error. This research aims toward
INTRODUCTION
I. helping Binus University students in dealing with
existing attendance problems. This research act as
Technology has a crucial role in the a consideration, as well as a comparison to the
development of human lives, especially with the attendance system that is being used today, aims to
effects of technology towards human’s point of better accuracy results and efficiency.
view. System information can be divided into two
different types based on human interactions: LITERATURE REVIEW
II.
manual system information and automatic system Facial Recognition came from the idea of
information. Manual system information tends to Computer Vision, which is a field of Artificial
be slower, need a lot of human intervention, and
Intelligence that trains computers to interpret and
inaccurate inform. Authentication is one of the
significant issues in the era of information system. understand visual input [2][3]. Computer Vision
Among other things, human face recognition uses digital images from cameras to make a
(HFR) is one of known techniques which can be reaction to what it sees using deep learning
used for user authentication. [1] models [2][3][4]. Computer Vision was first
experimented with in the 1950s, back then it was
This research is about how students at Bina just used to understand typewritten and
Nusantara University use in according to their handwritten texts [2][4].
attendance recordings on site. Today there is many
With the emergence of machine learning, a
cases where problems relating with attendance,
new approach to solving Computer Vision
that are caused by having manual attendance
recordings. These manual recording problems problems was invented. It provides a different
relate to students using their student cards to approach to machine learning it extracts common
record their attendance in each given class. Many patterns between those data samples and
of Binus’s students have either forgot or lost their transforms them into an equation that will help
student cards, with the excuses such as accidental, future information [3]. Computer Vision works in
broken, or etc. Therefore, research will be three basic steps. First, it acquires an image that is
conducted on many students that may have captured in real-time through cameras for analysis
experienced the effects of this inefficient way of [4]. Then, it processes the image using deep

978-1-6654-4002-8/21/$31.00 ©2021 IEEE


148 28 October 2021, Jakarta - Indonesia
2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

learning models that will automate much of the Haar-like objects, and cascade structure, to select
process, the models are usually trained by first object models that do not meet face recognition
being fed amounts of pre-identified images requirements [7][11][12][13]. This algorithm is
[2][3][4]. Finally, it identifies or classifies the implemented in OpenCV as
recognized object model from the captured image cvHaarDetectObjects() [13]. The Viola-Jones
[2][4]. algorithm remains as one of the best known and
Computer Vision has many types such as easiest methods to implement Facial Recognition
object detection, facial recognition, pattern [11].
detection, image classification, and feature The Viola-Jones proposed Integral Image
matching [2][4]. Facial recognition is an advanced is to calculate the value by subtracting the pixel
type of object detection that not only identifies on the white area with the pixels on the black area
facial features but also identifies an individual [4]. to help calculate the value of the whole feature of
Many companies today use Computer Vision for the representative image [7][12][13]. The second
their applications, such as “Bixby Vision” by contribution of this algorithm is by implementing
Samsung as an example, deep learning is used to Adaboost Classifier which is a set of filters that
find an object and find matching or similar items can be used to segment an image to determine
on the internet [3][5]. features and thresholds [7][12]. Adaboost
Facial recognition is a method of computer Classifiers accumulate several weak classifiers to
vision that identifies the identity of an individual make strong classifiers [12][13].
using their face [4]. Face recognition systems use During the testing phase, all stored
a computer algorithm to pick out distinctive features are applied to the input image and
details of a person’s face. Those details include classified whether they are faces or not [13][14].
eye distance, chin shape, nose shape, and others If the image passes all the conditions the image
[6]. A mathematical representation will be made will be deduced as a face, and vice versa if it does
and converted from the obtained details then not meet the conditions, the image fails to be
compared to data that have been trained into the classified as a face [13][14].
system [6]. Using a deep neural network,
computers learn to understand the position of III. MATERIAL AND METHODOLOGY
faces. Each time an image is given, the algorithm 3.1 Flowchart
estimates where the face is [7]. Neural networks
are first fed with data samples to be used as a
comparison for one and another [7].
First experiments on facial recognition
began in the 1960s by Woodrow Wilson Bledsoe.
Bledsoe proposed a system that could classify
photos of faces by using a RAND tablet, the tablet
functions as a coordinate collector of several
facial features [8][9]. In the 2010s, face
recognition started to be used in public facilities
and military operations [8]. In 2011, face
recognition helped to confirm the identity of
Osama bin Laden when he was killed in a US raid
[8].In the 1990s researchers Turk and Pentland
proposed Eigenfaces, other researchers of face
recognition began to pay attention to it. [10].
Real time face recognition became a
reality in 2001 with the proposed algorithm made
by Paul Viola and Michael Jones [7][9][10]. This
algorithm works by first capturing an integral
image from the original image, then an Adaboost
Classifier functions for feature selection using
Fig. 1. Flow Chart of Face Recognition Process

149 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

of Viola Jones face detection method is on its


The flowchart describes the process in cascade structure of the classifier. The cascade
which the system works. The system will convert structures consist of 3 stages [12][13][17].
an image from RGB into grayscale so features A Cascade classifier process from many
would be easier to identify and differentiate features with organizing from classification levels
[13][15]. If no face is detected, then the system [16]. There are three types of classification for
will not record any entries. If a face is detected, a face detection. The first classifier, each sub-image
boundary box will save the face and compares it will be classified using one feature. If the feature
with existing samples from the database. If value results from the filter do not meet the
samples are matched, the system will record a desired criteria, they will be rejected. The
new attendance entry for the database. algorithm then moves to the next sub window and
3.2 Viola-Jones Algorithm calculates the feature value again. If the results are
obtained in accordance with the desired threshold,
To detect face features, the system requires the
then proceed to the next filter stage [13][14][16].
viola-jones algorithm to turn the image into a
Until the number of sub windows that pass the
grayscale image. As shown in Fig. 2 Haar features
classification will be reduced to close to the
are composed of two or three rectangles. These
detected image [18][16].
features will be applied to an image to find out
From fig. 4 the cascaded classifier is composed
whether a face is present or not [12][13][14]. Each
of stages each containing a strong classifier from
Haar feature has a value and is calculated by
AdaBoost [16]. The job of each stage is to
taking an area of each rectangle and adding the
determine whether a given sub-window is not a
result [11].
face or maybe a face. When a sub-window is
classified to be a non-face by a given stage it is
immediately discarded. Conversely a sub-window
classified as a maybe face is passed on to the next
stage in the cascade [13]. It follows that the more
stages a given sub-window passes, the higher the
chance the sub-window contains a face [16].
Cascade structure which dramatically increases
the speed of the detector by focusing attention on
Fig. 2. Examples of Haar features of an image [14]
promising regions of the image. It is possible to
An integral image is an algorithm for the cost- eliminate the false candidate quickly using stage
effective generation of the sum of pixels cascading. The cascade eliminates candidate if it
intensified in a specified rectangle. The integral did not pass the first sta ge. If it passed than send
image is used for the rapid computation of haar it to next stage which is more complicated than
features. Calculation of the sum of a rectangle previous one. If a candidate passed all the stage,
inside the image is extremely efficient, requiring this means a face is detected [13][14].
only four additions for any arbitrary rectangle size
[14]. AdaBoost is used for the construction of
strong classifiers as a linear combination of weak
classifiers [12][14]. This is established in Fig. 3.

Fig. 4. Cascade Structure of Viola-Jones Algorithm [19]


Several advantages to this approach are like
Fig. 3. Integral image Generated from Input Image [16]
feature selection that is sophisticated and an
invariant detector that locates scales. This
Adaboost classifier is a set of filters that can be algorithm doesn’t work well with turned or tilted
utilized to segment the image. The characteristics faces [20].

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

150 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

3.3 Face Recognition App by MMM developer every course conducted will be held in different
classes based on the schedule listed.
The face recognition application allows users Figure 5, shows that when students record
to train face, so that face recognition application attendance, the Attendance entity will acquire data
can understand the user's face that has been from the student entity which contains StudentId,
inputted through the camera with computer vision StudentName, and other attributes that will be
which uses the Viola-jones algorithm to be able to used as a reference for face recognition. The
detect faces that have been inputted [19]. attendance entity needs a relationship with the
This face recognition application detects and lecturer entity to accurately distinguish the
visualizes the user by using three main modules, difference in the course taken. The attendance
first it allows the user to train the face with the records will also be mapped according to ClassId,
algorithm, to monitoring faces, and save the ClassLocation, CourseId, and CourseName that
username for each face, that will be used as a respectively allocated in the Class and Course
database in the application to recognize the face entities. The StudentImage attribute is the key to
that has been input. correctly identify and recognize a person from a
The second module is to realize the user's face class, it is used as a comparison for the captured
which is the training result from the inputted image in the implementation.
image and displays the name of the person who
matches the detected face. 3.5 Use Case Diagram
The third face recognition module is a gallery
module of the results from face recognition that
already have trained, which contains data from all
trained faces with the Viola-Jones algorithm so
that users can see the results of faces that have
been done through face recognition training.

3.4 Entity Relationship Diagram

Fig. 6. Use Case Diagram of Face Recognition

Fig. 6 has the illustration of Viola-jones


algorithm Using Use case diagrams and class
Fig. 5. Entity Relationship Diagram of The Proposed diagrams, methods regarding this research are
System simplified. From the use case standpoint, the
camera will detect a user’s face and extracts
As shown in Fig. 5. the student entity has a distinct information from each participant, and
one-to-many relationship, the id of the student then compares said image to a sample provided by
will be recorded multiple times in the Attendance a database. For the comparison to apply, an admin
entity. The lecturer also has a one-to-many must provide and store samples in a database so
relationship, this scenario works similarly to the attendance according to each student can be
student-attendance relationship. The Class identity recorded. Samples will differentiate color, motion,
has a one-to-many relationship with the features, and shape.
Attendance entity, as it uses the Class Id for the
basis of attendance. The Course Entity has a one-
to-many relationship towards the Class entity,

151 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

IV. IMPLEMENTATION AND TESTING classifiers into strong . Cascading helps to boost
the speed and accuracy of the model.
This research follows a specific
requirement analyzed using use cases. The system
mainly incorporates 2 main actors, Student/User,
and Admin. At this point, user requirements were
paid higher attention than the system
requirements. Functionalities such as attendance
records, user account, and face detection were
paid much attention to during this point. When
Users/Students enters a class, their faces will
automatically be scanned and detected when the
Fig. 8. saving face and name figure
class starts. A new record will be allocated in the In Figure 8, after training starts a name or
system, which allows the validation of attendance id needs to be associated with the trained image in
from the admin. order for the said image to be saved. A text will
A User clocks in a new record and clock out to show once the face image has successfully been
confirm that attendance record. Each record will trained.
be counted as a sign in and confirmation. once the
class is over, a confirmation and sign out will
occur. The system only detects images captured in
realtime according to the start of the class, this
function allows the differentiation of new and old
input image [21].

4.1 Tests and Results


Graphical testing to show the prototype of the
research at hand shows a detailed look at how the Fig. 9. database, with include 30 figures with name
research shall work. Tests of the proposed system Figure. 9 displays the gallery window as it
will be conducted using Face Recognition app by shows datasets that have been used as trained
MMM developer. images. These data will be used as a comparison.

Fig. 7. App detects the face Fig. 10. face detections show name that connected with face
Figure. 7 shows that a face image will be Figure 10 shows a new input image is put
trained and the system gathers the haar features of into the face recognition window, the window
a face image by converting the image into contains a screen gathering live view of the
grayscale. The rectangle shows the boundaries in camera, it has the option to search the database for
which the features are taken. The boundaries set in similar faces to those detected in the square of the
green is a calculation of the classifier that will be screen. And shows the people detected in the
fed information regarding features and its input image. It shows “2301889376” as seen
minimum threshold to be considered a feature. before on the gallery window, it shows that a face
Adaboost classifier is used to accumulate weak is detected.

152 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

REFERENCES

[1] S. Lukas, A. R. Mitra, R. I. Desanti, and D. Krisnadi, “Student


attendance system in classroom using face recognition technique”,
IEEE, 2021. https://ieeexplore.ieee.org/abstract/document/7763360
[2] N. Babich, “What is computer vision & how does it work? an
introduction”, 2021. https://xd.adobe.com/ideas/principles/emerging-
technology/what-is-computer-vision-how-does-it-work/K
[3] I. Mihajlovic, “Everything you ever wanted to know about computer
vision”, 2019. https://towardsdatascience/everything-you-ever-
wanted-to-know-about-computer-vision-heres-a-look-why-it-s-so-
awesome-e8a58dfb641e
[4] SAS, “Computer Vision What it is And Why it Matters”, 2021.
https://www.sas.com/en_id/insights/analytics/computer-
vision.html#:~:text=Computer%20vision%20is%20a%20field,to%20
what%20they%20%E2%80%9Csee.%E2%80%9DJ
Fig. 11. Dataset Test Result of Application [5] B. Dickson, “What is computer vision”, 2020.
https://www.pcmag.com/news/what-is-computer-vision
Figure 11 shows that the program
[6] EFF, “Face recognition”, October 2017. https://eff.org/pages/face-
struggles to detect faces with expressions and off- recognition
angled pictures. Image 1 and 2 shows frontal [7] I. Sample, “What is facial recognition and how sinister is it?”, 2019.
https://www.theguardian.com/technology/2019/jul/29/what-is-facial-
detection of an image, Image 3 shows an image recognition-and-how-sinister-is-it
with facial expression, and Image 4 shows off- [8] D. Dharaiya, History of facial recognition technology and its bright
angled images. Image 1 and 2 results in an future”, https://readwrite.com/2020/03/12/history-of-facial-
recognition-technology-and-its-bright-future/
average of 96% accuracy with Image 3 and 4 [9] M. K. Kundu, S. Mitra, D. Mazumdar, and S. K. Pal, “Perception and
respectively scored 64% and 18%. machine intelligence”, First Indo-Japan Conference, Kolkata, India,
2012, pp.29.
[10] K. Gates, “Our biometric future: facial recognition technology and the
V. CONCLUSION culture of surveillance”, NYU Press, 2011, pp.48-49.
[11] N. A. Andryani, “Study of viola jones face detection on color image”,
May, 2015. https://media.neliti.com/media/publications/176657-EN-
Based on the research and analysis of the study-of-viola-jones-face-detection-on-c.pdf
[12] M. Zervos, “Multi-camera face detection and recognition applied to
attendance system, a facial recognition system for people tracking", Lausanne, Switzerland: Master Thesis, School of
attendance in Binus University would be very Computer and Communication Science, Ecole Polytechnique, 2013.
useful to minimize human errors and increase [13] J. Kaur, and A. Sharma, “Performance Analysis of Face Detection by
using Viola-Jones algorithm”, International Journal of Computational
efficiency in the classroom. The researched Intelligence Research, India, 2017.
system works by the concept of Viola-Jones [14] M. K. Dabhi, and B. K. Pancholi, “Face Detection System Based on
Viola Jones Algorithm”, International Journal of Science and
Algorithm. First, takes an image to be trained as a Research, 2016.
comparison dataset. It takes an Integral Image [15] T. D. Narayan, and Dr. S. Ravishankar, “Face Detection and
Recognition using Viola-Jones”, Retrieved from Advances in
saves them as their studentIDs. Once datasets Computational Sciences and Technology, 2017.
have been implemented, each session will start by http://ripublication.com/acst17acstv10n5_47.pdf
capturing real-time images from the class [16] K. Cen, “Study of viola-jones real time face detector”, 2016.
https://web.stanford.edu/class/cs231a/prev_projects_2016/cs231a_fin
cameras. Captured images will be used for al_report.pdf
comparison with datasets that have been trained. [17] F. Comaschi, S. Stuijk, T. Basten, and H. Corporaal, “RASW a run
time adaptive sliding window to improve viola-jones object
Once the system recognizes the student or detection”, ICDSC 2013 : 7th ACM/IEEE International Conference
lecturer, an attendance record will be noted. In on Distributed Smart Cameras, 2013.
http://www.wikicfp.com/cfp/servlet/event.showcfp?eventid=28572co
this research, the use of the viola-jones algorithm pyownerid=26710
has advantages that allow frontal face detection [18] S. P. Sulur, “Perancangan aplikasi deteksi wajah menggunakan
algoritma viola-jones”, 2015. http://repository.unpas.ac.id/26827/
accuracy of 95% and above. The proposed
[19] N. Thiruchchelvan, and T. Kanagasabai, “The real time face detection
algorithm struggles to detect turned faces. and recognition system”, International Journal of Advanced Research
in Computer Science & Technology, Sri Lanka, 2017.
[20] K. Aashish, and A. Vijayalakshmi, “Comparison of viola-jones and
kanade-lucas-tomasi face detection algorithms”, Oriental Journal of
Computer Science and Technology, India, 2017.

153 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Comparative of Advanced Sorting Algorithms


(Quick Sort, Heap Sort, Merge Sort, Intro Sort,
Radix Sort) Based on Time and Memory Usage
Marcellino Marcellino Davin William Pratama Steven Santoso Suntiarko Kristien Margi
Department of Computer Department of Computer Department of Computer Suryaningrum
Science Science Science Department of Computer
Bina Nusantara University Bina Nusantara University Bina Nusantara University Science
Jakarta, Indonesia Jakarta, Indonesia Jakarta, Indonesia Bina Nusantara University
[email protected] [email protected] [email protected]. Jakarta, Indonesia
c.id id [email protected]

Abstract—Every algorithm has its own best-case as well as its experiments. Research will be carried out on machines with
worst-case scenario, so it is difficult to determine the best sorting an Intel Core I7-10710U @1.10 GHz processor that has 6
algorithm just by its Big-O. Not only that, the amount of memory cores, with 16GB @2666 MHz of memory on plugged-in
required also affect the algorithm’s efficiency. This research state.
provides an overview for the advanced sorting algorithms, namely
Radix Sort, Heap Sort, Quick Sort, Merge Sort, and Introspective II. ALGORITHM ANALYSIS
Sort, that are used directly in real life work to sort 11K GoodRead’s
data and compare each algorithm, in terms of time required and Based on previous study, researcher do some analysis for
memory usage to complete the sort. The test is completed by using each sorting algorithm to see how the algorithms efficiency
visual studio code to write the application and is implemented and here is the analysis
using python programming language. The program will do the
testing for each algorithm up to 5 times in a row and will be A. Quick Sort
recorded. This research show that Introspective sort is the best at Quick Sort is one of the fastest sorting algorithms which
time and Heap sort is the best at memory usage. is part of many sorting libraries. The running time of Quick
Sort is highly dependent on the selection of pivot elements.
Keywords—Heap, Introspective, Merge, Radix, Sorting
The time complexity of this algorithm is O (n log n) for best
Algorithm, Big-O, Memory Required, Efficient of Timing
and average case. As for the worst case, this algorithm has a
I. INTRODUCTION complexity of O (n2).
Over time, many new sorting algorithms have shown up B. Heap Sort
in order to become the algorithm with the best efficiency. The Heap sort algorithm has a better time complexity notation
algorithms vary widely from those with easy implementation compared to quick sort for the worst case. This algorithm has
to those with complex implementations. The best-known a time complexity of O (n log n) for all cases (best, average,
sorting algorithm is the Quick Sort algorithm, because it has worst).
a good average sorting time efficiency, but this algorithm will
also be the slowest algorithm when it is in that case. This kind C. Merge Sort
of thing is also implied for all kinds of sorting algorithms Merge sort is a sorting algorithm that uses the "Divide and
where each has the best case and the best case. It is not Conquer" method. This algorithm has a time complexity of O
uncommon to find other sorting algorithms that sometimes (n log n). When compared with other algorithms with O (n
have better average cases. Choosing the most efficient sorting log n) time complexity, namely Quick sort, Merge sort, Heap
algorithm is not an easy thing because it must be seen from sort, Team sort, Library sort, Smooth sort, Cube sort, Block
several aspects. Other than the efficiency in matters of time, sort, Tree sort, Intro sort, Tournament sort, Comb sort by
the use of resources in the form of memory and the stability utilizing multicore CPU usage, merge sort has a performance
of the algorithm that is running is also important to consider that is close to quicksort.
in choosing the right algorithm.
This study will provide an overview of the efficiency of D. Introspective Sort
5 sorting algorithms including advanced sorting algorithms, Introspective sort or also known as Introsort is a
namely Radix Sort, Quick Sort, Merge Sort, Introspective comparison sorting algorithm invented by David Musser in
Sort, and Heap Sort. To find out the efficiency of each 1997". It starts with Quicksort, but switches to Heapsort if the
algorithm, the researcher will do the algorithm analysis using recursive depth is too deep to eliminate worst cases and uses
Big-O on the sorting algorithm to make sure the comparison Insertion Sort for minor cases due to good reference locality.
is valid and the researcher will prove the results of the sorting Introsort has O (n log n) worst case and average case runtime
algorithm in terms of time and the large memory needed to and practical performance comparable to Quicksort on a
complete sorting data before comparing the efficiency with typical data set.
the analysis. Each algorithm is implemented using the python E. Radix Sort
programming language version 3.8.5. Each algorithm will
perform a sort order on the same 11,000 data and will take Radix sorting is a form of integer sorting that repeatedly
the average time and memory required which is taken from 5 sorts at a constant number of key bits. Radix sort also has a

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

154 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

fairly good time complexity of O (n) for all cases (best,


average, worst).
From the analysis for all five algorithms used in this
research(Radix Sort, Quick Sort, Merger Sort, Introspective
Sort, and Heap Sort), when considering only on the average
case, researcher can divide the algorithm into 2 classes of
efficiency that is O (n) for Radix Sort and O (n log n) for the
four others algorithm (Quick Sort, Merger Sort, Introspective
Sort, and Heap Sort). Based on previous study O (n)
efficiency is more efficient than O (n log n) for most cases.
So after doing the analysis it can be concluded that the most
efficient algorithm to sort the data is Radix Sort.
III. METHODS
A. Research Methods
In this research, the researcher uses one of the software
development methods, namely the waterfall. In the waterfall
Fig. 1. Flowchart of the Application
software development method, there are several stages
starting from Analysis, Design, Implementation, Testing, and B. Design
Deployment & Maintenance.
At the analysis stage, the researcher determines data that Figure 2 is an activity diagram of the quicksort
will be used as material for sorting. This dataset consists of algorithm. The diagram shows the activities that occur in the
several columns that could use for sorting, namely the book quicksort algorithm. Starting from the starting node
ISBN with a string data type, average rating with a float data represented by the black circle, the variable N will store the
type, and the number of pages with an integer data type. length of the data list. If N has a value of more than one,
However, the data used for this sorting research is the book which means that the dataset has more than one data, then
title column which has a string data type. there will be a pivot selection. In this research, the pivot is the
last element of the list. After that, the list will be divided into
Figure 1 is the design of the flow of the application in this two parts which are data with a value greater and less than the
study. The first thing the researcher does is import and pivot. The next step is to check whether the list has been
convert the data from its original form (.CSV) with help from sorted or not. If the list is already sorted, then the algorithm
a library called Panda so it can be read by programs written process is complete and will print all the data that has been
in Python. After that, the data will be sort five times each, sorted. Meanwhile, if the list hasn't been sorted, the process
using five different sorting algorithms in this research, which will be back to the pivot selection and will continue to like
is Radix, Quick, Merge, Introspective, Heap Sorting that until the list has already been sorted.
algorithm. After collecting the time (in ms units) and memory
(in bytes) used by each algorithm from each iteration, the
average value of these five iterations will represent the
efficiency of each algorithm. After obtaining all the required
data, analysis of these data will be the last thing to carried out
to answer the problem statement of this research.
At the implementation stage, this researcher uses the
Python programming language and uses Visual Studio Code
as a code editor to implement the advanced sorting algorithm
that will be used. The results of the program to be created and
used are also based on the Python programming language.

Fig. 2. Activity Diagram of Quick Sort Algortihm

From the design of activity diagram and explanation of


the quick sort algorithm above, the researcher then
implements it into the program. So it can be seen that the
efficiency of the implemented algorithm is O (n log n).
The dataset which contains 9 sample data is successfully
sorted with a completion time of 1.0297298431396484 ms
and takes up 55 Bytes of memory.

155 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Using the completion time and memory usage data as a


reference and using Population Mean equation (1),
researcher can estimate the time and memory required for
sorting algorithm to complete.

μ = (Σ * X) / N (1)

Where:
Σ means “the sum of”
X means the number of items in dataset
N means the number of items in sample

Table I show the estimated time required for Quick Sort


algorithm to complete each dataset.

TABLE I. ESTIMATED TIME REQUIRED FOR EACH DATASET


Fig. 3. Activity Diagram of Heap Sort Algortihm
Total Data Time (ms)
From the design of activity diagram and explanation of
2500 286.03606754 the heap sort algorithm above, the researcher then
5000 572.07213508 implements it into the program. So it can be seen that the
efficiency of the implemented algorithm is O (n log n).
7500 858.10820262 The dataset which contains 9 sample data is successfully
11000 1,258.55869717 sorted with a completion times of 0.997304916381836 ms
and takes up 448 Bytes of memory.

TABLE III. ESTIMATED TIME REQUIRED FOR EACH DATASET


Table II show the estimated time required for Quick Sort
Total Data Time(ms)
algorithm to complete each dataset.
2500 277.02914344
TABLE II. ESTIMATED MEMORY REQUIRED FOR EACH 5000 554.05828688
DATASET

Total Data Memory (Bytes) 7500 831.08743032

2500 15,277.78 11000 1,218.92823113

5000 30,555.56
7500 Table III show the estimated time required for Heap Sort
45,833.33
algorithm to complete each dataset.
11000 67,222.22

TABLE IV. ESTIMATED TIME REQUIRED FOR EACH DATASET


Figure 3 is a Heap Sort activity diagram. This diagram
Total Data Memory (Bytes)
describes the process of running the algorithm used (Heap
Sort) to sort on a dataset that has been used. This Heap Sort 2500 124,444.44
is similar to Selection Sort, where it must initially specify the
value i which is the Length of the List used, and give a value 5000 248,888.89
of 0 to the root, then it will iterate continuously as long as the 7500 373,333.33
value i is not equal to or less than root. In each iteration will
be exchanged positions When the root value is greater than 11000 547,555.56
the value below it. Starting from the starting node represented
by the black circle.
Table IV show the estimated memory required for Heap Sort
algorithm to complete each dataset.

156 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

11000 63,555.56

Table VI show the estimated memory required for Merge Sort


algorithm to complete each dataset.

Fig. 4. Activity Diagram of Merge Sort Algortihm

Figure 4 is a Merge Sort activity diagram. This diagram Fig. 5. Activity Diagram of Introspective Sort Algortihm
describes the process of running the algorithm used (Merge
Figure 5 is an Activity Diagram Introspective Sort. This
Sort) to sort on a dataset that has been used. Merge sort
diagram describes the process of running the algorithm used
divides the array that wants to be sorted into two parts and so
(Introspective Sort) to sort on a dataset that has been used.
on after each element only contains a single element before
Introspective sort uses a sorting algorithm that has phases and
the element combined will be done sorting first so that when
will change according to the conditions. In Figure 6 it is
recombined into an array that has been sequenced.
shown that this algorithm starts by searching for maxDepth.
From the design of activity diagram and explanation of
After that, the length of the list will be stored at variable n. If
the merge sort algorithm above, the researcher then
n is less than or equal to one then the process is complete.
implements it into the program. So it can be seen that the
Otherwise, it will be checked the value of the variable
efficiency of the implemented algorithm is O (n log n).
maxdepth. If maxDepth is 0 then heapsort will be performed.
The dataset which contains 9 sample data is successfully
If maxDepth is not worth 0 then a partition to that list is stored
sorted with a completion time of 0.9605884552001953 ms
in variable p and goes into the IntroSort function.
and takes up 52 Bytes of memory.
From the design of activity diagram and explanation of
TABLE V. ESTIMATED TIME REQUIRED FOR EACH DATASET the introspective sort algorithm above, the researcher then
implements it into the program. So it can be seen that the
Total Data Time (ms) efficiency of the implemented algorithm is O (n log n).
2500 266.83012645 The dataset which contains 9 sample data is successfully
sorted with a completion time of 0.9975433349609375 ms
5000 533.66025288 and takes up 52 Bytes of memory.
7500 800.49037933 TABLE VII. ESTIMATED TIME REQUIRED FOR EACH DATASET
11000 1,174.05255636 Total Data Time (ms)
2500 277.09537083
Table V show the estimated time required for Merge Sort 5000 554.19074165
algorithm to complete each dataset. 7500 831.28611247

TABLE VI. ESTIMATED MEMORY REQUIRED FOR EACH


11000 1,219.21963162
DATASET

Total Data Memory (Bytes)


Table VII show the estimated time required for Intro Sort
2500 14,444.44 algorithm to complete each dataset using equation (1).
5000 28,888.89
7500 43,333.33

157 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

TABLE VIII. ESTIMATED MEMORY REQUIRED FOR EACH 7500 49,765.50738017


DATASET

Total Data Memory (Bytes) 11000 72,989.41082425

2500 14,444.44
Table IX show the estimated time required for Radix Sort
5000 28,888.89
algorithm to complete each dataset using equation(1).
7500 43,333.33
11000 63,555.56 TABLE X. ESTIMATED TIME REQUIRED FOR EACH DATASET

Total Data Memory (Bytes)


Table VIII show the estimated memory required for Intro Sort 2500 117,777.78
algorithm to complete each dataset using equation (1). 5000 235,555.56
7500 353,333.33
11000 518,222.22

Table X show the estimated memory required for Radix Sort


algorithm to complete each dataset using equation(1).

IV. RESULT AND DISCUSSION


Each predefined sorting algorithm has been implemented
into the same program written in the Python programming
language based on the design stated above, where the user can
find out the efficiency of each advanced sorting algorithm.
There are two types of output generated from the program,
namely the time required to run each algorithm in
milliseconds and the amount of memory used to run each
algorithm in Bytes. The output of the program will be taken
from the average of 5 runs for each algorithm on dataset with
2500 data, 5000 data, 7500 data, and 11000 to ensure an
accurate result.

Fig. 6. Activity Diagram of Radix Sort Algortihm TABLE XI. RESULT OF TESTING THE TIME IT TAKES TO RUN
EACH ALGORITHMS ON 2500 DATA (MS)
Figure 6 is a Radix Sort activity diagram. This diagram
explains the process of running the algorithm used (Radix
Sort) to sort a dataset that has been used. Radix Sort is a
sorting algorithm that focuses on the number of digits so that
it can find out the most significant digits and those that are
not and uses counting sort to complete the sorting. Using
index i to be used as a counting where if i is smaller or equal
to the digit length of an array then TempList can fill in InitList
so that it will sort itself.
From the design of activity diagram and explanation of From table XI above the fastest algorithm to sort 2500 data is
the radix sort algorithm above, the researcher then Intro Sort with the lowest time that is 42.66610146 ms.
implements it into the program. So it can be seen that the
efficiency of the implemented algorithm is O (n). TABLE XII. RESULT OF TESTING THE TIME IT TAKES TO RUN
EACH ALGORITHMS ON 5000 DATA (MS)
The dataset which contains 9 sample data is successfully
sorted with a completion time of 5.982637405395508 ms and
takes up 424 Bytes of memory.

TABLE IX. ESTIMATED TIME REQUIRED FOR EACH DATASET

Total Data Time (ms)


2500 16,588.50246005
5000 33,177.004920112

158 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

From table XII above the fastest algorithm to sort 5000 data TABLE XVII. THE RESULT OF THE TEST OF THE AMOUNT OF
MEMORY USED TO RUN EACH ALGORITHM ON 7500 DATA (BYTE)
is Quick Sort with the lowest time that is 84.82875824 ms.

TABLE XIII. RESULT OF TESTING THE TIME IT TAKES TO RUN


EACH ALGORITHMS ON 7500 DATA (MS)

From table XVII above algorithm with the lowest memory


usage to sort 7500 data is Heap Sort with the lowest time that
is 1.758 Byte.
From table XIII above the fastest algorithm to sort 7500 data
is Quick Sort with the lowest time that is 104.8197269 ms. TABLE XVIII. THE RESULT OF THE TEST OF THE AMOUNT OF
MEMORY USED TO RUN EACH ALGORITHM ON 11000 DATA (BYTE)

TABLE XIV. RESULT OF TESTING THE TIME IT TAKES TO RUN


EACH ALGORITHMS ON 11000 DATA (MS)

From table XVIII above algorithm with the lowest memory


usage to sort 11000 data is Heap Sort with the lowest time
From table XIV above the fastest algorithm to sort 11000 data that is 2.1396 Byte.
is Intro Sort with the lowest time that is 172.685051 ms.
V. CONCLUSION
TABLE XV. THE RESULT OF THE TEST OF THE AMOUNT OF This study uses a self-developed program that can run
MEMORY USED TO RUN EACH ALGORITHM ON 2500 DATA (BYTE)
the five advanced sorting algorithms. The program generates
output that shows the efficiency of time and memory from the
Quick Sort, Heap Sort, Merge Sort, Intro Sort, and Radix Sort
algorithms. The dataset used contains 11,000 books
information such as title, author, ISBN, and others.
After doing the research with five sorting algorithms, the
fastest algorithm is Quick Sort and Intro Sort, although the
sorting time is fast, but both algorithms had a weakness in
terms of high memory usage. From the five algorithms used
in this study, the algorithm that uses the least amount of
From table XV above algorithm with the lowest memory memory is the Heap Sort algorithm. However, this algorithm
usage to sort 2500 data is Heap Sort with the lowest time that does not have the fastest time to sort the data.
is 2.1362 Byte. From the result, it can be concluded that the efficiency of
algorithm based on the Big-O notation can’t be the factor to
TABLE XVI. THE RESULT OF THE TEST OF THE AMOUNT OF see which algorithm is the best, because the behavior of the
MEMORY USED TO RUN EACH ALGORITHM ON 5000 DATA (BYTE) algorithms will change according to be sorted elements. So
this paper can prove the change of behavior of the algorithm
because from the algorithm analysis it can be concluded that
the most efficient algorithm is Radix Sort, but after doing the
research the fastest algorithm is Quick Sort and Introspective
Sort.

REFERENCES
[1] Akhter, N., Idrees, M., & Rehman, F. u. (2016). Sorting Algorithms –
A Comparative Study. International Journal of Computer Science and
From table XVI above algorithm with the lowest memory Information Security (IJCSIS), 14 (12), 930-936.
usage to sort 5000 data is Heap Sort with the lowest time that [2] Ali, M., Nazim, Z., Ali, W., Hussain, A., Kanwal, N., & Paracha, M.
is 2.347 Byte. K. (2020). Experimental Analysis of On (log n) Class Parallel Sorting
Algorithms. IJCSNS, 20(1), 139-148.

159 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

[3] Asaju, B. C., Ekuma, J. N., & Abiola, F. F. (2018). A Comparative [17] Obeya, O., Kahssay, E., Fan, E., & Shun, J. (2019, June). Theoretically-
Analysis of Sorting Algorithm. THE INTERNATIONAL JOURNAL OF efficient and practical parallel in-place radix sorting. In The 31st ACM
SCIENCE & TECHNOLEDGE, 124-133. Symposium on Parallelism in Algorithms and Architectures (pp. 213-
[4] Aung, H. H. (2019). Analysis and comparative of sorting 224).
algorithms. International Journal of Trend in Scientific Research and [18] Paira, S., Chandra, S., & Alam, S. S. (2016). Enhanced Merge Sort-a
Development (IJTSRD), 3(5), 1049-1053. new approach to the merging process. Procedia Computer Science, 93,
[5] Bramas, B. (2017). A novel hybrid quicksort algorithm vectorized 982-987.
using avx-512 on intel skylake. arXiv preprint arXiv:1704.08579. [19] Priyadarshani, S. C. (2018). Parameterized Complexity of Quick Sort,
[6] Buradagunta, S., Bodapati, J. D., Mundukur, N. B., & Salma, S. (2020). Heap Sort and K-sort Algorithms with Quasi Binomial
Performance Comparison of Sorting Algorithms with Random Input. International Journal on Future Revolution in Computer
Numbers as Inputs. Ingénierie des Systèmes d'Information, 25(1), 113- Science & Communication Engineering, 4(1), 117-123.
117. [20] Putri, A. N., & Asmiatun, S. (2018). Augmented Reality as A Display
[7] Chauhan, Y., & Duggal, A. (2020). Different sorting algorithms Information Using Quick Sort. IJAIT (International Journal of Applied
comparison based upon the time complexity. Int. J. Res. Anal. Information Technology), 2(02), 52-57.
Rev., 7(3), 114-121. [21] Reshma, P., & Srikanth, P. (2017). An Analogy Between Different
[8] Faujdar, N., & Ghrera, S. P. (2016). Performance Analysis of Parallel Sorting Algorithms with Their Performances. International Journal of
Sorting Algorithms using GPU Computing. IJCA Proceedings on Advanced Technology in Engineering and Science, 820.
Recent Innovations in Computer Science and Information Technology [22] Shakeel, Engr & Pansota, Muhammad. (2017). Review on Sorting
RICSIT 2016(2):5-11, September 2016. Algorithms - A Comparative Study. International Journal of
[9] Hossain, M. S., Mondal, S., Ali, R. S., & Hasan, M. (2020, March). Innovative Science and Modern Engineering (IJISME), 5 (1), 17-20.
Optimizing complexity of quick sort. In International Conference on [23] Shastri, S., Mansotra, V., Bhadwal, A. S., Kumari, M., Khajuria, A., &
Computing Science, Communication and Security (pp. 329-339). Jasrotia, D. S. (2017). A GUI Based Run-Time Analysis of Sorting
Springer, Singapore. Algorithms and their Comparative Study.
[10] Kumar, P., Gangal, A., Kumari, S., & Tiwari, S. (2020). Recombinant [24] Singh, T., & Srivastava, D. K. (2016). Threshold Analysis and
Sort: N-Dimensional Cartesian Spaced Algorithm Designed from Comparison of Sequential and Parallel Divide and Conquer Sorting
Synergetic Combination of Hashing, Bucket, Counting and Radix Algorithms. International Journal of Computer Applications (0975 –
Sort. Ingénierie des Systèmes D Information, 25(5), 655-668. 8887), 20-33.
[11] Lammich, P. (2020, July). Efficient verified implementation of [25] Stehle, E., & Jacobsen, H. A. (2017, May). A memory bandwidth-
Introsort and Pdqsort. In International Joint Conference on Automated efficient hybrid radix sort on gpus. In Proceedings of the 2017 ACM
Reasoning (pp. 307-323). Springer, Cham. International Conference on Management of Data (pp. 417-432).
[12] Li, S., Li, H., Liang, X., Chen, J., Giem, E., Ouyang, K., ... & Chen, Z. [26] Syed, Abdul & Rehman, Syed & Asif, Mohammad. (2021). A Class of
(2019, November). Ft-isort: Efficient fault tolerance for introsort. Estimator for Population Mean Under SRS. Proceedings of the
In Proceedings of the International Conference for High Performance Institution of Mechanical Engineers Part M Journal of Engineering for
Computing, Networking, Storage and Analysis (pp. 1-17). the Maritime Environment. 1. 9-26. 10.52700/scir.v2i1.7
[13] Majumdar, S., Jain, I., & Gawade, A. (2016). Parallel quick sort using [27] Taiwo, O. E., Christianah, A. O., Oluwatobi, A. N., & Aderonke, K. A.
thread pool pattern. International Journal of Computer (2020). COMPARATIVE STUDY OF TWO DIVIDE AND
Applications, 136(7), 36-41. CONQUER SORTING ALGORITHMS: QUICKSORT AND
[14] Marszałek, Z. (2017). Parallelization of modified merge sort MERGESORT. Procedia Computer Science, 171, 2532-2540.
algorithm. Symmetry, 9(9), 176. [28] Turzo, N. A., Sarker, P., Kumar, B., Ghose, J., & Chakraborty, A.
[15] Mavrevski, R., Traykov, M., & Trenchev, I. (2019). Interactive Defining a Modified Cycle Sort Algorithm and Parallel Critique with
Approach to Learning of Sorting Algorithms. International Journal of other Sorting Algorithms.
Online & Biomedical Engineering, 15(8). [29] Zhu, Z. G. (2020, May). Analysis and Research of Sorting Algorithm
[16] Moghaddam, S. S., & Moghaddam, K. S. (2021). On the Performance in Data Structure Based on C Language. In Journal of Physics:
of Mean-Based Sort for Large Data Sets. IEEE Access, 9, 37418- Conference Series (Vol. 1544, No. 1, p. 012002). IOP Publishing.
37430.

160 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Factors that Affect Data Gathered Using Interviews


for Requirements Gathering
Edward Rezzky Hendra Russell Otniel Tjakra Alexander A S Gunawan
Computer Science Department Computer Science Department Computer Science Department Computer Science Department
School of Computer Science School of Computer Science School of Computer Science School of Computer Science
Bina Nusantara University Bina Nusantara University Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia 11480 Jakarta, Indonesia 11480 Jakarta, Indonesia 11480 Jakarta, Indonesia 11480
[email protected] [email protected] [email protected] [email protected]

Abstract— Building software does not rely heavily on the reality is much more complicated than it seem. Requirements
technical side, i.e., coding. There are initial steps that need to be gathering is also called as requirements elicitation because the
done before constructing the software, that is requirements good requirements cannot just be collected from the
gathering. Requirements gathering is frequently called as stakeholders but must be discovered. There are at least three
requirements elicitation because the good requirements cannot main challenges [5] in requirements gathering, that is: (i)
just be collected from the stakeholders but must be discovered. problem of knowledge. The stakeholders don’t really know what
Therefore, gathering requirements should be done using is needed and don’t understand the capabilities and limitations
interviews. While it seems that interviews can be done easily, the of the software. (ii) problem of scope. The boundary of the
interview process can massively impact the result of the software
software does not be defined well because the stakeholders don’t
by discovering hidden requirements. This study would like to find
out the factors that can affect the data gathered using the interview
clarify the software objectives and its technical detail. (iii)
method by conducting a systematic literature review (SLR). Based problem of change. The stakeholders change the requirements
on our study, we can conclude that the main factor in the success because several factors, such as change in user needs or conflict
of an interview relies on the interviewer or the analyst. Since of interest.
interviewing is a social act, their communication skills will play a This study would like to discuss how the requirement
big role. Not only that, but the technical standpoint on the gathering methods, especially interviews, on how it can
interviewing is also considered, with choosing the right structure
critically capture important facts needed to build software. The
of the interview and basic knowledge on the problem at hand.
interview method is generally used by others to gather facts
Keywords— systematic literature review, interview, requirements
elicitation, effective communication
about the data and facts and can discover the hidden
requirements from the stakeholders. Developers can learn about
I. INTRODUCTION the terminology, problems, benefits, constraints, requirements,
and priorities of the desired organization and system. In this
Building software does not rely heavily on the technical side, case, the focus that needs to be considered is the factors that can
i.e. coding. There are initial steps that need to be done before affect the result of interviewing the stakeholders, which are the
constructing the software itself. Some might say that the first people who will work with the analyst to make the software.
initial steps are crucial to the result of the software at the end. Moving forward, this paper will be divided further into 4
One of the important steps is the requirements gathering phase. chapters: the methodology of our study in chapter 2, followed
In hindsight, requirement collection and analysis involve by the result of the systematic literature review (SLR) for
gathering data and analyzing them. Ejaz et al stated chapter 3. Discussions about the results that the authors gathered
requirements gathering is the first step in a software will be in chapter 4. And this paper will be closed with a
development life cycle and it plays a vital role in the success of conclusion in chapter 5.
the software later on [1]. Requirement gathering usually
involves meetings with clients, staff, and people who are going II. METHODOLOGY
to interact and need the software. This stage involves asking the
people what kinds of problems they experienced, resulting in the The methodology that will be used for this paper is a
need to build software. There are many techniques for gathering systematic literature review (SLR). This involves using research
information and they are called fact-finding techniques. Fact- questions that will be answered in the further chapters and
finding techniques include research, questionnaires, interviews, reviewing collected data that are found from journals, thesis
etc [2]. These techniques will later become the basis for papers, research papers, articles, etc. The study was made based
influencing the quality and description of the study results. This on references to several other research papers with the same
process is usually the duty of the analyst to gather data from the concept. The results of the referenced papers affect the answers
stakeholder [3]. Since gathering requirements is important, the to each existing research question. Therefore, it can be
use of appropriate techniques and or tools are required [4]. concluded that the quality of this study is highly dependent on
the reference paper and ultimately has an impact on the quality
The requirements gathering may seem simple: ask the of the results of this paper. The authors completed steps to find
stakeholders what the objectives for the software. However, the the list of papers. Searching for relevant papers using Google

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

161 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Scholar search engine using the search string “Requirement A. RQ1: What are the factors that affect the requirement
Elicitation AND Interview”, “Requirement Elicitation AND gathering stage?
Effective Communication”, “Requirement Gathering AND The requirements process is very important in the stage of
Interview”, “Requirement Gathering AND Effective
collecting necessary data. This is useful for the development of
Communication”. From these searchers, the authors gathered a
a system that will be used. Requirement gathering also helps to
total of 29 papers that does not contain duplicate papers. Table
1 below lists the numbers of papers used from each database of analyze and implement raw data into data that has been formed
electronics available. as the basic information of a system. Before knowing what
factors influence collecting data, the purpose of doing data
TABLE I. LIST OF PUBLICATIONS THAT ARE USED gathering should be known first. Knowing the purposes of the
data collected and preparing questions using qualitative and
Database of Electronics Number of papers used
quantitative methods are important. This is an indicator to
Academia.edu 1 determine the initial purpose of data gathering [6]. When
IEEE 7
selecting techniques and methods, the effectiveness of those
techniques is very important and can affect the requirements for
Science Direct 3 creating a system or project. Many people choose the wrong
Springer 5 technique resulting in collecting data inefficiently and the data
is obtained is not relevant [7].
Research Gate 1
The technique that will be discussed is the interview
SSRN 1 technique. Firstly, some factors that need to be known using
PLOS 1
this technique are how the interviewer can give attention and
explicit body gestures to the stakeholder. Secondly, interaction
Taylor & Francis 1 between the interviewer or analyst and the stakeholder is
Wiley Online Library 5 important to get relevant answers. The information answered
depends on the quality of the questions asked by the
Emerald Insight 1
interviewer. Third, the interviewer must master exactly what
Good Fellow Publishers 1 will be asked since the quality of answers from the stakeholders
IJCSE 1
will affect the data gathered greatly. Attitudes, beliefs, and
opinions can be expressed well [8]. As stated above, one of the
IET 1 influencing factors is the question. The interviewer must know
Total 29 the exact question and the reason why it was formulated. Then
the length of time in collecting data must also be determined.
Sources of answers when collecting data must also be possible
The main reason that this systematic literature review is and following data needs [9].
conducted is to find out the factors that can affect the data which Other factors in data gathering can be identified into five
is gathered using the interview technique. To find out the techniques which are delegated in some cases. The first is to
answers, we formulated a series of research questions: filter the observational data into suitable variables to be asked.
The second is to collect data records and turn them into more
• RQ1: What are the factors that affect the requirement structured and analyzable data. Third, make the dependent
gathering stage? variable so that it can be observed. Fourth is to visualize
• RQ2: Is the interview method the most commonly used structured data to be manipulated so that there are data variants.
method to gather requirements? Fifth, being able to control the variance of the required data and
go through the data selection process until the data is used in
• RQ3: What are the factors that affect the interview method the fulfillment of data collection [10].
during requirement gatherings?
This is not an easy process. There must be a dependency
The main filter criteria in our study are research papers that between the two users, namely the analyst and the stakeholders.
uses English which are included in conference proceedings, Analysts do not only ask for the stakeholder’s requirements but
journals and literature reviews. Papers that were selected were there is a further involvement to collect good data to solve the
published between 2012 and 2019. The findings of our study problems [11]. According to Kiran and Ali, during this process,
will be discussed in the results and discussions chapter. the analyst should focus on understanding stakeholders’
requirements, vision, and constraints of the system which is
III. RESULTS
going to be developed [12]. To collect data, an analyst is needed
This is the result of the systematic literature review (SLR) who can measure the characteristics of the project he is
that the authors had conducted to answer the research undergoing. Analysts must be able to match what methods and
questions. techniques are used to obtain the right data. Not all methods and
techniques can be used to collect data. Each method has its
advantages and disadvantages. Some think that combining
several methods makes collecting data easier [13].

162 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

B. RQ2: Is the interview method the most commonly used TABLE II. FACTORS THAT CAN AFFECT THE INTERVIEW METHOD
RESULTS
method to gather requirements?
The interview method is one of the fact-finding techniques Title Authors Factor
that uses a unique method to get information from interviewees.
There are four stages of requirements that need to be fulfilled The role of domain I Hadar, P Soffer, Elicitation process is a
by the interviewer to conduct a good interview which are by knowledge in K Kenzi communicative and
identifying sources and collecting data, identifying the needs of requirements elicitation social activity, thus
via interviews: an requiring the ability to
company or organization owners, identifying required exploratory study communicate
information, and identifying project relationships and required effectively.
product results. The results of data gathered with interviews can
immensely improve the project quality [14]. Information Gathering O Emoghene and The way the analyst
The unique method that interviews use is face-to-face Methods and Tools: A OF Nonyelum structures the interview
Comparative Study can massively impact
interaction with synchronous time and place, hence makes it the data gathered.
superior and more effective to gather data than other fact-
finding techniques [15]. Even though a novice analyst once said
an interview is the easiest technique of fact-finding, These papers in the table above will be the basis of the
interviewers need to make meaningful questions that are answer to the research question. And these factors can be
relevant to the data needed without ambiguity and vagueness categorized into two types which are the social factors, and the
[16]. Another thing that is needed for a successful interview is technical factors. Since the first factor is about effective
the interviewer itself. communication, which is hard to obtain, and it can significantly
Spoletini et al. performed a study using two groups of impact the outcome of the result gathered [19] [20] . And the
students from KSU and UTS. The study consisted of two second factor is the interview itself and how it is structured.
phases, participants performed roleplay requirement elicitation There are several structures to an interview and the analyst
interviews and after performing the interviews, they were should be able to pick the correct one [4].
required to review the result. The first group consists of 30 third IV. DISCUSSIONS
year and fourth year undergraduate students from KSU that
belonged to the User-Centered Design course. Students were A. Strategies to Collecting Data
given a two-hour lecture by the authors on requirements Collecting data requires a good strategy for the benefit of
elicitation. The second group consists of 12 Masters of analyzing data. Four strategies can be used are, first is to build
Information Technology students from UTS. Most of the a search strategy and conduct a search for relevant studies.
students were first year and belonged to the Enterprise Business Before collecting data, stakeholders should have developed a
Requirements course. The authors prepared the students with strategy that will be used to search for data. Stakeholders can
an introductory lecture on requirements elicitation that included also read case studies, papers, or articles that are relevant to the
how to run interviews. From the experiment, they concluded project to be completed. The second is to carry out the study
that interviewers are crucial to the success of the interview. selection process. A related case study about the project is the
Lack of communication creates a lot of elicitation problems value of knowledge to facilitate data retrieval. The third is to
which factors in the failure of the data gathered [17]. assess the quality of the study. Stakeholders can sort out which
Therefore, as an interviewer, they must learn how to prepare study quality is guaranteed and good. Fourth is data extraction
what is needed to lead a conversation. These include time and analysis. After the data has been collected, the last thing is
management since it’s a big mistake for the interviewer in to analyze the data into a mature data form that can be used for
general that causes a lack of sufficient data that is irrelevant project purposes [3].
[18]. Not only that, but the interviewer also needs to know the Requirements gathering engineering is divided into five
mood of the interviewee with the purpose of gathering the steps, including requirements elicitation, requirements analysis,
correct and relevant answer. requirements documentation, requirements validation, and
requirements management. The five requirements above must
C. RQ3: What are the factors that affect the interview method
be carried out correctly and following the chosen technique and
during requirement gatherings?
method. Need elicitation means using intelligence techniques
The objective of this research question is to find out the through conversation to obtain information that is being
factors that are crucial in the success of using the interview explored. The next step is a needs analysis to get what data is
method to gather requirements. To answer this research needed and what data does not have an important effect. This
question, the authors reviewed related works of literature to process is also often called the data sorting process. Then there
determine the main factors that system analysts can look at if is documentation from several articles, papers, journals, and the
they decide to interview their clients to gather information. like. Then there is the re-validation of the extracted data and
Table 1 below is the list of papers that the authors have data management to get structured and neat data [5].
discovered which implies the factors that are in play during
interviewing stakeholders.

163 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

B. Effective Method to Gather Data trust between themselves and the stakeholders. Niazi et al.
The most significant technique in data gathering is a conducted a systematic literature review and found out that
structured interview. Many techniques out there are taken from researchers agreed that trust is crucial in the result of the
several journals and papers but are less effective. Experience in requirements elicitation [3]. This is of course the job of an
collecting data is also a person's identity as a relevant analyst to be able to establish a good relationship with the
requirement. So, it can be concluded that the selection of stakeholders so that they can get information easily and
interview techniques is the most effective because it can obtain comfortably [19] [25]. Coughlan and Macredie, as cited in
results of analysis and a lot of information [21]. From the Hadar et al. stated that requirements elicitation must employ
customer side who is the target for collecting data, it must be intense communication with the stakeholders to have a good
rational to provide good answers with an understanding of relationship and overcome any differences that may exist [26].
stakeholder perceptions. Therefore, the decision to collect the Communication skills cannot be directly considered in the
right data is a feasible solution. Of course, there are various act of speaking, but the listening role is also a part of it. Bano
obstacles in making decisions. This can be overcome with an et al. conducted experiments with university students where
important knowledge base and of course the soft skills they role-played as analysts, and they were given a task to
possessed by stakeholders and customers [22]. gather requirements from the stakeholders. Students were
Donati et al. conducted a study on a set of 38 students from assigned into groups of 3-4 members. Each team had to conduct
a university during their third year and fourth year. The students three interviews over 3 weeks with stakeholders. After each
were assigned to be analysts and customers. The analysts will interview, each team was required to submit the minutes of their
ask customers about a novel computer intensive system that meeting with the stakeholders within 2 days to see if they had
they are interested to develop, and they are given week to think understood on the requirements that the stakeholders had given
about this topic. Before conducting the experiment, the authors them. Interviews were conducted a week from each other so
asked the students who were assigned to be the analysts that students can prepare for the interview and learn from the
questions regarding their experiences and skills as an past mistakes that they have made from the previous interviews.
interviewer. Students responded with various answers from no Feedbacks were given to the groups from the tutor which act as
experience to highly experienced in interviewing. Not only that, the stakeholders too. The result of the experiment shows that
but analysts were also lectured about requirements elicitation poor listening skills are part of the communicative act as well,
interviews. The type of the interview that was used was the hence making it a one of a communication skill. An analyst with
unstructured method. The result of the experiment concludes good listening skills enables them to quickly grasp an
that the interview is in fact the most effective method to gather understanding of what the stakeholder is talking about. This can
data from interviewees because of the technique which is face- lead to an easier time having to map out a concept of what is
to-face interaction that can get the answer that is relevant. going to be built. In contrast, having poor listening skills can
However, interviews can be hard if the interviewer is a novice lead to misinterpretations. The role plays that Bano et al. made
or have no communication skills [23]. Therefore, the gave an insight into how this can happen. The ego of the analyst
interviewer needs to learn basic communication skills, time is considered to be one of the main reasons why analysts have
management, pacing, and make good and meaningful poor listening skills because they feel overconfident that they
questions. With these skills, interviews can be conducted to have a clear understanding of the problem. While in reality,
retrieve data that are helpful to the project and gaining trust they misinterpret what the stakeholder is sharing with them
from stakeholders. [24]. Another way to describe misinterpretations during
interviews is called ambiguities.
C. Communication Skills According to Ferrari et al. ambiguities in the requirements
Neetu Kumari S., as cited in Ali et al stated that the lack of elicitation means that when a customer articulates a unit of
communication skills can lead to elicitation problems which information but is misinterpreted by the analyst [27]. Analysts
result in failing to achieve the requirements that are stated by aren’t able to master all areas of expertise and during their
the stakeholders [19]. Bano et al. stated that interviews are a career, they might stumble upon working with people from
communicative activity in which it could be categorized as other areas that are out of their knowledge. That will result in
intensive since the analyst has to be involved in face-to-face ambiguities that are likely to happen during requirements
interaction with people they are working with other than the gathering. Ali and Lai expressed a problem that will arise
analyst’s team [24]. The communication skill of the analyst during communication, which is when different terminologies
plays an important role in being able to connect with the are used to represent similar concepts [20], which leads to
stakeholders both technically and socially. On the social side, ambiguities. Ambiguities can be avoided by asking for further
analysts should be able to express their thoughts very clearly to clarifications on the terms that the analyst doesn’t understand
the stakeholders, making them understand the analyst much [27]. Doing this simple thing can avoid ambiguities and results
better and connecting with them socially to make them feel in avoiding a high percentage of failure of the project. Being
comfortable to tell the analyst all the information they need for able to share an understanding between the analyst and the
the project that they are trying to build. Having good stakeholder is important in avoiding ambiguities as well. This
communication skills can benefit a lot in the requirements is also one of the features of having communication skills as
gathering phase. One of those benefits is being able to create mentioned above.

164 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Analysts should be trained to acquire communication skills view, this can be one of the main reasons why having good
that are required for requirements gathering. Garcia et al. made communication skills from the point above can benefit highly
a research paper where they break through the common barriers during requirements gathering. Niazi et al. confirmed this
that requirement elicitation should be taught in a new way to statement from the research that they published. Their research
produce quality analysts [28]. Bano et al. agreed with what involves reading through several articles and finding out that
Garcia et al. had researched on by saying that computer science- trust between the client and the vendor is important [3].
related degrees don’t take requirement elicitation seriously but Even though Emoghene and Nonyelum pointed out that
instead overlook this course [24]. In Garcia’s research paper, using semi-structured and unstructured interviews is better than
they proposed a new way of training university students to elicit structured interviews since it bores the interviewee [4], other
requirements which is by simulating requirements gathering researchers argued otherwise. Davis et al. as cited in Carrizo et
using a game that they built. The aims of the game are to learn al. stated in their research paper that structured interviews
and interpret the library’s functions and operations. And then appear to be one of the most effective requirements gathering
they need to propose better ways to perform the functions and techniques [29]. They supported this statement by stating that
operations from the library. The students played the game careful preparation of interviews can bring much more impact
individually and they must interact with the in-game characters to the result of the data gathered. The tests that they gathered
to elicit requirements. Characters that are present inside the shows that open interviews or unstructured interviews will
game are students, teachers, library staff, book suppliers, result in poorer results than using structured interviews. Since
security personnel, cleaning staff etc. Students must be able to structured interviews can result in better data qualities, there is
identify stakeholders to gather valid requirements. The authors a possibility that novice analysts can elicit requirements better
stated that different characters will give out different responses. than expert analysts. Kato et al, as cited in Mishra et al.
This idea is to minimize the number of fresh graduates who are expressed that this is possible with the help of modeling several
unable to gather information effectively once they begin to kinds of knowledge such as project-specific knowledge or how
make projects for stakeholders. Because sometimes theories do to ask questions in a certain situation during interviews etc.
not always coincide with how they are in practice, that is why [13]. It is also important to prepare and plan the interview as a
requirement elicitation training is crucial. The game teaches the team, instead of relying on an individual since an interview is
players to ask the right questions, listen carefully to the commonly done face-to-face between the analyst and the
stakeholders, and collect important information [28]. Garcia et stakeholder [24].
al. conducted an experiment where they took several university Unstructured and semi-structured interviews can be carried
students to play the game and to learn how to gather out successfully with one factor in mind: experience. The
requirements and several others without playing the game. The experience of the analyst should be good enough to interview
students who played the game declared that the exercises inside without a script or with a script which is a tool to guide the
the game helped them correctly understand the requirements interview, otherwise, they might elicit requirements incorrectly
that the stakeholder wants during the interview session. On the [3]. But then this has to go well with communication skills as
other hand, the students who didn’t play the game, who took well. Because Bano et al. conducted an experiment where a
the traditional in-class learning approach stated that they can’t student relied on their experience rather than planning the
gather requirements as well as the students who played the interview. And the results show that the student made more
game as they felt that they didn’t learn correctly on how to mistakes purely because of their overconfidence.
conduct an interview [28]. Jacob and Furgerson, as cited in Bano et al. gave tips on
how to carry out an interview, starting with having a script from
D. The Interview Structure
the beginning to the end of the activity [24]. If the analyst
An interview can be both a formal or an informal act, believes in their natural communication skills and experience,
whether it’s over lunch or in a meeting room inside a big office. then they can opt out of this option. If an analyst wants to
Both of these can be viewed as an interview as long as there’s approach the semi-structured interview, then this script will be
an interviewer and an interviewee. And from the systematic useful to guide the interview and can act as a reminder for the
literature review that was conducted, the authors have found analyst to ask important questions. The script can also act as a
that the structure of an interview and how the interview is laid note where the analyst can write the answers that the
out is an important aspect to be looked at. An interview can be stakeholders gave. Jacob and Furgerson stated further that the
structured into three types: unstructured, semi-structured, and analyst doesn’t always have to stick to the script, even
structured [4]. A study that Emoghene and Nonyelum mentioning that they should improvise once in a while. Because
conducted shows that interviewing in a semi-structured and an answer that the stakeholder gives the analyst can trigger the
unstructured way proved to be a fun and engaging activity [4]. analyst to ask another question that is relevant to the topic
This is because the interview process is considered to be which isn’t part of the script.
informal and ‘free’, not interrogating the stakeholders which
can make them feel unsafe or uneasy [24]. Engaging with the V. CONCLUSION
stakeholder is also an advantage and can benefit highly to the In conclusion, the main factor relies on the interviewer on
success of the requirements gathering process since the their abilities to carry out the interview. The abilities vary from
stakeholder can trust the analyst and they can develop a good a technical and social standpoint. Since interviewing is a
relationship throughout the project, from the authors' point of

165 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

communicative act, communication skill is a big part in [14] N. L. Atukorala, C. K. Chang and K. Oyama, "Situation-Oriented
ensuring that the data gathered is relevant to the project. Not Requirements Elicitation," in 2016 IEEE 40th Annual Computer
Software and Applications Conference (COMPSAC), Atlanta, 2016.
only that but creating a great relationship with the stakeholder
[15] P. Jakkaew and T. Hongthong, "Requirements elicitation to develop
is also a part of having good communication skills, which can
mobile application for elderly," in 2017 International Conference on
contribute to the project as well. Digital Arts, Media and Technology (ICDAMT), Chiang Mai, 2017.
In future studies, the authors plan to conduct experiments [16] E. Adhabi and C. B. Anozie, "Literature review for the type of interview
with participants to validate the data that is gathered and in qualitative research," International Journal of Education, vol. 9, no.
possibly learn more important factors along the way. 3, pp. 86-97, 2017.
[17] P. Spoletini, A. Ferrari, M. Bano, D. Zowghi and S. Gnesi, "Interview
Review: An Empirical Study on Detecting Ambiguities in Requirements
REFERENCES Elicitation Interviews," in International Working Conference on
Requirements Engineering: Foundation for Software Quality, -, 2018.
[18] S. Lochrie, R. Curran and K. O'Gorman, Research Methods for Business
[1] A. Ejaz, A. Khalid, S. Ahmed and M. D. A. Cheema, "Effectiveness of and Management, Oxford: Goodfellow Publishers, 2015.
Requirements Elicitation Techniques in Software Engineering Process:
A Comparative Study Based on Time, Cost, Performance, Usability and [19] Z. Ali, M. Yaseen and S. Ahmed, "Effective communication as critical
Scalability of Various Techniques," BEST: International Journal of success factor during requirement elicitation in global software
Management, Information Technology and Engineering, vol. 4, no. 5, pp. development," International Journal of Computer Science Engineering
23-28, 2016. (IJCSE), vol. 8, no. 3, pp. 108-115, 2019.
[2] U. Rafiq, S. S. Bajwa, X. Wang and I. Lunesu, "Requirements Elicitation [20] N. Ali and R. Lai, "A method of requirements elicitation and analysis for
Techniques Applied in Software Startups," in 2017 43rd Euromicro Global Software Development," Software: Evolution and Process, vol.
Conference on Software Engineering and Advanced Applications 29, no. 4, pp. 1-27, 2017.
(SEAA), Vienna, 2017. [21] C. Pacheco, I. Garcia and M. Reyes, "Requirements elicitation
[3] M. Niazi, S. Mahmood, M. Alshayeb, M. R. Riaz, K. Faisal, N. Cerpa, techniques: a systematic literature review based on the maturity of the
S. U. Khan and I. Richardson, "Challenges of project management in techniques," IET Software, vol. 12, no. 4, pp. 365-378, 2018.
global software development: A client-vendor analysis," Information [22] N. Mukherjee, A. Zabala, J. Huge, T. O. Nyumba, B. A. Esmail and W.
and Software Technology, vol. 80, no. -, pp. 1-19, 2016. J. Sutherland, "Comparison of techniques for eliciting views and
[4] O. Emoghene and O. F. Nonyelum, "Information Gathering Methods and judgements in decision-making," Methods in Ecology and Evolution,
Tools: A Comparative Study," The IUP Journal of Information vol. 9, no. 1, pp. 54-63, 2018.
Technology, vol. 13, no. 4, pp. 51-62, 2017. [23] B. Donati, A. Ferrari, P. Spoletini and S. Gnesi, "Common Mistakes of
[5] H. Dar, M. I. Lali, H. Ashraf, M. Ramzan, T. Amjad and B. Shahzad, "A Student Analysts in Requirements Elicitation Interviews," Requirements
Systematic Study on Software Requirements Elicitation Techniques and Engineering: Foundation for Software Quality, vol. 10153, no. -, pp. -,
its Challenges in Mobile Application Development," IEEE Access, vol. 2017.
6, no. -, pp. 63859-63867, 2018. [24] M. Bano, D. Zowghi, A. Ferrari, P. Spoletini and B. Donati, "Teaching
[6] G. A. Stevens, L. Alkema, R. E. Black, J. T. Boerma, G. S. Collins, M. requirements elicitation interviews: an empirical study of learning from
Ezzati, J. T. Grove, D. R. Hogan, M. C. Hogan, R. Horton, J. E. Lawn, mistakes," Requirements Engineering, vol. 24, no. 3, pp. 259-289, 2019.
A. Marušić, C. D. Mathers and C. J. L. Murray, "Guidelines for Accurate [25] A. Ferrari, P. Spoletini, B. Donati, D. Zowghi and S. Gnesi, "Interview
and Transparent Health Estimates Reporting: the GATHER statement," Review: Detecting Latent Ambiguities to Improve the Requirements
PLOS Medicine, vol. 13, no. 8, pp. -, 2016. Elicitation Process," in 2017 IEEE 25th International Requirements
[7] P. O'Raghallaigh and D. Sammon, "Requirements gathering: the Engineering Conference (RE), Lisbon, 2017.
journey," Journal of Decision Systems, vol. 25, no. S1, pp. 302-312, [26] I. Hadar, P. Soffer and K. Kenzi, "The role of domain knowledge in
2016. requirements elicitation via interviews: an exploratory study,"
[8] I. McLafferty, "Focus group interviews as a data collecting strategy," Requirements Engineering, vol. 19, no. 2, pp. 143-159, 2014.
Journal of Advanced Nursing, vol. 48, no. 2, pp. 187-194, 2014. [27] A. Ferrari, P. Spoletini and S. Gnesi, "Ambiguity and tacit knowledge in
[9] J. Rowley, "Conducting research interviews," Management Research requirements elicitation interviews," Requirements Engineering, vol. 21,
Review, vol. 35, no. 3/4, pp. 260-271, 2012. no. 3, pp. 333-355, 2016.
[10] R. Bloomfield, M. W. Nelson and E. Soltes, "Gathering Data for [28] I. Garcia, C. Pacheco, A. León and J. A. Calvo-Manzano, "Experiences
Archival, Field, Survey, and Experimental Accounting Research," of using a game for improving learning in software requirements
Journal of Accounting Research, vol. 54, no. 2, pp. 341-395, 2016. elicitation," Computer Applications in Engineering Education, vol. 27,
no. 1, pp. 249-265, 2019.
[11] S. Tiwari, S. S. Rathore and A. Gupta, "Selecting requirement elicitation
techniques for software projects," in 2012 CSI Sixth International [29] D. Carrizo, O. Dieste and N. Juristo, "Contextual Attributes Impacting
Conference on Software Engineering (CONSEG), Indore, 2012. the Effectiveness of Requirements Elicitation Techniques: Mapping
Theoretical and Empirical Research," Information and Software
[12] H. M. Kiran and Z. Ali, "Requirement Elicitation Techniques for Open Technology, vol. 92, no. C, pp. 194-221, 2017.
Source Systems: A Review," International Journal of Advanced
Computer Science and Applications, vol. 9, no. 1, pp. 330-334, 2018.
[13] D. Mishra, S. Aydin, A. Mishra and S. Ostrovska, "Knowledge
management in requirement elicitation: Situational methods view,"
Computer Standards & Interface, vol. 56, no. -, pp. 1-13, 2018.

166 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

The Impact of E-Transport Platforms’ Gojek and


Grab UI/UX Design to User Preference in Indonesia
Henry Hamilton Prasetya Bima Bagaskarta Ridwanto Muhammad Ashraf Rahman Alexander Agung Santoso
Computer Science Department Computer Science Department Computer Science Department Gunawan
School of Computer Science School of Computer Science School of Computer Science Bina Computer Science
Bina Nusantara University Bina Nusantara University Nusantara University Jakarta, Department School of
Jakarta, Indonesia 11480 Jakarta, Indonesia 11480 Indonesia 11480 Computer Science Bina
[email protected] [email protected] [email protected] Nusantara University
Jakarta, Indonesia 11480
[email protected]

Abstract—UI/UX are the elements of an app Gojek and Grab are the two largest E-hailing
that are experienced first-hand by the user, and service companies serving Indonesia. Gojek as an
is a factor in the user’s engagement with the application launched in 2015, while Grab entered
app. In Indonesia, Grab and Gojek are two Indonesia in 2013 as GrabTaxi, before rebranding
main competitors in the E-hailing application as Grab in 2016. Between the two giant players in
market. The purpose of this paper is to the industry, there exist a rivalry between the two
determine if UI/UX is the main factor of user apps that creates a competition to offer the best
preference between the two apps and identify service and attract customer interest. As both
the UI/UX elements that are preferred or companies utilize mobile applications as their main
avoided by users in both apps by utilizing method of interaction with the user base, the User
Shneiderman’s rules for UI elements as a Interface (UI) and User Experience (UX) of the
baseline. The paper will conduct a comparison application itself represents a major factor of the
to determine the usability when compared to the companies’ ability to retain user trust, attention,
baseline. Survey will be conducted by giving a and satisfaction [3]. UI is a subset of UX that
comparison and asking users about their focuses on the visual elements, while UX includes
preference, followed up by a questionnaire the overall experience of the user while interacting
using the System Usability Scale (SUS) method with the application, which itself is an extension of
that identifies and scores 10 subjective factors the company’s products and services [4]. The
from the overall User Experience of each app. UI/UX design of a mobile application will directly
influence the branding of the company, and the
Keywords—UI/UX, E-hailing, Gojek, Grab, related aesthetics affect how the app is perceived
User Preference, Mobile, Design. and recognized by users [5].
I.INTRODUCTION
Because of this, both Grab and Gojek strive to
In this era, Mobile applications have generated new create a UI/UX design that is distinct, yet at the
market opportunities that result in a disruptive same time remain functional and practical to the
impact across a broad range of sectors. One of average user. This, combined with their similar
which is the proliferation of E-hailing applications evolution to include other services into their
as the most prominent solution to the problems application, has led to each application having a
present in the transportation sector in Indonesia. similar UI/UX design yet with minor differences.
Online Transportation applications are a form of Based on this background, we are interested in
sharing economy platforms that enables doing a research about the impact of UI/UX that
individuals to share goods and services[1]. In this every business gives to users of online
case, E-hailing is the sharing of vehicles and transportation or E-Hailing titled: “The Impact of
passengers to reduce vehicle trips, traffic E-Transport Platforms’ Gojek and Grab UI/UX
congestion, costs, and emissions[2]. In Indonesia, Design to User Preference in Indonesia.”. This
E-hailing apps also provide the answer to the study is based on customers that use online
populace’s concerns regarding the comfort, safety, transportation or E-Hailing services from GOJEK
and ease of use of public transportation. and Grab in Indonesia.

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

167 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

II. LITERATURE REVIEW Adding new features is also a way


The overwhelming growth of today’s companies can gain user interest towards their app.
technology could be seen by the amount of times One of the new features of Gojek and Grab recently
technology has played a part in our daily lives. One is that they will ensure the driver's health condition
of the fields where technology has assisted us is and make sure they are following health protocol,
transportation, where there are various to adjust to the current situation (Covid-19
technologies that assist our mobility. An Pandemic, 25th April 2021). These two apps have
application is one of the many technologies that is an E-Wallet feature that can make transactions
used by companies to fulfill the needs of the people. easier. With the difference of a proprietary system
The growth of these applications in the for Gojek, or a third-party system for Grab in the
transportation sector is led by the people’s concern form of OVO. This E-Wallet Payment system
when using public transportation in the past in functions the same as Cash payment, but the
terms of safety, ease of use, comfortability, and difference is that the money has been digitalized in
price [6]. With that, E-hailing applications appear the E-Wallet system and can be accessed online
to tackle obstacles in the digital era [7]. In [13]. There are many other features offered by
Indonesia, there are currently two popular E- different companies, that makes the user able to
hailing apps which are Gojek and Grab. These two fulfill their needs and interests within the app.
applications are based on the mobile platform (iOS
and Android) that both users and drivers can access. As such, the UI / UX of both applications
Recently, Gojek and Grab have transitioned to a have been optimized and continually updated, both
more collaborative model, but they remain the two to accommodate new features, and to keep users
largest competitors in the E-hailing market in interested. A good UI works to satisfy the user’s
Indonesia [8]. sensory and functional needs when operating the
application [14] and needs to be designed so that
The User Interface (UI) and User inexperienced users can begin to use with no
Experience (UX) are both essential because it is a problems, while also accommodating 'expert’ users
necessity for the user to communicate and navigate who wants to navigate the app as efficiently as
the app [9]. Every company will try to create a possible [15]. Meanwhile, User Experience is a
UI/UX that can be accepted by a wide user base.
comprehensive concept describing the subjective
User Interface in the mobile platform faces
challenges, with limited screen real estate, experience resulting from the interaction with
bandwidth, and the usage of touchscreens as an technology that depends on contextual factors,
input device. Because of this, every detail in the UI cultural aspects in the situation of use, and is a
is essential so that people that have never used the dynamic and constantly evolving process over
app before could understand it intuitively [10]. The instances of use [16]. This means that these apps
combination of both good UI/UX design can lead reflect the company, as the user interacts with all
to a happy customer engaged with the app daily, the services provided by the company from the app.
with no hindrance while using the app [11]. From that, the UX of an app can be called the
backbone of the company, while the UI of the app
Every customer that interacts with the is the first thing that users encounter when opening
application has their own comfort level, every
the app, and every time they navigate it. An
component in the application can be a decisive
factor for customers' comfort [12]. A Customer’s interesting and easy to use UI can gain the users’
comfort level in using the application would be the trust to use the app and keep engagement high [17].
decisive factor of the applications’ Usability. As a
customer can have varying degrees of comfort Usability is the measure of how well a user
levels depending on various factors, UI/UX in a specific context can use a product to achieve a
development must take into account customers that defined goal. A product can be said to have good
are comfortable with the application – which will usability if the product can minimalize or eliminate
be called an ‘expert’ user, and customers that are factors that can cause failure within its use [18].
still learning to navigate the app. Therefore, usability in the context of a UI/UX is the

168 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

overall parameter used to quantify the quality of a


user’s interaction with the system. Usability testing 3. User Allowing
is the method used to find failure points of the Customization personalization on
product [19]. One method used for measuring the the UI, either
Usability of UI/UX systems is the Systems functionally or
aesthetically to cater
Usability Scale (SUS).
to user preference.
III. METHODOLOGIES
4. Explanatory UI elements that
The methodology used for this paper is
Elements consist of easy-to-
separated into two parts. First, the UI / UX design
learn, intuitive, and
of both apps is evaluated using the qualitative
familiar icons to
method by analyzing their similarities and
accelerate user’s
differences. This comparison will be focused for
ability to learn.
the home screens, and the interface used for
ordering the E-hailing service for both applications.
The comparison will review the differences and 5. Designing for Auto-input, word
details present in the UI, and differences in smaller devices selection, and other
interactions between both applications when features that makes
performing the same task using the Shneiderman’s it easier for the user
golden rule [20] as the baseline. to input on a smaller
device.
Shneiderman’s golden rule is a guideline
for mobile interface elements to determine the
consistency of elements, ensure the dimensions to 6. Providing a Allows user to
fit within a mobile device, and how it history list access the history
accommodates both new users and ‘expert’ users. list.
Gong and Tarasewich also proposed a
development of Shneiderman’s rules for UI
elements in mobile device [21]. The indicators The second part is the usability testing by
used of UI principles used are as follow (Table 1) conducting an online interview using the System
Usability Scale (SUS) tool to quantify supporting
Table 1. UI Evaluation Indicators data to the research. System Usability Scale is a
measuring method developed by John Brooke in
1986 as a practical and fast method [22]. SUS is a
No. UI Principles Details tool that could quickly and easily collect a user's
subjective rating of a product's usability. The data
in this study indicate that the SUS fulfills that
1. Minimize To fit mobile device
need[23]. The System Usability Scale (SUS)
horizontal width and eliminate
consists of 10 statements that identifies the overall
scrolling confusion caused by
user experience of the app [24].
two-directional
scrolling. Every question consists of a five choice
answer that ranges from “Strongly Disagree” with
2. Design Brand-aesthetics and a score of 1, to “Strongly Agree” with a score of 5.
consistency design in general In general, it is impossible to specify the usability
consistent across the of a system (i.e., its fitness for purpose) without
entire app and first defining who are the intended user of the
designs in other system, the tasks those users will perform with it,
platforms. and the characteristics of the physical,
organizational and social environment in which it
will be used. As such, for optimal results,

169 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

participants must already have some experience Strongly disagree neutral agree Strongly
using both applications. The SUS method is chosen Disagree Agree
due to several benefits: a) results can be analyzed 1 2 3 4 5
quickly, b) the test can be easily understood, c) o o o o o
does not take much time from respondents, d) the
results are in a range of 0-100 making it easily After getting results from the respondents, we
understood, e) is affordable or does not require then analyze the test results by calculating it using
additional funds. However, it must be noted that the SUS Scoring. To calculate the SUS Score, the
SUS test result from 1-100 is not a percentage, nor methodology is as such : 1) For every odd
is it proportional [25]. SUS consists of two factors, numbered question, the final score is the result of
Usable (8 items) and Learnable (2 items, which is the user score subtracted by one (x – 1 = final
items 4 and 10), which means that the two items’ score). 2) For every even numbered question, the
responses may depend on the user’s level of final score is the result of 5 subtracted by the user
experience with the apps [26]. score (5 – x = final score). 3) The total score will
be multiplied by 2.5 to determine the weight of the
The survey is conducted online by using SUS score. This methodology will yield the result
Google Form as the media. The respondents of the SUS Score of one respondent. Calculating
consist of 32 respondents where they will take the the average of all respondents is achieved by
test using these SUS statement list, with 11 summing up the entirety of the final scores and then
participants being female and 21 being male. 31 dividing it with the total number of respondents, as
participants are also within the 19-25 years age follows :
group, with 1 participant being below 18 years old. ∑
̅
Table 2. SUS Statements ̅ = Average final score
∑ = Total final score
No SUS Statements
1 I think that I would like to use this = Amount of respondents
application often.
The SUS Scale ranges from 0 to 100. This scale
2 I found this application to be too complex.
can be interpreted into several forms, which are : 1)
3 I thought this application was very easy to
adjective rating from worst to best. 2) Scale grade
use.
from grades F to A. and 3) Acceptability ranges
4 I think that I would need help from
ranging from not acceptable to acceptable. This
technical support to use this application.
score becomes the weighting factor to a product’s
5 I found the various features in this
evaluation process to determine the success or
application to be helpful of my daily
failure of its usability.
needs.
6 I thought some of the features were
inconsistent.
7 I would imagine that new users would
adapt to using this application very
quickly.
8 I think new users would find this
application to be very complicated. Figure 1. SUS score
9 I feel comfortable when using this
application.
10 I needed to learn a few things from this
application before using it.

Table 3. SUS point

170 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

IV. ANALYSIS AND RESULT The home screen of Grab consists of (from top
A. FIRST SECTION to bottom) your OVO balance and points, Grab
The result of comparing Gojek and Grab’s Services, Grab recommendations, and Menus.
UI/UX design are as follows : Initially, there are 7 services that shows up in the
home screen of grab. When you tap on “More” it
will give you other services that are available in
Grab. Grab has these 7 core services that appear on
the home screen as their core services, meaning
that users cannot customize this.

Figure 2. Gojek Home Screen (Left), Gojek Other Services (Right)

The home screen of gojek consists of (from top


to bottom) your GoPay balance, order history,
Gojek promotions, and Gojek services. Initially,
there are only 4 services available in the home
Figure 4. GoRide/GoCar Menu
screen. But if the user does a swipe up gesture, it
will bring up other available Gojek services as seen The first thing that the app tells you to do when
in Figure 2. It is revealed that the 4 services that you choose either GoRide or GoCar is to designate
shows on the home screen are the users favorite your pickup location and destination. For the
services which can be edited by the user pickup location, you can either leave it to “Your
themselves. Current Location” or you could specify your
location to make the driver easy to look for you.
For the destination, you can input the address of
your destination. You can also choose your
payment method, booking time, and type of service
with the relevant indicators. The cost of the ride is
displayed both in the ride option themselves and at
the bottom of the screen.

Figure 3. Grab Home Screen (Left), Grab Other Services (Right)

171 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

instances of side-scrolling elements within the


home page.

In contrast, Grab opted to simply use buttons to


transition between pages. Horizontal scrolling is
limited to banners.

2. Design Consistency

Both Gojek and Grab use green as it’s primary


color, however there is a major difference in the use
of other colors to represent the app’s features.

As seen in Figure 2, Gojek utilizes green to


represent the core services, which is their transport
and logistical service (GoRide, GoCar, GoSend,
Figure 5. GrabBike/GrabCar Menu
etc.). Red is for their food delivery, medical, and
shopping services (GoFood, GoShop, GoMed,
Grab’s UI for ordering rides is very similar to etc.). Blue is used to represent Gojek’s payment
Gojek, with only minor differences in coloration, services, with Gopay being Gojek’s proprietary E-
icon placement, and shapes. The most major wallet service using it as it’s main color. Finally,
difference is that the price for a chosen ride type is Pink is used to represent Gojek’s news and
only shown once rather than twice, while the entertainment services (GoTix, GoPlay, GoNews,
booking option is replaced by Grab’s ‘GrabNow’ and GoGames).
service. In conclusion, Gojek relies on segregation
between different windows on the applications to
Using these UI elements of both applications, prevent any confusion regarding the different
we can evaluate several indicators from colors. This is aided by having clear and visible
Shneiderman’s rules : transition states, links to the different services
being hinted by icon coloring and shape, and
1. Minimize horizontal scrolling smooth transitions.

Gojek and Grab has a major difference in this


aspect with Grab only applying one-directional
scrolling in the home-screen, while Gojek has the
ability to scroll vertically to explore the home page,
and horizontally to open other pages.

Due to the small screen of a mobile device, two


directional scrolling may cause confusion or the
presence of too much information on the screen.
However, Gojek does not suffer from this problem
by applying clear indicators of changing pages on
scrolling horizontally with the presence of a
transition and UI indicators. The sensitivity of the
horizontal scroll is also much lower compared to
the vertical scroll, meaning that the user will need
to deliberately scroll sideways with their finger.
Figure 6. GoPay’s Interface utilizing blue as it’s primary color
However, this can still cause problems due to the
presence of horizontal scrolling banners within the
homepage itself, meaning that there are two

172 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

as seen in Figure 2. For user-based customization,


Gojek allows the user to change their profile
picture that’s shown at the profile page for
identification.

Grab does not have as wide of User


Customizability as the seven services that’s first
Figure 7. Gopay’s and Gojek’s color schemes visible is set by Grab. Grab also does not allow the
user to set a profile picture. Account-related
Compared to Gojek, Grab only utilizes green for customization, such as notifications, user blocking,
the app’s primary color for all its services. This and privacy is similar for both applications.
extends to it’s E-wallet service, OVO. OVO’s
primary color is purple in its own application, 4. Explanatory Elements
however it is not used in Grab to a noticeable
degree, only visible in small icons or fonts to direct Grab and Gojek differs in the usage of icon
user’s attention to their E-wallet balance. types. Gojek utilizes an entirely 2d flat design,
while Grab uses an isometric design for icons. Both
apps use self-explanatory elements for their core
services, however some services can be ambiguous
to the first-time user. For this, Gojek and Grab both
utilizes brief explanation texts to inform the user
what each service does when they open the menu
to select other services.

5. Designing for smaller devices

To accommodate smaller devices that may face


difficulty providing precise inputs, both
applications utilize large buttons with a clear layout
for their selections as seen in Figures 2 and 3. Both
applications also provide an auto input feature in
their search bars, providing preset inputs that are
based on trending search terms and the user’s
search history. This addresses the issue which is
faced by smaller devices when doing manual text
Figure 8. Grab’s interface when accessing the OVO E-wallet
within the app
input, allowing the user to tap a button to browse
frequent search terms without typing it.

Figure 9. Grab’s and OVO’s color schemes

3. User Customization

Gojek provides in-app customization for its


users in the form of customizing the 4 primary
services visible when first entering the application

173 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Figure 10. Gojek’s Auto input feature (Left) and Grab’s Auto
input feature (Right)
Figure 12. Grab’s history list
6. Providing a history list
B. SECOND SECTION
Both applications provide a history feature In this section we will explore the data
to let users see previous transactions. This feature taken from the survey. From the 32 respondents
is presented differently on both applications. Gojek with everyone between the age of 18 to 25, 21 of
uses a different history list for each of their the respondents are male and 11 are female. Every
services, which can also be accessed by changing respondent has had experience of both
the filters of the history list. Grab presents their applications.
history list as a large list that is split between the
many services without the ability to filter the list. The gap between the result of the survey
from respondents is tight. But it favors Gojek more
than Grab with Gojek having 71.09 SUS score and
Grab having 69.60 SUS score.

Table 3. Gojek SUS Score

Gojek
Respondent Total Score Total SUS Score
1 32 80
2 28 70
3 36 90
4 25 62.5
5 26 65
6 32 80
7 20 50
8 25 62.5
Figure 11. Gojek’s history list 9 30 75
10 29 72.5
11 29 72.5
12 25 62.5
13 36 90
14 33 82.5

174 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

15 31 77.5 28 20 50
16 26 65 29 20 50
17 32 80 30 27 67.5
18 27 67.5 31 24 60
19 29 72.5 32 34 85
20 30 75 Total Average 69.609375
21 37 92.5
22 26 65
23 28 70 Comparing the result side by side, we can
24 28 70 see that the gap between Gojek and Grab is
25 31 77.5 relatively close.
26 26 65
27 22 55
28 21 52.5
29 24 60
30 26 65
31 26 65
32 34 85
Total Average 71.09375

Table 4. Grab SUS Score


Grab Figure 13. SUS score comparison between Gojek (left) and Grab
Respondent Total Score Total SUS Score (Right).
1 27 67.5
2 33 82.5 Based on the survey results for both
3 24 60 applications, Gojek received a score of 71.0 with a
4 32 80 “Good” rating, or a “B” with the letter grade. Grab
5 28 70 achieved an average score of 69.6 with a “Good”
6 26 65 rating or a “B” in letter grade.
7 33 82.5
This means that both applications are most
8 27 67.5
likely be used by a lot of people and there is not
9 32 80
much of a difference in preference in terms of
10 30 75
score. Just the fact that one application has slightly
11 29 72.5
higher SUS score than the other.
12 25 62.5
13 31 77.5 A more detailed look yields respondents
14 28 70 that gives a larger score to Gojek than Grab, and
15 26 65 vice versa. However, there are some that give
16 32 80 values that are nearly the same for both
17 21 52.5 applications. From the results, we can conclude
18 24 60 that there are three different types of users, which
19 35 87.5 are users who are experts with Grab’s UI, experts
20 29 72.5 in Gojek’s UI, and experts in using both
21 31 77.5 applications.
22 28 70
23 29 72.5 Based on statement number 1 (I think that I
24 28 70 would like to use this application often), Gojek
25 31 77.5 managed a greater score than Grab, explaining that
26 20 50 if respondents are forced to pick one or the other,
27 27 67.5 Gojek will have the edge. However, due to other

175 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

factors, such as promotions, difference in E-money the apps’ interface to a new user. Both apps scored
services used for both apps, and service preference well which explains that the UI/UX design of both
(such as when dealing with drivers on different apps is easily understood by new users.
areas), the difference is negligible, as both apps
have a SUS Score with only a difference of 1.49. Of note, statement 10 also stands out as the
question that scores the lowest out of all statements
Based on statement number 2 (I found this on both apps. This means that the UI/UX design for
application to be too complex.), Gojek had a both apps is intuitive and the average user within
greater score than grab. This explains that Gojek as the age group of below 18 years old, and 19-25
an app is complex, and most users might not be years old can learn how to use the app quickly.
able to fully utilize all it’s features. However, users
still some users still prefer Gojek. This is because
Gojek offers more features that most users do not V. CONCLUSION
use on a regular basis. The resulting UI elements Gojek and Grab are the most used E-Transport
that are not used by a user creates an impression of applications in Indonesia, and both have a robust
complexity and clutter. This is supported by UI/UX design resulting in support from their users.
statement number 3 (I thought this application was
very easy to use.), where Gojek and Grab only have From our comparison utilizing
a 1-point difference, this suggesting the fact that Shneiderman’s golden rules as a baseline, Gojek
expert users can understand both applications in a can be considered a more complex application for
reasonable amount of time, and users that may a first-time user. This is primarily created due to
consider the entire application to be too complex the usage of several different colors, the presence
can still access the desired features quickly and of horizontal and vertical scrolling, and however it
efficiently. utilizes techniques such as clear transition states,
separation of pages, and highlights indicating the
Based on statements 3 to 10, both current state the application is in to inform a new
applications received similar scores on the SUS user while also accelerating the user’s ability to
rating. This indicates that Gojek and Grab have learn the application to become more easily
developed a robust UI/UX. This is proven on accustomed to it, hence becoming an ‘expert’ user.
statements 3 and 4 where participants from both
apps responded that they do not experience any Grab on the other hand has a simpler UI/UX
notable difficulties when using the app in daily experience that is easier to understand for a first-
activities, and that they do not require any time user, or a user with accessibility issues such as
assistance from other sources when using the app. colorblindness. Both applications employ design
These results indicate that respondents within the principles to include indicators in Shneiderman’s
age group of below 18 years old and 19-25 years golden rules to create an effective UI/UX design.
old can understand the intention of the UI/UX used
in both applications. The features offered by both From our survey, all 32 respondents like
applications are also consistent and is a boon in the using both apps. The minor difference in score
users’ day to day activities by providing the reflects the unique UI design of each app resulting
services they need, as proven by questions about in some users that are experts in one app or the
the apps’ features in statements 5 and 6, where both other, and some users that are experts in using both
apps received good scores. However, it should be apps. The overall score still favors Gojek slightly,
noted that in statement 6, Grab scored slightly but the difference is negligible when considering
below average. This indicates that Grab UI/UX other factors, such as promotions, E-money service
design contains some inconsistencies that are used for both apps, service, and driver preference.
experienced by users after a certain period.
We can conclude that the UI/UX design
Statements 7-10 is dedicated the user’s ability and from both applications are only a minor factor
willingness to learn the applications’ UI/UX, how when considering which of the two applications are
quickly can they do it, and if they feel comfortable more preferred by the Indonesian user base.
doing it. This indicates the overall intuitiveness of

176 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Based on our analysis, we can suggest [11] T. Tullis, and W.Albert, Measuring the User Experience, 1st ed.
Burlington :Kaufmann, 2008.
future research to determine the reason why Gojek [12] N. McNamara and J. Kirakowski, “Functionality, usability, and user
is preferred by Indonesians despite being more experience: Three areas of concern,” Interactions, vol. 13, no. 6, pp.
26–28, 2006, doi: 10.1145/1167948.1167972.
complex by looking into the minor UI elements [13] N. Anwar, R. Rasjidin, D. Najoan, C. Rolando, Tamimmanar and H.
such as transition states, tap and scrolling Warnars, "E-payment for Jakarta Smart Public Transportation, Using
the Point System for E-Commerce", Journal of Physics: Conference
sensitivity, and the segregation between Gojek’s Series, vol. 1477, p. 022035, 2020. Available: 10.1088/1742-
features by utilizing different color schemes, icons, 6596/1477/2/022035.
and descriptions. Future research may also [14] V. Venkatesh, V. Ramesh, and A.P. Massey, Understanding usability
in mobile commerce. Communications of the ACM, 2003.
consider accessibility factors such as [15] M. Padilla, Strike a balance: Users’ expertise on interface design. The
colorblindness into account for either or both Rational Edge. 2003.
applications. [16] K. Rodden, H. Hilary, and Xin Fu, “Measuring the User Experience on
a Large Scale : User-Centered Metrics for Web Applications”, in
Proceedings of the SIGCHI conference on human factors in computing
REFERENCES systems, 2010.
[17] N. Yashmi et al., "THE EFFECT OF INTERFACE ON USER TRUST;
[1] Zervas, G., Proserpio, D. and Byers, J., 2017. The Rise of the Sharing USER BEHAVIOR IN E-COMMERCE PRODUCTS", Proceedings
Economy: Estimating the Impact of Airbnb on the Hotel Industry. of the Design Society: DESIGN Conference, vol. 1, pp. 1589-1596,
Journal of Marketing Research, 54(5), pp.687-705. 2020. Available: 10.1017/dsd.2020.103.
[2] Teo, Boon-Chui, Mustaffa, M. Azimulfadli, Rozi, A.I.Mohd, 2018. To [18] M. Matera, F. Rizzo, and G. T. Carughi, “Web usability: Principles and
Grab or Not to Grab? : Passenger Ride Intention Towards E-Hailing evaluation methods,” Web Engineering, pp. 143–180.
Services. Malaysian Journal of Consumer and Family Economics, [19] J. Nielsen, Usability inspection methods. Geneva: IFIP, Internat.
Vol.21. Federation for Information Processing, 1995.
[3] Fruhling, A. and Lee, S., 2006. The influence of user interface usability [20] B. Shneiderman, C. Plaisant, M. Cohen, S. Jacobs, and N. Elmqvist,
on rural consumers' trust of e-health services. International Journal of Designing the user interface: Strategies for effective human-computer
Electronic Healthcare, 2(4), p.305. interaction. Boston: Pearson, 2018.
[4] Fruhling, A. and Lee, S., 2006. The influence of user interface usability [21] J. Gong and P. Tarasewich, “Guidelines for Handheld Mobile Device
on rural consumers' trust of e-health services. International Journal of Interface Design”, Proceedings of the 2004 Decision Sciences Institute
Electronic Healthcare, 2(4), p.305. Annual Meeting, Boston, Massachusetts, USA, pp. 3751-3756.
[5] Et.al, J., 2021. A Study on Cognitive Affordance Analysis and BX [22] J. Brooke, “Sus: A 'quick and dirty' usability scale,” Usability
Design of Flight Reservation Application UI. Turkish Journal of Evaluation In Industry, vol. 189, no. 194, pp. 4–7, Jun. 1996.
Computer and Mathematics Education (TURCOMAT), 12(6), pp.773- [23] A. Bangor, P. T. Kortum, and J. T. Miller, “An empirical evaluation of
779. the system usability scale,” International Journal of Human-Computer
[6] "Factors Influencing Passengers’ use of E-Hailing Services in Interaction, vol. 24, no. 6, pp. 574–594, 2008.
Malaysia", International Journal of Engineering and Advanced [24] J. Brooke, “SUS: A ‘Quick and Dirty’ Usability Scale,” Usability
Technology, vol. 9, no. 3, pp. 2711-2714, 2020. Available: Evaluation in Industry, November 1995, pp. 207–212, doi:
10.35940/ijeat.c6040.029320. 10.1201/9781498710411-35.
[7] L. Wulantika and S. Zein, "E-Wallet Effects on Community Behavior", [25] A. Bangor, T. Staff, P. Kortum, J. Miller, and T. Staff, “Determining
IOP Conference Series: Materials Science and Engineering, vol. 879, what individual SUS scores mean: adding an adjective rating scale,”
p. 012121, 2020. Available: 10.1088/1757-899x/879/1/012121. Journal of Usability Studies, vol. 4, no. 3, pp. 114–123, 2009.
[8] D. E. Kurniawati and R. Z. Khoirina, “Online-Based Transportation [26] J. R. Lewis and J. Sauro, “The Factor Structure of the System Usability
Business Competition Model of Gojek and Grab,” Proceedings of the Scale,” Human Centered Design, vol 5619, pp. 94–103, 2009.
1st Borobudur International Symposium on Humanities, Economics
and Social Sciences (BIS-HESS 2019), 2020.
[9] H. Joo, "A Study on the development of experts according to UI / UX
understanding", KOREA SCIENCE & ART FORUM, vol. 31, pp. 401-
411, 2017. Available: 10.17548/ksaf.2017.12.30.401.
[10] O. Supriadi, "User Interface Design of Mobile-based Commerce", IOP
Conference Series: Materials Science and Engineering, vol. 662, p.
022047, 2019. Available: 10.1088/1757-899x/662/2/022047.

177 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Compare the Path Finding Algorithms that are


Applied for Route Searching in Maps
1st Wendy Susanto 2nd Samuel Dennis 3rd M Brian Aqacha 4th Kristien Margi
dept. of Computer Science dept. of Computer Science Handoko Suryaningrum
Binus University Binus University dept. of Computer Science dept. of Computer Science
Tangerang, Indonesia Jakarta, Indonesia Binus University Binus University
[email protected] [email protected] Jakarta, Indonesia Jakarta, Indonesia
[email protected] [email protected]

Abstract— The purpose of the conducted research is to find examining variables such as distance, weather, average
the most optimal algorithm in terms of finding the shortest path speed, obstruction, and road width.
that are applied for route searching in maps. This research will
compare the pathfinding algorithm such as Dijkstra, BFS, and II. LITERATURE REVIEW
A * algorithms in terms of time, path, and distance found, by
considering the existing real life variables such as distance, A. Breadth First Search
weather, average speed, obstructions, and road width that are
encountered by every road users to increase accuracy. With the Breadth First Search is a pathfinding algorithm which will
characteristic and following the procedure of each algorithm, we visit all adjacent nodes first each one time. Breadth First
finally manage to dive thoroughly about how the algorithm Search will yield some solution from the most optimal
works by performing calculation and analyze the advantages solution to not optimal [2]. The solution given by Breadth
and drawbacks of each algorithm. The comparison will be made
First Search is simpler and trapping will not occur [3].
by a system designed using Python Programming Language.
This study concludes that Dijkstra's algorithm is 13.78% faster Moreover, the solution given by BFS is also optimal and
than A* Algorithm and 32.3% faster than Breadth First Search complete [4]. Nevertheless, Breadth First Search needs huge
in terms of time but the paths found are not always optimal. memory because BFS will record each expanded node for
Keywords—Time efficiency, Heuristic, Graph, A*, Dijkstra, each level to expand to the next node [5]. Breadth First Search
Breadth First Search is usually applied when the solution that will be searched only
looks for the step from start to the goal [1].
I. INTRODUCTION
B. A* Algorithm
Maps is used to determine the shortest route from point a
to point b. Within the process, maps uses algorithms that A* search is a best-first-search algorithm with slight
presents different routes for the user to go to their destination. modification and will yield the best solution if it has
There are a significant number of path finding algorithms and admissible heuristic functions [5][6]. A* Algorithm will
search algorithms that are known and studied for determining estimate the heuristic h(n) from the start node to the end node
the shortest path [1]. and also exact cost g(n) from each path. The cost for each
node is calculated by the following formula.
In this research, the authors will focus on three
algorithms, Breadth First Search, A* (pronounced A Star), f(n) = g(n) + h(n) [7]
and Dijkstra, while considering the real life traffic variables Information,
such as distance, weather, average speed, obstruction, and f(n) = evaluation cost
road width that are encountered by every road users to g(n) = cost that already incurred from initial state to n state.
increase accuracy. Distance is used to determine weight of h(n) = estimated cost to arrive at a destination from n state.
the edge, while the rest of the variables are assigned to
heuristic value to determine the cost of the node. A* search will select the unvisited node and the most
promising node. The selection is based on minimum cost f(n)
This research is done in order to deepen the knowledge of to expand to that node [8]. A* Algorithm has a faster running
the path finding algorithms in the process of determining the time compared to Dijkstra’s Algorithm [9] and Breadth First
shortest route to the destination in maps and comparing the Search. This is because Dijkstra is a Greedy Best First Search
difference between the three algorithms. which will conduct a blind search to every nodes in the graph
until it found the destination nodes which result more time-
This paper discusses the algorithm that maps usually consuming in terms of time complexity compared to A*
applied to search the route of a destination. After we found Algorithm, while Breadth First Search will also visit all nodes
the algorithm that is implemented in maps, we will try to without information (blind search) [10].
search for the reason why the algorithm was applied.
Additionally, we will try to compare this algorithm with the A* search has flexibility [11][12] and adaptability that will
other pathfinding and shortest path algorithms that exist by make A* have wide usage by simply modifying its function

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

178 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

or even adding function according to goals and situation • Analysis - In this phase, all the data required for the
[5][7][13]. Despite of its flexibility, A* has some advantages: comparison must be identified and analyzed. In this
• Algorithm A * works well in road networks because case they would be the map with its variables as a
the formula of the cost on A * can be modified [14]. specification to generate the data we want to
• In cases where the resource is important and compare such as the time, distance, and route to
monitored, the wavefront algorithm will be used, differ the result between algorithm as to properly
with a longer path [15]. assess the performance of each algorithm. Time will
be calculated to find the max and average time for
• A * can be modified to have relevant variable
each algorithm execution. Distance will be measured
variables such as: collision risk, water navigation as an additional information according to route.
regulations, water flow, and ship maneuverability Route will be generated as a result of how each
[16]. algorithm traverse to find the shortest path.
• Algorithm A *, can also be modified to use fewer • Design - Next, in accordance with all the required
resources, by implementing direction-based data, a system must be designed to be able to run all
heuristic values and node reliability data structures, the pathfinding algorithms and get the desired data.
A * can be made to minimize the number of nodes • Implementation - Break apart the system into smaller
evaluated for less resource usage and time [17][18]. units to be done separately. In this stage time
• In a large grid, A * can be modified to get a smaller allocation for each unit must also be estimated to
time and path [19]. ensure better workflow.
• Integration and testing - In this stage, the units
However, A * has a lot of computation before determining broken down in the previous stage will be integrated
the next node, therefore the use of the A * algorithm needs to together, after integrating the units into a system, the
be avoided if it is applied to robots that will get many system will be tested to make sure it is working
sequential tasks because it requires a lot of time [20]. reliably and all the desired data is successfully
obtained. If any issues are found then it will be fixed
C. Dijkstra’s Algorithm in this stage.
• Delivery - The system will be delivered and
packaged ready for use.
Dijkstra Algorithm is a greedy algorithm and was
developed by Dutch computer scientist Edsger Dijkstra in In summary, the system will be made to initially get a map
1959 and being used to solve the shortest path problem that on which the user can choose the starting and end node. Then
computes the single-source shortest path for a graph with the user will have to select a pathfinding algorithm be it A *,
nonnegative edge path cost [21][22]. Dijkstra’s Algorithm Dijkstra’s, or BFS. The system will then run the algorithm
compatible to use on searching the shortest path on a simple according to the starting and end node to get the route,
node and not complex [23]. The concept of Dijkstra's distance, and time.
Algorithm and A * can both be used to find the same shortest
route generated via Google Maps [24]. Both A * and After the system has been built, experiments and testing
Dijkstra's algorithms have a big role in determining the will be done and data such as the route, distance, and time will
distance and route selection which is implemented on Google be obtained to be compared by the system and manually.
Maps [25][26].
B. Algorithm Analysis
In determining the priority node, each algorithm will
To compare pathfinding algorithms, we need a graph that
compare each weight of each node that is stored with the
will be tested for each algorithm to find a route from a
weight of the next node, until finally the node is found [23].
particular node to another. Then the results of each algorithm
will be compared to. The graph that will be use is a route
In its implementation, calculating the shortest path, image from maps which is converted into a graph
Dijkstra's Algorithm determines the shortest path based on
the smallest weight from one point to another, so a graph that Breadth First Search will work by visiting all nodes in a
represents intersections as nodes, and road branching as graph. By visiting all existing nodes, it will find all possible
vertices is needed [27]. Apart from Dijkstra, there are also paths to a node from a certain point. Because Breadth First
Bellman-Ford and Floyd-Warshall algorithms, which have Search in its traversal is not based on certain conditions,
good performance in terms of time complexity and are able breadth first search does not pay attention to the variables that
to provide one solution (single-source), and there is also a can distinguish the optimization of a route.
Genetic Algorithm that can produce optimal route solutions
with different options [28]. For the A* algorithm the search will be carried out based
III. RESEARCH METHODS on the exact cost from one node to another which is
represented by the distance (100 meters = 1 cost). The exact
A. System Development Method cost is then added with the heuristic value of the target node.
The heuristic value is a value that represents 4 variables,
The system development to compare algorithms BFS, A*, namely average speed, roads, weather, and also obstruction.
and Dijkstra, and will be done with the waterfall method with
the following stages: For A* algorithm, due to the present of a heuristic value
as a requirement for algorithm calculation, therefore we

179 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

determined and assigned the heuristic value from each • System Comparing Flow Chart
variable such average speed heuristic value shown at Table
3.1, obstruction heuristic value shown at Table 3.2, weather
heuristic value shown at Table 3.3, and road width heuristic
value shown at Table 3.4.

TABLE 3.1 AVERAGE SPEED HEURISTIC


Average Speed
For each route that has been determined the average
speed will be the same so that the results are compared
more consistently. So the heuristic is += 0

In Table 3.1, we assume the average speed heuristic value


is 0 because every kind of vehicle most likely have the same
speed in the traffic.
TABLE 3.2 OBSTRUCTION HEURISTIC
Obstruction
Flood Heuristic += ~
Roadblock Heuristic += ~

In Table 3.2, we assign the heuristic value of every kind


of obstruction to ~ (undefined) because the road that has
obstruction is impassable.
TABLE 3.3 WEATHER HEURISTIC
Weather
Rain Heuristic += 2
Drizzle Heuristic += 1
Sunny Heuristic += 0 Figure 3.1 System Comparing Flow Chart

In Table 3.3, we assign the default heuristic value for In Figure 3.1, the designed system first starts by
weather to be 0 and will be incremented as the weather get determining the graph which is a route that has been
worse. converted into a graph. Then specify the starting point of the
route and the destination of the route. Next we determine the
TABLE 3.4 ROAD WIDTH HEURISTIC search path method between the informed search and the
Road Width uninformed search. There are three algorithms that will be
Alley Heuristic += 2 compared, namely Dijkstra's Algorithm, A* Algorithm, and
Highway Heuristic += 1 also Breadth First Search. Dijkstra's Algorithm and A* are
Toll Heuristic += 0 included in the informed search category, while Breadth First
Search is included in the uninformed search category.
In Table 3.4, we assign the default heuristic value for .
weather to be 0 and will be incremented as the road width is IV. IMPLEMENTATION AND TESTING
getting narrow.
In implementing the algorithm that researchers want to
Dijkstra's algorithm is an algorithm that will work by test, sample graphs from maps are necessary in order to test
finding the smallest cost of each node and then calculating and run the system to compare the algorithm, the sample graph
the shortest path from one node to a particular node. This is can be seen below in Table 4.1. In this chapter will be
because Dijkstra's algorithm is only based on the cost of the described how the system that researchers have been built will
weight of a node to a node. So the route determination carried determine the result with the calculation of approximation by
out by Dijkstra's algorithm is only based on distance. following the algorithm formula itself in terms of generating
the route using a shortest path and route finding algorithm.
C. System Design TABLE 4.1 SAMPLE GRAPH FOR TESTING
In designing the system, the author uses flowcharts, use
case diagrams, and sequence diagrams that can represent how Sample Region
the system works. It can be simplified as a series of processes
Sample 1 BINUS Alam Sutera and Living World
or procedures to facilitate user understanding of the system
Sample 2 Sangiang Regensi and Prabu Kian Santang
being designed.
Sample 3 HAKA Restaurant and Gajah Mada Jakarta
Sample 4 Ciputra Mall and Central Park
Sample 5 Sangiang Regensi And Kutabumi
Etc.

180 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Figure 4.1 Graph with Infinite Heuristic Value


Figure 4.4 Code Snippet of A* Algorithm Node Creation
In Figure 4.1, there is a node where there is
flooding/roadblock, which means the heuristic will be In Figure 4.4 shows a snippet of creating the connection
modified to be impassable, due to the nature of BFS and between nodes with the format of “graph.connect(node1,
Dijkstra not taking into account heuristics while on the other node2, weight)”. As seen in the above code, the graph is made
hand, A* does take into account heuristics, A* algorithm will identical to the graph in Table 4.1 with the weight here directly
look for another route while the other algorithms will go converted to metres instead.
through that flooded/blocked node.

A. System Testing

1) Breadth First Search Algorithm


Systems that writers made to compare pathfinding
algorithms need a same graph that will be applied to every
algorithm. So, the first thing we need to do is generate a
graph.

Figure 4.5 Code Snippet of Initial Heuristics

In Figure 4.5 shows the assignment of heuristic values to


each node. For now, it is all set to 1 as there are no modifiers
such as weather, traffic, etc. added yet.

.
Figure 4.2 Code Snippet Generate Graph Breadth First Search
Figure 4.6 Code Snippet of New Heuristics Value
In Figure 4.2, while searching the shortest path, Breadth
First Search will visit all nodes without considering the In Figure 4.6 shows the assignment of new heuristic values
direction (undirected graph), weight, and heuristic value. So to each node. Now all the variables has been concluded and
the search that breadth first search in this system will only being assigned to the heuristics.
search for a way to go to a destination with the least node
visited.

Figure 4.7 Code Snippet that Return Result

In Figure 4.7 shows a part where the code returns the node
and distance travelled.
Figure 4.3 Code Snippet Search Algorithm on Breadth First Search

In Figure 4.3 shows snippet code on how Breadth First Figure 4.8 Code Snippet of Time Declaration
Search Algorithm will search for path that visit all nodes.

2) A* Algorithm
As explained in the previous segment, the map data will Figure 4.9 Code Snippet of Time Calculation
first be first converted into a graph with nodes and vertices,
afterwards the appropriate weights(distance) and heuristics
can be applied. This will be coded in python.

181 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

In Figure 4.8 and 4.9, shows how time is calculated to run In Figure 4.12 shows the main function and as can be seen
the algorithm. “start_time” is initialized in the beginning of in option 2 “Analyze results” the code will call another python
the code while “end_time” is taken in the end and “execTime” program that will take the raw data result and processed it to
will be the difference between the two multiplied by a 1000 to get the Max and Mean data of time execution and distance.
show in milliseconds.

3) Dijkstra Algorithm
Similar to how the A* code works, the Dijkstra program
will get a graph representation of the map data with the nodes
and weights and then proceed to get the path, distance, and
program runtime. This algorithm will be coded in python.

Figure 4.13 Final Analysed Data


Figure 4.10 Code Snippet of Dijkstra Algorithm Node Creation In Figure 4.13, all the result data will finally get the
processed data which are, the max and mean of the time and
In Figure 4.10 shows the input of the nodes and distances distance of each algorithm as shown. The total amount of
as weights for the algorithm to run through. Similar to the A* sample that is taken from our experiment is 173.
code, here the distance is already converted to metres.

TABLE 4.2 RUN TIME RESULT

Execution BFS Time A* Time Dijkstra Time

MAX 7.7995ms 3.1152ms 1.9079ms

MEAN 0.8572ms 0.6726ms 0.5799ms

In Table 4.2 shows the most optimal algorithm from the


Figure 4.11 Code Snippet that Return Result sample case researcher taken is the Dijkstra Algorithm that
proves the opportunity of generating nodes with the least time
In Figure 4.11 shows how Dijkstra algorithm is compared to other two algorithms.
implemented into code.
TABLE 4.3 DISTANCE RESULT
B. Testing Result
In this section, the result of all algorithm calculations from Execution BFS Distance A* Distance Dijkstra
the researcher's system in CHAPTER IV (A) will be shown. Distance
The sample and data for testing researchers system will be
done from different maps sample at Table 4.1 , which from the MAX 6 nodes 3540m 3540m
reserved maps will be executed to generate the total sample
size at the end of testing experiment. MEAN 4 nodes 2080m 1704m

In Table 4.3 shows the most optimal algorithm from the


sample case researcher taken is the Dijkstra Algorithm that
proves the opportunity of generating nodes with the least
route and path distance compared to A* algorithm. Breadth
First Search distance data cannot be retrieved since BFS
Algorithm itself is an Uninformed Search Algorithm, which
only traverse the graph and do not have additional
information.

Therefore, the most optimal algorithm in terms of route


search and runtime based on our experiment testing is
Dijkstra's algorithm with 13.78% faster than A* and 32.3%
faster than Breadth First Search, but for some cases the results
Figure 4.12 Main Function found by the A * algorithms are more optimal than other
algorithms.

182 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

V. CONCLUSION IJISCS (International Journal of Information System and Computer


Science), 3(3), 98-106.
[9] Sharma, S. K., & Pal, B. L. (2015). Shortest path searching for road
Breadth First Search Algorithm works by traversing all network using a* algorithm. International Journal of Computer Science
node one time each, meanwhile A* Algorithm traverse to and Mobile Computing, 4(7), 513-522.
almost all nodes in the map, and Dijkstra Algorithm traverse [10] Goyal, A., Mogha, P., Luthra, R., & Sangwan, N. (2014). PATH
FINDING: A* OR DIJKSTRA’S? International Journal in IT &
all nodes in the map. The most optimal algorithm by Engineering, 2, 1-15.
considering time efficiency is Dijkstra’s Algorithm. We [11] Sudhakara, P., & Ganapathy, V. (2016). Trajectory planning of a
choose Dijkstra Algorithm because it generates nodes with mobile robot using enhanced A-star algorithm. Indian Journal of
Science and Technology, 9(41), 1-10.
least time compared to two other algorithms. Although [12] Foead, D., Ghifari, A., Kusuma, M. B., Hanafiah, N., & Gunawan, E.
Dijkstra visits all nodes compared to A* (visit same node (2021). A Systematic Literature Review of A* Pathfinding. Procedia
many times) , Dijkstra finds the most optimal path with the Computer Science, 179, 507-514.
least time. BFS has the same total node visited compared to [13] Permana, S. D., Bintoro, K. B., Arifitama, B., & Syahputra, A. (2018).
Comparative Analysis of Pathfinding Algorithms A *, Dijkstra, and
Dijkstra Algorithm but it needs more time to get the path BFS on Maze Runner Game. IJISTECH (International Journal Of
because BFS will find every path to reach destination then Information System & Technology), 1(2), 1.
will choose the shortest path based on least node visited to doi:10.30645/ijistech.v1i2.7.
reach destination. [14] Siregar, B., Gunawan, D., Andayani, U., Lubis, E. S., & Fahmi, F.
(2017). Food delivery system with the utilization of vehicle using
geographical information system (GIS) and a star algorithm. In Journal
A * algorithm still has some advantages in some cases of Physics: Conference Series (Vol. 801, No. 1, p. 012038). IOP
where there are paths with undefined heuristics (flooded, Publishing.
[15] Zidane, I., & Ibrahim, K. (2017). Wavefront and A-Star Algorithms for
blocked), or high heuristic function values (rain, obstruction, Mobile Robot Path Planning. Proceedings Of The International
etc.). A * will automatically find another path to the goal with Conference On Advanced Intelligent Systems And Informatics 2017,
less traffic. This cannot be done by Dijkstra's algorithm 69-80. doi: 10.1007/978-3-319-64861-3_7
because the Dijkstra's algorithm searches only based on the [16] Liu, C., Mao, Q., Chu, X., & Xie, S. (2019). An improved A-star
algorithm considering water current, traffic separation and berthing for
shortest distance or the smallest weight. Thus A * will vessel path planning. Applied Sciences, 9(6), 1057.
produce a more optimal route if there are obstacles even [17] Mathew, G. E. (2015). Direction based heuristic for pathfinding in
though the route will be farther. Therefore, the most optimal video games. Procedia Computer Science, 47, 262-271.
[18] Guruji, A. K., Agarwal, H., & Parsediya, D. K. (2016). Time-efficient
algorithm in terms of route search and runtime is Dijkstra's A* algorithm for robot path planning. Procedia Technology, 23, 144-
algorithm with 13.78% faster than A* and 32.3% faster than 149.
Breadth First Search, but for some cases the results found by [19] Duchoň, F., Babinec, A., Kajan, M., Beňo, P., Florek, M., Fico, T., &
the A * algorithms are more optimal than other algorithms. Jurišica, L. (2014). Path planning with modified a star algorithm for a
mobile robot. Procedia Engineering, 96, 59-69.
[20] Korkmaz, M., & Durdu, A. (2018). Comparison of optimal path
REFERENCES planning algorithms. 2018 14th International Conference on Advanced
[1] Pathak, M. J., Patel, R. L., & Rami, S. P. (2018). Comparative Analysis Trends in Radioelecrtronics, Telecommunications and Computer
of Search Algorithms. International Journal of Computer Applications, Engineering (TCSET). doi:10.1109/tcs et.2018.8336197
179(50), 40-43. [21] Gupta, N., Mangla, K., Jha, A. K., & Umar, M. (2016). Applying
[2] Rahim, R., Abdullah, D., Nurarif, S., Ramadhan, M., Anwar, B., Dijkstra’s algorithm in routing process. Int. J. New Technol. Res, 2(5),
Dahria, M., . . . Khairani, M. (2018). Breadth First Search Approach 122-124
for Shortest Path Solution in Cartesian Area. Journal of Physics: [22] Deepa, G., Kumar, P., Manimaran, A., Rajakumar, K., &
Conference Series, 1019, 012038. doi:10.1088/1742- Krishnamoorthy, V. (2018). Dijkstra Algorithm Application: Shortest
6596/1019/1/012038 Distance between Buildings. International Journal of Engineering &
[3] Patel, J. R., Shah, T. R., Shingadiya, V. P., & Patel, V. B. (2015). Technology, 7(4.10), 974. doi:10.14419/ijet.v7i4.10.26638
Comparison between breadth first search and nearest neighbor [23] Risald, Mirino, A. E., & Suyoto. (2017). Best routes selection using
algorithm for waveguide path planning. Int. J. Research and Scientific Dijkstra and Floyd-Warshall algorithm. 2017 11th International
Innovation, 2, 19-21. Conference on Information & Communication Technology and System
[4] Žunić, E., Djedović, A., & Žunić, B. (2016, May). Software solution (ICTS). doi:10.1109/icts.2017.8265662
for optimal planning of sales persons work based on Depth-First Search [24] Amaliah, B., Fatichah, C., & Riptianingdyah, O. (2016). FINDING
and Breadth-First Search algorithms. In 2016 39th International THE SHORTEST PATHS AMONG CITIES IN JAVA ISLAND
Convention on Information and Communication Technology, USING NODE COMBINATION BASED ON DIJKSTRA
Electronics and Microelectronics (MIPRO) (pp. 1248-1253). IEEE. ALGORITHM. International Journal on Smart Sensing & Intelligent
[5] Zafar, A., Agrawal, K. K., & Kumar, W. C. (2018). Analysis of Systems, 9(4).
Multiple Shortest Path Finding Algorithm in Novel Gaming Scenario. [25] Mehta, H., Kanani, P., & Lande, P. (2019). Google Maps. International
Advances in Intelligent Systems and Computing Intelligent Journal Of Computer Applications, 178(8), 41-46. doi:
Communication, Control and Devices, 1267-1274. doi:10.1007/978- 10.5120/ijca2019918791
981-10-5903-2_132 [26] Rachmawati, D., & Gustin, L. (2020, June). Analysis of Dijkstra’s
[6] Wayahdi, M. R., Ginting, S. H., & Syahputra, D. (2021). Greedy, A- Algorithm and A* Algorithm in Shortest Path Problem. In Journal of
Star, and Dijkstra’s Algorithms in Finding Shortest Path. International Physics: Conference Series (Vol. 1566, No. 1, p. 012061). IOP
Journal of Advances in Data and Information Systems, 2(1), 45-52. Publishing.
doi:10.25008/ijadis.v2i1.1206 [27] Kirono, S., Arifianto, M. I., Putra, R. E., Musoleh, A., & Setiadi, R.
[7] Candra, A., Budiman, M. A., & Hartanto, K. (2020). Dijkstra's and A- (2018). GRAPH-BASED MODELING AND DIJKSTRA
Star in Finding the Shortest Path: A Tutorial. 2020 International ALGORITHM FOR SEARCHING VEHICLE ROUTES ON
Conference on Data Science, Artificial Intelligence, and Business HIGHWAYS. International Journal of Mechanical Engineering and
Analytics (DATABIA). doi:10.11 09/databia50434.2020. 9190342 Technology (IJMET), 9(8), 1273-1280.
[8] Gunawan, R. D., Napianto, R., Borman, R. I., & Hanifah, I. (2019). [28] Magzhan, K., & Jani, H. M. (2013). A review and evaluations of
IMPLEMENTATION OF DIJKSTRA'S ALGORITHM IN shortest path algorithms. International journal of scientific &
DETERMINING THE SHORTEST PATH (CASE STUDY: technology research, 2(6), 99-104.
SPECIALIST DOCTOR SEARCH IN BANDAR LAMPUNG).

183 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

A Systematic Literature Review of Fintech


Investment and Relationship with Bank in
Developed Countries.
Almira Rahma Saphyra Raesita Zahra Noerlina
Accounting & Information Systems, Accounting & Information Systems, Information Systems Department,
School of Information Systems School of Information Systems School of Information Systems
Bina Nusantara University, Bina Nusantara University, Bina Nusantara University,
Jakarta, Indonesia, 11480 Jakarta, Indonesia, 11480 Jakarta, Indonesia, 11480
[email protected] [email protected] [email protected]

Abstract— The financial industry has been evolving for discussing the disparities in consumer reaction between
decades, especially in developed countries. People in developed industries. As a result, Japan's regional bank is debating the
countries can adapt to technological changes more quickly. As differences in customer reactions across industries. IT
a result, the potential for the growth of fintech companies is investment in the banking industry is critical because it has
growing, owing to an increase in the number of investors
the potential to impact bank operations and the growth of
involved in investing in fintech companies. The introduction of
new fintech product developments demonstrates this. At the sophisticated new financial products[4]. TFirms may
customer interface as well as in back-office operations, this implement new goods, services, and operational processes
evolution is marked by improved connectivity and information as a result of technological advancements, gaining a
processing speed. Which, by collaborating with banks, one of competitive edge and market share. Market reactions differ
the latest technologies will help all parties. A large number of by industry. IIT investments in the banking industry are
innovative financial goods, business finance, financial related especially important because they can have a significant
tools, and new forms of communication are all part of digital impact on bank operations and profitability by enabling the
finance. As a result, this article explores the state of fintech creation of new sophisticated financial products, alternative
capital and investment in developed countries by defining
distribution mechanisms to the conventional branch
research, insights, trends, relationships, impacts, and
challenges that are most relevant to developed countries in network, or new improved technology that can minimize
terms of fintech investment. Fintech and banks, as financial bank costs over time[1].
services, also play a significant role in the global financial Fintech, such as P2P Lending can help the country's
market. As a result, by collaborating with fintech and banks, economy and finances by providing solutions to financing
the financial services business model will become more problems for small and medium enterprises[4]. This
effective and profitable. However, as this study shows, this problem has triggered competition between P2P Lending
partnership has its own set of implications and challenges for and Bank Lending, where both of them offer financial
fintech investment. services to the public[5]. Bank lending has weaknesses in its
Keywords— FinTech; Fintech and Banks; Fintech investment,
operational system that still uses traditional methods, while
Developed countries.
P2P Lending offers convenience through the technology
I. INTRODUCTION used in its operational system. However, P2P also has its
FinTech (financial technology) is a financial innovation drawbacks because they are operated online as a whole and
that connects the financial system and technology [1]. without providing a special identity, the risk of illegal
Although the subject of fintech has received a lot of activities on the stock market such as money laundering and
attention from financial institutions all over the world, there credit fraud is also higher. This also affects the negative
have been few studies on how banks can create a system for response factors of the stock market to fintech companies,
fintech integration. In the Fintech Integration Phase, the because investors are reluctant to invest in fintech
Bank will provide guidance to other financial institutions. companies that are too risky[1]. Therefore, if fintech
This analysis covers the entire fintech integration process, companies and banks collaborate, it will have a good impact
from revealing internal department needs to fintech on both of them where banks can increase the market value
integration completion[2]. Investment in Fintech refers to of fintech and banks can also overcome their weakness in
the use of information or network technology to improve the adjusting to technological changes in order to survive[5] and
quality or protection of financial services or operational [6].
processes in investing[3]. Today's there is a lot of research literature that discusses
The Bank is no longer the only hub for all financial fintech companies because of their very rapid development.
services, thanks to FinTech's adoption of new technologies. These literatures discuss fintech companies on a rice line or
The TFSB was alerted, and all banks in Taiwan were given specifically into one type of fintech for example, such as
official orders to devise strategic plans to change their Peer to Peer (P2P) Lending. From these literatures, it is
human resource structures through training. Disruptive stated that the investment of fintech companies can have a
innovation, on the one hand, is a blend of "Finance" and positive and negative impact on a country's financial
"Technology" that can create new companies and job activities. Furthermore, there is also research literature that
opportunities. As a result, Japan's regional bank has been discusses the relationship between fintech companies and

978-1-6654-4002-8/21/$31.00 ©2021 IEEE 978-1-6654-4002-8/21/$31.00 ©2021 IEEE

184 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

banks as financial service providers for a country. The


problem discussed by the researchers is driven by the fact
that the two organizations can have an impact on each other
as discussed previously. However, based on our knowledge
there is still limited research literature discussing the
impacts that occur in investment-related fintech and bank
cooperation. Therefore, this literature review aims to
determine how the cooperation relationship between fintech
companies and banks, especially in developed countries, can
affect investment activities in fintech companies.
This literature review is structured within these sections
as follows. In the second section, Methodology provides the Fig 1. Systematic Literature Review Procedures[10].
selected method for interpreting the problem by explaining B. Research Question
how problems are identified, collected or selected based on The research questions and motivation addressed by this
the appropriate data contained in the previous research literature review are shown in Table 1.
literature. The last section as Research Results is devoted to
explaining the answers to the research question that was TABLE I. Research Questions on Literature Review
made in the previous section based on the data analysis
method used. Then, it ends with conclusions that can be ID Research Question Motivation
drawn from the research results of this paper in order to
complement and add insight into the research problem. RQ1 What are the most related Identify most related
studies in the field of studies about fintech
fintech investment in investment in developed
II. METHODOLOGY developed countries? countries
A. Review Method
This research literature raises the issue of how the
relationship between fintech companies and banks to fintech RQ2 What is the basic Identify developed
investment based on data from previous studies. Therefore, perspective of developed countries basic
this research literature uses the Systematic Literature Review countries on fintech perspective on fintech
method based on the Kitchenham Guidelines for database investment? investment
processing. The methodology of Systematic Literature
RQ3 How has fintech Identify the evolved of
Review is one of literature review methods designed for investment in developed fintech investment in
identifying, evaluating, and interpreting all research that is countries evolved over developed countries over
relevant to a specific subject and research question. This time? time
method is used to minimize the bias of extraction data by
finding the gaps in the research topics and summarized into a RQ4 What is the relationship Identify the relationship
new research[7]. between fintech between fintech
This Systematic Literature Review (SLR) is related to the investment and banking in investment and banking
areas of financial technology (fintech) investment activity, to developed countries? in developed countries
find gaps among the investment activities of financial
technology companies in some developed countries[8]. This RQ5 What are the impacts of Identify the impacts of
fintech cooperation and fintech cooperation and
is implemented by mapping the literature related to capital banks in developed banks in developed
and investment fintech in developed countries, then countries on fintech countries on fintech
connecting it with the main research questions that have been investment? investment
researched to date in research contained in indexed
databases. RQ6 What are the challenges of Identify the challenges of
Furthermore, for this SLR is carried out in three steps: fintech cooperation and fintech cooperation and
planning, implementing and reporting literature reviews[9]. banks in developed banks in developed
Step 1, planning provides identification of the needs of SLR countries on fintech countries on fintech
and preparation of a protocol review. Step 2, the investment? investment
implementation of explaining the identification of the
research, the strategies used in the selection of literature, the From the primary studies, the evolved, the impact, the
results of selecting the literature, and the results of data challenges to answer the RQ3 to RQ6 are extracted. RQ3 to
extraction. Step 3, reporting containing the results of data RQ6 are the main research questions and can be determined
analysis which have been described in detail and represented by the literature review above. The remaining question RQ1
through tables and graphs. to RQ2 is to help us evaluate the content of primary studies.
RQ1 to RQ2 gives us the summary and synopsis of
particular area research in the field of fintech investment in
developed countries. It is important to look at the track
record of fintech companies on the capital and investment
value of these companies, especially in developed countries
that are involved in investment activities and how the
impact of fintech activities on the value of fintech itself.

185 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

C. Search Strategy To store and manage search results, researchers use the
The research process is carried out in several activities: Mendeley software package (http://mendeley.com). The
first selecting journal sources through a digital library, then detailed search process and the number of studies identified
searching for relevant topics, sorting the year of publication at each phase are shown in Figure 1. The study selection
and type of journal, and finally sorting journals according to process was divided into two stages, as shown in Figure 1:
the main study and search strings through the digital primary study exclusions based on title and abstract, and
library[9]. Finding data for the research literature begins primary study exclusions based on full text. Excluded from
with determining keywords that are relevant to the research the study are literature reviews and other research that do
problem. Henceforward, determine the world-renowned not contain experimental findings. The inclusion of studies
digital library as a journal database source in order to obtain is also a measure of the study's similarity to software defect
research that is more competent and recognized in the global prediction[9].
scope. Furthermore, a well-known digital library can
coverage the literature from around the world, thus it can
provide varied insights and information as well[9]. The
following is a list of digital libraries that are used as sources
in this research literature:
● ScienceDirect (sciencedirect.com)
● Springer (springerlink.com)
● IEEE eXplore (ieeexplore.ieee.org)
● Google Scholar (scholar.google.com)
In this research literature uses the following string search
steps:
1. Identification of search strings based on the problems
discussed in the form of a research question.
2. Identification of search strings based on research-
related topics.
3. Identification of search strings based on titles and
abstracts from related journals.
4. Identify search strings based on keywords in related
journals. Fig 2. Selection of Primary Studies
Search string was eventually used in this literature E. Data Extraction
review: (Fintech Investment OR Capital Investment Fintech) The next section, after selecting and collecting the
AND (Relationship OR Collaborative OR Cooperative) primary studies based on the data study selection step
AND (Fintech and Banks) AND (Developed Countr*). carried out in the previous section. Afterwards, the final list
After adjusting the search string, the main study journals of primary studies is extracted which aims to design data
used in this research literature still adjust the specific needs extraction, thus researchers can produce accurate and actual
of the main problem. This research is limited to journals or information from the primary studies. Furthermore, this data
proceedings published by digital libraries in English, and extraction is needed to collect all the information that can
also limited in the last 5 years. answer the previously determined research questions.
D. Study Selection Hence, this research literature can produce new information
The inclusion and exclusion criteria were used for selecting or can complement existing research information and is
the primary studies. These criteria are shown in Table 2. useful for further research. The properties in the data
TABLE II. Inclusion and Exclusion Criteria on Literature Review extraction table are used to define each of the research
questions.
Inclusion Methods papers relating to systematic reviews of TABLE III. Data Extraction Table
Criteria capital and fintech investment in developed countries Property Research Questions
Studies that discuss the impact and challenge of capital
Trends of Related Topic RQ1
and fintech investment in developed countries Research
Fintech Investment in RQ2
For duplicate publication on the same study, only the Developed Countries
most complete and newest one will be included Development of Fintech RQ3
Studies without a strong validation or including
Investment
Exclusion
Criteria experimental result of capital and fintech investment in Relation between Fintech RQ4
developed countries Investment and Banking
Impacts of Fintech Cooperation RQ5
Studies where the duration date of publication was with Banks on Fintech
before 2016
Investment
Studies where discusses about fintech investment in Challenges of Fintech RQ6
developing countries Cooperation with Banks on
Fintech Investment
Studies not written in English

186 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

III. RESEARCH RESULT and standardize back-office processes and services even
Through this section we discuss research results. This more. Banks can gain comparative advantages in a rising
section aims to collect relevant information from several market by integrating FinTech into their operations in a
different studies as evidence from our study which is timely manner[13].
described in a research question. So, based on the evidence,
this information can be used as a guide in preparing this
research literature which is expected to be useful in the
future.
A. Trends of Related Topic Research
This research literature produced a list of primary studies
for selecting study criteria in accordance with the discussion
of this literature. Thus, in the literature this study found 16
main studies related to fintech investment in developed
countries. Based on these 16 main studies, it can be seen
how the trends in the field of research on fintech investment
in developed countries are in a certain period of time.
Research trends on this topic are presented in the 2016-2021
vulnerability to see the interest of researchers in this field,
Fig 4. Investment in FinTech in Developed Countries[12].
where these trends are described in Figure 3. It can be seen
C. Development of Fintech Investment
in the picture based on our literature review in 2018 that was
Before the words ‘‘Digital Finance" and ‘‘FinTech," the
the year that most published research that discussed related
word ‘‘e-Finance" was coined to describe the use of
topics. Meanwhile, the figure also shows that in 2018-2020
information and communication technology in the financial
interest in research related to fintech investment in
sector in the 2000s[14]. The development of fintech
developed countries is decreasing. As a result of our review
companies also influenced by the potential based on
of the literature, fintech companies continue to develop
geography in developed countries, showing a more
today. Therefore, research related to fintech investment in
developed countries is currently still relevant. significant potential. Where, current research shows the
United States highlighting fintech developments that focus
on venture capital investment and start-ups[15].
IT investment has become an essential component of
every bank's business operation, regardless of its scale. The
survey shows that banks' responses to the growth of FinTech
companies around the world vary: some banks start
incubation projects, others set up investment funds to
FinTech companies, while others corporate as partners by
acquiring FinTech companies or launching their own
FinTech subsidiaries, transforming the business operational
model into a new digital system, entered into a collaboration
and partnership with fintech derive a new product
innovation[13] and [16]. They are evolving dramatically,
with a growing interest in investing in digitization and
Fig 3. Graphic represented trends of related topic research
fintech innovation that can help them boost the quality and
B. Fintech Investment in Developed Countries
cost-effectiveness of their business processes, implement
The "fintech revolution" is supported by a significant
new services, improve customer support, and enhance the
segment of the population that recognizes the technology of client experience. Free APIs, registers and ledgers,
millennial consumers, as well as the additional need for
cybersecurity, and regtech are the fintech initiatives that
financial services that the Bank must win and maintain in
banks are most interested in[14].
order to survive[5]. With the development of exciting
internet and mathematical algorithms, as well as the
completion of low-cost stock trading. This made it easier for
companies like Boursorama and Cortal in France, Banco
BIC in Portugal, and Binckbank in the Netherlands,
Belgium, and France to enter the market. The payment
system's main breakthroughs ranged from manual
debit/credit entries in a consolidated ledger to check
computer readers, which then facilitated electronic
payments between investors and borrowers[11].
Investment in FinTech companies is increasing in all
major European developed countries although there is much
cross-country heterogeneity Furthermore, as the FinTech Fig 5. Survey on Fintech Investment growth.
business expands, it will become increasingly vital. It will D. Relation between Fintech Investment and Banking
have an impact on the financial sector's overall stability[12]. Fintech is an interpretation of innovation technology
Recent developments have prompted banks to enhance their related to financial services that offer automated financial
FinTech investments, rethink service distribution channels, information processing through systems and the

187 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

internet[17]. Meanwhile, banks also act as providers of


financial services in a country to collect public funds and
facilitate public investment activities. The two financial
institutions have an important role in the financial sector in
the world today. Other than that, because the business model
adopted by fintech is still relatively new by offering
convenience through its technology, it makes banks as
competitors unable to adapt to the business model as a
whole[17]. However, the presence of banks cannot be fully
replaced by fintech, because banks also play a role in
investing in fintech companies that are still in the category
of young companies[6] and [18].
This, supported by digital processes and services carried
out by fintech can create more risk for borrowers and there
Fig 7. Impacts of fintech cooperation with banks
may be a lack of regulations with banks[18]. Based on these F. Challenges of Fintech Cooperation with Banks on
explanations, it can be concluded that fintech requires banks Fintech Investment
to channel funds, while banks need adaptation in the Finance often depends on and evolves along with
technology developed by fintech. Because the two technological advances. In the process, the financial sector
institutions need each other, it will be mutually beneficial if has become the most adept user of information and
there is cooperation between fintech and banks. Through communication technology. However, more than a decade
this collaboration, it can enable banks to increase fintech's ago, something unusual occurred in the relationship between
market entry opportunities. Afterwards, if fintech succeeds finance and technology[2]. The main challenge appears to
in entering the stock market, it can increase the revenue of be to be establishing a distinct ecosystem for banks and non-
the fintech company. Then, through this income, fintech can bank providers that is controlled and supervised properly.
invest by creating new products that may also be With the advent of digital technologies, borrowers and
implemented in banks[6]. investors can now connect instantly. FinTech is a way to
lower marginal costs and increase productivity. They may
indicate a large stockpile of intangible assets that are
difficult to value in capital markets, blurring industry lines
and posing major privacy, policy, and compliance
issues[11]. According to estimates, the most common field
in the financial sector that attracted the largest amount of
investment in FinTech companies since the 2008 financial
Fig 6. Relationship between fintech and banks.
crisis was payment-related, accounting for 70% of total
E. Impacts of Fintech Cooperation with Banks on Fintech financial technology investment in FinTech as a challenge to
Investment conventional banking[13].
The collaboration between fintech and banks is a
solution for both parties who both still have weaknesses in IV. CONCLUSION AND FUTURE WORKS
providing financial services[2]. This collaboration is aimed This research reviews some of the literature related to
at obtaining the most effective and efficient financial service fintech investment and the relationship that has existed
business model[5]. Thus, this collaboration can have an between fintech companies and banks in developed
impact on reaching a wider market prey, obtaining greater countries over the last 5 years. After conducting a literature
profits, and increasing competitive advantages for both review analysis, the trend of researchers' interest in
institutions[5] and [13]. The relationship created in the discussing related topics led to reduced results over the last
collaboration between fintech and banks provides strength three years based on several digital library sources such as
for both of them and has an advantage over banks that are IEEE, ScienceDirect, Springer, and Google Scholar.
only transaction-oriented and fintech companies that are still However, the investment ecosystem of fintech companies in
classified as young companies. developed countries is showing rapid development. This is
Banks that adopt technology developed by fintech can indicated by a significant increase in the distribution of
make transactions that exist in banks more scalable and investment funds for fintech companies in several developed
cost-effective[5]. Fintech companies that have support from countries such as the UK, Spain, and France [12].
banks enables fintech to create new products that are more As a result of the increased distribution of investment
optimal and stable, thereby reducing the risk of financial funds to fintech companies. Which, the development of
services and increasing the confidence of investors and fintech investment in developed countries based on previous
borrowers to invest in fintech companies[6]. Through this research shows that investor interest is getting bigger in
collaboration, banks can assist fintech companies in investing in digitalization and fintech innovation. Thus,
determining their regulatory infrastructure, thus the through increasing the distribution of investment funds in
transaction process won’t cause a lack of bank regulations. fintech companies. This also has an impact on increasing the
This also affects the opportunity for fintech's market entry to availability of credit because many investors are interested
increase the investment value of the fintech company itself in investing in fintech innovation[12]. Because through this
[6]. innovation, it can be possible to improve the quality,
effectiveness, and efficiency of the business model related
to financial services.

188 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

According to the findings of this study, both fintech relationship that occurs between fintech companies and
companies and banks have weaknesses. Fintech, as a new banks.
company, still needs a fund distributor to support its growth, REFERENCES
thus it will be unable to replace the bank's presence in the [1] F. Takeda, K. Takeda, T. Takemura, and R. Ueda, “The impact of
financial industry. Meanwhile, the bank's business model information technology investment announcements on the market
value of the Japanese regional banks,” Financ. Res. Lett., p.
requires an update in order to maintain its existence and
101811, 2020, doi: 10.1016/j.frl.2020.101811.
expand its market opportunities. On the one hand, the rise of [2] O. Acar and Y. E. Çitak, “Fintech Integration Process Suggestion
fintech presents a new challenge for banks; on the other for Banks,” Procedia Comput. Sci., vol. 158, pp. 971–978, 2019,
hand, these challenges can be transformed into opportunities doi: 10.1016/j.procs.2019.09.138.
[3] J. L. Hung and B. Luo, “FinTech in Taiwan: a case study of a
that will help banks expand even more. As a result, financial
Bank’s strategic planning for an investment in a FinTech
market regulators must pay particular attention to non-bank company,” Financ. Innov., vol. 2, no. 1, 2016, doi:
financial service providers when it comes to managing 10.1186/s40854-016-0037-6.
customer information, reporting, and supervision. enough [4] S. Chishti, “How peer to peer lending and crowdfunding drive the
fintech revolution in the UK,” New Econ. Wind., pp. 55–68,
financial resources, etc[14]. Therefore, the collaboration
2016, doi: 10.1007/978-3-319-42448-4_4.
between fintech and banks will also help increase the value [5] M. Jakšič and M. Marinč, “Relationship banking and information
of the two institutions. technology: the role of artificial intelligence and FinTech,” Risk
In this study, developed countries explain how fintech Manag., vol. 21, no. 1, 2019, doi: 10.1057/s41283-018-0039-y.
[6] M. Bömer and H. Maxin, “Why fintechs cooperate with banks—
companies can have both positive and negative effects on
evidence from germany,” Zeitschrift fur die gesamte
the financial industry through investment growth. According Versicherungswiss., vol. 107, no. 4, pp. 359–386, 2018, doi:
to Fumiko Takeda's research[1], greater investment in 10.1007/s12297-018-0421-6.
fintech companies raises a threat to the financial industry, [7] P. H. Prastyo, A. S. Sumi, and S. S. Kusumawardani, “A
Systematic Literature Review of Application Development to
particularly banking, due to the wider fintech market
Realize Paperless Application in Indonesia: Sectors, Platforms,
opportunity. Nonetheless, in other developed countries, Impacts, and Challenges,” Indones. J. Inf. Syst., vol. 2, no. 2, p.
particularly in Europe, investment growth in fintech 32, 2020, doi: 10.24002/ijis.v2i2.3168.
companies is strongly encouraged. This also encourages the [8] E. Z. Milian, M. de M. Spinola, and M. M. d. Carvalho,
“Fintechs: A literature review and research agenda,” Electron.
development of banking-related product innovations, where
Commer. Res. Appl., vol. 34, no. September 2018, 2019, doi:
collaboration between the two institutions can be 10.1016/j.elerap.2019.100833.
established. [9] R. S. Wahono, “A Systematic Literature Review of Software
This collaboration can be realized in several ways, such Defect Prediction: Research Trends, Datasets, Methods and
Frameworks,” J. Softw. Eng., vol. 1, no. 1, pp. 1–16, 2007, doi:
as banks can implement a system developed by fintech,
10.3923/jse.2007.1.12.
banks can acquire fintech to increase the market value of [10] B. Kitchenham, “Procedures for Performing Systematic
fintech companies, or through banks that invest in fintech Literature Reviews,” Jt. Tech. Report, Keele Univ. TR/SE-0401
companies to support the development of fintech products. NICTA TR-0400011T.1, vol. 33, p. 33, 2004, [Online]. Available:
http://www.inf.ufsc.br/~aldo.vw/kitchenham.pdf.
One of the most important sectors of the global economy is
[11] G. B. Navaretti et al., “EUROPEAN ECONOMY BANKS,
financial services. Traditional banks, according to experts, REGULATION, AND THE REAL SECTOR FINTECH AND
should prioritize digitization in the long run. This allows for BANKING. FRIENDS OR FOES? FROM THE EDITORIAL
the creation of challenges in order to ensure the bank's DESK FinTech and Banks: Friends or Foes? A Bird Eye
(Re)view of Key Readings by José Manuel Mansilla-Fernández
future growth. As a result, FinTech companies and
ARTICLES Digital Disruption and,” Papers.Ssrn.Com, 2018,
conventional banks can become both rivals and partners at [Online]. Available:
the same time, but collaboration is necessary for banks and https://www.econstor.eu/handle/10419/200276%0Ahttps://papers
can be mutually beneficial[13]. .ssrn.com/sol3/papers.cfm?abstract_id=3099337.
[12] W. S. Frame, L. Wall, and L. J. White, “Technological Change
From an institutional standpoint, developed countries'
and Financial Innovation in Banking,” Oxford Handb. Bank., pp.
financial institutions are moving faster into the FinTech 261–284, 2019, doi: 10.1093/oxfordhb/9780198824633.013.10.
sector, which encourages conventional financial institutions [13] I. Romanova and M. Kudinska, “Banking and fintech: A
to adapt their business and service models on a regular challenge or opportunity?,” Contemp. Stud. Econ. Financ. Anal.,
vol. 98, pp. 21–35, 2016, doi: 10.1108/S1569-
basis. FinTech is becoming increasingly technology-driven,
375920160000098002.
with emerging innovations such as big data, AI, and [14] V. Soloviev, “Fintech Ecosystem in Russia,” Proc. 2018 11th Int.
blockchain becoming more widely adopted in the financial Conf. &quot;Management Large-Scale Syst. Dev. MLSD 2018,
sector in developing countries. The financial services sector 2018, doi: 10.1109/MLSD.2018.8551808.
[15] E. Knight and D. Wójcik, “FinTech, economy and space:
will look very different in the future than it does now.
Introduction to the special issue,” Environ. Plan. A, vol. 52, no. 8,
Landscapes will be expected to be more competitive and pp. 1490–1497, 2020, doi: 10.1177/0308518X20946334.
effective in the future. Customers will have more options as [16] T. J. Kudryavtseva, A. E. Skhvediani, and A. A. Bondarev,
a result of this. In the financial services sector, new value “Digitalization of banking in Russia: Overview,” Int. Conf. Inf.
Netw., vol. 2018-Janua, pp. 636–639, 2018, doi:
propositions, goods, services, and markets will arise[11].
10.1109/ICOIN.2018.8343196.
Through the findings of this research, it is hoped that it [17] P. Gomber, J. A. Koch, and M. Siering, “Digital Finance and
will provide insight and information regarding the FinTech: current research and future research directions,” J. Bus.
development of the financial industry sector, especially the Econ., vol. 87, no. 5, pp. 537–580, 2017, doi: 10.1007/s11573-
017-0852-x.
growth of investment in fintech companies. Furthermore, we
[18] A. V. Thakor, “Fintech and banking: What do we know?,” J.
presume this study can provide an understanding of the Financ. Intermediation, vol. 41, no. July, 2020, doi:
10.1016/j.jfi.2019.100833.

189 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Enhancement Design for Smart Parking System


Using IoT and A-Star Algorithm
Briant Stevanus Suharjito Arief Agus Sukmandhani
Computer Science Department, Computer Science Department, Computer Science Department,
Binus Graduate Program – Master of Binus Online Learning, Binus Online Learning,
Computer Science, Bina Nusantara University, Bina Nusantara University,
Bina Nusantara University Jakarta, Indonesia 11480 Jakarta, Indonesia 11480
Jakarta, Indonesia 11480, [email protected] [email protected]
[email protected]

Abstract— Finding parking slot inside packed parking area can actual available slot in order to lessen the probability for
be frustrating sometimes. Multiple cars chasing single parking multiple cars chasing single space phenomenon.
space phenomenon often happens, and numbers of cars wasting Since PIS Buffed was developed by its predecessor PIS,
vehicle distance without knowing where to go, and hopefully found therefore the goal of BPIS and PIS is to share as much
empty parking slot, or a car that about to leave the occupied
information as possible with drivers so that drivers can use
parking slot. This study will focus to increase IoT environment
functionality on parking area with the help of A* path finding that information to make decisions on where to park their
algorithm to accurately pinpoint driver to vacant parking slot, vehicles. In this study, researchers design and implement a
determined by nearest building entrance, and remotely reserved the smart parking system model, using the concept of IoT
parking slot prior when the vehicle got the parking ticket. In this technology by focusing on improving the smart parking
study we also discuss about few possible scenarios, further discuss system by implementing the A* (a-star) algorithm and
about the sensors, and the system architecture. studying the appropriate sensor placement so that it is
expected to reduce parking search time for motorists. This
Keywords: Smart Parking System, IoT, sensors, smart building.
research will also modify the business processes of the current
parking service company. Our research will only identify each
I. INTRODUCTION parking slot based on its score. The parking slot score is the
Parking service companies in big cities generally have a value of the distance from the parking entrance gate, and the
similar business model process. The challenge they face today score to the nearest entrance. Every vehicle that passes
is managing their parking slots and with all their customers' through the entrance gate will get a parking ticket that has a
cars during peak hours, such as weekends and holidays. predetermined parking slot. With this idea, we can retrieve the
During peak hours, the parking area will be crowded with buffer value like in BPIS, because we have mapped which
vehicles looking for available parking slots. The application of vehicle to a particular parking slot based on its location, and
technology and the internet is expected to solve this problem. the connected sensors will also provide the actual updated data
The use of smart parking systems is not a new thing being to the server. This can be informative information for drivers,
developed. There are three scenarios available for parking so they will not enter occupied parking areas, as in the BPIS
service companies to choose, so that they are implemented system.
into their business processes, namely [1]: Blind search:
Parking service company won’t equip their parking area that
will help drivers to get any kind of information that will help
drivers to find available parking slot. Parking Information
Sharing (PIS): in this scenario, parking service company will
help driver to find available slot, using lamp or other indicator
to indicate the availability of the parking slot. However,
because there are sharing media to display the information
about the availability, it triggers another phenomenon called
multiple cars chasing single space when the parking area were
crowded. Buffed PIS: this method was upgraded version from Fig. 1. SOMA phases—a fractal model of software development
PIS, adding information display about the availability of
certain area inside parking area. Buffed PIS usage still left Our research will use this sitemap to represent the simulation
small room for improvement because of its dependency on area as in figure 1. This sitemap represents 12 sections which
buffer number as we prevent multiple cars chasing single have 177 parking slot capacity [2]. For our research purpose
space phenomenon. In practice, you need to decrease the we shall re-arrange way flow as stated in previous research.

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

190 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

II. LITERATURE REVIEW our research purpose. There are also other components like
A. Smart Parking System sensors and algorithms that will connect the business process
with the parking system.
A smart parking system is the application of technology in Our research will measure the distance needed by the
parking management so that it can make it easier for users of vehicles from the entry gate until they finally reach their
parking services to be more comfortable using the parking designated parking slot. After we got the result for each
services provided. The smart parking system allows parking scenario, we will test data normality, if the data is already in
service users to find out how many parking spaces are normal form, we will do paired sample T-Test, while if the
available, where the available parking spaces are, to anticipate data is not in normed form, we will do a Mann-Whitney test.
congestion that may occur in the parking area. Congestion or This research is planned to run following the following
density is an anomaly that occurs because the available research steps:
parking space has fewer numbers when compared to the
number of cars entering the parking area to find available
parking spaces (Wang & He, 2011). This phenomenon not
only causes congestion in the parking area but also causes
uneconomically wasted vehicle fuel [3].
B. Path Finding Algorithm
A* (A-star) pathfinding algorithm is an approach to finding
the distance to reach a destination. The A* algorithm is one of
the best when applied to a static network or environment [2].

TABLE I
RELATED RESEARCH Fig. 2. Research Design
Literature Advantages Disadvantages
Source
Reservation based Parking slot Inequality of distributing IV. BUSINESS FLOW
parking system [1] reservation is reservations to the target, With this research, IoT implementation must fit with the
available and leaving multiple cars
possible chasing single space
existing business flow. It must be co-existing since
phenomenon unsolved, implemented IoT devices have to understand and know when
causing ethical problems. a car is being directed to designated parking slot represented
A new “Smart Can detect and No data utilization and no by IoT sensors:
Parking” System allocating path allocation
Infrastructure and available parking
Implementation slot, utilizes
[4] WIFI network
Image recognition Able to detect No spot allocation
for parking system plate license
[5] number, detect in
and out vehicle
Car Park Best network Every vehicle needs to
Management with architecture and implement extended
Networked use RFID to devices, more cost in RFID
Wireless Sensors detect every operational, no slot
and Active RFID vehicle allocation, no parking spot
[5] status report to system
IoT Based Smart Use IoT and No business process further
Parking System integrating with enhancement, still using
[6] cloud computing booking slot, no slot and
path allocation Fig. 3. Business flow process

a. Open Entry Gate, this is the first phase on phases to


Creating a system that needs to complement each other
come, every vehicle will get parking pass. This parking
features is the main goal of SPARKS. It covered from
pass will be issued on entry button was pressed,
business process to the small detail of sensor type we use in
automatically system will reserve nearest parking slot
this case.
near entry door. This system will also update parking slot
III. METHODOLOGY status from vacant to reserved.
b. Guide to nearest parking slot, on every parking pass
Business flow is one of the crucial parts of this research as the
given, there will be guidance to reach designated parking
business flow will determine whether our smart parking
slot on vehicle, this steps also include parking slot
architecture will complement the business flow or not. After
number. These steps will help drivers to remember where
we got the suitable business flow, we create a system design
they parked their cars.
on top of it. System design will describe how IoT prototype
works together with business processes effortlessly to achieve

191 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

c. Car parked, when vehicle reach and parked on turn red when there is vehicle presence, yellow for reserved,
designated parking slot, system will read vehicle and green for vacant.
presence and update parking slot status into occupied.
d. Update Parking Slot Status, Parking slot that already
marked with occupied and or reserved status will not be
given to another vehicle until that vehicle already leave
that parking slot.
e. Car Leave Parking Slot, When the vehicle leaves the
designated parking slot, sensor will read that vehicle
already left that slot. Although sensors won’t directly
update the status, approximately sensors will keep
reading vehicle presence for 60 (sixty) seconds, if sensor Fig. 5. Slot parking with sensors and indicator lamp
keeps returning value that indicate there is no vehicle on
site, then system will update parking slot status to vacant.
f. Car Heading to Exit Gate, in this phase, vehicle
heading exit gate, system don’t have any parts and
control.
g. Scan Parking Ticket, this phase necessarily determines
how much time vehicle spent inside parking building.
h. Payment, Payment process was not included in this
research. We won’t discuss any further about payment
process.
V. SMART PARKING SYSTEM WITH A-STAR ALGORITHM Fig. 6. SPARKS (Smart Parking System) Architecture
MODEL
On SPARKS implementation, every time sensors reads
A. System Design
vehicle presence, IoT device will send data using HTTP
After we determined business process we use, we tried to request protocol to connected router, to application server via
picture connection between business process, IoT Prototype, internet. Next, application server will update parking slot
and server side on Figure 4. Process Flow Diagram. status to the database.
B. IoT Prototype

Fig. 7. System Design

We describe smart parking system into three (3) main


components:
1) Prototype: In building an IoT prototype we use a
framework. This framework is an application of the
Fig. 4. Process Flow Diagram bottom-up approach of product development. Starting
from determining all the main components, to the
System can also give reports to parking management about software used to build the prototype of this IoT device.
parking duration, just like ordinary parking system. Except our
system can also allow drivers to get information where they
will park their vehicle rather than searching blindly trough
crowded parking area. Although our research doesn’t include
payment service inside the research boundary, and we don’t
accommodate driver who parks their vehicle on other parking
slot that does not belong to their reserved parking slot.
Figure 5 explain that in this research parking slot was level
with the floor in the parking slot, this setting intended so Fig. 8. Smart Parking Prototype
parking sensor can read vehicle presence. Indicator lamp will

192 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

In the prototype that we use, each sensor will be


connected to an Arduino device that has a wifi shield so
that it can communicate directly to the application server.

Fig. 12. Photoresistor

c) Connectivity: In connectivity parts, we implement


Arduino Uno board, and wi-fi shield as we only need
to implement header pins into available female
header pins. While we must use jumper wire to
Fig. 9. Prototype of sensor implementation for parking area
connect photoresistor setup on the breadboard to
a) Prototype Infrastructure: In this phase of our female header pins available on protoboard shield, or
research, we discuss about hardware and software female header pins available on Arduino boards.
that we will use. For hardware, firstly we choose
Arduino boards, as microcontroller. As in software
we use Arduino IDE for Arduino compiler, for server
side we use php and MySQL as database
management.

Fig. 13. Photoresistor with protoboard shield

We use protoboard as it is so simple to use and on its


placement. There will be no difference using
protoboard or ordinary breadboard. We use wireless
Fig. 10. Arduino Uno communication protocol from our prototype to router
and will be relay to the server via internet.
We also use Arduino wi-fi shield, to accommodate d) Analytics: Embedded in our prototype, we add simple
wireless data transmission. We choose Arduino wi-fi logic that let it know when to send the information
shield as it known at its simplicity on implementation. based on event triggered. It necessarily needed to
We choose Arduino development environment because avoid server get unnecessary information from
of its vast support group, and reliable brand for prototype. Each prototype can be set to know each
microcontroller. threshold value depends on its location, and lighting
condition.
e) Smart Apps: On presentation layer, we will provide
information display media that will show available
parking area and occupied area to drivers to reach
their designated parking slot.
2) Algorithm
Fig. 11. Arduino wi-fi shield
We use A* (A-star) path planning algorithm to give
b) Sensors: After we determined hardware and software drivers direction to designated parking slot through
infrastructure, the next step was choosing the right minimum distance available. A* (A-star) is a great option
sensor to detect vehicle on parking slot. We chose to implement on closed and controlled area that have
photoresistor to act as sensor to detect vehicle consistent routes. Our algorithm only needs to scan and
presence, photoresistor have several advantages like count distance on the beginning of implementation and
easy to get, low operational cost, and cheaper price only when routes changes happened. The algorithm will
compared to other type of sensors [7] direct drivers through shortest path available based on
Photoresistor will return analog value in integer form, node calculation as seen in figure 1. We derived figure 1
as in the concept photoresistor work as electric to this node mapping.
current resistor. Photoresistor will decrease its
returned value as light intensity rise.

193 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

The following is a form of LDR senor light detection


sensor (left), and BH 1750 (right). Both are two different
sensor modules, but the principle is the same.

Fig. 14. Node Mapping

After we got the node mapping, we use is as a guideline to


determined which node nearest to the available parking slot. In
this case each parking slot, have to know the nearest node
located. This will help system to create path. In this simulation
we ran, we use this distance as distance benchmark.
3) Sensors
The object detection sensor we use is a light sensor. Our
SPARKS device is positioned embedded on the floor of Fig. 18. SPARKS Prototype Sends Data to Server
the parking area, so it can detect if there are objects
above the sensor. The application of a "threshold" sensor value area can
solve the problem of varying sensor values so that each
device does not require special settings.
C. Data Collection
Data collection, we use eight (8) parking slot samples for each
Fig. 15. Sensors
simulation occurred and every parking slot will be tagged by
unique ID.
Photoresistor is chosen on our product because of its low
operational costs [7]. Even though photoresistor will D. Data Processing
need adjustment depending on location and lighting After we have data stored in a table that represents our
condition, but in indoor controlled environment like simulation, the first thing we do is test the normality of the
parking building, we only need to add extra adjustment data, using the Shapiro test, after that, we find that our data is
on threshold on sensors to read and detect vehicle non-parametric data, so we continue with the Mann-Whitney
precisely. We use a “threshold” in determining the which will be explained in the evaluation section.
existence of an object that we have set on the SPARKS
E. Evaluation
device.
The data we gathered from simulation we have and conduct
statistic test. On our case, we have non-parametric dataset, so
we do Mann-Whitney statistic test on our dataset.

Fig. 16. Prototype placement

We can display the information from screen that have browser


on it, so mobile devices that connected to the internet can
Fig. 19. Mann-Whitney test
access the information easily.
We got p-value = 6.062e-10, on following hypothesis
tA = distance using A*algorithm
tA' = distance without using algorithm
H0 ∶ tA ≥tA'
H1 ∶ tA<tA'
So, with statistic test we receive H1 as our research result.
On more depth review we stored all parking slot with their
distance and optimum path, so we can display the information
faster the next time it needed to be displayed.
Fig. 17. S SPARKS recommendation

194 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

pp. 690–695, 2011.


[2] L. Cheng, C. Liu, and B. Yan, “Improved hierarchical A-star algorithm
for optimal parking path planning of the large parking lot,” IEEE Int.
Fig. 20. System path recommendation to selected parking slot Conf. Inf. Autom. ICIA, no. July, pp. 695–698, 2014.
[3] P. G. Höglund, “Parking, energy consumption and air pollution,” Sci.
Total Environ., vol. 334–335, pp. 39–45, 2004.
VI. RESULTS & ANALYSIS [4] Y. Geng and C. G. Cassandras, “A new ‘Smart Parking’ System
After we collected all the data we need and ran some analysis, Infrastructure and Implementation,” Procedia - Soc. Behav. Sci., vol.
54, no. may 1877, pp. 1278–1287, 2012.
we obtained a gap distance of one hundred and seventy meter
[5] F. Shaikh, B. S. Nikhilkumar, O. Kulkarni, P. Jadhav, and S.
(170 m) from node six (6) and counting on minimum path Bandarkar, “A Survey on ‘ Smart Parking ’ System,” J. Clean. Prod.,
distance recommendation less than path needed without using vol. 11, no. 2, pp. 9933–9939, 2015.
A* algorithm. [6] A. Khanna and R. Anand, “IoT based smart parking system,” Int. Conf.
Internet Things Appl. IOTA, pp. 266–270, 2016.
[7] M. Bachani, U. M. Qureshi, and F. K. Shaikh, “Performance Analysis
of Proximity and Light Sensors for Smart Parking,” Procedia Comput.
Fig. 21. System path recommendation to selected parking slot Sci., vol. 83, no. Ant, pp. 385–392, 2016.
[8] M. Paasivaara, C. Lassenius, and V. T. Heikkilä, “Inter-team
From data snapshot on figure 21, we can see reduced distance coordination in large-scale globally distributed scrum: Do scrum-of-
needed if we use SPARKS and without it on the same parking scrums really work?,” Int. Symp. Empir. Softw. Eng. Meas., pp. 235–
slot. 238, 2012.
[9] Y. I. Alzoubi, A. Q. Gill, and A. Al-Ani, “Empirical studies of
geographically distributed agile development communication
challenges: A systematic review,” Inf. Manag., vol. 53, no. 1, pp. 22–
37, 2016.
[10] C. Mann and F. Maurer, “A case study on the impact of scrum on
overtime and customer satisfaction,” Proc. - Agil. Confernce 2005, vol.
2005, no. August 2005, pp. 70–79, 2005.
[11] A. M. AlMutairi and M. R. J. Qureshi, “The Proposal of Scaling the
Roles in Scrum of Scrums for Distributed Large Projects,” Int. J. Inf.
Technol. Comput. Sci., vol. 7, no. 8, pp. 68–74, 2015.
[12] J. Sutherland, A. Viktorov, J. Blount, and N. Puntikov, “Distributed
Fig. 22. SPARKS and Conventional data histogram scrum: Agile project management with outsourced development
teams,” Proc. Annu. Hawaii Int. Conf. Syst. Sci., no. May 2014, 2007.
On overall data collected, SPARKS overall distance doesn’t [13] L. D. Sienkiewicz and L. A. Maciaszek, “Adapting scrum for third
surpass 200 meter while Conventional parking slot, for some party services and network organizations,” Fed. Conf. Comput. Sci. Inf.
slot surpass 300 meters on the distance needed. Syst. FedCSIS, no. January 2011, pp. 329–336, 2011.
[14] M. L. Drury-Grogan, “Performance on agile teams: Relating iteration
VII. CONCLUSION objectives and critical decisions to project management success
factors,” Inf. Softw. Technol., vol. 56, no. 5, pp. 506–515, 2014.
Enhancing smart parking system with path planning
[15] S. V. Shrivastava and U. Rathod, “Risks in Distributed Agile
algorithm, SPARKS is only as good as the business process Development: A Review,” Procedia - Soc. Behav. Sci., vol. 133, pp.
itself. It means if SPARKS and the business process not 417–424, 2014.
carefully planned, it won’t give the best benefit of it. SPARKS [16] P. Rola and D. Kuchta, “Implementing Scrum Method in International
will outshine other methods if implemented on large parking Teams—A Case Study,” Open J. Soc. Sci., vol. 03, no. 07, pp. 300–
area, that have more than six (6) nodes parking area. Path 305, 2015.
planning algorithm surely can reduce distance needed by [17] Y. Lu and L. Da Xu, “Internet of things (IoT) cybersecurity research: A
driver and directly can spot dedicated parking slot. review of current research topics,” IEEE Internet Things J., vol. 6, no.
2, pp. 2103–2115, 2019.
[18] D. Bandyopadhyay and J. Sen, “Internet of things: Applications and
REFERENCES challenges in technology and standardization,” Wirel. Pers. Commun.,
[1] H. Wang and W. He, “A Reservation-based Smart Parking System,” vol. 58, no. 1, pp. 49–69, 2011.

195 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

E-Learning Service Issues and Challenges:


An Exploratory Study
Indriani Noor Hapsari Armando Rilentuah Parhusip Sawali Wahyu
Information System Program Information System Program Informatic Program
Universitas Esa Unggul Universitas Esa Unggul Universitas Esa Unggul
Jakarta, Indonesia Jakarta, Indonesia Jakarta, Indonesia
[email protected] [email protected] [email protected]

Imam Sutanto Gerry Firmansyah Ainur Rosyid


Informatic Program Magister of Computer Science Elementary Education Program
Universitas Esa Unggul Universitas Esa Unggul Universitas Esa Unggul
Jakarta, Indonesia Jakarta, Indonesia Jakarta, Indonesia
[email protected] [email protected] [email protected]

Abstract—The use of e-learning in higher education has been 90’s, but the entire adoption of e-learning commenced 20
known since the late 90's albeit its development and adoption years later after the Covid-19 pandemic hit the world. Due to
remained a slow process in higher education institutions. The unanticipated situations, many are still not ready and facing
adoption of e-learning is gaining traction as the Covid-19 difficulties in implementing, managing, and using e-learning.
pandemic hit the world in 2019 and become a necessity for all
educational institutions. Although today the institutions have Ministry of Education and Culture of Republic Indonesia
opted for e-learning as an alternative way to carry out the has been promoting the “Kampus Merdeka” program since
learning process, many are still not ready and facing difficulties January 2020 to enable networked-based education among
in implementing, managing, and using it. This research aimed universities and industries. This program requires universities
to investigate the issues and challenges of e-learning to make continuous learning improvements since the open e-
implementation during the Covid-19 pandemic at Universitas learning service becomes one of the university’s key
Esa Unggul. This research was conducted using exploratory performance indicators. Therefore, today e-learning has
research method by conducting literature study, observation, become a necessity for all educational institutions.
interview, and survey to examine the lecturer’s and student’s
experiences and perspectives of the e-learning system. The This paper aims to gain further understanding of the issues
questionnaire was adapted from an Information Technology and challenges of e-learning in general. A literature review
Service Management perspective to gain data of e-learning was conducted to identify e-learning problems and challenges
satisfaction, availability of e-learning facilities, e-learning ease to date. Furthermore, an empirical inquiry was conducted at
of use, availability of guidelines, availability of system supports, Universitas Esa Unggul to explore the perception of the
and to get feedbacks from the student and the lecturer. Based student and the lecturer about their learning experience during
on the study, we identify the following seven e-learning issues: the Covid-19 pandemic from March to October 2020. This
1) e-learning infrastructure, 2) e-learning system integration, 3) study addresses the following research questions: (RQ1) What
e-learning policy, 4) e-learning support, 5) individual workload, are the issues and challenges of e-learning from the past
6) timeliness, and 7) interactivities. E-learning infrastructure is research to date? (RQ2) What are the issues and challenges of
a dominant challenge for Universitas Esa Unggul that requires e-learning during the Covid-19 pandemic at Universitas Esa
special attention to improve on its support and regulation. This
Unggul?
finding gives implications that improving e-learning service is
essential to provide a better learning experience. The e-learning II. LITERATURE REVIEW
service includes the infrastructure, the system integration, the
policy and regulation, IT support, the lecturer, and the There has been a fair amount of research on e-learning
interactivity. This research could contribute to the design of an problems and challenges. However, study about e-learning is
effective model of e-learning service by giving more structured still growing as the challenges of each institution continue to
guidance on how to do readiness self-assessment and factors to change along with increasing scale and complexity of various
be focused on. technologies and pedagogical models [3]. E-learning
problems and challenges come from numerous aspect, from
Keywords—e-learning, issues, challenges, IT service, higher
education
the individual learner [4]–[6], the faculty [7][8][9], the
collaboration among the student and the teacher [10], the
I. INTRODUCTION infrastructure [9][11], and its policy and regulation [3], as
Information technology has been widely adopted in shown in Table 1.
universities through the use of e-learning. E-learning is The problem of individual learners is related to student’s
considered as a way to deliver a sustainable and high quality ability to self-regulate during the learning process. Distance
education to as many students as possible [1]. E-learning learning shifts the control of learning that were previously
offers flexibility for the students to learn anytime and carried out by educators or peers into individual learner [4].
anywhere with their own learning pace. However, despite its Self-regulated learning appears to be important for learners
benefits, e-learning adoption in higher education remained in e-learning environments due to the high degree of learning
slow [2]. E-learning initiatives had been known since the late autonomy and physical absence of the teacher [5]. However,

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

196 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

not all learners have the same ability to self-regulate with identifies several significant strategic points for planning,
minimal guidance. Learners drop out is caused by a variety including: develop and communicate a change strategy;
of reasons including having no one to ask for help, lack of clarify changes to roles and responsibilities; establish a
time due to other more important priorities, lack of incentive, coherent implementation plan that addresses all relevant
insufficient prior knowledge, and inability to understand issues; set explicit targets; ensure adequate support; involve
course content [6]. all stakeholders; institute pilot projects; promote early
successes; implement evaluation procedures and be
Table 1 E-Learning Issues and Challenges from Literature
responsive to user feedback; address ongoing maintenance
E-Learning Issues and and upgrading [12].
No Reference
Challenges
1 Individual Learner [4][5][6] III. METHODS
2 Lecturer [7][8][9] This research was conducted using exploratory research
3 Interaction and Collaboration [5][10] method to gain understanding about issues and challenges of
4 Infrastructure [9][11] e-learning during the Covid-19 pandemic at Universitas Esa
5 e-Learning Policy and Regulation [3][12] Unggul. The study began with the observation of e-learning
delivery process at Universitas Esa Unggul and interviews
with university’s IT supports from March to October 2020.
The second e-learning challenge is related to the lecturer.
The observation aimed to examine the delivery process, the
Faculty load in online learning appears higher than those in availability of the guidelines, and the system support. To have
traditional learning [7][8][9]. Teaching an online course, more in-depth analysis, a survey was deployed to examine the
including course preparation, requires six times more effort lecturer’s and student’s experiences and perspectives of the
than a face-to-face course [8]. Lecturers require more time to e-learning system. The questionnaire was adapted from an
deal with final exams, grade computations, and Information Technology Service Management perspective
communicating with students before grades are posted to consisting of six likert scale questions, and one open-ended
transcripts [7]. Furthermore, the amount of time to teach an questions. This survey aims to gain data of e-learning
online class increases directly with the number of enrolled satisfaction, availability of e-learning facilities, e-learning
student [8][9]. According to a study in Kenya, lecturer ranked ease of use, availability of guidelines, availability of system
heavy workloads as the most serious challenge affecting the supports, and to get feedbacks from the student and the
adoption of e-learning [9]. lecturer about the implementation of e-learning during the
The third e-learning challenge is the minimal interaction Covid-19 pandemic at Universitas Esa Unggul.
among the student and the teacher [5], [10]. This causes The survey was conducted in October 2020 using the
students feel isolated and not connected to their learning convenience sampling method involving a total of 510
communities. Interaction increases social presence and lecturers and students from 10 different faculties at
appears important to maintain student’s motivation. Thus, Universitas Esa Unggul. The questionnaire was set in a
collaborative activities should be incorporated into learning google form and distributed through the lecturers’ WhatsApp
instruction. Difficulty in establishing social presence was group, and with the help of lecturers, this survey was
apparently a serious barrier for teacher to promote distributed to their respective students. The data collected
collaboration at a distance [10]. were statistically described and analyzed using percentage to
The next e-learning challenge is the infrastructure [9] [11]. identify the main e-learning issues at Universitas Esa Unggul.
Infrastructure is one of the most notorious challenge The results were then compared to the previous research to see
commonly found in developing countries [9]. The lack of its consistency and relevancy.
infrastructure, e-learning technology, internet access, and
IV. RESULT AND DISCUSSIONS
poor quality of internet services impact both learners and
faculty members [11]. A. Results
The fifth e-learning challenge is related to its policy and A total of 510 participated in this survey, consisting of 86
regulation [3][12]. The problem of technology is not the lecturers and 424 students from 10 Faculties at Universitas Esa
technology but rather its implementation [12]. According to Unggul. As shown in Table 2, 61.6% of students are new to e-
Marshall, e-learning implementation should be based on learning with less than one year on using it, with the most
explicit e-learning document plan for the deployment, widely used device is laptop.
maintenance, and retirement of the technologies [3]. Westera

Table 2 Participant characteristics


n (%) n (%)
Lecturer (N=86) Student (N=424)
Faculty Faculty of Computer Science 33 (38.4%) 107 (25.2%)
Faculty of Education 5 (5.8%) 31 (7.3%)
Faculty of Psychology 2 (2.3%) 1 (0.2%)
Faculty of Communication Science 5 (5.8%) 125 (29.5%)
Faculty of Economics and Business 11 (12.8%) 118 (27.8%)
Faculty of Law 8 (9.3%) 2 (0.5%)

197 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Faculty of Physiotherapy 3 (3.5%) 19 (4.5%)


Faculty of Health Sciences 10 (11.6%) 3 (0.7%)
Faculty of Design and Creative Industries 3 (3.5%) 12 (2.8%)
Faculty of Technology 3 (3.5%) 4 (0.9%)
Faculty of Postgraduate Studies 1 (1.2%) 0 (0.0%)
Others 2 (2.3%) 2 (0.5%)

Length of "< 1 year" 24 (27.9%) 261 (61.6%)


using
E-learning "1-2 years" 20 (23.3%) 135 (31.8%)
"3-5 years" 28 (32.6%) 26 (6.1%)
"> 5 years" 14 (16.3%) 2 (0.5%)

E-learning Home 79 (91.9%) 390 (92.0%)


Access
Location On campus 3 (3.5%) 2 (0.5%)
Cafe/ Public Space 0 (0.0%) 4 (0.9%)
Others (ie office, coworking space, etc) 4 (4.7%) 28 (6.6%)

Device Smartphone 4 (4.7%) 77 (18.2%)


Laptop 67 (77.9%) 293 (69.1%)
Personal Computer 15 (17.4%) 54 (12.7%)

To make it easier to understand the level of respondent Based on the survey results, the overall e-learning
satisfaction, the 5 Likert scale questionnaire was simplified satisfaction was 34.7%, of which only 33.7% of lecturers and
into 3 scales, namely satisfied, neutral, and unsatisfied as 34.9% of students were satisfied. 71.6% of students who are
shown in Table 3. new to e-learning have a higher level of satisfaction.
However, lecturer satisfaction is not influenced by the length
Table 3 E-learning service management satisfaction
at Universitas Esa Unggul of use of e-learning.
More than 50% of participants prefer using online
n (%) n (%) synchronous discussion feature. However, this feature will
Lecturer Student
(N=86) (N=424) not be affordable for nearly 30% of participants who had little
Overall E- access on e-learning facilities and infrastructures.
learning Satisfied 29 (33.7) 148 (34.9) Overall user perception on e-learning support is quite low
Satisfaction Neutral 27 (31.4) 177 (41.7) for the lecturer compared to the student. 39.5% lecturers
found it is not easy to access the e-learning guidelines and
Unsatisfied 30 (34.9) 99 (23.3)
Availability of documentation and 44.2% lecturers feel there is not enough
e-learning Sufficient 34 (39.5) 173 (40.8) support for e-learning skills and training. Lecturer found
facilities and Neutral 23 (26.7) 132 (31.1) difficulties in setting the configuration on e-learning
infrastructures activities.
Insufficient 29 (33.7) 119 (28.1)
The availability of e-learning helpdesk is perceived
Ease of Use of
Online Easy 52 (60.5) 224 (52.8) insufficient for 50% lecturer and 43.9% students.
Synchronous Neutral 17 (19.8) 143 (33.7)
Infrastructure and e-learning problems are frequently
Discussion occurred due to new added auto synchronized feature
Feature Not Easy 17 (19.8) 57 (13.4) between e-learning and SIAKAD (UEU’s academic
Ease of Access
of E-Learning Easy 22 (25.6) 196 (46.2) information system). This feature enables synchronization
Guidelines and from SIAKAD to e-learning system to minimize lecturer’s
Neutral 30 (34.9) 158 (37.3)
Documentation repetitive work in uploading the e-learning materials.
Not Easy 34 (39.5) 70 (16.5) Unfortunately, this synchronization didn’t set the right
E-learning skills configuration, thus require the lecturer to reconfigure it
support and Sufficient 23 (26.7) 206 (48.6)
training
manually.
Neutral 25 (29.1) 120 (28.3) Participants’ feedback about e-learning issues were
Insufficient 38 (44.2) 98 (23.1) collected and classified into the following seven categories,
Availability of namely 1) e-learning infrastructure, 2) system integration, 3)
E-learning Sufficient 19 (22.1) 111 (26.2)
e-learning policy, 4) e-learning support, 5) individual
support/
helpdesk
Neutral 24 (27.9) 127 (30.0) workload, 6) timeliness, and 7) interactivities, as shown in
Insufficient 43 (50.0) 186 (43.9) Table 4.

198 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Table 4 e-learning issues at Universitas Esa Unggul new auto synchronized feature between e-learning and
that are perceived by respondents SIAKAD (UEU’s academic information system). It aims to
n (%) n (%) automate the process of uploading e-learning material from
Lecturer Student SIAKAD to e-learning, that previously being done by
(N=86) (N=424) lecturers manually. However, it didn’t set the right
Infrastructure 59 68.6% 122 28.8% configuration after the synchronization. Thus, the lecturer
had to reconfigure every synchronized learning activity. In
System Integration 42 48.8% 38 9.0%
addition, the limited documentation provided by the third-
Policy 29 33.7% 0 0.0% party developer caused the synchronization problems could
Support 24 27.9% 6 1.4% not be identified immediately. Thus, the lecturers had to re-
upload the teaching material that were not synchronized by
Timeliness 0 0.0% 31 7.3% the system. This affected the delay of course material
Workload 3 3.5% 13 3.1% availability, causing the student's load piled up at the end of
the week.
Interactivity 0 0% 23 26.7%
3. E-learning Policy
Regulations related to e-learning changed without an
The result shows only four categories are the common
explicit plan. Therefore, it was not well-socialized to all of
issues for both lecturer and student, and the other three
the stakeholders. In addition, sudden changes also affect the
categories have varied. The common issues faced by both
high workloads of lecturers in the course preparation and the
students and lecturers are the infrastructure, the system
learning process.
integration, e-learning support, and e-learning workload.
4. E-learning Support
Lecturers have more concern on e-learning policies,
A high number of problems experienced by lecturers and
meanwhile, the students have more concern on e-learning
students mainly caused by infrastructure and system
timeliness and interactivity.
integration problems. Furthermore, only few IT staffs were
available for the system support. Due to this limitation, not
B. Discussions all user complaints receive immediate response from the
This research indicates that the dissatisfactions of e- staff. Although there is a ticketing system to accommodate
learning were widely spread from the e-learning user’s complaints, it is not widely used and not well-
infrastructures to the interactivity. The infrastructure socialized to the user. Most of complaints were delivered
problems seen as the most common issues faced by both through a Telegram group or direct message, therefore the
student and lecturer at UEU that caused the retardation of the number of complaints and the problem resolved didn’t well
courses, such as the late submission and the late grading. documented. In addition, new system deployment was carried
System integration problems added up the faculty workload out without any notification. The user had to wait in uncertain
in reuploading and reconfiguring the e-learning activities, times because they didn’t know when or how long the
thus causing the unavailability of e-learning material as maintenance was undergoing.
scheduled. In addition, sudden changes caused by new 5. Faculty’s / Student’s Workload
regulations affect the faculty workload significantly, thus Online learning increased both faculty’s and student’s
causing the e-learning material delays and reducing the workload at UEU. Complying to UEU regulation, lecturers
interactivity. The infrastructure timeout, the system miss- had to provide more variation to students’ learning activities,
configuration, the lack of e-learning training, and sudden such as video/text modules, quiz, and assignment.
changes in regulation provoked the abundance inquiries to the Furthermore, they had to participate in students’ discussion
helpdesk that led to insufficient assistance to the lecturers and and conduct a weekly assessment for asynchronous learning
the students. The detail explanation of each issue are or having a virtual synchronous learning. Meanwhile,
described as follows. students had to complete at least four activities each week,
1. E-learning Infrastructure namely 1) read/watch the course module, 2) complete the
The biggest e-learning issues at UEU during its first year quiz, 3) participate in a discussion, and 4) submit the
of full online learning adoption was lack of infrastructure assignment for asynchronous learning activities or having a
capabilities. At that time, UEU server couldn’t handle request virtual synchronous learning.
from thousands of its user simultaneously at peak times. In Teacher workload increases as the number of enrolled
addition, the same weekly learning cycle for all classes students increase 50% than face-to-face learning, and the
appears to be the principal cause of the undistributed load. All length to provide feedbacks/ to answer students’ inquiries
courses started on Monday and ended on Sunday, causing all extends much longer because asynchronous activities have
the students and teachers access e-learning at the same time. spreads 3 hours in class activities into 7 days learning
This reduces the productivity of lecturers and students since activities. Additionally, teachers have to assess student’s
it requires too much time and effort to submit the assignment submission in a weekly basis and prepare more learning
or to grade the assignment. As many as 68.6% of lecturers object materials.
and 28.8% of students complained about e-learning timeouts 6. Timeliness
that were happened almost every week. Timeliness refer to the availability of learning materials
2. E-learning System Integration as scheduled. The low punctuality at UEU is affected by
The second biggest e-learning issues at UEU was e- synchronization problems and the lack of lecturer preparation
learning system integration. In the mid-2020, UEU due to high lecturer workload.
implemented a new version of its e-learning platform with

199 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

7. Interactivity
Students have difficulties in understanding the learning The issues and challenges at Universitas Esa Unggul are
materials and expect more qualitative feedback on their still relevant with the previous study, with two additional
works than just a mere grade and expect more synchronous unique problems regarding the e-learning implementation at
learning activities. Although only a few participants UEU, such as the system integration and timeliness. The
mentioned individual learning and interactivity problems, it summary of its relevance to the previous work were
doesn’t mean there were little problems on these issues. summarized in Table 5.
Participants’ focus on infrastructure problems appears
causing them overlook these issues.
Table 5 E-learning issues at Universitas Esa Unggul and its relevance to the previous research
No. e-learning Description Relevance to the previous research
issues
at UEU
1 Infrastructure Lack of infrastructure capabilities The lack of infrastructure, e-learning technology, internet
access, and poor quality of internet services impact both
learners and faculty members [11]
2 System Problems of system integration between e- -
integration learning and SIAKAD
3 Policy and System update and maintenance scheduled is The rationale for e-learning should be placed within an explicit
regulation not well informed plan and should be communicated to the stakeholders [1]
Changes in regulations are not well socialized
4 Support Complaints have not been formally managed Formal documentation of all student enquiries, questions, and
complaints needs to be mandatory in e-learning institutional
policy [1]
Inadequate skills The success in the implementation of E‑learning will not be
achieved without identifying the different skill, technical and
cultural challenges [14]
Inadequate training and technology would obstruct the
effectiveness of e-learning in education [13].
5 Timeliness The availability of learning materials as -
scheduled
6 Workloads The number of weeks of online classroom is Teaching an online course, including course preparation,
equal to the number of weeks of face-to-face requires six times more effort than a face-to-face course [8]
classroom even though online activities appears
to be higher than face-to-face activities. In
addition, the number of enrolled students per
class is 50% more than face-to-face learning,
with a total of 60 students per class.
The length to provide feedbacks/ to answer Student Counsel and Advisement Hours took the form of face-
students’ inquires extends much longer because to-face interaction either before or after class or during
asynchronous activities spread 3 hours in class scheduled office hours [7]
activities into 7 days learning activities.
7 Interactivity • Students have difficulties in understanding Learners drop out is caused by a variety of reasons including
the learning materials and got little feedbacks having no one to ask for help, lack of time due to other more
from the lecturer. important priorities, lack of incentive, insufficient prior
• Students are expecting more qualitative knowledge, and inability to understand course content [6]
feedback on their works than just a mere Difficulty in establishing social presence was apparently a
grade and expecting more synchronous serious barrier for teacher to promote collaboration at a distance
learning activities. [10]

C. Implications important to be able to uncover problems in the early phase.


The study gives implications that improving e-learning Besides, communicating the maintenance process to all of
stakeholders demonstrates a polite computing practice. A
service is essential to a successful e-learning implementation
polite software does not act arbitrarily, and does not use
in higher education. The service includes the infrastructure,
information without the permission of the owner [16].
the system integration, the policy and regulation, IT support,
Therefore, any changes in the maintenance process must be
the lecturer, and the interactivity.
Implication 1 - Infrastructure. Infrastructure plays a main informed to the user in advance and must ensure that the
role in providing a smooth e-learning experience. Therefore, information is not altered without the owner permission.
Implication 3 - Policy and Regulation. The rationale for e-
it is important to understand the usage and load capacity of
learning should be placed within an explicit plan and should
UEU service concurrently supports to be able to cope the user
be communicated to the stakeholders [1]. Therefore, new
requests at peak times.
policy and regulation should be incorporated into UEU’s e-
Implication 2 – System Integration. The application of a
careful change management practice may help smoothing the learning strategy blueprint and should be communicated to all
integration process by ensuring adequate support before the of the participants.
Implication 4 - IT Support. Adequate supports ensures the
system can be fully run into production. Involving all
e-learning implementation success. Providing training and
stakeholders during the pre-implementation stage are also
guidelines will help the user learn how to use the system.

200 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Furthermore, formal documentation of all student enquiries, of e-learning service by giving more structured guidance on
questions, and complaints should be provided. This will help how to do readiness self-assessment and factors to be focused
UEU understand user problems and measure whether the e- on.
learning service meets the user requirement.
Implication 5 – The Lecturer. The availability of learning ACKNOWLEDGMENT
materials as scheduled is affected by the lecturer workload. We would like to thank all lecturers, students, and IT
Therefore, an incremental plan of e-learning material support teams at Universitas Esa Unggul for the abundant
preparation should be created by considering the faculty’s supports and helps.
workloads. In addition, course credits need to be adjusted in
online learning context. In a face-to-face course, credit hours REFERENCES
are based on the hours per week the students spend in the
classroom or lab, or “contact hours” with the students. A [1] S. Marshall, “eLearning Maturity Model Process Description.,”
2007.
course that meets for three 50-minute periods per week
during a full 16-week semester is considered 3 credit hours. [2] D. Birch and B. Burnett, “Bringing academics on board:
However, this “contact hours” is not applicable in online Encouraging institution-wide diffusion of e-learning
environments,” Australas. J. Educ. Technol., vol. 25, no. 1, 2009.
environment. Therefore, the credits need to be reconsidered
as time on task rather than contact time. The number of [3] S. Marshall, “Using the e-learning maturity model to identify good
practice in E-learning,” 30th Annu. Conf. Aust. Soc. Comput.
students enrolled in the e-learning also need to be adjusted. Learn. Tert. Educ. ASCILITE 2013, no. 2001, pp. 546–556, 2013.
According to Tomei (2019), the ideal class size for
undergraduate courses is 18 students for the traditional format [4] J. Wong, M. Baars, D. Davis, T. Van Der Zee, G.-J. Houben, and
F. Paas, “Supporting Self-Regulated Learning in Online Learning
and 12 students when teaching them online [7]. Environments and MOOCs: A Systematic Review,” Int. J.
Implication 7. Interactivity. According to the research, Human–Computer Interact., vol. 35, no. 4–5, pp. 356–373, 2019.
students are expecting more qualitative feedback on their
[5] T. Lehmann, I. Hähnlein, and D. Ifenthaler, “Cognitive,
works than just a mere grade and expecting more metacognitive and motivational perspectives on preflection in self-
synchronous learning activities. Therefore, the regulation regulated online learning,” Comput. Human Behav., vol. 32, pp.
needs to be adjusted to promote collaborative learning. The 313–323, 2014.
lecturer should be empowered and given more autonomy to [6] K. F. Hew and W. S. Cheung, “Students’ and instructors’ use of
design the learning activities that enable the collaboration as massive open online courses (MOOCs): Motivations and
long as learning outcomes are met. challenges,” Educ. Res. Rev., vol. 12, pp. 45–58, 2014.
[7] L. A. Tomei and N. Douglas,
V. CONCLUSION “Impact_of_Online_Teaching_on_Faculty_Loa.” 2019.
Advances in technology and pedagogical methods make [8] J. Cavanaugh, “Teaching online-A time comparison.,” J. Distance
the complexity of distance learning continue to increase and Learn. Adm. Content, vol. 8, no. 1, 2005.
become a cross-disciplinary research domain that is always [9] D. N. Mutisya and G. L. Makokha, “Challenges affecting adoption
interesting to study. The Covid-19 pandemic has contributed of e-learning in public universities in Kenya,” E-Learning Digit.
in accelerating the adoption of e-learning in universities and Media, vol. 13, no. 3–4, pp. 140–157, 2016.
has clearly opened up various issues and challenges faced in [10] M. Rannastu-Avalos and L. A. Siiman, “Challenges for Distance
universities. Based on an empirical study at UEU, the issues Learning and Online Collaboration in the Time of COVID-19:
and challenges at Universitas Esa Unggul are still relevant Interviews with Science Teachers.,” CollabTech 2020 Collab.
Technol. Soc. Comput., no. 128–142, 2020.
with the previous study, with two additional unique problems
regarding the e-learning implementation at UEU, such as the [11] M. Al-Balas et al., “Distance learning in clinical medical education
system integration and timeliness. The e-learning issues amid COVID-19 pandemic in Jordan: current situation,
challenges, and perspectives,” BMC Med. Educ., vol. 20, no. 341,
include 1) e-learning infrastructure, 2) e-learning system pp. 1–7, 2020.
integration, 3) e-learning policy, 4) e-learning support, 5)
individual workload, 6) timeliness, and 7) interactivity. These [12] W. Westera, “Implementing integrated e-learning: Lessons learned
from the OUNL case. In W. Jochems, J. van Merrienboer & R.
problems affect the overall e-learning satisfaction at UEU. E- Koper (Eds.),” Integr. E-learning Implic. Pedagog. Technol.
learning infrastructure found as a major challenge for UEU Organ. London RoutledgeFalmer., pp. 176–186, 2004.
that requires special attention to improve on its support and [13] T. FitzPatrick, “Key Success Factors of eLearning in Education: A
regulation. The success of e-learning implementation Professional Development Model to Evaluate and Support
requires commitment from management to provide e- eLearning.,” Online Submiss., vol. 9, pp. 789–795, 2012.
learning guidelines that well documented and communicated [14] L. Shahmoradi, V. Changizi, E. Mehraeen, A. Bashiri, B. Jannat,
to the participants. and M. Hosseini, “The challenges of E-learning system: Higher
This research gives implications that improving e- educational institutions perspective,” J. Educ. Health Promot., vol.
learning service is essential to provide a better learning 7, no. 116, 2018.
experience. The e-learning service includes the [15] J. Moody, “Distance education: Why are the attrition rates so
infrastructure, the system integration, the policy and high,” Distance Educ., vol. 5, no. 3, pp. 205–210, 2004.
regulation, IT support, the lecturer, and the interactivity. This [16] B. Whitworth, “Polite computing,” Behav. Inf. Technol., vol. 24,
research could contribute to the design of an effective model no. 5, pp. 353–363, 2005.

201 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Smart Electricity Meter as An Advisor for Office


Power Consumption
Muhamad Firman M. Muhammad Nooryoku R. Ferdinand Nathaniel E.
Computer Science Department, Computer Science Department, Computer Science Department,
School of Computer Science, School of Computer Science, School of Computer Science,
Bina Nusantara University Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia Jakarta, Indonesia Jakarta, Indonesia
[email protected] [email protected] [email protected]

Anthony Steven T. Jeffrey Clay S. Boby Siswanto


Computer Science Department, Computer Science Department, Computer Science Department,
School of Computer Science, School of Computer Science, School of Computer Science,
Bina Nusantara University Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia Jakarta, Indonesia Jakarta, Indonesia
[email protected] [email protected] [email protected]

Abstract— Technological advances are growing fast, with the II. LITERATURE REVIEW
presence of new technology that helps everyday work. Big
companies often have a lot of fluctuation in power usage, where A. Internet Of Things
there is no monitoring being done, there is no way of knowing Internet of Things is a concept where an object can/or is
the cause. There is a need for a system that can do the able to transmit data over a network independently without the
monitoring of power consumption. Internet of Things is a help of computer device or human interaction. The concept of
product of technological growth that is focused on helping
the Internet of Things (IoT) was first proposed by Kevin
humans manage their everyday activity, especially in controlling
electronics automatically. This research aims to create a system
Ashton in 1999, along with the invention of Radio Frequency
able to monitor power consumption in offices that could be Identification (RFID) technology. Its development continued
accessed from anywhere. Found that this system can predict into 2000 when the LG company presented the idea of making
electrical consumption every month. a smart refrigerator. Then in 2008, the use of the Internet
Protocol (IP) began to be embedded in the application of the
Keywords—Electrical monitoring, IoT, power consumptions, IoT concept. Until now, almost all the equipment that we use
monthly billing can be controlled and monitored with this IoT concept. [1]
Simply put, this IoT works by utilizing programming
I. INTRODUCTION algorithm code, this code that allows interaction between
Technology is growing in a lot of sectors right now. This devices so that the device can work automatically with the
growth is making the knowledge of technology even greater help of the Internet network. [2]
and advanced. Everyone needs electricity to do many things, B. Monitoring System
from using electronic devices to devices that do not belong in
electronics. Electronic technology is also growing rapidly, The monitoring system is a collection of certain elements
which contributes to the number of devices that use this that are connected to each other to achieve a main goal. [3]
technology, but the main power source of this technology is Monitoring is an activity of observing a condition regarding
electricity. certain behaviors or activities with the aim of obtaining
information or data from these activities. [4] So, the
Electricity is one of the things that became a necessity in monitoring system is an observation activity using a tool
this era, where it became the main source of human activities, against a particular system.
from lamps to computers, to handphones. The need for
electricity is growing by the day; this happened because many In designing the electricity usage monitoring system in this
companies produce devices that have a lot of different office, we need a microcontroller so that it can receive input
functions to help and fulfill humans' daily necessities. Humans signals from sensors. In our case, we use Arduino Uno, which
also keep buying devices that needed electricity to run is connected to the ESP-8266 so that it can be connected to a
properly. Especially in offices, there are a lot of devices such wi-fi network.
as computers, printers, lamps, and so on. Offices that use this C. Microcontroller
device need to monitor the electricity flow in each sector.
Microcontroller is a chip in the form of an IC (Integrated
The overuse and lack of monitoring of electricity has a Circuit) that can receive input signals, process them, and
huge impact to the economy of an office, which leads to the provide output signals according to the program that is loaded
need of a device to monitor the electricity without the need of into it. [5] The microcontroller input signal comes from the
human resources to monitor it every single moment, and the sensor which is in form of information from the environment
freedom to use it anywhere, so that the user can monitor the while the output signal is addressed to the actuator which can
consumption that has already happened without worrying have an effect on the environment. So, in simple terms, the
about the trouble of accessing the information itself. microcontroller can be considered as the brain of a
device/product that is able to interact with the surrounding
environment. Arduino Uno board microcontroller shown in
figure 1.

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

202 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

The specifications contained in the ESP8266 include:


• 2.4 GHz Wi-Fi (802.11 b/g/n, supporting
WPA/WPA2)
• General-purpose input/output (16 GPIO)
• Inter-Integrated Circuit (I²C) serial communication
protocol
• Analog-to-digital conversion (10-bit ADC)
• Serial Peripheral Interface (SPI) serial
communication protocol
• I²S (Inter-IC Sound) interfaces with DMA(Direct
Fig. 1. Arduino Uno Board Memory Access) (sharing pins with GPIO)
• UART (on dedicated pins, plus a transmit-only
Arduino Uno specification: UART can be enabled on GPIO2)
• Pulse-width modulation (PWM)
• Microcontroller: Microchip ATmega328P
• Operating Voltage: 5 Volts E. ACS712 Current Sensor
• Input Voltage: 7 to 20 Volts ACS712, shown in figure 3, is a current sensor that works
• Digital I/O Pins: 14 (of which 6 can provide PWM based on field effects. This current sensor can be used to
output) measure AC or DC current. [8] This sensor module has been
• UART: 1 equipped with an operational amplifier circuit, so that the
• I2C: 1 current measurement sensitivity is increased and can measure
• SPI: 1 small current changes. [9]
• Analog Input Pins: 6
• DC Current per I/O Pin: 20 mA
• DC Current for 3.3V Pin: 50 mA
• Flash Memory: 32 KB of which 0.5 KB used by
bootloader
• SRAM: 2 KB
• EEPROM: 1 KB
• Clock Speed: 16 MHz
Arduino is a microcontroller that is actually intended for
designers and artists with little experience in the field of
Engineering. Arduino can produce sophisticated prototype
designs and interactive artwork. [6] Fig. 3. ACS712 Current Sensor
Arduino is an input/output based open-source software.
The programming language that used on Arduino is C ACS712 has variants according to the maximum current,
language which has been simplified and is equipped with a namely 5A, 20A, and 30A. This ACS712 uses 5V VCC. The
library in it. [6] device consists of a linear, low offset, and accurate Hall effect
sensor circuit. When current flows in the copper wire pins 1-
D. ESP8266 Wi-fi Module 4, the Hall effect sensor circuit detects it and converts it into
ESP8266 is a wifi module that functions to connect a proportional voltage.
Arduino to wifi directly and make TCP/IP connections. This The following are the characteristics of the ACS712
module requires around 3.3v of power and has three wifi Current Sensor:
modes, namely Station, Access Point and Both. This module
is also equipped with a processor, memory and GPIO where • Has an analog signal with low noise
the number of pins depends on the type of ESP8266 that we • 80 kHz bandwidth
use. So that this module can stand alone without using any • Resistance within 1.2 mΩ
microcontroller because it already has equipment like a • Single source operating voltage 5.0V
microcontroller [7]. • Output sensitivity: 66 to 185 mV/A
• Output voltage proportional to current AC or DC
• Calibration fabrication
• Very stable output offset voltage
• Hysteresis due to magnetic field is close to zero
• Output ratio according to source voltage
III. RESEARCH METHOD
Research method used in this research shown in figure 4.
There are 5 stages used in this research.
1) Research Planning: In the early stages, a plan is
carried out by searching for various theories that can be used
Fig. 2. ESP8266 WI-fi module

203 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

as references related to research on the design of IoT Based monitoring system [14-15] that will be made can be seen in
Electrical Monitoring System for Offices. figure 6.
The Arduino circuit diagram will be used as a blueprint for
Research Planning Data Collecting making the prototype system of an electrical power
consumption monitoring tool based on Arduino Uno,
ACS712, ESP9266 and Android. From the circuit diagram
Prototyping System Design above we can see that the current sensor is connected to the
power source and the electronics (may vary depending on the
electronic devices that are being monitored). The Arduino uno
is the micro-controller that is used to process the data are that
Testing obtained from the current sensor. After the data has been
processed, the ESP8266 will send the data through the blynk
Fig. 4. Research Planning server to the blynk app on our smartphone [10]. There are 2
resistors that is used to prevent the wifi module from being
2) Data Collecting: Collecting data related to the over-voltaged.
application of the Internet of Things on electrical energy
monitoring tools, both in terms of the tools used and data
related to IOT itself.
3) System Design: The design of the arduino, esp, and
android-based monitoring system for electrical energy
consumption includes several circuit planning processes,
which include the design of a household electricity
consumption monitoring system, schematic design of the
monitoring system and the design of the acs sensor module
or 220v ac current sensor for data reading on android
application.
4) Prototyping: The prototype will be made based on
the previous system design. This device will be made using
Arduino Uno, ESP8266, and ACS712 to monitor electricity
5) Testing: At this final stage, a test of the prototype
that has been made will be carried out to check whether the
system is in accordance with what is desired and is able to
measure electrical energy consumption.
B. System Design

Fig. 5. Block Diagram

The system design aims to describe an electrical power


consumption monitoring tool based on Arduino Uno,
ACS712, ESP9266 and Android [11-13]. The design shown
in figure 5 is a simple block diagram of the system that
describes the layout of the components of the monitoring tool
for electrical power consumption. In the block diagram using
the ACS712 current sensor to detect electrical power and also
ESP8266 as a connector for the prototype device to the
internet network which will then be displayed via an android
smartphone. In making this prototype, we will use a cloud
system provided by blynk which is useful as a platform in
connecting the prototype device for the monitoring system of
electricity consumption to an android smartphone.
The circuit diagram design of the Arduino Uno, ACS712,
ESP9266 and Android-based electrical power consumption Fig. 6. Circuit Diagram Arduino

204 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

C. Prototyping As seen on figure 8, there are 3 parameters in the Blynk


The monitoring system for electricity consumption based ap. On the upper left the parameter shows the output current
on Arduino Uno, ACS712, ESP9266 and Android was made after being affected by the resistance. Because of the test
based on the system design in the previous stage which was results are between 0.04 until 0.16 ampere, there are no
then implemented into a prototype which can be seen in figure. changes in the parameter, since the results are under 0.5
7. We have combined several components by assembling ampere in which rounds down to 0.
them into a complete system. The red wire connects On the upper right the parameter shows the power usage
components such as the ESP8266 and the current sensor to the from the lamp. Then the bottom part shows the total price of
power on the Arduino uno. The black wire connects the electricity per month obtained from the PLN's middle category
components to ground. For the ESP8266, we use 2 resistors electricity cost calculation, which is over 200kVA with the
with 1k ohm and 2.2k ohm resistance on the RX pin to limit price of Rp 1,114.74 per kWh.
the incoming power to maximum 3.6 volts. Lastly, we use
220-Volt power from the wall plug to power the light. IV. RESULT AND DISCUSSION
From the results of the tests that we have done on the
research prototype using an 19-Watt LED lamp with a
voltage of 220 Volts for 5 minutes, some data are obtained
which can be seen in figure 9 below.

Fig. 9. Data result

It can be seen from figure 9, there are 2 parameters that


are used as a reference to display the graphic data above. On
Fig. 7. Prototype System the vertical side there are parameters in the form of Wattage
and on the horizontal side there are calculation results in the
D. Testing form of rupiah per month.
In this testing section, we use Blynk to display data from
electricity usage that is read through the ACS712 sensor. Our Σ EnergyCostPerMonth = ecp + (24.0 * 30.0 * (WH /
device is connected via the wifi module ESP 8266 to the blynk 1000.0) * (tarifPLN / 10000.0)
server which is then displayed on the Blynk application on the
smartphone. To view it can be seen in figure 8. The results of the calculation on the electricity usage per
month are obtained from the results of the Wattage
calculation multiplied by the PLN's electricity cost
calculation per KWH in 24 hours for 30 days.

TABLE 1. TEST RESULTS USAGE COSTS WITH COMPARE PROGRAM


CALCULATIONS AND MANUAL CALCULATION

Program Manual Difference


Error (%)
Calculation Calculation (-/+)
Rp2,54 Rp2,54 0 0
Rp3,20 Rp3,17 0,03 (+) 0,9
Rp3,87 Rp3,81 0,06 (+) 1,5
Rp4,53 Rp4,44 0,09 (+) 2
Rp5,20 Rp5,08 0,12 (+) 2,3
Rp5,94 Rp5,87 0,07 (+) 1,1

Based on table 1 it can be concluded from program


calculation test results and manual calculation of 19watt led
lamp for 5 minutes. There are some error, where the smallest
error is 0 percent and the largest is 2.3% so the tool still has
Fig. 8. Electricity Monitoring in Blynk Application

205 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

accuracy in calculations. As in explain in table 1.


V. CONCLUSION
After going through several processes of planning,
prototyping, and testing, the data obtained could support the
success rate of the Assembling IoT Based Electrical
Monitoring System for Offices research. From the results of
the development of this prototype, it was found that the system
has been able to run accordingly to the plan which is to be able
to monitor electricity usage in the office every month and the
results can be sent to android devices via the Blynk platform.
From the results of the comparison of calculations through the
program and manual calculations of the 19 watt led lamp, the
error obtained is fairly low, namely <10% with the highest
error rate of 2.3%.
REFERENCES
[1] S. Madakam, R. Ramaswamy, and S. Tripathi, “Internet of things
(IoT): A literature review,” J. Comput. Commun., vol. 03, no. 05, pp.
164–173, 2015.
[2] M. R. Adani, “Internet of Things: Pengertian, Cara Kerja, Contoh dan
Manfaat,” Sekawanmedia.co.id, 23-Nov-2020. [Online]. Available:
https://www.sekawanmedia.co.id/pengertian-internet-of-things.
[Accessed: 31-Mei-2021].
[3] Sutarman, Buku Pengantar Teknologi Informasi. Jakarta: Bumi
Aksara, 2012
[4] M.Lutfi Mustofa, Monitoring dan Evaluasi - Konsep dan Penerapannya
bagi Pembinaan Kemahasiswaan. Malang: UIN-MALIKI Press, 2012.
[5] H. A. Dharmawan, Mikrokontroler: Konsep Dasar dan Praktis.
Universitas Brawijaya Press, 2017.
[6] Y. M. Dinata. 2016. Arduino Itu Pintar. Jakarta: Elex Media
Komputindo,.
[7] T.-G. Oh, C.-H. Yim, and G.-S. Kim, “ESP8266 WI-FI MODULE
FOR MONITORING SYSTEM APPLICATION,” GLOBAL
JOURNAL OF ENGINEERING SCIENCE AND RESEARCHES,
vol. 4, no. 1, p. 1, 2017.
[8] Fransiscus, Harianto, and S. T. Rasmana, “RANCANG BANGUN
ALAT PEMBATAS ARUS LISTRIK DAN MONITORING
PEMAKAIAN DAYA PADA RUMAH SEWA BERBASIS
MIKROKONTROLER ARDUINO UNO,” Journal of Control and
Network Systems, vol. 5, no. 1, pp. 136–143, 2016.
[9] Tanto and Darmuji, “Penerapan Internet of Things (IoT) Pada Alat
Monitoring Energi Listrik,” Jurnal Elektronika, Listrik dan Teknologi
Informasi Terapan, vol. 1, no. 1, pp. 45–51, Jul. 2019.
[10] J. Galih, "Current measurement using Arduino Uno + ACS712 + Blynk
without error," 15-Oct-2018. [Online]. Available:
https://www.youtube.com/watch?v=ah0KezJSge0.
[11] T. Tukadi, W. Widodo, M. Ruswiensari, and A. Qomar, “Monitoring
Pemakaian Daya Listrik Secara Realtime Berbasis Internet Of Things,”
Prosiding Seminar Nasional Sains dan Teknologi Terapan, vol. 1, no.
1, pp. 581–586, 2019.
[12] I. S. Hudan and T. Rijanto, “RANCANG BANGUN SISTEM
MONITORING DAYA LISTRIK PADA KAMAR KOS BERBASIS
INTERNET OF THINGS (IOT),” JURNAL TEKNIK ELEKTRO, vol.
8, no. 1, 2019.
[13] B. Prayitno, “PROTOTIPE SISTEM MONITORING
PENGGUNAAN DAYA LISTRIK PERALATAN ELEKTRONIK
RUMAH TANGGA BERBASIS INTERNET OF THINGS,” PETIR,
vol. 12, no. 1, pp. 72–80, 2019.
[14] F. Istighfar, R. Kurniawan, and M. Yonggi Puriza, “RANCANG
BANGUN ALAT PENGENDALI DAN MONITORING KONSUMSI
PEMAKAIAN LISTRIK BERBASIS ARDUINO DAN APLIKASI
BLYNK,” PROCEEDINGS OF NATIONAL COLLOQUIUM
RESEARCH AND COMMUNITY SERVICE, vol. 3, pp. 109–112,
2019.
[15] A. R. Mutmainah and M. Hayaty, “Sistem kendali dan pemantauan
penggunaan listrik berbasis IoTmenggunakan Wemos dan aplikasi
Blynk,” Jurnal Teknologi dan Sistem Komputer, vol. 7, no. 4, pp. 161–
165, Oct. 2019.

206 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Building Natural Language Understanding System


from User Manual to Execute Office Application
Functions
Anis Cherid Edi Winarko Mujiono Sadikin
Faculty of Computer Science Department of Computer Science and Faculty of Computer Science
Universitas Mercu Buana Electronics Universitas Mercu Buana
Jakarta, Indonesia Universitas Gadjah Mada Jakarta, Indonesia
[email protected] Yogyakarta, Indonesia [email protected]
[email protected]

Afiyati Reno
Faculty of Computer Science
Universitas Mercu Buana
Jakarta, Indonesia
[email protected]

Abstract—To improve the flexibility of using office menu or the right components (buttons, checkboxes, text
applications, it takes a natural language interface that can fields, etc.) to complete tasks.
execute the office application’s functions. To achieve this, the
office application must have an inference engine that can An approach that can be used as a way out of this problem
automatically detect the intents and the entities contained in an is to use constrained natural language as an interface for
instruction, into an algorithm to execute the related functions in action-based applications [2]. Instead of getting results by
the application on the entities. Creating a natural language using a series of clicks on the mouse and taps on the
understanding system by manually listing various rules in a keyboard, for complex applications, it would be more efficient
knowledge base, is very inefficient and makes the effort of using if the user simply asked the application to execute the given
natural language to execute office application functions too instructions using the command sentence in the constrained
expensive. The author proposes conducting research to build a natural language.
natural language understanding system more efficient, by
analyzing the text contained in the office application user Various attempts to create computer systems that can
manual, by means of natural language processing technology. accept instructions in natural language have been made, but
We propose to build a variety of simple rules automatically, most of the systems require the provision of specialized
using the text in the user manual, which is specifically crafted to hardware and software, such as Smartkom [3] and Siridus
facilitate and support natural language processing technology. Project [4]. As a result, whenever a new system is needed, the
In future works, research will be conducted to automatically process of creating hardware and software must be done from
build various complicated rules, from analysis of the text from scratch. There is software that utilizes a natural language
common and commercially available user manual. In this interface and can be easily duplicated for use on various PC
preliminary study, the necessary steps to execute the application computer systems or smartphone devices, but they are only
functions, is executed on an office application prototype limited to dictation systems or systems that can receive short
specifically built for this study. instructions in natural languages that are identical to a click on
the mouse or a tap on the keyboard, such as in Dragon
Keywords—natural language understanding, natural
Anywhere [5] or in applications that can be used to generate
language interface, inference engine, natural language
processing, office application
patient medical reports [6].
In this study, we will focus on executing office
I. INTRODUCTION applications functions, which require the user to master the
One type of computer application is an action-based algorithm to divide a bigger goal to be achieved, into a series
application. This application is an application whose interface of simple, sequential steps. For example, to change the font
can be expressed in the form of actions that modify the status color of the word "xyz" to one of the standard colors, such as
of the application and in the form of predicates to query the red, the user must understand that he or she needs to run the
status of the application at a certain time [1]. Since the birth of following steps to achieve the goal:
Microsoft Windows's graphical user interface (GUI) in the
1. Highlight the word by double-clicking the mouse
1990s, most computer users have become accustomed to
operating personal computers and action-based applications 2. Search menu components to change font color
using a mouse and GUI consisting of windows and menus. 3. Click on the down arrow symbol on the "font color" menu
But as computer software gets more and more complex, it component to display a pop-up menu that displays a
becomes more and more difficult for users to complete variety of standard colors
complex tasks. The main reason for this is that GUIs 4. Click on the red color found in the pop-up menu.
consisting of menus and windows are also becoming The more complicated the goal to be achieved, the more
increasingly complex, making it more difficult to find the right steps the user has to master, or in other words the more
complicated the algorithm must be mastered by the user. For

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

207 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

example, to achieve the goal of "setting up a mail merge II. RESEARCH METHODOLOGY
template consisting of names and addresses", it takes far more To facilitate the research process, we conducted a reverse
steps than the previous example goal, making it increasingly problem solving method, starting from the final result to be
difficult for application users to learn and master the achieved and then moving backwards to the initial stage
achievement of those goals. It is even harder for user with needed to achieve the final result. This means that the research
disabilities to perform and execute these steps. In addition, it process starts from the process of creating an application that
would be more efficient if the steps to implement a goal can will be operated using a natural language interface and
be reduced. continued until the final stage of creating a natural language
To facilitate the process of learning how to use office interface that can be used to operate the application. In more
applications and to improve the efficiency of the use of office detail, the research steps used are as follows:
applications and also to make it possible for user with
1. Detailing the features of a simple office application
disabilities to use office application, it takes a natural language
prototype and then implement the various features of the
interface that can be used to provide instructions on the
operation of office applications, as a replacement for a series application, without providing natural language
of click actions on the mouse button or a tap on the keyboard understanding system.
key. The office applications will be more efficient to use and 2. Define natural language components and state components
will be more flexible (especially for user with disabilities), if in the application needed to execute various functions in
the user can carry out the intention to change the font color of the application prototype, so that it becomes a knowledge
a word, simply by saying the instructions "change the color of base that can accept queries.
the word xyz to red" or carry out the intention to prepare a 3. Pair the various natural language components/state
mail merge document template consisting of names and components that have been compiled in the knowledge
addresses simply by saying "mail merge name and address". base with function calls to be executed on the application
prototype.
To realize the goal of giving instruction directly with
4. Build a user manual for the application prototype,
natural language, the application must have an inference
engine that can automatically translate an instruction into an consisting of various natural language components
algorithm to achieve the intents or the objectives contained in identified in step 2, for each feature that the application has.
the instruction and then execute the algorithm. One way to 5. Create an algorithm to parse the instructions in the natural
create an inference engine that can determine the steps to language given by the application user, to be used in an
achieve the goal is to manually various rules in a knowledge inference process (with information retrieval approach)
base, which will be used by the engine in finding the required against the knowledge base, which has been generated in
set of steps. The process of creating engine inference by step 2.
manually creating various rules is a time-consuming process, 6. Provide an interface for application users to provide
so the utilization of inference engine in common cases, such instructions in natural language, in order to execute various
as in the case of the operation of this office application, features in the office application prototype.
becomes too expensive to implement. If all of the previous steps have been successfully
implemented, then it is expected that the algorithms and data
We propose conducting research to build an inference
structures needed for the knowledge base creation process in
engine more efficiently, by analyzing the text contained in the
step 2, are done automatically by performing natural language
office application user manual, through the use of natural
analysis and processing of text contained in the prototype
language processing technology. We plan to develop an
application user manual. This algorithm and data structure is
inference engine that can translate the operating instructions
expected to eliminate or at least minimize the manual process
of the office application given in Indonesian language, into
that must be done by a knowledge engineer in building the
action steps that must be taken in order for the instruction to
knowledge base. Thus, the final objectives of this study
be completed. As a first step, we will build various simple
associated with the efficiency of knowledge base creation is
rules automatically, using the text of the office application
as follow:
user manual, which is specifically created to facilitate the use
of natural language processing technology to build these 7. Change the manual process performed by the knowledge
simple rules. The specifically crafted user manual will be engineer to build the knowledge base, into an automated
written in the English language. In the future, research will be process performed by the inference engine. The inference
conducted to automatically establish various complicated engine is expected to automatically generate a knowledge
rules, from analysis of the text in the commercially available base containing various intent-object-state components
user manuals. that can be matched with the instructions or functions that
the application prototype has. The process of creating a
In this initial study, the natural language understanding
knowledge base is automatically done by conducting
system will execute the simple instructions within a specially
analysis and processing of natural language to text in the
built office application prototype, with some limited
user manual of the application prototype.
functionalities. In the future, research will be conducted so the
instructions can be a complex instructions, executed by a The process of pairing each natural language component
common full-blown office application, through the calling of in the knowledge base with the process of calling functions in
various functions contained in the application programming the application prototype, is not discussed in this study and
interface (API) of the commercially available office will be reserved for future works.
applications..

208 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

III. RESULT AND DISCUSSION 5. Insert a period or a semicolon after a specific word in the
To implement the various steps in this study, we choose to document, by double-tapping the word in question,
re-implement some of the functionalities of a kind office followed by touching the "add punctuation" button,
application specifically built to edit English-to-Indonesian followed by touching the "period" button or the "comma"
translated documents, as the result of free online translation button.
services such as Google Translate or Microsoft Bing 6. Insert a quotation mark or opening/closing parentheses,
Translate. Furthermore, the application can be used to before or after a certain word in the document, by double-
improve the translation results with minimum interactions tapping the word in question, followed by touching the
with the keyboard or mouse. This is possible because the "add punctuation" button, then continued by touching the
application is built to run on Android tablet devices, so it can "double quotation" button, the "single quotation" button or
make use of the touch screen to perform the document editing the "parentheses" button, then continued by touching the
process [7]. "before" button or the "after" button.
For this research, an application prototype was built based 7. Remove punctuation, before or after a specific word in the
on the document editing application discussed previously. document, by double-tapping the word in question,
Some features of the translated document editing application followed by tapping the "remove punctuation" button.
are rebuilt using an integrated development environment of In Fig. 1 and Fig. 2, one of the features of the prototype
the Microsoft Excel spreadsheet software. The selection of application that has been implemented is depicted, namely the
Microsoft Excel spreadsheet software is based on several feature to delete more than one word in the document. In Fig.
reasons, and the reasons are elaborated in the followings: 1, the prototype will highlight the word, after the user double-
taps the first word of the words set to be deleted. At the same
1. Spreadsheet software provides an interface that allows for time, the prototype will display a pop-up menu.
flexible experimentation to build interfaces, algorithms and
data structures of application prototypes. In addition to In Fig. 2, the prototype will highlight the entire word set,
easily manipulated nature of the worksheet interface in a after the user taps the last word in the word set to be deleted.
spreadsheet application, this worksheet can serve as an Then the entire series of highlighted words will be removed
from the document when the user taps on the "delete word"
interface as well as a place to store data (data structure)
button on the pop-up menu. In the application prototype built
needed by the prototype application.
with Microsoft Excel, the action of tapping a word or button
2. Microsoft Excel spreadsheet software can be integrated is implemented by clicking on the mouse button while the
with external programming environments based on the double-tap action is implemented by double-clicking on the
Python programming language, so that natural language mouse button.
processing algorithms, fully supported by various Python-
language programming libraries, can be easily integrated
with application prototypes being built in this study.
3. The author has done several studies whose end results are
prototype application built by using Microsoft Excel
spreadsheet software. Among the prototypes that have been
produced are a prototype of an English sentence-making
learning application in the topic of countable and
uncountable noun [8] and a prototype of an academic
report-generating application [9].
Some of the functionalities implemented in the application
prototype that will be built in this study are:
1. Move one word to another position in the document, by
touching the word to be moved and continue by touching
another word in the document that will be the destination
location of the word to be moved.
2. Move more than one word to another position in the
document, by double-tapping the first word of the entire set
of words to be pinned, followed by touching the last word Fig. 1. Application Prototype User Interface after the User Double-Taps the
First Word in a Series of Words to be Deleted
in the series of words to be moved and continued with, then
ending with touching another word in the document that
will be the destination location of the series of words.
3. Delete a single word in the document, by double-tapping
the word you want to delete, followed by touching the
"delete word" button on the application prototype interface.
4. Delete more than one word in the document, by double-
tapping the first word of the entire set of words to be
deleted, followed by touching the last word in the series of
words to be deleted, then ending with the touch of the
"delete words" button on the application prototype
interface. The order in which the first and last words appear
has no effect.

209 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

English, because the Python programming language library


used to perform the natural language processing is still based
on the English language model. In future works, a prototype
application will be developed that fully uses the Indonesian
language to recognize intents and objects in the instruction.
Once the features in the application prototype have been
mapped into various intent-object-state sets, what is to be
done next is pairing these sets with function calls on the
application prototype, including the function call arguments.
A function call on an application prototype is the realization
or execution of the various features represented by the intent-
object-state set in the knowledge base. Thus, the function call
pair and its function call argument, are also information that
must be recorded in the knowledge base.

TABLE I. SOME EXAMPLE OF APPLICATION FEATURES TRANSLATED


Fig. 2. Application Prototype User Interface after User Taps the Last Word INTO INTENT, OBJECT AND STATE IN THE APPLICATION
in a Series of Words to Be Deleted
Feature Intent/Object Precondition/State
Based on the various features that will be implemented in Move one word to Intent:  The word to be
the application, we formulate the various natural language another position in Move a word to a new moved exists in
the document, by location the document
components and application state components needed to
tapping the word to Object:  The word as the
execute those features. As mentioned earlier, the purpose of be moved and  The word to be new location
this study is to automatically build the knowledge base by continue by tapping moved reference exists in
formulating these components, based on natural language another word in the  The word as the new the document
processing to the text contained in the application user document that will location reference
manual. But to realize this, the process of identifying the be the purpose of
the move.
components that will be used as parts of the knowledge base
is done manually by us, in the first stage of this study. Thus, Move more than Move a series of words  The first word
we act as knowledge engineers who build the knowledge base one word to another to a new location exists in the
for the inference engine discussed in this study. In the final position in the Object: document
stages of the study, it will be outlined the process of document, by  The first word of a  The second word
double-tapping the series of word to be exists in the
automatically building a knowledge base for any new features
first word of the moved document
of the application prototype, by performing an analysis of the entire set of words  The second word  The word as the
text in the user manual. to be pinned,  The word as the new location reference
followed by location reference exists in the
Table I presents some examples of the natural language touching the last document
and state components needed to build the knowledge base. word in the series of  The order of the
Natural language components are divided into two large words to be moved first word and
groups, namely intent and object, while the state component and continued with, second word does
consists of only one type of precondition. Intent is a natural then ending with not matter
touching another  The word as the
language component that summarizes the features that the user word in the new location
wants to execute while the object is part of the data structure document that will reference, must
in the application that is affected by the implementation of the be the purpose of not be located
application feature. What is meant by "affected" is that the the move between the first
state of the object/application changes after the feature is word and the
implemented. While precondition is the state of the second word
application or of the object, which must be fulfilled by the
application/object, in order for the application features In this study, the process of pairing the above two things
(intent) can be executed successfully. is still done manually by the knowledge engineers. In future
Some features will be transformed into a single intent- works, algorithms and data structures will be researched and
object-state set, while some other features will be transformed developed to maximize the effectiveness and the efficiency of
into multiple intent-object-state sets. This needs to be done to the automatic process to pair the previously mentioned objects
ensure that each object in the intent-object-state set is a simple in the knowledge base. The algorithms and data structures that
object that no longer requires a selection process or a will be researched in future works, will be closely related to
branching process. Thus, the selection or branching process the various functions inside the application prototype and are
is done implicitly when the process of transforming natural published through its application programming interface
language instructions provided by the user into one of the (API).
various possible intent-object-state sets is done. The various To simplify the parsing process that will be done to the
sets of intent-object-state will be stored as parts of the instructions in the natural language provided by the user, the
knowledge base and each user instruction in natural language, text on how to use the application is arranged in a pre-
will be compared with various parts in the knowledge base, to standardized form and follows certain rules. Thus, in this
reach to conclusion of what to be done by the application. study, the user manual was compiled based on the rules that
In this study, both the intent and natural language we have established in the first place. In future works, this
instructions provided by user, are still assumed to be in assumption will be relaxed and the inference process will be

210 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

done against common and commercially available user the xlwings add-in for Microsoft Excel applications, program
manual. The simplification process is done to facilitate the codes in the Microsoft Excel spreadsheet application
discovery of simple and basic algorithms and data structures execution platform can call functions within the Python
for the initial stage of the study. programming language-based execution platform.
Putting these simplifications into considerations, the With this approach, the user can still use the spreadsheet
various features of the application prototype are described in interface of the Microsoft Excel application to type
the user manual by following some rules (only three of the instructions in natural language (utterance). Instructions in
complete six rules explored in the study, are explained here): that natural language will then be sent to the Python language-
 Rule I: To intent, tap on first-object and then tap on based execution platform. Furthermore, the spaCy library can
second-object. be used to perform natural language processing of the directed
Sample text: To move a word into a new location, tap on the instructions, and then generate intents and objects that will
word to be moved and then tap on the word as the new location be further processed by the application prototype, on the
reference. Microsoft Excel application execution platform. Furthermore,
the Microsoft Excel application execution platform will check
 Rule II: To intent, double-tap on object, and finally tap on
if the objects that have been found in the previous analysis
action-button.
process are in a state that corresponds to a precondition. If the
Sample text: To delete a word, double-tap on the word to be
precondition is met, then the instruction that corresponds to
deleted, and finally tap on the "delete word" button.
the intent-object-state set will be executed.
 Rule III: To intent, double-tap on object, then tap on first
action-button, and finally tap on second-action-button. TABLE II. SOME EXAMPLE OF UTTERANCES THAT CAN BE
Sample text: To remove a punctuation at the beginning of a RECOGNIZED AS INSTRUCTIONS BY THE APPLICATION PROTOTYPE
word, double-tap on the word, then tap on the “remove Intent Object Utterance
punctuation” button, and finally tap on the “beginning of Move a word to a  The word to be Move the first-word
word” button. new location moved before the second-
In this study, we use the programming library or spaCy  The word as the new word
library [10] which is a library that can be accessed using the location reference
Delete a word  The word to be Delete the word
Python programming language, to perform natural language deleted
processing. By using this library, the author attempts to Append a word  The word to be Append the word
recognize the instructions in the natural language delivered by with a period appended with a with a period
the user of the application. Introduction and understanding of period
instructions in natural language, done by comparing the Remove a  The word with the Remove the
punctuation at the punctuation to be punctuation at the
different language components in the instruction with the beginning of a word removed beginning of the
various language components contained in the utterance list word
contained in the knowledge base. In this study, the list of
utterances or list of commands that can be recognized by the
application prototype, is still manually created by the Based on various steps that have been explained
previously, we can formulate several models and template to
knowledge engineers.
generate the algorithms. Some of the templated are presented
Table II depicts some examples of utterances formulated
in Table III.
by the list of intent-object-state sets in the knowledge base.
Emphasized words in the utterance column are objects in the After the discovery process of various templates in the user
knowledge base and at the same time are the objects in the manual of the application prototype, the process of creating a
documents or in the application. All of these objects must be knowledge base can be continued automatically, especially to
a part of an utterance, if the utterance is going to be executed find the intent-object-state components. For example, if one
as an instruction of the application. of the features of the application execution platform is to
The process of introducing utterance using spaCy library, change the suffix of an Indonesian language word to 'me-',
can lead to three possibilities: then one can find in the user manual, the description of "To
change the suffix of a word to 'me-', double-tap on the word,
1. Utterance can be recognized as one of the instructions then tap on the 'change-affix' button, and finally tap on the
contained in the knowledge base 'me-' button'", it can be concluded that:
2. Utterance cannot be recognized as one of the instructions
1. The intent of the sentence is to change suffix of a word to
contained in the knowledge base
'me-'
3. Utterance can be recognized as one of the instructions
2. The object of the sentence is word
contained in the knowledge base, but there are object
3. The precondition state of the sentence is the word exists in
components that cannot be found inside the utterance.
the document and the suffix of the word is not 'me-'
In this study, the second and third possibilities were treated
4. Possible function call of the sentence is
equally, i.e. unrecognized instructions is treated as one of the
changeAffixToMe(word)
invalid utterances in the knowledge base.
We choose to combine the programming environment and To reach a conclusion like this one, the inference engine
application execution platform based on the Python must have the ability to identify the relevant components of
programming language with the programming environment each sentence analyzed in the user manual. In identifying the
and execution platform of the Microsoft Excel spreadsheet various constituent components of this sentence, the inference
application. To bridge both execution platforms, we choose to engine should pay attention to what components are generally
use the xlwings library [11]. By using the xlwings library and found when analyzing the entire text in the manual. In this

211 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

study, the algorithms and data structures for identifying for various office applications or even general computer
components that are generally found, have not been discussed applications, by using various kinds of user manuals, readily
in detail and will be left for future works. available in book stores.

TABLE III. INSTRUCTION RECOGNITION TEMPLATES IN NATURAL REFERENCES


LANGUAGE TO AUTOMATICALLY BUILD THE KNOWLEDGE BASE [1] S. Chong and R. Pucella, “A Framework for Creating Natural Language
User Interfaces for Action-Based Applications,” Third International
Component Value
AMAST Workshop on Algebraic Methods in Language Processing,
Sentence rule I To intent, tap on first-object and then tap on second- TWLT Report 21, pp. 83–98, 2003.
in the user object
manual [2] A.A. Razorenov and V.A. Fomichov, “The design of a natural language
interface for file system operations on the basis of a structured meanings
User manual To move a word into a new location, tap on the
model,” Procedia Computer Science, 31, pp. 1005–1011, 2014.
sentence I-A word to be moved and then tap on the word as the
new location reference [3] W. Wahlster, “Smartkom: Fusion and fission of speech, gestures, and
Intent-object- Intent-object-state no. 1 in Table II facial expressions,” 1st International Workshop on Man-Machine
state I-A Symbiotic Systems, pp. 213–225, 2002.
Function IA call moveAWord(first-word, second-word) [4] J.F. Quesada, D. Torre, and J.G. Amores, “Design of a Natural
Sentence rule To intent, double-tap on object, and finally tap on Command Language Dialogue System,” in Specification, Interaction
II in the user action-button and Reconfiguration in Dialogue Understanding Systems: IST-1999-
manual 10516, 2000.
User Manual To delete a word, double-tap on the word to be [5] Nuance Inc., “Dragon Anywhere - Professional-grade dictation on your
sentence II-A deleted, and finally tap on the "delete word" button mobile device,” video retrieved on December 15, 2019, from
Intent-object- Intent-object-state no. 2 in Table II https://m.youtube.com/watch?v=rnsqVawvuJU
state II-A [6] E. G. Devine, S. A. Gaehde, “Comparative Evaluation of Three
Function II-A deleteAWord(word) Continuous Speech Recognition Software Packages in the Generation
call of Medical Reports,” Journal of the American Medical Informatics
Sentence rule To intent, double-tap on object, then tap on first- Association, 7(5), pp. 462–468, 2000.
III in the user action-button, and finally tap on second-action- [7] A. Cherid, “Improving the Efficiency of Translated English-to-
manual button. Indonesian Document Editing with Touch Screen Interface and Suffix-
User Manual To append a word with a period, double-tap on the Recommendation System,” unpublished.
sentence III-A word, then tap on the "add punctuation" button, [8] A. Cherid, “English Sentence Construction Application with Microsoft
and finally tap on the "period" button Excel and VBA”, unpublished.
Intent-object- Intent-object-state no. 3 in Table II [9] A. Cherid, “Generating Academic Advisor Report with Screen Scraping
state III-A Technology and Visual Basic of Application from Microsoft Excel”,
Function III-A appendPeriod(word) unpublished.
call [10] Spacy.io, Industrial Strength Natural Language Processing in Python,
User Manual To remove a punctuation at the beginning of a website retrieved on September 4, 2021, from https://spacy.io
sentence III-B word, double-tap on the word, then tap on the [11] Xlwings.org, Automate Excel with Python (Open Source and Free),
"remove punctuation" button, and finally tap on website rretrieved September 4, 2021, from https://www.xlwings.org
the "beginning of word" button.
Intent-object- Intent-object-state no. 4 in Table II
state III-B
Function call removeOpeningPunctuation(word)
III-B

IV. CONCLUSIONS AND FUTURE WORKS


Based on the implementation of the solution proposed in
this study, it can be concluded that to allow a user to give
natural language instructions to execute a feature of an office
application, and to grow the list of natural language
instructions correctly understood by the inference engine of
the application, we need a growing knowledge base. The
knowledge base will contain sets of intent-object-state paired
with related function calls to the application programming
interface.
In this study, it has been elaborated that the process of
building a knowledge base can be done automatically.
However, the success of this process is still limited to the
process that heavily depends on the manual preparation that
precedes the building of the knowledge base itself. In addition,
the success is still limited to the office application prototype
specifically built for the purpose of this study.
In the future, it is necessary to research how to make the
process of building a knowledge base to be fully automated

212 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Aspect Based Sentiment Analysis: Restaurant


Online Review Platform in Indonesia with
Unsupervised Scraped Corpus in Indonesian
Language
Samuel Mahatmaputra Tedjojuwono Clement Neonardi
Business Information Systems Program, Business Information Systems Program,
Information Systems Department Information Systems Department
Faculty of Computing and Media Faculty of Computing and Media
Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia 11480 Jakarta, Indonesia 11480
* Corresponding author: [email protected] [email protected]

Abstract— The paper has designed a dynamic dashboard platform as according to survey done by Weiche, indicates
that will show a summarized information of restaurants in that more than 58% of consumers will be influenced by the
Indonesia on four distinct metrics which are Food, Service, reviews for their restaurant selection [2], and overall increased
Ambience and Covid Safety. Each metrics shown will have their in conversion rate due to positive reviews in Indonesia for
own ratings which shows the detailed score for each aspect of
restaurants by the presence application such as Tripadvisor
the restaurant. The data inside the dashboard have been
developed by using semi supervised learning of aspect-based stated by Sumarsono [3].
sentiment analysis approach. The idea is to analyze past However, due to the immense number of reviews available
reviews/comments of each restaurant in the current restaurant’s in these platforms especially Tripadvisor as one of the leading
online review platform and extract the sentiment as well as the
aspect of each of the reviews. The restaurant lists and the
online review platforms available have an average of 490
reviews have been collected through web scraping method on million unique monthly visitors and have accumulated over
one of the most used online review platforms in Indonesia which 730 million reviews [4], which may cause a hassle to its users
is Tripadvisor. Scraped data has been cleaned through several as they are forced to read huge collections of reviews to
process of data pre-processing by utilizing Sastrawi and NLTK understand clearly about the whole restaurant sentiments.
library for Indonesian languages. The machine learning tools
that will extract the aspect and sentiments in every of the Due to this myriad of online reviews written by past
reviews will be built by applying Monkeylearn machine learning consumer on the platform, users may feel tired to read all the
platform through APIs. Cleaned datasets have been imported reviews and prefers seeing the star ratings for easier
into the platform for data annotations of model training to understanding of the reviews especially in the restaurant
identify the set of words belongs in each aspect categories as well industry where there are several metrics that needs to be
as their sentiment values. Although after reaching the end of the measured such as the customer service, price and taste of the
analysis, this paper has concluded that accuracy of the analysis food itself. Thus, star ratings alone might not exactly reflect
may not be ideal due to lack of negative sentiment dataset being
the true meaning of the reviews as past consumer might gave
gathered which affects the model during the training process. In
conclusion, the feature has successfully been built and
a 1-star ratings for a bad customer experience although the
implemented as well as deployed into a web server which food may be great and vice versa.
supported by Ngrok services however, there are still more room This paper has proposed to create a feature that may help
for improvement regarding the analysis of the model.
user to search and learn about the restaurant information by
Keywords— Sentiment Analysis, Aspect Based, Semi- providing certain aspects ratings in four distinct metrics which
Supervised Data, Web Scraping, NLTK, Sastrawi, Monkeylearn are the Food, Service, Price and Covid Safety. This solution
will be created with the help of aspect-based sentiment
I. INTRODUCTION analysis technique to acquire the necessary data for the
A. Background development of the feature. Aspect Based Sentiment Analysis
is one of natural language processing technique that will help
As more people are using the internet throughout to give sentiment attribute to a certain aspects/topic that will
generations, online review platform has been heavily used by be extracted in the documents [5].
customer to review the product or places they wish to go and
the reviews available in the platform have become one of the
determining factors for customer decision as according to
Kaemingk and research done by Podium, 93% of customer are
influenced by online reviews for their purchasing behaviours
[1].
This paper will focus on the restaurant industry in
Indonesia specifically as the proposed customer review Fig. 1 Aspect Based Sentiment Analysis Example

978-1-6654-4002-8/21/$31.00 © 2021 IEEE


213 28 October 2021, Jakarta - Indonesia
2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Fig. 1. Shows a simple representation of Aspect-Based methods significant highlights of items evaluated by users, for
Sentiment Analysis where list of topics will be extracted from example, in the case of restaurant: food, service and ambience)
texts/documents. With the machine learning tools that will be The online reviews are a combination of positive and negative
built, the tools will later help to identify each aspect as well as reviews on various viewpoints. It needs more fine-grained
the sentiments within every user review in the online review examination of the online reviews to predict these
platform. The aspects that will be used by the paper are Food, conclusions. Hence aspect-based sentiment analysis is
Service, Ambience and Covid Safety. As due to the recent preferred in this paper.
global pandemic of Covid-19, information about safety
This paper would be using web scraping as its data
protocols of Covid-19 is an important information for users to
collection methods. Web scraping is a technique to extract or
know the situation in the restaurant they wish to go
parse files from HTML pages on the internet where it would
B. Scope be accomplished automatically by a web crawler [8]. This
As documents/corpus attained through web scraping will paper would be using an open-source python library,
be in Bahasa Indonesia, the data pre-processing stage such as BeautifulSoup4 to parse the collections of reviews available
stemming and stop words removal will be carried out using in Tripadvisor which contains the opinions about each of the
an open-source Indonesia’s language library, Sastrawi. As restaurants. Results of this web scraping will be stored in a csv
this paper focuses Aspect Based Sentiment Analysis, there files and would be the source documents for the aspect-based
will be two processes of classifiers, for the aspects/topics and sentiment analysis and this method was chosen so that reviews
the sentiment itself, aspects classifiers will be processed with that are being analyzed are relatable to the current condition
the help of TF-IDF and SVM algorithm while sentiment of the online review platform.
analysis will be classified using Naïve Bayes, SVM as well With vast amount of data gathered and extracted from the
as some linear regression algorithm with the help of World Wide Web, most of the data gathered will be
Monkeylearn platform. The results will be visualized in unstructured or semi-structed. These data may contain lots of
dynamic dashboard using Streamlit library and deployed with noises which might produce negative consequences in the
the help of Ngrok. classification stage [9]. Thus, data pre-processing needs to be
C. Aims and Benefits processed to transform the data into more structured form by
several process such as: tokenization, case folding, HTML tag
The aims of the paper are to design an aspect-based
removal, stop-word removal, stemming, correction of
sentiment analysis tool in the form of a dynamic dashboard
common misspelled words, and reduction of replicated
that will be able to discover and break down the aspects of
characters [10]. Case Folding is a process of converting all
restaurants given in the corpus which includes Ambience,
letters within the parameter into lower case letters as most text
Customer Service, Food Quality and Covid Safety.
in documents would have both capital and lower cases letter
Furthermore, this solution has been designed to reduce the
[11]. Stemming is the process of producing morphological
time consume while reading the reviews in the online review
variants of a root/base word [12]. This is an important part of
platform and allow users to effectively analyse the sentiments
pipelining in natural language processing where it would be
of the restaurant they wish to go and compare it with another
put into tokenized words after being stemmed. However,
restaurant in a single screen.
Indonesian language has a combination of suffixes and
D. Hypothesis prefixes which is differ from an English language where it
The goal from this paper is to design a solution to improve would mainly have a combination of 2 prefixes combined with
customer satisfaction towards the current online review the root form of a word and ended with 3 suffixes which make
platform as the paper has its own hypothesis regarding the it quite unique.
current situation where there is a lack of information for As this paper has chosen to utilize Monkeylearn analysis
restaurant ratings in online review platform, thus may fails platform as the foundation for classifier mechanisms, the
users’ expectation. There are too many reviews available in platform itself has utilized several algorithms which combined
the online review platform, thus wasting more time. There is to create the whole system. According to Monkeylearn itself,
no sentiment analysis for the reviews in online review SVM a supervised machine learning model that uses
platform, thus creating inaccuracies in overall rating. And classification algorithms for two-group classification
information regarding COVID-19 situation is not properly problems. After giving an SVM model sets of labelled training
mentioned, thus affecting user’s demand. data for each category, they’re able to categorize new text.
Classification method will mainly be using bag of words
II. THEORETICAL FOUNDATION
method and the kernel function of the SVM, Monkeylearn will
According to Bing Liu [6] and ABSA project done by stick to the good old linear kernel [13].
Chinsa [7] ABSA/ involves the space of characteristic
language processing, computational phonetics and Additionally, Monkeylearn has also used other algorithms
information mining, ABSA should be possible at different to support its platform by implementing naïve bayes theorem
levels, which is document level, sentence level and to help with the probabilistic modelling algorithm which helps
topics/aspects level. In document level, generally assessment the machine to predict certain words that has been
extremity of the record is determined and delegated positive tagged/annotated previously [14]. According to Berrar, Naïve
or negative. In sentence level, each sentence in the archive is Bayes is of fundamental importance for inferential statistics
examined and decide the sentiment classified in a sentence as and many advanced machine learning models. Bayesian
certain, negative, or unbiased. In ABSA, the term perspective reasoning is a logical approach to updating the probability of
hypotheses in the light of new evidence, and it therefore

214 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

rightly plays a pivotal role in science, where it allows users to Additionally, more than 70% of the responded strongly
answer questions for which frequentist statistical approaches agreed to the idea of the paper to provide categorical
were not developed [15]. Essentially, this is how Bayes’ information in the reviews section so that it will be more
theorem works. The probability of A, if B is true, is equal to specific and detailed. Figure 2 has also shown that 74% of the
the probability of B, if A is true, times the probability of A respondents strongly agreed that they might possibly use a
being true, divided by the probability of B being true: new online review platform that provided ratings of a
restaurants based on categories of Food, Service, Ambience,
Covid Safety.
[14]
With the combination of those algorithms, this paper could
use that machine to build a model for aspect modelling,
sentiment modelling by annotating datasets/corpus gained by
the web scraping method which will be used as the training
datasets to create a smarter model that could predict and
models the rest of the testing datasets. Furthermore,
classification process will be done in the same environment by
applying Monkeylearn API to python notebook and classify
every of the remaining testing datasets.
Fig. 2 Whether User will use a platform that can provide ratings based on
specific categories of a restaurant
III. PROBLEM ANALYSIS
To gain information about the customer behaviours IV. PROPOSED DESIGN SOLUTION
towards the current online review platform as well as the
proposed solution of this paper, surveys have been conducted The purpose of this analysis is to provide users of online
using Bahasa Indonesia as it targeted mainly for Indonesian review platform for restaurant a better overview of the
users with google forms as its medium of questionnaires. As restaurant specification which would increases customer
this paper has decided to apply clustered sampling method for satisfaction and experiences when looking for restaurants’
the surveys, links of the forms have been spread out recommendation. The solution is targeted to all loyal
throughout social media such as Line and Whatsapp. As it is customers as well as future leads of the online review
semi random sampling method, there are some online groups platforms. This solution would help to solve the current
dedicated for food enthusiasts that received the link as well for possible problems of user experiences when using the top
better representation of the customer behaviors. The surveys online review platforms in Indonesia.
were conducted for 3 days and it received a total of 116
respondent. This information is mainly will be used to support
the hypothesis of the paper regarding the possible problem of
the current online review platforms.
A. Survey Questions
There are a total of 12 questions in the survey which
consisted of questions such as the importance of covid 19
information for a restaurant, types of information that users
are seeking when using online review platforms, possibility of
failed expectation from ratings to real experiences, possibility
of a rise in time consumed due to immense number of reviews
and questions regarding the respondent take on the proposed
solution of the paper. The answer to every question will be in
linear scale thus the author could get a better grasp regarding Fig. 3 Customer Journey Map
what the users wanted for better experience while using
services for restaurant online reviews and to see whether it is Fig. 3 shows that users are likely to be unhappy when they
true and if the respondents agreed that there is a need of are trying to analyze the restaurants reviews and evaluating
solution to the current situations. their opinion towards the restaurants, thus this analysis will
be processed on solving those problems which are by
B. Survey Reults and Analysis providing a categorical detailed information of a restaurants
After conducting the surveys, the paper has come into which consist of: Food, Ambience, Service and Covid-19
conclusion that the initial hypothesis is indeed supported by safety protocols/cleanliness with a sentiment attached to it
the respondent responses as 70% of strongly agreed that based on the reviews written by past users.
information about Covid-19 protocols in restaurants are
important and also agreed that there is some contradictions Additionally, Fig. 4 below has shown the design
between the star ratings with the actual sentiment of the processes of this project, where web scraping will act as the
content itself, more than 52% of respondents also agreed that data collection part and data will be clean through data pre-
it has become inefficient to read the reviews in the online processing process, then an aspect model will be built to
review platforms as there are too many reviews available and model the topic in the data to predicts the aspects within a
it consumes more time as well. document and will be classified to their topics accordingly.

215 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Sentiment model will also be built to detect sentiment accuracy. The reason of why this paper has chosen these
polarity within a text which it will then be used to predicts the particular libraries is due to lack of other viability library or
rest of the testing datasets and classify their sentiments open-source project that may can be used for this paper and as
accordingly. After each sentence have been classified to their there are time and knowledge constraints in the creation of this
topics and sentiments, polarity analysis process will be paper, an open-source library is the best tools that is available
implemented to determine the calculation of the sentiments
For aspect modelling and classification process, this paper
within each aspect which the results will be visualized and
has utilised Monkeylearn platform for assisting in creating
deploy into web application to be access by the user.
semantic aspects within the texts in the corpus. As
Monkeylearn provided trial queries for model building and
Web Scraping Aspect Classification Build Sentiment Model
data annotations, this paper has connected the API into the
project notebook and will be used as a prediction and
Data Pre-processing Aspect Prediction Sentiment Modelling Polarity Analysis Data Visualization Deploy into Web App classification model. Below is the step-by-step process of
aspect creation until aspect classification by utilizing
Build Aspect Model Aspect Modelling Sentiment Prediction
Sentiment Monkeylearn APIs Platform:
Classification

1. Create a classifier model.


2. Create Aspect Tags
Fig. 4 Solution Design 3. Upload Data to the model to be trained.
4. Do data annotations/tagging.
V. SOLUTION IMPLEMENTATION 5. Build the model until >70% Overall Accuracy.
For the first stage of the solution, data should be gathered 6. Deploy and Utilised the Model that has been built.
as the source of the database which will be attained by web 7. Predict the aspect of testing dataset.
scraping method using BeautifulSoup4 python library, and the 8. Data Cleaning Process
web targeted as the source of the customer reviews will be 9. Data Classification Process
Tripadvisor for restaurant. 10. Save the labelled datasets.

Table. 1 Aspect Model Specification

Aspect Number
Precision Recall
Type of data
Ambience 79% 98% 106 Text
Service 84% 97% 117 Text
Food 76% 89% 98 Text
Covid
96% 46% 35 Text
Fig. 5 Web Scraping Result Safety

Fig. 5 shows the result of the web scraping method, data Table.1 shows the aspect model statistics acquired through the
scraped consists of restaurant name (column 1), reviewer’s data annotations process on the testing dataset. The next step
name (column 2), comment reviews (column 3) and ratings process is to predict the rest of the datasets by using this model
result (column 4). However, not every data will be used in as the reference for the aspect modelling stage.
this project as the main objectives are restaurant review
analysis, the main data that will be used are the restaurant
name that acts as the unique master data and the customer
reviews.

For data pre-processing process, this paper has utilised the


python library, Sastrawi as one of the functional Indonesian
python libraries available as well as combining NLTK
functions with the process for better results.

Fig. 7 Aspect Modelling Results after several cleaning processes

Fig. 6 Data Pre-processing Results Fig. 7 shows the result of the aspect modelling and
classification, where comments/reviews from restaurant can
Fig. 6 shows the result of the data pre-processing be predicted for its aspect/labels with certain levels of
technique where it has cleaned all the unnecessary data in the confidence. The reason of why this paper has chosen to use
corpus which would help in the model building stage for better Monkeylearn, is due to lack of semantic valuations for

216 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Indonesian words available and it would take a very long time Fig. 9 shows the results of the whole aspect-based
to create one thus, using Monkeylearn to speed up the process sentiment analysis however, it still needed to be analyzed for
of creating data semantics dictionaries. Although there are its polarity calculation.
some constraints of using Monkeylearn such as limited
queries and possible inaccuracies due to small model type, For the visualization, this paper will be utilizing Streamlit
however, the results are still positive. library as the source code to create the dynamic dashboard to
visualize the results of the analysis. As Streamlit can only
The next step will be the sentiment prediction and visualize data in a localhost, the page needed to be deployed
classification stage where data gained from the previous into world-wide server so that it can be utilized everywhere
aspect classification will be used as the source however, As by the users, thus this paper will be using Ngrok library to
this paper is only using the trial version of the Monkeylearn conduct the deployment. However, Ngrok itself is not a
platform, certain features are limited such that the model can domain and will not be providing domain server behind the
only understand words in a single column however, as dataset Web Page. Below is the result of the visualization and
from the aspect prediction have two columns which consists deployment stage:
of the texts and the aspect, the solution to the problem is to
combined both the column with a delimiter of semicolon
which would trick the model to think that the texts is only one
however each of them will have unique words within it that
would create a distinction between the sentences but with
different aspects.
Below is the step-by-step process of sentiment prediction
until sentiment classification by utilizing Monkeylearn APIs
Platform:
1. Prepare the dataset required for sentiment prediction
processes.
2. Create a sentiment analysis classifier model.
3. Upload Data to the model to be trained.
4. Do data annotations/tagging.
5. Build the model until >70% Overall Accuracy. Fig. 10 Dynamic Dashboard Hover Function
6. Deploy and Utilised the Model that has been built.
7. Predict the sentiment of testing dataset. Fig. 10 is the visualization of the whole aspect-based
8. Data Cleaning Process sentiment analysis results. Restaurant Name will consist of
9. Data Classification Process every restaurant available in the excel database while Value
10. Save the labelled datasets. is the result of the ratings out where 1 = the highest results
and 0.1 being the lowest rating. In caveat, if there are no
ratings available the value will be 0.
VI. TESTING AND EVALUATION
Table. 2 Test Data Texts

Test Data Text


ID
TD001 “Makanan disini enak”
Fig. 8 Sentiment Model Specification
TD002 “Servis disini bagus”
TD003 “Makanan disini tidak enak”
Fig. 8 shows the sentiment model statistics acquired TD004 “Suasananya bagus dan makanan disini
through the data annotations process on the testing dataset. enak”
The next step process is to predict the rest of the datasets by TD005 “Servis pelayanannya bagus dan protocol
using this model as the reference for the sentiment prediction kesehatanny sangat dijaga”
and classification. TD006 “Suasananya bagus tapi makanan disini tidak
enak”

Table. 3 Types of Modules for Testing Scenarios

Module 1 Short Texts, 1 aspects/sentiment, (Positive +


Negative).
Module 2 Long Texts, 2 aspects. (Text: “
Module 3 Long Texts, 2 aspects + sentiment, (Positive
+ Negative)

Fig. 9 Sentiment Prediction Results after several cleaning processes.

217 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Table. 4 Testing Results contract number: No.017/VR.RTT/III/2021 and contract


date: 22 March 2021.
Test Expected Status
Data
Module
Result
Actual Result
(Pass/Fail)
REFERENCES
TD001 1 Food Food Pass
TD002 1 Service Service Pass [1] D. Kaemingk, "Online reviews statistics to know in
TD001 1 Positive Positive Pass 2021," Qualtrics, 30 October 2020. [Online].
TD003 1 Negative Negative Pass Available: https://www.qualtrics.com/blog/online-
TD004 2 Ambience;Food Ambience;Food Pass
review-stats/.
Covid Safety; Covid Safety;
TD005 2 Pass
Service Service [2] A. Weice, "Online Reviews Study: Restaurants &
Ambience Reviews," Gatherup, 05 November 2018. [Online].
Ambience
(Positive); Available: https://gatherup.com/blog/online-reviews-
TD006 3 (Positive); Fail
Food
Food (Positive) study-restaurants-reviews/.
(Negative)
[3] S. D, "The influence of TripAdvisor application usage
These test cases have been created to test the functionality of towards hotel occupancy rate in Solo," Journal of
the analysis itself and there will be a total of 7 test cases that Physics: Conference Series, 2019.
will be conducted. The main metrics that will be tested are [4] M. Bassig, "Tripadvisor Review Analysis: Stats Your
the confidence level of the ABSA Machine itself and will be Business Should Know," Review Trackers, 21 March
divided in to three different modules as shown in Table. 3 and 2019. [Online]. Available:
following the test data in Table. 2. https://www.reviewtrackers.com/blog/tripadvisor-
review-analysis/.
VII. CONCLUSION AND RECCOMENDATION [5] E. H. A. A. S. S. P. R. R. S. Utami, " Formal and Non-Formal
Indonesian Word Usage Frequency in Twitter Profile Using Non-
A. Conclusion Formal Affix Rule," 1st International Conference on Cybernetics
This paper has successfully created the dynamic and Intelligent System (ICORIS), vol. 1, pp. 173-176, 2019.
dashboard feature that will show the four main aspects of [6] B. Liu, "Sentiment Analysis and Opinion Mining,"
each restaurant by applying an aspect-based sentiment Synthesis Lectures on Human Language Technologies
analysis approach according to the scope of the paper. The 5, pp. 1-167, 2012.
dashboard has also been successfully deployed into a web [7] C. T. C, "Aspect based Opinion Mining from
server which will allows users all over the globe to access the Restaurant Reviews," International Journal of
platform. The author has also purposely created two search Computer Applications, 2014.
engines in a single screen for easier comparison as stated in
the aims and benefits. Although there may be some obstacles [8] B. Zhao, "Web Scraping," Springer International
and hurdles which includes an inaccuracy of the analysis due Publishing AG, 2017.
to lack of negative semantic texts on the training data, the [9] M. HaCohen-Kerner, "The influence of preprocessing
final results have reflected the main objectives of the paper. on text classification using a bag-of-words
representation," pp. 1-6, 2020.
B. Recommendation
[10] L. Garcia, "Data Preprocessing," Intelligent Systems
The author recommends for future researcher to utilize Reference Library, vol. 72, 2015.
the full version of Monkeylearn if they wish to use the
platform because as seen in the paper, the trial version only [11] P. K, "Comparison Of Classification Methods On
provide limited features and some tweaking had to be done Sentiment Analysis Of Political Figure Electability
during the sentiment annotations process such as giving Based On Public Comments On Online News Media Sites,"
IOP Conference Series: Materials Science and Engineering, pp. 4-
delimiting each sentence with semicolon of their aspects to 8, 2019.
mirror the full version features of the platform. Additionally,
[12] S. Jain, "Introduction to Stemming," Geeksforgeeks,
the author would suggest adding automatic web scraping
17 September 2020. [Online]. Available:
features onto the machine so that every day the result could
https://www.geeksforgeeks.org/introduction-to-
be parallel to the current reviews in Tripadvisor. Moreover,
stemming/. [Accessed 10 March 2021].
larger negative sentiment datasets are recommended for
future projects on this topic especially in the negative [13] "Sentiment Analysis: A Definitive Guide,"
semantic values dataset to improve the tools of identifying Monkeylearn, [Online]. Available:
negative reviews. https://monkeylearn.com/sentiment-analysis/.
[Accessed 10 July 2021].
ACKNOWLEDGMENT [14] "A practical explanation of Naive Bayes Classifier,"
This work is supported by Research and Technology Monkeylearn, [Online]. Available:
Transfer Office, Bina Nusantara University as a part of Bina https://monkeylearn.com/blog/practical-explanation-
Nusantara University’s International Research Grant entitled naive-bayes-classifier/. [Accessed 10 July 2021].
“Computational Intelligence and Advanced Predictive Data [15] D. Berrar, "Bayes’ Theorem and Naive Bayes
Analytics of Covid-19 Transmission Dynamics and Classifier," Encyclopedia of Bioinformatics and
Automated Detection of Health Protocol Compliance” with Computational Biology, vol. 1, pp. 403-412, 2018.

218 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

A Review of Signature Recognition Using Machine


Learning
Elizabeth Ann Soelistio Rafael Edwin Hananto Kusumo Zevira Varies Martan
School of Computer Science School of Computer Science School of Computer Science
Bina Nusantara University Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia 11480 Jakarta, Indonesia 11480 Jakarta, Indonesia 11480
[email protected] [email protected] [email protected]

Edy Irwansyah
School of Computer Science
Bina Nusantara University
Jakarta, Indonesia 11480
[email protected]

Abstract—Signatures have been used for years for requirements. Additionally, results from literature reviews
transactions and consenting to responsibilities. Yet, online or and research questions will be reported in the results section.
offline, signatures can easily be falsified as there are no security This paper concludes by summarizing all the findings of all
measures in place to prevent this. Numerous researches have discussions and providing references to the literature
been carried out to find the most accurate and reliable signature
reviewed during its preparation.
recognition and verification system. This study examines the two
problems previously mentioned. A primary goal of this study is II. THEORITICAL BACKGROUND
to determine the best algorithms for recognizing signatures
based on the signature type. This systematic literature review is A. Offline Signature Recognition Algorithm
conducted using a PRISMA flow diagram. The results indicate
Offline signature faces difficulty in recognition and
that offline signatures mostly use Convolutional Neural
verification because of its non-digitized format. However,
Networks (CNN) for their recognition, while online signatures
use Recurrent Neural Networks (RNN) with other architectures.
there is much research related to offline signature using
models of machine learning and deep learning. Thus, all these
Keywords—Signature Recognition, Offline Signature, Online models will be explained respectively with its development
Signature, Machine Learning, Handwritten Signature and results.

I. INTRODUCTION • Convolutional Neural Network (CNN)


Signature is one of the most common things used in Convolutional Neural Network (CNN) is one of the
different transactions from the government to marketplace. commonly used methods in computer vision for
As one of the widely accepted forms of transactions publicly, image classification [1]. It shows as a research have
it is easy to forge an individual's signature and misuse it. A achieved an accuracy of 85-89% in forgery detection
and 90-94% signature recognition [2] while another
preventative measure taken is to identify the signature and
research achieves an accuracy of 98.8% [3].
verify it with a system that will give an answer if it is a forged
GoogLeNet architecture Inception-v1 and Inception-
or genuine signature. v3 have also been tested as it uses CNN model, and it
Researchers have been working on a preventative achieved an accurate validation of 83% and 75%
measure for a long time. Several approaches such as machine respectively [4]. By adding normalization, a research
learning and deep learning are being utilized to solve this has also achieved an accuracy of 96.41% and 98.30%
problem. In order to recognize and verify offline signatures, [5]. Another research uses CNN model with
they must be scanned and adjusted before being trained. approaches of Support Vector Machine (SVM) and
While online signature only needs to be adjusted in some Radial Basis Function Network (RBFN) achieve an
parts. In both cases, there are many different schemes that use accuracy of 96.6% [6]. When an extra feature
different approaches or combine two or more schemes. extraction method is added, CNN models can achieve
Nevertheless, consistency has its challenges since a 62.5% of success in Writer Independent model and
everyone's signature is affected by external factors such as a 75% success in Writer Dependent model [7].
mood, environment, and many more. Concerns with the DCNN, an improved Convolutional Neural Network
algorithms include the output accuracy, the time required for model, achieved a 100% image recognition rate [8]
the process, and the performance in real-life testing. The and was claimed the most powerful CNN model for
objective of this study is to determine the best model for both image recognition.
online and offline signature identification and verification.
This paper is structured as follows: it begins with a topic • Probabilistic Neural Network (PNN)
introduction, then it reviews literature to analyze other Probabilistic Neural Network (PNN) is often used for
approaches addressing the same problem, followed by a classification problems and follows the rules of
methodology section that explains the manner in which Bayes’ theorem in decision making [9]. PNN can be
literature was reviewed with a scope corresponding to the

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

219 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

used with Wavelet Transform Average Framing verification model, it achieves an equal error rate
Entropy (AFE) as done by a research. This resulted in (EER) of 2.03% in a random forgery test [17]. The
an accurate recognition of 92% and verification of lower EER means the model’s accuracy is better.
83% [9].
• Relief Algorithm
• K-Nearest Neighbor (k-NN)
Relief algorithm is a filter-method towards feature
K-Nearest Neighbor is a classification algorithm that selection that is sensitive to interactions [18]. A model
assigns a class to the tested image based on its feature research achieves an average error rate of 5.31% in
values [10]. This research paper adds a feature of local false positive, meaning the model approves of the
binary pattern that allows extraction of texture-based signature as genuine instead of forged [18].
features and is closely related to face recognition [10].
The highest accuracy by using this model is 84.29%. III. METHODOLOGY
This systematic literature review uses PRISMA Flow
• Efficient Fuzzy Kohonen Clustering Network
Diagram. The authors use this because it improves the
(EFKCN)
reporting systematic reviews that are relevant to the topic of
Efficient Fuzzy Kohonen Clustering Network the literature review. The making of this literature review can
(EFKCN) is mostly used for classification by using be seen below:
data clustering [11]. The result of this research shows
70% accuracy in recognition compared to its previous
results of 53% [11].
• Artificial Neural Network (ANN)
ANN is widely used in pattern recognition as it is
powerful and easy to use. Features from the signature
can be selected based on the pattern recognized, then
it will learn the correlation between the classes and
signatures [12]. The research done by using this
algorithm has achieved a success rate of 95% on
average in recognition and verification system [12].
This is done by using MATLAB to design the system.
• Support Vector Machine (SVM)
This research is done by using a multiclass SVM
classifier on radial basis function (RBF) for training
and testing. The result for SIFT with SVM-RBF
achieved an accuracy of 98.75% while SURF with
SVM-RBF achieved 96.25% [13]. [14] uses SVM
with AlexNet as feature extraction and achieved 80%-
100% accuracy in signature recognition for different
datasets.
B. Online Signature Recognition Algorithm
Online signature is widely used in authentication
Figure 1. Prisma Flowchart
technology by using a touchscreen input interface. The
growing popularity in online signatures shows that people are All papers in this review must be written in English and
still familiar with the offline way of signing while putting a must have the following topics: Handwritten Signature
disadvantage at forgery that will attempt to imitate the Recognition, Offline Signature Recognition, Online Signature
signature. The algorithm used for this signature recognition Recognition, or Machine Learning. A second requirement is
and verification vary from Long-term Recurrent that papers have to be published within the year of 2017 up to
Convolutional Network (LRCN) to Weighting Relief 2021. Papers are included or excluded based on their
algorithm. The two models will be explained with its relevance to the topic that is being reviewed, and papers are
development and results. excluded for non-English language, duplicates, and
• Recurrent Neural Network (RNN) unsubstantiated studies. Included and excluded papers were
selected in accordance to these criteria for use in the review
RNN is mostly used in sequential modeling and can paper.
handle a large dataset. A research has improved this
model and shows a 2.37% equal error rate (EER) [15] Records are selected based on 4 steps, which are:
while another research results in below 2.0% EER 1. A database search was conducted using Google
[16]. Scholar, IEEE, Springer Link, and other libraries.
The keywords used to search were (1) Offline
• Long-term Recurrent Convolutional Network Signature Recognition, (2) Online Signature
(LRCN)
Recognition, (3) Handwritten Signature
LRCN is commonly used for visual recognition and Recognition, and (4) Handwritten Signature
description [17]. By using LRCN as a recognition and Biometrics.

220 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

2. The screening process eliminates duplicates and Table 2. Publication of algorithm used for online signature.
journal articles that don't correspond to the topic at
Signature Algorithm Number of Paper ID
hand. Type Paper
3. The introduction and abstract are read to determine
eligibility; it eliminates records that are irrelevant to CNN 10 [1], [2], [4],
[5], [7], [8],
the research questions. [20], [21], [22]
4. Included journals are chosen if they provide key or [36]
supporting information to help in the preparation of CNN + SVM 5 [3], [6], [19],
the review paper. [23], [24]
Offline PNN 1 [9]
A. Data extraction and synthesis
KNN 1 [10]
The data are extracted by reading the full text of the journal
then analyzed according to the relevancy for the purpose of EFKCN 1 [11]
writing this paper. The extracted items are: ID, Reference, ANN 2 [12], [36]
Context, Methodologies, and Topic. These are the description
of extracted items. SVM 2 [13], [14]

Table 1. Data Extraction of Each Study


Based on the information extracted, Convolutional Neural
Extracted Data Description
Network (CNN) is the most used algorithm for offline
ID Unique identity of each record. signature recognition and verification. Convolutional Neural
Reference Authors, year of publication, title of
Network is effective in system recognizing because it extracts
the record, and publication location of relevant data for classifications [2]. Many researches use
the reviewed journals. variation of multilayer perceptron (MLP) and introducing
Context The purpose and context of the paper
feature extractions such as LS2Net [5]. An improvement of
written. this model if Deep Convolutional Neural Network (DCNN)
using AlexNet and VGG16 for feature extractions which
Methodologies The methods, models algorithms, and
dataset used in the writing process. deliver a perfect result of 100% accuracy [8]. The dataset used
in [8] is a personal dataset of 600 signatures. However, this
Topic The topic related to the writing of the shows that Convolutional Neural Network (CNN) presents
records.
good results in recognizing signatures and delivering
accuracy.
IV. RESULT
Table 3. Publication of algorithm used for online signature.
The results presented are based on the data extraction
performed previously and the items listed on the previous Signature Algorithm Number of Paper ID
section. Data extraction is done to obtain results of critical Type Paper
literature review based on research questions determined RNN 6 [15], [16], [25],
below. [26], [27], [28]
A. What challenges does signature recognition face? LRCN 1 [17]
Signature recognition faces many challenges in real life Relief 1 [18]
because of the inconsistency of the signatures from time to Algorithm
time [19]. This inconsistency depends on the environment Online
CNN 2 [29], [30]
and human behavior at the time, thus making it a challenge to
create a model to differentiate a genuine and skilled forgery K-ANN 1 [31]
signature. This problem can lead to either a strict model that ANN 1 [32]
only allows the exact same of signature to be recognized as
DTW Cost 2 [33], [34]
genuine, or a model that is too lenient that allows different Matrix
signature to be labeled as genuine.
Furthermore, information loss is another challenge offline
signature recognition must face because of the digitalization Meanwhile, offline signature recognition mostly uses
process of the signature [1]. The signature must be scanned Recurrent Neural Network (RNN). The RNN architecture falls
before it can become a digital image. This create a loss in under machine learning whereas CNN is a deep learning
dynamic information such as position and velocity of the pen, architecture. The difference between CNN and RNN is the
pressure and stroke of the signature, which can determine the learning techniques each uses to learn the pattern. CNN model
genuineness of a signature. usually uses a supervised model while RNN has the ability to
learn in both supervised and semi-supervised methods [35].
B. What are the most used algorithm for signature Thus, if the method and combination are used properly, RNN
recognition? can produce an acceptable result.
The table below contains extractions information from 34
selected papers and journals and 3 review papers. This
information is divided by the type of signatures and extracted
based on the first research question.

221 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

C. What is the most used feature extractions in CEDAR dataset [20], and [8] using AlexNet and VGG16
Convolutional Neural Network (CNN) for offline feature extractions. The lowest accuracy with 75% are
signature? extracted using GoogLeNet Inception-v3 with GPDS dataset
Feature extractions is the backbone of these algorithm [4].
used for recognizing the signatures. The process increases the
V. CONCLUSION
efficiency of processing without losing any relevant
information while simultaneously reducing the amount of This systematic literature reviews observes and identify
redundancy [36]. Nevertheless, not all algorithms use feature the implemented methods and models that are used for
extractions [37]. This is feature extraction from offline signature recognition and verification. The studies used in the
signatures with the CNN algorithm. writing of this paper are selected based on PRISMA Flow
Diagram with an initial start of 51 studies collected from
Table 4. Feature extraction methods for convolutional neural online databases to 37 relevant studies that are selected to be
network (CNN) algorithm in offline signature recognition. analyzed further. The aim of this systematic literature review
is to determine which algorithms are commonly used for
Algorith Paper Feature Dataset Result offline and online signature recognition. This study focuses on
m ID Extraction
two recognition algorithms, one for each type of signature,
SigNet-F GPDS-160 EER = 1.72% which are Convolutional Neural Network (CNN) for offline
(SVM)
CEDAR EER = 4.63% and Recurrent Neural Network (RNN) for online signature.
[1]
Based on the offline signature recognition, CNN is
MYCT EER = 2.87%
commonly used across different datasets and feature
SigNet Brazilian EER = 2.01% extractions. CNN has been designed to run task for visual
PUC-PR
recognition in computer vision, thus able to perform well in
[2] Crest-Trough 1320 ACC = 94% offline signature since it is feed with images that have either
Algorithm pictures been preprocessed or not.
GoogLeNet ACC = 83% Nevertheless, RNN is mostly used for online signature
Inception-v1 recognition. Since RNN is a machine learning, it will need
[4] GPDS
Convolut GoogLeNet ACC = 75% time to learn and classify the dataset. However, it still gives an
ional Inception-v3 acceptable result depending on the combination used.
Neural
Network MYCT ACC = 96.4% From the analysis of the eligible studies, there are
(CNN) [5] LS2Net
important factors in implementing offline and online signature
CEDAR ACC = 98.30%
recognition in real environments. It must be fulfilled by
LS2Net-v2 GPDS-4000 ACC = 96.91% algorithm equipped to handle a large-scale and noisy dataset.
[8] AlexNet 600 ACC = 100% This algorithm must be tested in real scenarios before
signatures implemented into the real environments. This must be done to
VGG16 ACC = 100%
avoid having miscalculation in the real environment. In
CEDAR ACC = 100% addition, for offline signature type, it must have a standardized
BHSig260 ACC = 97.77% image quality to achieve the same effectiveness for the feature
(in Bengali) extraction method.
[20] InceptionSVG
Net BHSig260 ACC = 95.40%
(GoogLeNet (in Hindi) REFERENCES
Inception-v1) [1] L. G. Hafemann, R. Sabourin, and L. S. Oliveira, “Learning features
Persian ACC = 80.44%
for offline handwritten signature verification using deep convolutional
UTSig
neural networks,” in Pattern Recognit., vol. 70, pp. 163–176, 2017, doi:
[21] ConvNet 300 ACC = 99.7% 10.1016/j.patcog.2017.05.012.
(CNNs) signatures [2] J. Poddar, V. Parikh, and S. K. Bharti, “Offline Signature Recognition
and Forgery Detection using Deep Learning,” in Procedia Comput.
Sci., vol. 170, no. 2019, pp. 610–617, 2020, doi:
Equal error rate (EER) represents a threshold value in 10.1016/j.procs.2020.03.133.
comparison to false acceptance rate (FAR) and false rejection [3] V. L. F. Souza, A. L. I. Oliveira, and R. Sabourin, “A writer-
rate (FRR) [35]. The threshold is commonly used to compare independent approach for offline signature verification using deep
convolutional neural networks features,” in Proc. - 2018 Brazilian
and evaluate biometric authentication systems [35]. The Conf. Intell. Syst. BRACIS 2018, pp. 212–217, 2018, doi:
decreasing of EER values, the model becomes more accurate. 10.1109/BRACIS.2018.00044.
Based on accuracy (ACC), the system can identify whether a [4] Jahandad, S. M. Sam, K. Kamardin, N. N. Amir Sjarif, and N.
signature is fake or authentic. For ACC, the highest number Mohamed, “Offline signature verification using deep learning
is preferred as it displays the level of accuracy of the model. convolutional Neural network (CNN) architectures GoogLeNet
inception-v1 and inception-v3,” in Procedia Comput. Sci., vol. 161, pp.
CNNs use feature extraction and classification as part of 475–483, 2019, doi: 10.1016/j.procs.2019.11.147.
their architectures. Features can be extracted from a dataset [5] N. Çalik, O. C. Kurban, A. R. Yilmaz, T. Yildirim, and L. Durak Ata,
in order to reduce the number of features, while classification “Large-scale offline signature recognition via deep neural networks
is the likelihood that an input will lead to an outcome. and feature embedding,” in Neurocomputing, vol. 359, pp. 1–14, 2019,
doi: 10.1016/j.neucom.2019.03.027.
Furthermore, the difference in feature extraction can lead to
[6] M. Hanmandlu, A. B. Sronothara, and S. Vasikarla, “Deep Learning
accuracy and overfitting improvements depending on the based Offline Signature Verification,” in 2018 9th IEEE Annu.
complexity of the datasets tested. The highest accuracy Ubiquitous Comput. Electron. Mob. Commun. Conf. UEMCON 2018,
achieved using CNN is 100% with InceptionSVGNet with pp. 732–737, 2018, doi: 10.1109/UEMCON.2018.8796678.

222 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

[7] M. M. Yapici, A. Tekerek, and N. Topaloglu, “Convolutional Neural CNNs,” in Expert Syst. Appl., vol. 133, pp. 317–330, 2019, doi:
Network Based Offline Signature Verification Application,” in Int. 10.1016/j.eswa.2019.03.040.
Congr. Big Data, Deep Learn. Fight. Cyber Terror. IBIGDELFT 2018 [23] T. Younesian, S. Masoudnia, R. Hosseini, and B. N. Araabi, “Active
- Proc., no. November 2019, pp. 30–34, 2019, doi: Transfer Learning for Persian Offline Signature Verification,” in 4th
10.1109/IBIGDELFT.2018.8625290. Int. Conf. Pattern Recognit. Image Anal. IPRIA 2019, pp. 234–239,
[8] A. Hirunyawanakul, S. Bunrit, N. Kerdprasop, and K. Kerdprasop, 2019, doi: 10.1109/PRIA.2019.8786013.
“Deep Learning Technique for Improving the Recognition of [24] S. V. Bonde, P. Narwade, and R. Sawant, “Offline Signature
Handwritten Signature,” in Int. J. Inf. Electron. Eng, vol. 9, no. 4, 2019, Verification Using Convolutional Neural Network,” in 2020 6th Int.
doi: 10.18178/ijiee.2019.9.4.709. Conf. Signal Process. Commun. ICSC 2020, pp. 119–127, 2020, doi:
[9] K. Daqrouq, H. Sweidan, A. Balamesh, and M. N. Ajour, “Off-line 10.1109/ICSC48311.2020.9182727.
handwritten signature recognition by wavelet entropy and neural [25] R. Tolosana, R. Vera-Rodriguez, J. Fierrez, and J. Ortega-Garcia,
network,” in Entropy, vol. 19, no. 6, 2017, doi: 10.3390/e19060252. “Exploring Recurrent Neural Networks for On-Line Handwritten
[10] T. Jadhav, “Handwritten Signature Verification using Local Binary Signature Biometrics,” in IEEE Access, vol. 6, pp. 5128–5138, 2018,
Pattern Features and KNN,” in Int. Res. J. Eng. Technol., vol. 6, no. 4, doi: 10.1109/ACCESS.2018.2793966.
pp. 579–586, 2019, [Online]. Available: www.irjet.net. [26] R. Vera-Rodriguez, R. Tolosana, M. Caruana, G. Manzano, C.
[11] D. Suryani, E. Irwansyah, and R. Chindra, “Offline Signature GonzalezGarcia, J. Fierrez and J. Ortega-Garcia, “DeepSignCX:
Recognition and Verification System using Efficient Fuzzy Kohonen Signature Complexity Detection using Recurrent Neural Networks,” in
Clustering Network (EFKCN) Algorithm,” in Procedia Comput. Sci., Proc. International Conference on Document Analysis and
vol. 116, pp. 621–628, 2017, doi: 10.1016/j.procs.2017.10.025. Recognition, 2019
[12] A. U. Rehman, S. ur Rehman, Z. H. Babar, M. K. Qadeer, and F. A. [27] C. Li, X. Zhang, F. Lin, Z. Wang, J. Liu, R. Zhang, and H. Wang, “A
Seelro, “Offline Signature Recognition and Verification System Using stroke-based RNN for writer-independent online signature
Artificial Neural Network,” in Univ. Sindh J. Inf. Commun. Technol., verification,” in Proc. Int. Conf. Doc. Anal. Recognition, ICDAR, pp.
vol. 2, no. 1, pp. 73–80, 2018. 526–532, 2019, doi: 10.1109/ICDAR.2019.00090.
[13] A. T. Nasser and N. Dogru, “Signature recognition by using SIFT and [28] C. Nathwani, “Online Signature Verification Using Bidirectional
SURF with SVM basic on RBF for voting online,” in Proc. 2017 Int. Recurrent Neural Network,” in Proc. Int. Conf. Intell. Comput. Control
Conf. Eng. Technol. ICET 2017, vol. 2018-Janua, pp. 1–5, 2018, doi: Syst. ICICCS 2020, no. Iciccs, pp. 1076–1078, 2020, doi:
10.1109/ICEngTechnol.2017.8308208. 10.1109/ICICCS48265.2020.9121023.
[14] K. Kamlesh and R. Sanjeev, “Offline Signature Recognition Using [29] C. S. Vorugunti, R. K. S. Gorthi, and V. Pulabaigari, “Online signature
Deep Features,” in Lect. Notes Networks Syst., vol. 141, pp. 405–421, verification by few-shot separable convolution based deep learning,”
2021, doi: 10.1007/978-981-15-7106-0_18. in Proc. Int. Conf. Doc. Anal. Recognition, ICDAR, pp. 1125–1130,
[15] S. Lai, L. Jin, and W. Yang, “Online Signature Verification Using 2019, doi: 10.1109/ICDAR.2019.00182.
Recurrent Neural Network and Length-Normalized Path Signature [30] C. S. Vorugunti, G. S. Devanur, P. Mukherjee, and V. Pulabaigari,
Descriptor,” in Proc. Int. Conf. Doc. Anal. Recognition, ICDAR, vol. “OSVNet: Convolutional siamese network for writer independent
1, no. 1, pp. 400–405, 2017, doi: 10.1109/ICDAR.2017.73. online signature verification,” in Proc. Int. Conf. Doc. Anal.
[16] R. Tolosana, R. Vera-Rodriguez, J. Fierrez, and J. Ortega-Garcia, Recognition, ICDAR, pp. 1470–1475, 2019, doi:
“DeepSign: Deep On-Line Signature Verification,” in IEEE Trans. 10.1109/ICDAR.2019.00236.
Biometrics, Behav. Identity Sci., vol. 3, no. 2, pp. 229–239, 2021, doi: [31] R. Ravi Chakravarthi and E. Chandra, “Kernel based artificial neural
10.1109/tbiom.2021.3054533. network technique to enhance the performance and accuracy of on-line
[17] C. Y. Park, H. G. Kim, and H. J. Choi, “Robust Online Signature signature recognition,” in J. Internet Technol., vol. 21, no. 2, pp. 447–
Verification Using Long-term Recurrent Convolutional Network,” in 455, 2020, doi: 10.3966/160792642020032102013.
2019 IEEE Int. Conf. Consum. Electron. ICCE 2019, no. c, pp. 1–6, [32] D. I. Dikii and V. D. Artemeva, “Online handwritten signature
2019, doi: 10.1109/ICCE.2019.8662005. verification system based on neural network classification,” in Proc.
[18] L. Yang, Y. Cheng, X. Wang, and Q. Liu, “Online handwritten 2019 IEEE Conf. Russ. Young Res. Electr. Electron. Eng. ElConRus
signature verification using feature weighting algorithm relief,” in Soft 2019, pp. 225–229, 2019, doi: 10.1109/EIConRus.2019.8657134.
Comput., vol. 22, no. 23, pp. 7811–7823, Dec. 2018, doi: [33] A. Sharma and S. Sundaram, “On the Exploration of Information from
10.1007/s00500-018-3477-2 the DTW Cost Matrix for Online Signature Verification,” in IEEE
[19] A. B. Jagtap, R. S. Hegadi, and K. C. Santosh, “Feature learning for Trans. Cybern., vol. 48, no. 2, pp. 611–624, 2018, doi:
offline handwritten signature verification using convolutional neural 10.1109/TCYB.2017.2647826.
network,” in Int. J. Technol. Hum. Interact., vol. 15, no. 4, pp. 54–62, [34] Y. Jia, L. Huang, and H. Chen, “A two-stage method for online
2019, doi: 10.4018/IJTHI.2019100105. signature verification using shape contexts and function features,” in
[20] R. K. Mohapatra, K. Shaswat, and S. Kedia, “Offline Handwritten Sensors (Switzerland), vol. 19, no. 8, 2019, doi: 10.3390/s19081808.
Signature Verification using CNN inspired by Inception V1 [35] nehal hamdy al-banhawy, H. Mohsen, and N. Ghali Prof., “Signature
Architecture,” in Proc. IEEE Int. Conf. Image Inf. Process., vol. 2019- identification and verification systems: a comparative study on the
November, pp. 263–267, 2019, doi: online and offline techniques,” in Futur. Comput. Informatics J., vol.
10.1109/ICIIP47207.2019.8985925. 5, no. 1, p. 3, 2020.
[21] E. Alajrami, B. A. M. Ashqar, B. S. Abu-Nasser, A. J. Khalil, M. M. [36] S. Utkarsh and B. Vikrant, “Comparison between CNN and ANN in
Musleh, A. M. Barhoom, and S. S. Abu-Naser, “Handwritten Signature offline signature verification,” in Proc. Second Int. Conf. Comput.
Verification using Deep Learning,” in Int. J. Acad. Multidiscip. Res., Commun. Control Technol., vol. 4, pp. 136–140, 2018.
vol. 3, no. 12, pp. 39–44, 2019, [Online]. Available: [37] L. G. Hafemann, R. Sabourin, and L. S. Oliveira, “Offline handwritten
https://philarchive.org/archive/ALAHSV signature verification - Literature review,” in Proc. 7th Int. Conf. Image
[22] S. Masoudnia, O. Mersa, B. N. Araabi, A. H. Vahabie, M. A. Sadeghi, Process. Theory, Tools Appl. IPTA 2017, vol. 2018-Janua, pp. 1–8,
and M. N. Ahmadabadi, “Multi-representational learning for Offline 2018, doi: 10.1109/IPTA.2017.8310112.
Signature Verification using Multi-Loss Snapshot Ensemble of

223 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Student Performance Based on Student Final Exam


Prediction

1st Ignasius Kenny Bagus 2nd Luwita


Information Systems Department Information Systems Department
BINUS Online Learning BINUS Online Learning
Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia Jakarta, Indonesia
[email protected] [email protected]

3rd Nasrullah 4th Dina Fityria Murad


Information Systems Department Information Systems Department
BINUS Online Learning BINUS Online Learning
Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia Jakarta, Indonesia
[email protected] http://orcid.org/0000-0001-8724-9105

Abstract— The Covid-19 pandemic situation has made In this study, the data mining classification process was
changes to the education system. Educational institutions carried out using the Naïve Bayes and KNN methods to
carried out the shift from the face-to-face learning model to the provide predictions for student final exam scores using past
distance learning model to adapt to the pandemic situation to academic data and student profile data. This study aimed to
maintain educational activities' sustainability. Despite changes find the best classification method between Naïve Bayes and
in learning models, education providers certainly want to KNN by comparing the accuracy levels produced by the two
maintain academic quality by producing graduates with ways.
superior academics, practical knowledge, and innovative
thinking. The problem currently faced is how education II. RELATED WORK
providers can monitor students' performance to complete their
studies correctly. Therefore, a grade prediction is needed that Various models have been developed by several
helps students, lecturers, and administrators of educational researchers related to measuring student performance. A
institutions maintain and improve academic quality. This study research journal compiled by [7] states that machine learning
compares the techniques. This study shows that the Naïve Bayes methods can help examine students' historical data and
method provides a higher level of accuracy than the KNN provide predictions for future performance measurements [8].
method, which is 96%. It gives a similar conclusion that data mining techniques can
use to find interesting patterns that can use as models in
Keywords—exam score prediction, classification, naïve Bayes, measuring student performance.
KNN.
The implementation of Naïve Bayes has been carried out
I. INTRODUCTION [9] to predict student's final grades based on historical data in
Education in Indonesia has undergone significant changes Mathematics and Portuguese with an accuracy of 93.6%. In
since the Covid-19 pandemic. This is because the contrast to the research that has been done [10], where this
conventional learning model with face-to-face, which is the research produces an accuracy of 72% for predicting student
learning model most widely used by educational institutions, performance in semester exams using Naïve Bayes.
has begun to shift to the online learning model, also known as III. METHOD
distance learning. But of course, this change requires
tremendous effort and support, including facilities or The method used in this study is to create a classification
infrastructures, human resources, and most importantly, a model with the Naïve Bayes [11] and K-Nearest Neighbor
change in the mindset of students and teachers [1]. For [12] methods and compare the results of the accuracy resulted
educational institutions, even though there has been a change from the two classification models. An overview of the flow
in the learning model, they still want to maintain academic of this research can be seen in Figure 1.
quality by generating graduates with superior quality in
academics, practical knowledge, and innovative thinking.
Academic achievement can be measured by carrying out
various tests, assessments, and other forms of measurements
[2]. However, academic performance can differ from student
to student because each student has a different ability level [3].
Therefore, early information is needed to be related to student
achievement levels [4] with a prediction system [5]. The
availability of fast, precise, and high accuracy will assist
education providers in helping their students. This form of
assistance can be given in the form of directions to study
harder and provide recommendations for learning materials or
practice questions [6].

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

224 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Team Assignment
13 TAS2 0-100
2 Score
Team Assignment
14 TAS3 0-100
3 Score
Team Assignment
15 TAS4 0-100
4 Score
16 FIN Final Exam Score 0-100
17 Result Total Score 13-100
The grade obtained A is the best
by students is score, and E
18 Grade
following the total is the worst
score accepted. score.
Gender of F = Female
19 Gender of Student
Student M = Male
Jakarta,
20 Place of Birth Place of Birth
Bandung, etc
21 Date of Birth Date of Birth date/mon/year
Academic BJKXI, BSMXI,
22 BULC Code
Prog etc
Student's latest
23 Status SMA, D3.
education status
Student marital
24 Marital Single, Married
status
Number of children
The number of 0, 1, More than
25 had by students
dependents 1.
Figure 1. Research Framework with married status
Not working
Student's employee
26 Job-status yet, already
status
working
A. Dataset The kind of
The type of
The data used for this research came from the Information 27 company where IT, Non-IT.
company
Systems study program repository for the Business Process students work
Fundamentals course. The sample data obtained contains 75
data records with data attributes as in table 1. B. Data Cleaning
The Data Cleaning stage needs to be done to improve data
TABLE 1. ATTRIBUTES OF DATASET quality by eliminating inconsistencies and noise [13].

No Attribute Description Value C. Attribute Selection


Binusian The attributes retrieved from the sample data will be
1 BinusianID Identification BNxxx selected and used in the data mining process according to the
Number research objectives. There are two groups of features defined:
2 StudentID Student ID Number 22xxxxx 1) The target of an attribute
3 Student Name Student Name Xxxxxx The target of an attribute can also be called the
dependent variable. This attribute will be the
4 Course Code Course Code ISYS6300 reference for the classification model in providing
Business the output class. The Target attribute selected is the
5 Course Name Course Name Process final exam score.
xxxxxxx
6 ATT Attendance Score 0-100 2) Feature attribute
Discussion Forum The Feature of an attribute can also be called an
7 FOD 0-100
Score independent variable. This attribute will be the
Personal reference for the classification model in making the
8 PAS1 Assignment 1 0-100 model. The featured characteristics selected to
Score consist of PAS1, PAS2, QIZ1, QIZ2, Gender,
Personal Status, Date of Birth, Marital, Number of
9 PAS2 Assignment 2 0-100
Dependents, Job Status, Company Type, and
Score
Position.
10 QIZ1 Quiz 1 Score 0-100
D. Data Transformation
11 QIZ2 Quiz 2 Score 0-100
This stage makes changes to the form of data to facilitate
Team Assignment making classifications in the data mining process [13].
12 TAS1 0-100
1 Score

225 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

E. Naïve Bayes classification IV. RESULT AND DISCUSSION


Naïve Bayes is a statistical classification that can predict In this study, two classification models were made, namely
the probability of class membership based on the Bayes Naïve Bayes and KNN. The sample data obtained contains 75
theorem developed by Thomas Bayes in the 18th century [13]. data records divided into two groups, namely training data by
Naïve Bayes assumes that the effect of attribute values in a 70% of the sample data and testing data by 30% of the sample
particular class does not depend on the importance of other data. Then the data is processed data mining with the Naïve
attributes. This assumption is the computation involved and, Bayes method and KNN using jupyter notebooks with python.
in this sense, is considered "naive." The equation of the Bayes
theory is:
P X Ci P(Ci )
P(Ci |X) = (1)
P(X)

where:
C = The Hypothesis of X is in a Figure 2. Simple split [14]
specific class. Then after making the model using training data, the
X = Data with an unknown class. model will test on the testing data to determine the level of
P(C|X) = The probability of hypothesis C is accuracy generated by the model.
based on condition X.
A. Implementation of Naïve Bayes using Jupyter Notebook.
P(C) = The probability of hypothesis C.
P(X|C) = The probability of hypothesis X The packages to start with are Pandas, Numpy and
based on condition C. Seaborn. Pandas functions to support data analysis and
P(X) = The Probability X. presentation functions, Numpy functions to support work
related to matrices and arrays, and Seaborn serves to support
F. KNN classification data visualization and statistical procedures.
K-Nearest Neighbors (KNN) is a classification algorithm
that groups data based on the closest distance to its neighbors.
The KNN classification base on analogy learning by
comparing a given test tuple with similar training tuples. Each
tuple represents a point in the -n dimension of space. In this
way, all training tuples are store in the -n dimension pattern
space. The steps that need to take to create the KNN algorithm
are:
1. Determine K value.
2. Calculating the distance between the test data and
training data, one of which is to use the euclidean
distance equation below:
D X ,Y ∑ X Y (2)
Figure 3. Naïve Bayes classification
where:
Figure 3 shows the process of making a Naïve Bayes
Xi = Training data.
classification model using a jupyter notebook. The first
Yi = Test data.
process to do is call the required packages. After that, the
D (Xi, Yi) = Distance. second process is to read the data from the CSV file, which
i = Data variables. contains the data attributes collected in the previous method.
n = Data dimensions. The third step is dropping the column for features that are not
G. Confusion Matrix used in the modeling process. The fourth step is to enter data
into the data frame for the classification model creation
A confusion matrix is a helpful tool for analyzing how well process. The existing data will be determined into 2 (two)
a classifier can recognize data from different classes. There groups, namely feature data (x) and target data (y). The fifth
are 4 (four) terms that need to be known in calculating the step is to divide the data into 2 (two) parts: training data and
evaluation measure [13]. testing data. The sixth step is to call a function called
1. True positives (TP) or the amount of positive label GaussianNB from the sci-kit to learn library to create a model
that predicted correctly. and test the model on testing data.
2. True negatives (TN) or the amount of negative label Figure 4 shows the accuracy obtained using the Confusion
that predicted correctly Matrix library from the little learn library. From this matrix, it
can be seen that the resulting True Positive (TP) value is
3. False positives (FP) or the number of negative data is relatively high when compared to the True Negative (TN),
incorrectly labeled as positive. False Positive (FP), and False Negative (FN) values. This
4. False-negative (FN) or the number of positive data is indicates that the model used in the test data successfully
incorrectly labeled as negative. classifies positive data correctly with the highest value when

226 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

compared with the TN, FP, and FN values. Table 2 describes


in detail the results of the TP, TN, FP, and FN.

Figure 6. KNN (K=3)


Figure 6 shows the process of making the KNN
Figure 4. Confusion Matrix Naïve Bayes classification model with a K = 3 using a jupyter notebook. In
making a classification model using K-Nearest Neighbor
TABLE 2. TP, TN,FP AND FN EACH CLASS (KNN), the steps taken are the same as those in making a
classification model using Naïve Bayes. An action taken
Class TP TN FP FN explicitly in this process is calling the classification function
0 7 16 0 0 using the KNN. Figure 7 shows the confusion matrix results
1 1 22 0 0 generated from the testing model made against the test data.
Furthermore, in Figure 8, we can know the results of the
2 6 16 0 1 classification report, which contain the precision and recall
3 0 21 2 0 values for each class label and the overall accuracy level for
4 2 19 0 2 KNN with a value of K = 3, where it is known that the
resulting accuracy rate is 0.70 or 70% along with the TP
5 0 21 2 0
results, TN, FP and FN as in table 3.
6 2 20 0 1
7 1 22 0 0

After the confusion matrix has been formed, the results of


the classification report can then be taken, which contain the
precision and recall values for each class label and the overall
accuracy level where it is known that the resulting accuracy
rate is 0.83 or 83% as in Figure 5.

Figure 7. Confusion Matrix KNN (K=3)

TABLE 3. TP, TN,FP AND FN KNN (K=3)


Figure 5. Classification Report Naïve Bayes
Class TP TN FP FN
B. Implementation of KNN using Jupyter Notebook. 0 6 14 1 2
In making a classification model using the KNN, the 1 0 22 1 0
researcher uses 3 (three) different k values, namely 3, 5, and
2 6 16 0 1
7. To find the optimal K value when the comparison is made
at the resulting level of accuracy. 3 1 19 1 2
4 1 20 1 1
5 1 21 1 0
6 0 20 2 1
7 1 22 0 0

227 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Figure 10. Classification Report KNN (K=5)


Figure 8. Classification Report KNN (K=3)

The following experiment is to change the K value from 3 The last experiment is to use the value of K=7. Similar to
to 5. The composition of TP, TN, FP, and FN in the confusion the previous experiment, the composition of TP, TN, FP, and
matrix has a slight change at K = 5 when compared to K = 3 FN in the confusion matrix has slightly changed compared to
in the previous experiment as seen in Figure 9, where table 4 K=5 as seen in Figure 11 and described in more detail
also describes in more detail the value of TP, TN FP, and FN. regarding the TP, TN FP, and FN values in table 5.
Furthermore, in Figure 10, it can be seen that the results of the
classification report include the precision and recall values for
each class label and the overall accuracy level for KNN with
a value of K = 5 where it is known that the resulting level of
accuracy is 0.74 or 74%.

Figure 11. Confusion Matrix KNN (K=7)

TABLE 5. TP, TN,FP AND FN KNN (K=7)

Class TP TN FP FN
Figure 9. Confusion Matrix KNN (K=5) 0 7 15 0 1
1 0 22 1 0
2 6 17 0 0
TABLE 4. TP, TN,FP AND FN KNN (K=5)
3 1 20 2 1
Class TP TN FP FN 4 2 19 0 2
0 7 14 0 2 5 1 21 1 0
1 0 22 1 0 6 0 21 2 0
2 5 17 1 0 7 1 22 0 0
3 0 20 2 1
4 2 19 0 2
The resulting classification report based on the test results
5 1 21 1 0
with a value of K = 7 states that the level of accuracy is 0.78
6 1 21 1 0 or 78% as shown in Figure 12.
7 1 21 1 0

228 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

[5] P. Bertens, A. Guitart, P. P. Chen, and Á. Periáñez,


"A Machine-Learning Item Recommendation
System for Video Games," Jun. 2018, Accessed: Jul.
02, 2018. [Online]. Available:
http://arxiv.org/abs/1806.04900.
[6] D. F. Murad, "Recommendation System for Smart
LMS using Machine Learning : A Systematic
Literature Review," no. June 2019, 2018, doi:
10.1109/ICCED.2018.00031.
[7] Agrawal; H.; & Mavani; H., "Student Performance
Figure 12. Classification Report KNN (K=7) Prediction using Machine Learning," Int. J. Eng. Res.
Technol., vol. 4, no. 3, pp. 111-113., 2015.
[8] Abu; A., "Educational Data Mining & Students
C. Testing Performance Prediction," Int. J. Adv. Comput. Sci.
The results of the implementation of the classification Appl., vol. 7, no. 5, pp. 212–220, 2016.
using Naïve Bayes and KNN were then tested on the testing
[9] Ünal; F., "Data Mining for Student Performance
data. After trying the level of accuracy using a configuration
matrix, it is known that the Naïve Bayes classification model Prediction in Education," InTechOpen, p. 12, 2019.
produces an accuracy of 83%, while the KNN classification [10] Shaziya; H.; Zaheer; R.; & Kavitha; G., "Prediction
model that was carried out experiments using K values of 3, of Students Performance in Semester Exams using a
5, and 7 each yielded an accuracy of 70%, 74% and 78. Naïve bayes Classifier," Int. J. Innov. Res. Sci. Eng.
V. CONCLUSION Technol., vol. 4, no. 10, p. 8, 2015.

Naïve Bayes and KNN are simple classification methods [11] S. Gupta, M. Kaur, S. Lakra, and Y. Dixit,
used to predict test scores based on past learning data. The "Performance evaluation of supervised learning
results show that the classification using Naïve Bayes algorithms on hate speech detection," J. Adv. Res.
produces the highest level of accuracy when compared to the Dyn. Control Syst., 2020, doi:
level of accuracy delivered by the KNN method with K values 10.5373/JARDCS/V12SP7/20202440.
of 3, 5, and 7. Therefore, the Naïve Bayes method can use to
make prediction models for test scores. [12] S. Bag, A. Ghadge, and M. K. Tiwari, "An integrated
recommender system for improved accuracy and
aggregate diversity," Comput. Ind. Eng., vol. 130, no.
Reference February, pp. 187–197, 2019, doi:
10.1016/j.cie.2019.02.028.
[1] B. D. W. and T. D. F. Murad, R. Hassan, Y. Heryadi,
"The Impact of the COVID-19 Pandemic in [13] Han; J.; Kamber; M.; & Pei; J., "Data Mining
Indonesia (Face to face versus Online Learning)," in Concepts And Tehniques.," Massachusetts: Morgan
2020 Third International Conference on Vocational Kauffman, 2012.
Education and Electrical Engineering (ICVEE), [14] T. E. . S. R. . & Delen;, Decision Support And
Surabaya, Indonesia, 2020, pp. 1–4, DOI: DOI: Intelligence Systems. New Jersey: Pearson, 2011.
10.1109/ICVEE50212.2020.9243202.
[2] O. Zahour, E. H. Benlahmar, A. Eddaoui, and O.
Hourrane, "Towards a system for predicting the
category of educational and vocational guidance
questions using bidirectional encoder representations
of transformers (BERT)," Int. J. Adv. Trends
Comput. Sci. Eng., vol. 9, no. 1, pp. 505–511, 2020,
doi: 10.30534/ijatcse/2020/69912020.
[3] A. Elzainy, A. El Sadik, and W. Al Abdulmonem,
"Experience of e-learning and online assessment
during the COVID-19 pandemic at the College of
Medicine, Qassim University," J. Taibah Univ. Med.
Sci., no. October, pp. 1–7, 2020, doi:
10.1016/j.jtumed.2020.09.005.
[4] D. Sulisworo, M. Fitrianawati, I. Maryani, S.
Hidayat, E. Agusta, and W. Saputri, "Students' self-
regulated learning (SRL) profile dataset measured
during Covid-19 mitigation in Yogyakarta,
Indonesia," Data Br., vol. 33, p. 106422, 2020, doi:
10.1016/j.dib.2020.106422.

229 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Development of Smart Restaurant Application for


Dine-In
1st Andriatama Bagaskara 2nd Ahmad Ridhwan Naufal 3rd Ivano Ekasetia Dhojopatmo
Computer Science Department Computer Science Department Computer Science Department
School of Computer Science School of Computer Science School of Computer Science
Bina Nusantara University Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia 11480 Jakarta, Indonesia 11480 Jakarta, Indonesia 11480
[email protected] [email protected] [email protected]

. 4th Ali Abdurrab 5th Widodo Budiharto


Computer Science Department Computer Science Department
School of Computer Science School of Computer Science
Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia 11480 Jakarta, Indonesia 11480
[email protected] [email protected]

Abstract— The Coronavirus disease has dealt the Food and the new normal does not mean the disappearance of Covid-19,
Beverage industry a great blow. Restaurants have to reduce but it gives people the opportunity to return to their daily
their open hours and customers allowed to follow with the activities, on condition that they comply with the health
regulations, and it leaves them at a huge financial loss. We protocol.
conduct this research to find the optimal possible way for
restaurant to stay open and still have customer on this pandemic In our paper, we offer solutions to reduce face-to-face
era. We propose that restaurant should make their reservation events that occur in restaurants. With it, people can visit
and booking system online as we have seen that online booking restaurants and order food without face to face and can reduce
and ordering became popular lately. The purpose of this the spread of Covid-19. With it, people can visit restaurants
application is to reduce the contact needed between the and order food without face to face and can reduce the
customer and the staff to as little as possible, while also avoid transmission of Covid-19 with the health protocol that has
creating waiting time for the customer. Our application is been regulated by the restaurant by limiting seats for guests
consisted of reservation system and ordering system. The and assistance from QR code scans. By using QR code
reservation system enables customers to book a table online and technology we can order food without a direct face to face
check in via QR Code. The ordering system enables customers conversation with the restaurant employee, and we also can
to pick a menu using their Smartphone and confirm their order, reserve our table.
as well as paying it directly using electronic money.
In this project, we have a solution: an application that
Keywords— restaurant reservation, food ordering, QR Code provides services such as an ordering system and a booking
verification, electronic payment, android studio system that reduces direct contact with the waiter or the person
concerned. Customers can choose what restaurant they want
I. INTRODUCTION to visit or a nearby restaurant and customers can order the seat
The Coronavirus disease has been threatening humanity as as they want. Arriving at the restaurant, the customer will scan
of late 2019. Taken from the data as of March 2021, there has the QR code that has been obtained when ordering or
been around 120 million confirmed cases around the world, reserving a seat and they will be directed to the seat that has
with almost 700 thousand new cases every day. The effects been booked before. After that, the waiter just waits for the
created by this plague threat not only the health and life of notification from the customer, what food the customer
people, but also take away their entertainment, their social life, ordered. Payment can be made via e-wallets such as OVO,
their jobs. People can no longer go outside as they please, they GOPAY, and others.
must wear masks and take safety precautions wherever they
go. A lot of countries, including first-world countries have II. LITERATURE REVIEW
activated the lockdown procedure in several cities due to the Based on Wajinku Musili [1], Covid-19 has an immediate
unstoppable spreading of the virus. In Indonesia alone, there and lasting impact on trade and development in developing
has been 1,3 million cases with around 6,000 new cases a day. countries. This pandemic has a short-term and long-term
The mortality rate of Covid-19 infection in Indonesia is effect that must be addressed to ensure the fast recovery of the
around 2,7%. The impact of Coronavirus towards the economy. This involves government’s supports in terms of
Indonesian economy is unfortunately much more severe. financial and infrastructure on trades, especially tourism field
Indonesia’s economic growth has decreased from 5% in the such as restaurant.
early months of 2019 into 3% in the same quarter of 2020.
Impact of COVID-19 on trades especially restaurant
At first, due to the existence of Covid-19, many restaurants has been significant since the introduction of the distance
could not stay open due to the lockdown procedure to reduce policy and limitation on person in order to decrease the spread
the spread of Covid-19. Currently, the ‘new normal’ has been through person-to-person. This caused restaurant to lose their
implemented so that people can return to their activities with revenue in a large scale, in some cases deficit. Restaurants are
the terms and conditions that have been made by the in need for innovation to counter this problem, especially if
government such as wearing a mask, social distancing, and not they want to survive this pandemic. One of the more
forgetting to wash hands after touching stuff. The existence of

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

230 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

acceptable innovation is restaurant based on online booking They concluded that not only time waiting for reservation
and ordering. In this era of technology, online reservation affects customer’s patience on waiting in the queue and
system has been applied in many sectors, especially in leaving it to another restaurant, time waiting for paying and
restaurant. Marsyahariani et.al [2] has been successful in ordering also affects the duration of customer staying in the
making an online system for booking and reservation for restaurant and their likelihood to go there again. Huang et.al
restaurant where they replace the manual business system with [10] analyze the response from the customer to scarcity
the computerized system. Customer can make reservation appeals on online booking, stating that customer tend to
online without problems like unavailability of table behave impulsive and not thinking about the risk when they
reservation or staffs doesn’t need to organize the waiting are faced with scarcity. This also applies to customer with
order. With this the work load off the staffs can be lifted high-buying capability, not only locals but also tourist.
significantly, and the manager can also have the benefits of Toktassynova and Akbaba [11] describe the analysis review
monitoring and organize all the activities efficiently. Other of customer that uses online platform booking on a certain
successful case of this online reservation system is from [3], restaurant. They found that customer give negative reviews
where they use time-series prediction to predict the sequence when the restaurant is crowded. It can be concluded that user
dependence complexity as a variable in neural network. In appreciate platform that can give them access to the table
their study, they lay out all the features that essential in online- faster and appreciate the existence of online-booking
booking application. Customer can pre-book tables, pre-order application.
food, and make change in reservation. For restaurant, they can
receive details of the reservation, details of the food order, and III. METHODOLOGY
details of the customer. Application must have this following The process of creating the application is divided into two
function for customer and restaurant: function to search a parts, the prototyping and the project development.
restaurant for customer, function to add new types of food
from the menu and set the timing for order for restaurant. A. Prototyping
The purpose of building this prototype model is to define
Another method that can be applied in online-
the interface that we want for the app first, before moving into
booking system proposed by Liyanage et.al [4], is that by
the back-end part. The prototype is created with Figma, and
applying GPS to track the customer location and then
once it was finished, we create a fill-out form and share the
recommend some restaurant near to them. The application will
questionnaire for customers to test our app and judge its
show the tables that are available in the recommended
functionality and appearance.
restaurant, and for each table there is a device that connects
through Wi-Fi and the restaurant database, where they
connected with two light bulb that signal the status of the
reservation. The lights will change color based on the
reservation status and cannot be reserved by another customer
if they are already reserved. With this method, restaurant can
easily manage the number of customers.
Self-ordering system or Smart Ordering System that
no longer needed the direct help from the staff is already being
used by many restaurants. There are some varieties to this
system. Umap et.al [5] use transmitters that are connected to
screens that are available on every tables. The transmitter
connected with receiver that can decrypt the NRF module and
process the order from the kitchen. Shadaksharappa and
Kumar [6] proposed the use of QR code to access menus and
choose orders. After the order, customer will be directed to Fig. 1. Home Page, Login Page and Register Page
bank portal to do the payment. Kurniawan and Abdul [7] use
the AOS-RTF system in their study, the customer use This is the home page, where the customer can register or
smartphone that are connected through restaurant’s Wi-Fi to log in if they already have an account. A user who logs in will
do the ordering. Order will then be sent to the device in the be redirected to Figure 2, Home Reservation Page.
kitchen that also connected to Wi-Fi, and device that do the
payment in the cashier to print the receipt for customer.
Payment system can also use non-traditional
currency, the use of debit/credit card as a solution before being
taken over by the digital currency. Harpanahalli et.al [8] use a
radio-frequency system identification (RFID) in their study,
allowing them to do the payment using debit cards without
doing the transaction in cashier. Having a device in each table
that use Payment Unique Identification (UID), allows the
customer to pay with a card using RFID reader. As stated in
[5], a web portal will appear directly after the customer orders
and redirect the customer to a payment website, thus allowing
them to choose payment via bank transfer.
De Vries et.al [9] analyze the relation between time
waiting and customer’s behavior in a restaurant environment.

231 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

In Figure 3 is what appears when users click on the book


option in a restaurant. The information provided on the
restaurant are photos, distance from user’s current location,
tables left, and the price range. The user can determine how
many guests will be attending the reservation.

Fig. 4. Booking Confirmation Page

After the user confirm their reservation, in which the


booking fee cannot be refunded if they cancel their
reservation, they will receive a confirmation note of their
reservation as well as a QR Code that they can scan when
they arrive to the restaurant to verify their arrival, and after
that they will move on to the second segment, the Food Order
Segment.
Fig. 2. Home Reservation Page

The App is basically parted into two segments, that is the


Reservation Segment and the Food Order Segment. In this
Reservation Segment, the Home Page will provide user with
the list of restaurants that are available on the app. The feature
not only provides information regarding nearby restaurants,
but also restaurants that offers deals as well. User can also see
their booking history and promos available to them. If a user
did not find the restaurant they are looking for, they can type
the restaurant name in the search bar, and the result will pop
up.

Fig. 3. Restaurant Detail Page


Fig. 5. Home Ordering Page

232 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

In this segment, the user has checked into the restaurant,


verified their reservation and ready to order their food. The
user will be provided information on every dish available in
the restaurant, and occasionally when they have a promo, it
will be featured in the page as well. User can also see what
they have already ordered, and what remains on their wallet
so they can calculate on what to order. A user can still order
if they have not checked out yet, in which their session in the
restaurant will be over.

Fig. 8. Checkout Confirmation Page

In this page, the user wants to check out and finish their
session in the restaurant. They will be given a checkout recipe
as well as a thank you note from the application, and can direct
themselves back to Figure 2, the Home Reservation Page.
As mentioned above, we did spread a fill-out form that is
aimed to get potential customer response for our prototype.
Fig. 6. Food Detail Page
We received around 18 responses, most of them are Bina
This is what is shown when a user clicks on a food to see Nusantara students and some of them are lecturers. The
its details. User can see the photo of the dish, the name and responses we received are mostly supporting the development
details of the dish as well as its price. User can directly change of the application, due to the demand of a safe and proper
the quantity they want to order, then click add to cart. They restaurant is high, as well as the difficulty to find and book a
will then be redirected to Figure 5 and see whether they want restaurant when people go out with their family.
to add to their order or want to confirm their order and pay.
B. Planned Project Developing
The development of the project will begin with the same
step as the prototyping, that is the division of the Application
work into two segments: Reservation Segment and Food
Ordering Segment. The reservation segment will be fully
focused on the application, while the food ordering segment
will not only rotate upon the application, but the restaurant
system as well.
The application will be made with Android Studio and
using JavaScript and React Native as its language and library.

Fig. 9. Reservation Segment Flowchart

This is the flow of the reservation segment. The two


processes are searching for a restaurant and requesting
reservation. The process of searching for a restaurant will be
Fig. 7. Order Confirmation Page using an algorithm to prioritize a restaurant for
recommendation based on rating. Based on [12], the
This is what appears when a user click confirm order and
collaboration filtering method based on user evaluation matrix
pay. They will be asked to follow up with the digital wallet
is an effective method to proceed with the process.
they would like to use to pay. User must directly pay their
order; a postponed payment is not possible. Then they will be
shown their paid order as it is currently being made and then
served. The process of requesting a reservation will be using
another algorithm to send the request, check whether it is

233 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

possible to reserve a place by checking available table with the nearby


size of guests of the user, and return the receipt and QR Code restaurants
for verification if the reservation is successful. Nur Ainin 5 User search Found restaurant Found
Sofiya [13] points out that using priority algorithm to sort out for restaurant by name restaurant by
queues is efficient enough.
name name
6 User search Found restaurant No restaurant
for restaurant by category found
by category
7 User reserves Reservation Reservation
a place at an successful successful
empty
restaurant
8 User reserves Reservation failed Reservation
a place at a successful
full restaurant
9 User verifies Verification Verification
QR Code successful successful
10 User places Order successful Order
order successful
11 User pays for Payment Payment
order successful successful
Fig. 10. Restaurant Segment Flowchart
12 User checks Checkout Checkout
Figure 10 shows the flow when the user has arrived at the out successful successful
restaurant. There are two processes to be shown, the first one
being the verification process of the QR Code, the second one
being the food ordering process where the user pick items out
of the menu and confirm their order, which will be sent to the V. CONCLUSION
kitchen to be processed. This process will be repetitive until The application is meant to simplify the way of customers
the user checks out of the process. to find a restaurant that maintains the current health and safety
Al-Ghaili et.al [14] discussed the usage of Hash based protocols, while also reducing possible waiting time. It is
Reference Value (HRV) for verifying ID. They concluded that useful for customers that want to have a comfortable dine in
the method is reliable and secure. The method used by [15] by experience, without having to fear that they may have done
combining the work of the mobile platform of the application something that could cause the infection of Covid-19.
and the POS system of the restaurant gives us an insight on With that said, the application is far from being complete.
how to approach the method of connecting the application and There are still a few mistakes that must be cleared, as the app
a restaurant’s POS system. needs to connect to partner with restaurants and connect with
their POS system for it to be completely functional. The
reparation and testing it with a real restaurant system would
IV. EXPERIMENTAL RESULT be our future goal.
The experiment will test the output from the app based on
user’s input. It is currently not possible to connect the
application directly to a restaurant’s POS system, therefore REFERENCES
that part will not be tested. [1] W. Musili, “Impact Of Covid - 19 Pandemic On Trade And
Development In Developing Countries,” vol. 11, no. 4, pp. 60–69,
The following table will present the user input, the 2021, doi: 10.29322/IJSRP.11.04.2021.p11208.
expected output, and the real output of the application. [2] N. D. Nik Marsyahariani and A. A. Muhammad Amin, “Restaurant
Reservation System Using Electronic Customer Relationship
Management,” pp. 147–152, 2019.
TABLE 1. DATA OF THE RESULT
[3] F. Rarh, D. Pojee, S. Zulphekari, and V. Shah, “Restaurant table
No User Input Expected Output Real Output reservation using time-series prediction,” Proc. 2nd Int. Conf.
1 User logs in Successful login Successful Commun. Electron. Syst. ICCES 2017, vol. 2018-Janua, no. Icces, pp.
(already have login 153–155, 2018, doi: 10.1109/CESYS.2017.8321254.
account) [4] V. Liyanage, A. Ekanayake, H. Premasiri, P. Munasinghe, and S.
Thelijjagoda, “Foody - Smart restaurant management and ordering
2 User logs in Failed login Failed login system,” IEEE Reg. 10 Humanit. Technol. Conf. R10-HTC, vol. 2018-
(does not have Decem, pp. 1–6, 2019, doi: 10.1109/R10-HTC.2018.8629835.
account) [5] B. Shadaksharappa and D. Kumar, “FOOD HUB A Model for Ordering
3 User clicks a Go to restaurant Go to In Restaurant Based On Qr Code Without Presence Of A Waiter At
The Table,” Int. J. Eng. Res. Comput. Sci. Eng., vol. 5, no. 5, pp. 2394–
restaurant restaurant 2320, 2018, [Online]. Available: www.menucraft.ca/snscaf/menu.php.
4 User uses map Found nearby No restaurant [6] S. Umap, S. Surode, P. Kshirsagar, M. Binekar, and N. Nagpal, “Smart
to locate restaurants found Menu Ordering System in Restaurant,” Int. J. Sci. Res. Sci. Technol.,
vol. 4, no. 7, pp. 207–212, 2018.

234 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

[7] B. Kurniawan and M. F. Abdul, “Designing Food Ordering Application [12] L. Li, Y. Zhou, H. Xiong, C. Hu, and X. Wei, “Collaborative filtering
Based on Android,” IOP Conf. Ser. Mater. Sci. Eng., vol. 662, no. 2, based on user attributes and user ratings for restaurant
2019, doi: 10.1088/1757-899X/662/2/022070. recommendation,” Proc. 2017 IEEE 2nd Adv. Inf. Technol. Electron.
[8] J. Harpanahalli, K. Bhingradia, P. Jain, and J. Koti, “Smart Restaurant Autom. Control Conf. IAEAC 2017, pp. 2592–2597, 2017, doi:
System using RFID Technology,” Proc. 4th Int. Conf. Comput. 10.1109/IAEAC.2017.8054493.
Methodol. Commun. ICCMC 2020, no. Iccmc, pp. 876–880, 2020, doi: [13] N. A. Sofiya, “Hostel Facility Booking System Using Priority
10.1109/ICCMC48092.2020.ICCMC-000162. Algorithm ( Internet Computing ) With Honours,” Univ. Sultan Zainal
[9] J. De Vries, D. Roy, and R. De Koster, “Worth the wait? How Abidin, 2018, [Online]. Available:
restaurant waiting time influences customer behavior and revenue,” J. https://myfik.unisza.edu.my/www/fyp/fyp18semkhas/report/041253.p
Oper. Manag., vol. 63, no. April 2017, pp. 59–78, 2018, doi: df.
10.1016/j.jom.2018.05.001. [14] A. M. Al-Ghaili, H. Kasim, F. A. Rahim, Z.-A. Ibrahim, M. Othman,
[10] H. Huang, S. Q. Liu, J. Kandampully, and M. Bujisic, “Consumer and Z. Hassan, “Smart Verification Algorithm for IoT Applications
Responses to Scarcity Appeals in Online Booking,” Ann. Tour. Res., using QR Tag,” Comput. Sci. Technol. Lect. Notes Electr. Eng., vol.
vol. 80, no. September 2019, p. 102800, 2020, doi: 481, pp. 107–116, 2018, doi: https://doi.org/10.1007/978-981-13-
10.1016/j.annals.2019.102800. 2622-6_11.
[11] Z. Toktassynova and A. Akbaba, “CONTENT ANALYSIS OF ON- [15] L. W. Hong, “Food Ordering System Using Mobile Phone,” 2016.
LINE BOOKING PLATFORM REVIEWS OVER A RESTAURANT: [Online]. Available: http://eprints.utar.edu.my/1943/1/IA-2016-
A CASE OF PIZZA LOCALE IN IZMIR,” pp. 1–26, 2018 1203135-1.pdf.

235 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Utilization Big Data and GPS to Help E-TLE


System in The Cities of Indonesia
David Yu Andini Artika Dewi Sindy Nikita Wijaya Alexander A S Gunawan
Computer Science Computer Science Computer Science Computer Science
Department School of Department School of Department School of Department School of
Computer Science Computer Science Computer Science Computer Science
Bina Nusantara University Bina Nusantara University Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia 11480 Jakarta, Indonesia 11480 Jakarta, Indonesia 11480 Jakarta, Indonesia 11480
[email protected] [email protected] [email protected] [email protected]

Abstract— Indonesia government currently keep practicing be considered to implement this new system, such as every
on Electronic Traffic Law Enforcement. Over the last three city that will implement this system must have supporting
years the E-TLE was launched by the Ditlantas Polda Metro infrastructure such as Big Data, so that this system can run
Jaya the government keep developed a system to provides the Properly [1]. Armed with a very large volume of data and
smoothness and safety of traffic violations at the cities in
running in real-time, using Big Data can certainly help the E-
Indonesia. Using a Big Data Analytics for helping the large
amount of data to recording all the traffic violations, GPS for TLE system operate. Although it does not have a fixed
moving activity and can be used for analysis to identify and definition of the definition of big data, it can be concluded
information. We used the literature review to find out how to that big data refers to the 3 V’s, namely volume, variety, and
develop the E-TLE concept, system, and implement in velocity, which emphasizes the large amount of data, the data
Indonesia. The main problem in Indonesia is due to the lack of recording format, and the speed of the producing data. Big
equipment to support all traffic systems and the accuracy of the Data provides a great opportunity to identify every traffic
inference algorithm varies greatly depending on the size of the breach no matter how small it is. In addition, with GPS and
collected data sample. In this study, we find out all the problem assisted by cameras from E-TLE, it can be used to analyze
that can be cause by the system and find out how to solve all the
and identify traffic on a road. Furthermore, a solution can be
problem using the literature review method for identifying,
understanding, and transmitting information to help E-TLE made with the purpose of congestion does not occur, by
system. Finally, we concluded the big data is must for taking a sample dataset that is on the GPS. There can be seen
implementing the E-TLE system. Furthermore, GPS and radar whether the road segment is congested or not so that further
sensor is the critical data beside CCTV for enforcement of from the police can give an appeal to road users to avoiding
electronic traffic law. existing congestion [2].
Keywords— big data, electronic ticketing system, GPS, camera
sensor, smart city With all the problem mentioned, here we want to find out
I. INTRODUCTION (HEADING 1) whether using big data and GPS can help the Electronic
Indonesia is the 4th largest country in the world with a Ticketing System work well in Indonesian cities, do that it
population of 271 million which could increase again in the can be realize the development of smart cities Indonesia
next few years. Obviously, with the growing population, an properly. We describe overview of the E-TLE system in
area will become denser and will also affect the existing chapter 2, followed by the methodology of systematic
traffic. As we often see, especially in big cities in Indonesia literature review (SLR) in chapter 3. The results will be
such as Jakarta, it is always decorated with congestion and discussed in chapter 4 and 5. And this paper will be closed
traffic violations, from trivial things such as not wearing a with a conclusion in chapter 6.
helmet to driving drunk. Certainly, the police cannot monitor II. LITERATURE REVIEW
every existing road segment, especially in big cities in As we all know now Indonesia has started to develop an
Indonesia. However, with the development of existing Electronic Traffic Law Enforcement (ETLE). Nevertheless,
technology, Indonesia developed a considerable innovation there still many things to consider implementing this new
for the smoothness and safety of traffic. It is supported by the system. Every city that will implement this system must have
creation of an Electronic Ticketing System or commonly supporting infrastructure such as being able to use IOT and
abbreviated as E-TLE. Big Data technology so that the system can work properly [3].
Before that, the geography of a city must be mapped with the
In fact, this technology has been developing in other countries intention of system makers can find out where are the
for a long time, such as our neighbor, Singapore. Finally on strategic places to put CCTV cameras in the city. There are
November 25, 2018, the E-TLE was launched by Ditlantas many technologies can be utilized, such as use google earth
Polda Metro Jaya in order to overcome the high number of to show real time geographic contour topographically [4].
traffic violations and accidents while taking advantage of the Red Light Camera (RLC) also can be used as an alternative
industrial revolution 4.0 which is placed on 2 roads in Jakarta. for CCTV, RLC is being able to detect risky drivers’ behavior
Until 2021 E-TLE has been developed in many sections of such as speeding up and lane change [5]. Implementation
the city of Jakarta and has even begun to be things that must RLC that installed at traffic sign is because so many accidents

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

236 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

because Red Light Running and this implementation is from there we can see whether an area can accommodate so
efficient to reduce accidents. many vehicles. From there we can arrange a solution, so
traffic jam no longer occurs [14]. But, with the development
After all is calculated in detail, what must be done is to of technology, there are irresponsible people who try to find
implement this E-TLE to each road segment that has been loopholes in an existing system to reap profits. And from here
previously determined. By using a CCTV camera equipped we can see that the technology of Big Data is relatively new
with an RFID sensor that can read vehicle plates quickly and and it is vulnerable because it can be used by hackers such as
in real-time, every recorded violation will be immediately to manipulate E-TLE to facilitate a crime. From here we can
entered into the police database for further follow-up [6]. Not examine the existing anomalies so that this system can run
only that, in addition from using CCTV, the government can properly without any interference [15]. as time goes by, this
take advantage of applications which are used by other people technology will continue to develop and of course its security
on the road and can report traffic violations on the road, even will be even more guaranteed.
though there is no E-TLE CCTV [7]. Obviously, this way can
cut expenses and can also make it easier for law enforcement Another matter that needs to be considered in implementing
officials to obtain data about violations that occurred directly other E-Tilang is moving GPS activity data into the database.
against existing witnesses. In the stage of moving activity data to the database, it is
usually done storage for one activity data table by combining
Based on [8], humans generated five exabytes (5 × 106 raw data and information from the original data. In
Terabytes) of data until 2003 and in 2012 the same amount of transferring data there are many affecting factors such as the
data was generated in just 2 days. Big data also has the size of the database used, the performance desired, the
potential to improve security and the sustainability of specific objectives and the operational environment. Not only
transport systems [9]. Big data can also implement the that, but incoming data also has different formats from
Intelligent Transportation System (ITS) which is useful for different sensor data [1]. GPS data processing is divided into
supporting technology, especially ELTE enforcement [9] and two Steps. The first step is to transfer data from the GPS
big data has an important role and is very suitable to be device to a computer and create a file that can be used for
implemented because so much data popping up on every statistical analysis and the second step is to identify trips and
CCTV camera, the storage of data have to be calculated other information. GPS will record all the travel time and
optimally and can run in real time. This system can utilize 2 location coordinates every second, not only that the speed and
modules, namely Tracking radar and Law Enforcement, route of the trip can be automatically identified [16]. Storage
where all this large amount of data will be stored in NoSQL of GPS data into the database usually uses a queue cache
which can run in real-time optimally [10]. cluster to improve the performance of the entire system then
it will receive and process data using a socket connection then
Then how to catch criminals who run away by vehicle the information is stored in the database. After successfully
quickly? We can take advantage of CCTV E-TLE technology entered into the database, the server will send a message to
combined with route calculations, which uses the SPFA the terminal informing that the data has been received [17].
(Shortest Path Faster Algorithm) that is applied, which can
help the authorities determine the route that taken by people The use of the database in the GPS is to help a security system
who did a traffic violence. How? Its because to make SPFA that covers everything, not only that the database is used to
based on development of 3 algorithm, and the improved Prim make it easier for the workforce to check data security to a
path selection algorithm based on probability that will find a minimum and help prevent attacks by unauthorized people in
solution route that taken by people who did a traffic violence retrieving data of people who have been hit by E-Tilang [18].
[11]. In supporting the implementation of ELTE, a GPS- Obstacles that are felt in implementing E-Tilang, especially
based survey is needed to collect various kinds of information by using GPS devices, are tracking the location of the person
[12]. In addition, we can use GPS to track the position of the who did traffic violence and where they made a mistake, the
person, can go through the vehicle and can also use a cell accuracy of the information contained in the GPS data
phone. By using visual analysis whose data comes from GPS, besides that to track various types of vehicles on the highway
we can easily visualize the route of people who did traffic network is very limited. However, this problem can be solved
violence then we can easily visualize their route by using a by using a map matching algorithm and the GPS position
graphical visual query. By using this, the data from vehicle tracking data in this data is useful for determining routes,
GPS and data from the existing paths are processed using recognizing mobility patterns, and general navigation even
Clean Trajectory and Quad-tree Spatial Index which are then though the accuracy is not very high [19]. The data between
put together to be filtered in a visual graphic form. the client and the server will display the mapping information
that is centered on the GPS. The server will tell the location
Then a spatial visualization is performed which provides an data [20]. All locations will be identified and then will receive
overview of all viable routes from the filtered trajectory. Then a unique location code with geographic coordinates which
computation of the existing visual data can be analyzed in direct to the SQL database containing the information [13].
order to obtain a visual of the route that the people might take All locations will be identified and then will receive a unique
to escape [13]. We can use GPS in addition to helping the location code with geographic coordinates which direct to the
continuation process of the E-TLE, it can be used to analyze SQL database containing the information [17]. The use of a
and identify traffic on a road. through this, solutions can be road network database is used to improve the accuracy of
made to facilitate traffic jam does not occur somewhere. By traffic management and road safety analysis [2].
taking a sample dataset that is on the GPS of a vehicle, and

237 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Various studies have stated that the implementation of command, and use the naming convention prescribed by your
Electronic Traffic Law Enforcement (ELTE) is very helpful conference for the name of your paper. In this newly created
in reducing traffic violations. Aggressive driver behavior is file, highlight all of the contents and import your prepared text
one of the most factors in traffic accidents [21]. As an file. You are now ready to style your paper; use the scroll
experiment conducted in Hong Kong to determine the down window on the left of the MS Word Formatting toolbar.
effectiveness of camera-based punishment in limiting A. RQ1: Can Big Data Analytics and GPS help the work of
speeding offenses, it shows that there is a percentage the E-TLE system?
reduction effect [22]. Evaluation of the impact of using a. Big Data Analytics
cameras on traffic violations in Cali, Colombia also showed Big Data requires a revolutionary step forward from
the same results. The existence of camera enforcement is traditional data analysis, characterized by its three main
shown to significantly reduce all road accidents [23]. Red components: variety, velocity, and volume. Where
Light Camera (RLC) which can detect risky rider behavior variety contains unstructured, semi-structured, and
such as speeding and sudden lane changes [5]. The structured data. As for Velocity, it contains streamed,
application of RLCs installed on traffic signs is due to the real-time, near real-time, and batch data. And lastly, for
many accidents that occur due to Red Light Running and this the volume it is a data size such as Terabyte, Petabyte,
application is very efficient in reducing these accidents [24]. Exabyte, and Zettabyte. From this, we can know that Big
III. METHODOLOGY Data is indispensable in today's technological
We use the PRISMA checklist in the methodology to help us developments. Every day the data is increasing by a very
evaluate the papers we use to research our topic. Using large amount. IBM records, at least every day, 2.5
PRISMA, we can select literature that is relevant to our topic exabytes of data were created over the last 2 years [17].
and help us develop it. Various incoming data such as the number of vehicles
passing by, the number of accidents, and violations that
occur every day are certainly a concern for the
government to support smooth traffic flow. Certainly, if
we only rely on the police to manage traffic, not all
incidents will be handled at once. We can see from the
graph below, which shows a comparison of Indonesia's
population in 2020 compared to 2050 (prediction) based
on age.

Figure 2. Total population of Indonesia 2020 vs 2035 (source:


World Population Prospect 2019)

The figures above are only the total population in


Figure 1. Prisma Flowchart Indonesia, not counting the growth in the number of
vehicles in Indonesia. When compared with the width of
From here we can develop a research question to assist us in the existing roads in Indonesia, it will feel very narrow,
conducting a literature review, which consists of: and it can be predicted that every day there will be lots
 RQ1: Can Big Data Analytics and GPS help the work of of violations. Of course, if you only rely on police
the E-TLE system? personnel to monitor existing traffic, you will be very
 RQ2: What about the advantages and disadvantages of overwhelmed, especially in big cities, where there are so
E-TLE? many roads. So, we need an innovation where there is a
 RQ3: Can E-TLE be implemented well in Indonesian technology that can monitor the flow of traffic at any
cities? time. And this is where E-TLE (electronic traffic law
enforcement) was created. But to develop E-TLE in
Here we use research articles in English including journals Indonesia that can capture images of hundreds of people
and literature reviews which have dominant in Utilization of at once can identify the identity of the vehicle, requires
Big Data and GPS To Help E-TLE System In The Cities Of Big Data Analysis behind it all and also the
Indonesia. implementation of other technologies that can help the
work of the E-TLE system.
IV. RESULT
After the text edit has been completed, the paper is ready for b. E-TLE System
the template. Duplicate the template file by using the Save As

238 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Before we ask about what E-TLE is, we must know how if a violation is caught, the sensor will capture it and will
the E-TLE system works. Actually, how it works is not immediately send a signal to the CPU in the camera, and
simple, but quite complex. From the first, the E-TLE from there the sensor will immediately analyze the
device will automatically catch traffic violations that are vehicle plate. After being analyzed, the data in the form
monitored directly from the local Polda RTMC, and from of streaming will be immediately sent to the Back Office
there, media evidence of the violation will be sent to the to be analyzed in the database whether the license plate of
E-TLE Back Office. Then the officer will identify the the vehicle is registered. By utilizing Big Data, it will be
vehicle data using Electronic Registration and easy to find one license plate from millions of other
identification as the source of data from the vehicle. vehicles
Furthermore, after finding the address of the owner of the
vehicle, the officer will send a confirmation letter to that
address, which is the first step in taking action. Then the
vehicle owner is obliged to confirm the ownership of the
vehicle and the driver of the vehicle at the time of the
violation, if the vehicle is no longer a vehicle belonging
to the person who received the confirmation letter, then
that person must immediately confirm this within only 8
days which can be through the website or directly visit
Sub Directorate office. Once confirmed, the officer will
issue a ticket with a payment method via bank for each
verified violation. From here we can see the importance
of Big Data because as we know for sure everyday
vehicles in Indonesia continue to grow and can also
change names, which will occur with very large data
updates. Here we can see the importance of using Big
Data, because of the amount and variety of incoming
data. And what about the camera system so that it can
capture and record traffic violations? Here we will
discuss it. Figure 4. Placement of E-TLE camera at traffic light
• RFID Reader
By using this system, the camera will detect and identify c. GPS in E-TLE
various vehicles that are indicated to have committed For example, on a sunny day, some robbers try to escape
traffic violations. RFID (Radio Frequency Identification) by car through the streets in urban areas. Although the
is a smart AI that can read vehicle plates quickly and police may catch the perpetrator by chasing him, it will
using this, of course, the information received will be be difficult because of the heavy traffic, and it is also
directly sent to the head office for immediate action. In possible that the robber has planned his escape carefully
RFID there is an electromagnetic field to transfer data so that he can trick the police. Well, in our opinion, the
quickly. use of E-TLE will certainly help the police in catching
criminals. Utilizing a route calculation algorithm
(Shortest Path Faster Algorithm), utilizing Big Data in
analyzing it, the police will be helped to determine the
fastest route in ambushing the robber. In addition, by
utilizing GPS, which can see road segment activity in
real-time and streaming, the police can use it to track
criminals using cell phones, because nowadays, most
mobile phones already use GPS. By using a visual
analysis whose data comes from GPS, we can easily
visualize the route of the perpetrator, by using a
graphical visual query. By using this, the data from the
vehicle's GPS and data from the existing paths are
processed using a Clean Trajectory and Quad-tree
Spatial Index which are then put together to be filtered in
the form of visual graphics. Then a spatial visualization
is carried out which provides an overview of all routes
that are feasible to take from the filtered path. Then
Figure 3. RFID Reader Detection
computing the existing visual data to obtain the routes
that are likely to be taken by the perpetrators to escape.
• CPU with camera censor
The camera is equipped with a CPU that has a camera
sensor to detect vehicle plates from offenders on the road.
Assisted by a camera lens that can capture a full road
segment, the camera will be analyzed in Real-Time. And

239 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

practice [25]. As well as advantageous to decrease traffic


violations caused by red-light running, illegal lane
changing, speeding, and illegal parking using traffic
camera enforcement such as running camera and
speeding camera that developed from sensing
technology and computer vision techniques that could
detect high-risk driving behavior as mentioned above.
The based conclusion from previous literature that
collects the sample data from Kunshan Police
Department about detailed information traffic between
2017 and 2018 that traffic camera enforcement capable
to decrease regional crash risk and the speeding camera
Figure 5. Implementation of GPS Trajectory
is very effective to decrease injury crash risk and
effective to decrease Property Damage Only (PDO)
By filtering using GPS based on the trajectory algorithm, we
crash risk [5].
will get a grid depiction to extract multiple routes
automatically. This is where initially the trajectory will create
b. Disadvantages
a uniform grid to be covered over the boundary box. Then it
The previous paragraph describes and explains the
will be separated into cells so that the trajectories can be
advantages of using Big Data on the E-TLE system.
segmented by the cells and each of them can be denoted by
Using big data on the E-TLE system makes several
the sequence of passing cells.
problems that give a huge impact on the system. On the
Each cell collects the segments from the trajectory, which will
intersect with it. Next, for each cell that contains segments, the GPS-based surveys, we can tell that the survey will vary
system will derive the average direction from trajectory significantly depending on the sample size of the data,
segments inside it. So later the calculation of the distance the accuracy of the inference algorithm, and the desired
between the police (pursuit of the perpetrator) and the complexity of the model if the data is big enough the
perpetrators of the crime will be calculated horizontally and quality of inference may not matter. But there is a lot of
vertically. Then, the two cells will be merged into one, which cases if the volume of data increased the data potentially
will result in several routes that will be distinguished in their be neutralized by losses in quality. The algorithm will
coloring. After that, using SPFA with the help of Big Data not ever guarantee the complete accuracy of the data. All
Analysis, the fastest route will quickly be found to of the data that came from the GPS-based survey still be
immediately ambush the perpetrators of these crimes which used for travel demand analysis so it will need either to
have been tracked by GPS and monitored by E-TLE. be incredibly big and the data can be supplemented and
be treated as a reliable source of ground. If the E-TLE
system using traditional storage the government can cost
a lot of money to store the big data and lots of time to
analyze for a longer duration to leverage the benefits.

C. RQ3: Can E-TLE be implemented well in Indonesian


cities?
Even with the sophistication that exists, it is still difficult
to implement everywhere in Indonesia. The various
obstacles are:
1. Infrastructure in Indonesia is not evenly distributed
2. Human Resources in each region are not supportive
enough to operate E-TLE which is still relatively
new
3. The rise of theft of electronic objects that can be sold
Figure 6. The process of creating a visual route
In addition, many uses can be made using GPS in the E-TLE. There are many other obstacles, but what is usually
By applying GPS, the police can 'record' routes that are encountered are these 3 things. Moreover, to implement
crowded with motorists, so that the police can plan something E-TLE in all cities in Indonesia, the city must be able to
to facilitate traffic on the road can be handled properly. apply the "smart city" principle. One of the important
things in implementing smart cities is the integration of
B. RQ2: What about the advantages and disadvantages of E- techniques and strategic policies. Let's just say that a
TLE? smart city is a human being, then the body's organs are the
a. Advantage internet, IoT, mobile communication network
As we all know, electronic traffic law enforcement (E- technologies that can support systems such as GPS.
TLE) has been used and implemented in so many
countries around the world. The technology helps and
advantageous for law enforcement, criminal
investigation, and prosecution and transforms police

240 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

However, it must be balanced with technology that is capable


enough to support the work of the system itself. With the
existence of Big Data Analysis and GPS that can help E-TLE
work, it is hoped that this system can develop well and be
accepted by many people in the development of industry 4.0
in Indonesia.

The goal of this study is we want to show that using Big Data
together with GPS can help the work of E-TLE in Indonesian
cities. Not surprisingly, the selection of Big Data and GPS is
not arbitrary, because there is a lot of data and updates are
carried out every day, the performance of E-TLE will be very
heavy without its implementation. In addition, GPS can assist
the police in catching criminals by calculating routes using
sensor cameras from E-TLE to monitor and predict the
movements of the perpetrators. And remember, all of that
cannot be applied if the cities in Indonesia do not yet have
facilities that can support it all. However, with the
development, this method will be implemented well.
Figure 7. System management of the smart city concept
VI. CONCLUSION
From the picture above, the E-TLE camera can be an eye in The development of technology are very rapid, making the
the development of a smart city to monitor existing traffic, amount of data circulating is very large which makes the
where data can be streamed directly using IoT, which will implementation of Big Data Analytic very useful. By using
then be processed by the Data Center for analysis. It aims to Big Data, we can predict something with existing data. Big
conclude everything that happens on roads in urban areas. By Data will certainly be very helpful for implementing the E-
using all of these, the development of E-TLE in every city in TLE system, because vehicle data in Indonesia is updated in
Indonesia will run smoothly, because every road segment can large numbers every day. It cannot be monitored by humans
be monitored in real-time and streamed, without fear of because of its complexity. Therefore, it can certainly help the
significant disruption. performance of E-TLE and make it more efficient with the
use of Big Data.
With Big Data Analytics, the government can also monitor
traffic developments in the city. For example, to find the In addition, GPS is also very good if applied in E-TLE as a
location of roads where traffic jams and violations usually radar to catch criminals who try to escape. With the
occur. Using E-TLE, the incoming data can be directly implementation as mentioned above, the police will be helped
processed in Big Data Analytics, so it will get data and faster to catch criminals who try to escape. However, all
conclusions where there are usually traffic jams and traffic of this cannot be implemented properly if Indonesian cities
violations. All of this can help the police to follow up and add do not yet have facilities that support it all. In addition to
personnel to monitor the roads that are followed up. facilities, humans must also be able to operate and maintain
them properly, so that the system's performance can run
Obviously, with the development of the E-TLE concept, the properly. It seems impossible, but we are sure that in the next
development of smart cities in every Indonesian city will few years, this kind of implementation can be carried out well
soon be realized, but not all of these things are without in Indonesian cities.
drawbacks. In every existing technology, there must always
be a risk that haunts its development, for example, the risk of
data leakage. Yesterday, we were shocked by the leaking of REFERENCES
hundreds of millions of population data in Indonesia. [1] S. P. Mohanty, U. Choppali, and E. Kougianos,
Certainly, we realize that this is still a complicated problem “Everything you wanted to know about smart cities,” IEEE
that is still being faced by IT personnel in Indonesia. If these Consumer Electronics Magazine, vol. 5, no. 3, pp. 60–70,
things are implemented, they must also be developed in terms Jul. 2016, doi: 10.1109/MCE.2016.2556879.
of security, because if they are not safe, data leakage will be [2] J. X. Cui, F. Liu, J. Hu, D. Janssens, G. Wets, and M. Cools,
very detrimental to all Indonesian residents. Then there are “Identifying mismatch between urban travel demand and
transport network services using GPS data: A case study in
also problems with existing technology because in Indonesia the fast growing Chinese city of Harbin,” Neurocomputing,
there is still very little equipment that can support this vol. 181, pp. 4–18, Mar. 2016, doi:
development quickly. And lastly, Indonesia must also think 10.1016/j.neucom.2015.08.100.
about its human resources, because it is useless for us to have [3] N. Gorelick, M. Hancher, M. Dixon, S. Ilyushchenko, D.
sophisticated technology, but it is not matched by reliable Thau, and R. Moore, “Google Earth Engine: Planetary-
humans in using it. scale geospatial analysis for everyone,” Remote Sensing of
Environment, vol. 202, pp. 18–27, Dec. 2017, doi:
V. DISCUSSION 10.1016/j.rse.2017.06.031.
The development of E-TLE in Indonesia is still relatively [4] R. D. Das and S. Winter, “Detecting Urban transport modes
new, but we can already see that its implementation is good. using a hybrid knowledge driven framework from GPS

241 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

trajectory,” ISPRS International Journal of Geo- [15] Spatial Database for GPS Wildlife Tracking Data. Springer
Information, vol. 5, no. 11, 2016, doi: International Publishing, 2014. doi: 10.1007/978-3-319-
10.3390/ijgi5110207. 03743-1.
[5] C. Wang, C. Xu, and P. Fan, “Effects of traffic enforcement [16] A. Sharif, J. Li, M. Khalil, R. Kumar, M. I. Sharif, and A.
cameras on macro-level traffic safety: A spatial modeling Sharif, “Internet of things - Smart traffic management
analysis considering interactions with roadway and Land system for smart cities using big data analytics,” in 2016
use characteristics,” Accident Analysis and Prevention, vol. 13th International Computer Conference on Wavelet
144, Sep. 2020, doi: 10.1016/j.aap.2020.105659. Active Media Technology and Information Processing,
[6] J. Zeng, M. Li, and Y. Cai, “A Tracking System Supporting ICCWAMTIP 2017, Oct. 2017, vol. 2018-February, pp.
Large-Scale Users Based on GPS and G-Sensor,” 281–284. doi: 10.1109/ICCWAMTIP.2017.8301496.
International Journal of Distributed Sensor Networks, vol. [17] “An Efficient Algorithm for Detecting Traffic Congestion
2015, 2015, doi: 10.1155/2015/862184. and a Framework for Smart Traffic Control System Thesis
[7] L. Shen and P. R. Stopher, “Review of GPS Travel Survey Report.”
and GPS Data-Processing Methods,” Transport Reviews, [18] R. A. Ariyaluran Habeeb, F. Nasaruddin, A. Gani, I. A.
vol. 34, no. 3. Routledge, pp. 316–334, 2014. doi: Targio Hashem, E. Ahmed, and M. Imran, “Real-time big
10.1080/01441647.2014.903530. data processing for anomaly detection: A Survey,”
[8] Y. Lian, G. Zhang, J. Lee, and H. Huang, “Review on big International Journal of Information Management, vol. 45.
data applications in safety research of intelligent Elsevier Ltd, pp. 289–307, Apr. 01, 2019. doi:
transportation systems and connected/automated vehicles,” 10.1016/j.ijinfomgt.2018.08.006.
Accident Analysis and Prevention, vol. 146, Oct. 2020, doi: [19] M. Lu, C. Lai, T. Ye, J. Liang, and X. Yuan, “IEEE
10.1016/j.aap.2020.105711. TRANSACTIONS ON BIG DATA 1 Visual Analysis of
[9] A. Neilson, Indratmo, B. Daniel, and S. Tjandra, Multiple Route Choices based on General GPS
“Systematic Review of the Literature on Big Data in the Trajectories.”
Transportation Domain: Concepts and Applications,” Big [20] G. Romanillos, M. Zaltz Austwick, D. Ettema, and J. de
Data Research, vol. 17. Elsevier Inc., pp. 35–44, Sep. 01, Kruijf, “Big Data and Cycling,” Transport Reviews, vol.
2019. doi: 10.1016/j.bdr.2019.03.001. 36, no. 1, pp. 114–133, Jan. 2016, doi:
[10] V. A. Paz-Soldan et al., “Strengths and Weaknesses of 10.1080/01441647.2015.1084067.
Global Positioning System (GPS) Data-Loggers and Semi- [21] J. Zhang, W. Meng, Q. Liu, H. Jiang, Y. Feng, and G.
structured Interviews for Capturing Fine-scale Human Wang, “Efficient vehicles path planning algorithm based
Mobility: Findings from Iquitos, Peru,” PLoS Neglected on taxi GPS big data,” Optik, vol. 127, no. 5, pp. 2579–
Tropical Diseases, vol. 8, no. 6, 2014, doi: 2585, Mar. 2016, doi: 10.1016/j.ijleo.2015.12.006.
10.1371/journal.pntd.0002888. [22] S. S. Pantangi, G. Fountas, P. C. Anastasopoulos, J.
[11] M. A. Qadeer, A. Chandra, and S. Jain, “Design and Pierowicz, K. Majka, and A. Blatt, “Do High Visibility
Implementation of Location Awareness and Sharing Enforcement programs affect aggressive driving behavior?
System using GPS and 3G/GPRS Femtocells and Their An empirical analysis using Naturalistic Driving Study
Integration with IMS View project Bandwidth Limiter data,” Accident Analysis and Prevention, vol. 138, Apr.
View project Design and Implementation of Location 2020, doi: 10.1016/j.aap.2019.105361.
Awareness and Sharing System using GPS and 3G/GPRS,” [23] T. Chen, N. N. Sze, S. Saxena, A. R. Pinjari, C. R. Bhat,
2012. [Online]. Available: and L. Bai, “Evaluation of penalty and enforcement
https://www.researchgate.net/publication/237040695 strategies to combat speeding offences among professional
[12] A. Vij and K. Shankari, “When is big data big enough? drivers: A Hong Kong stated preference experiment,”
Implications of using GPS-based surveys for travel demand Accident Analysis and Prevention, vol. 135, Feb. 2020, doi:
analysis,” Transportation Research Part C: Emerging 10.1016/j.aap.2019.105366.
Technologies, vol. 56, pp. 446–462, 2015, doi: [24] D. M. Martínez-Ruíz et al., “Impact evaluation of camera
10.1016/j.trc.2015.04.025. enforcement for traffic violations in Cali, Colombia, 2008–
[13] R. Zhang, “A transportation security system applying RFID 2014,” Accident Analysis and Prevention, vol. 125, pp.
and GPS,” Journal of Industrial Engineering and 267–274, Apr. 2019, doi: 10.1016/j.aap.2019.02.002.
Management, vol. 6, no. 1 LISS 2012, pp. 163–174, 2013, [25] K. Shaaban and A. Pande, “Evaluation of red-light camera
doi: 10.3926/jiem.668. enforcement using traffic violations,” Journal of Traffic
[14] X. Zhao, K. Carling, J. Håkansson, and H. Fleyeh, and Transportation Engineering (English Edition), vol. 5,
“Reliability of GPS based traffic data: an experimental no. 1, pp. 66–72, Feb. 2018, doi:
evaluation Smart Parking View project EASYLIFE: 10.1016/j.jtte.2017.04.005.
Efficient Adjustable SupplY in Local chaIns of Food for [26] B. Custers and B. Vergouw, “Promising policing
consumErs View project Working papers in transport, technologies: Experiences, obstacles and police needs
tourism, information technology and microdata analysis regarding law enforcement technologies,” Computer Law
Reliability of GPS based traffic data: an experimental and Security Review, vol. 31, no. 4, pp. 518–526, Aug.
evaluation Reliability of GPS based traffic data: an 2015, doi: 10.1016/j.clsr.2015.05.005.
experimental evaluation,” 2014, doi:
10.13140/2.1.2960.2242.

242 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)
Expert System to Predict Acute Inflammation of
Urinary Bladder and Nephritis Using Naïve Bayes
Method
1st Ria Arafiyah 2nd Diyah Anggraeny 3rd Rachel Haryawan
Computer Science Department Computer Science Department Computer Science Department
State University of Jakarta State University of Jakarta State University of Jakarta
Jakarta,Indonesia Jakarta,Indonesia Jakarta,Indonesia
[email protected] [email protected] [email protected]
nj.ac.id nj.ac.id
4th Zakiyah Hamidah
Computer Science Department
State University of Jakarta
Jakarta,Indonesia
[email protected]
nj.ac.id

Abstract— Bad life habits such as not kidneys become inflamed. Nephritis can be caused
consuming enough water and often delaying the by several factors such as immune system disease,
urge to urinate are the causes of bladder-related history of cancer, and abscesses that arise in other
diseases. In this study, a solution is given to parts of the body and spread to the kidneys through
overcome this problem by developing an expert the bloodstream.
system that can predict acute inflammation of According to Indonesia's Health Ministry
nephritis and urine bladder disease using one of estimates, 90 - 100 cases per 100,000 people a year
the classification algorithm that is often used or around 180,000 new cases every year is the
and gets a lot of attention from researchers in number of bladder infected or inflamed patients in
predicting problems, namely Naïve Bayes. Data Indonesia [11]. This disease will easily occur when
utilized include a list of symptoms that include the immune system begins to decrease and decline.
patient fever, nausea, lumbar discomfort, At first this disease only attacks the reproductive
urinary pushing (continuous urination, urethra organs, but can spread to other body organs
burning and micturition pain, itching and accompanied by bacteria, fungi and germs that
urethra fluid swelling). The results of the cause infection (Nugroho, 2010). Nephritis, if not
diagnostic analysis consist of a confusion matrix treated, can cause the kidneys to enter a critical
and system accuracy values. The accuracy value period and stop working (kidney failure). Kidney
failure is the 2nd catastrophic disease that costs the
of predicting acute urine bladder disease is 83%
most health costs after heart disease [1].
and acute nephritis of renal pelvis origin is 96%.
Expert systems are becoming increasingly useful
Keywords— Expert System, Naïve Bayes, in solving problems like predicting inflammation of
Python, Scikit-learn, inflammation of urinary the urinary bladder and nephritis as science and
bladder, nephritis information technology related to Artificial
Intelligence (AI) advance.
INTRODUCTION
I.
An expert system is a computer software which
There are two types of bladder-related diseases
automates tasks that human experts typically
that are the focus of this study, namely
undertake. An expert in Knowledge who examines
inflammation of urinary bladder and nephritis.
how human experts decide and interprets rules into
Inflammation of urinary bladder is an infectious
computers, is required to design an expert system
disease that attacks the bladder. This disease is
[2].
caused by several bad habits such as consuming less
water and often delaying the urge to urinate. This study aims to model an expert system that
Inflammation of the urinary bladder can also be uses Naïve Bayes to predict inflammation of
induced by a chronic or subclinical infection, urinary bladder disease and nephritis because both
autoimmune, or genetic predisposition, all of which of these diseases are deadly diseases. Naïve Bayes
can trigger an inflammatory response [10]. is a classification method based on the theorem of
Meanwhile, Nephritis is a disease in which the Bayes which assumes independence strongly and

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

243 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

naively.This algorithm has several advantages P(X): Evidence.


including, relatively simple to understand and
build, faster in predicting classes than other The Naïve Bayes theorem or formula is a
classification algorithms and easy to train using combination of Bayes' theorem and the assumption
small datasets [8]. of independence (naive), so that it becomes a new
Data collecting for patients who experience theorem called Naïve Bayes or Gaussian Density.
symptoms of urinary bladder irritation and acute The following is the formula for the Naïve Bayes

1
nephritis is employed in this study. 80% of the data theorem or Gaussian Density:
| =
√2
are used to train the model, and 20% are utilized to
test the trained model for the application of the
Naïve Bayes algorithm. The result is a confusion
matrix and the accuracy value for bladder
With the Mean formula (µ), namely:
inflammation is 83% and for Nephritis is 96%.

1
=
II. METHOD
A. Naïve Bayes
Naïve Bayes is a form of an algorithm for the
classification of the data in which the algorithm is And Standard Deviation formula ( ):
able to construct a series of probabilities from a

1
certain dataset. This method was first presented by

= ! −
−1
British scientist Thomas Bayes, and it is known as
Bayes' theorem. It is used to forecast future
possibilities based on past experience. Statistical
classification methods are used to predict the
probability of membership in a class using the Explanation:
Naïve Bayes algorithm. Naïve Bayes is based on P = Probability
the theorem of Bayes, with comparable decision- X = Attribute
making and neural networks categorization skills. C = Class
The Bayes theorem approach is then supplemented n = The amount of data
with naivety, which is supposed to be with x = Iteration
conditions between independent qualities. Bayes' µ = Mean
theorem, taken from the name of a mathematician = Standard Deviation or,
who was also a British Prebysterian minister, = Varians
namely Thomas Bayes (1702-1761) and (Bramer,
2007) [3]. The advantage of using Naïve Bayes is that it is
simple to use and useful for big amounts of data.
| .
According to Bayes rule is defined as:
| =
As a result, Naïve Bayes is recognized as one of the
finest classification methods [4].

| .
In another form it can be written as:
| =
| . + |~ . ~

Explanation:
X and Y: Events
P(Y|X): Likelihood of event Y, given event X is
true.
P(Y): Prior likelihood of Y.
P(X|Y): Likelihood dependent on hypothesis
condition.

244 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

In this acute inflammation dataset, there are 120


data with 6 attributes and 2 classes (decision). The
attributes consist of { “Temperature of patient”,
“Occurrence of nausea”, “Lumbar pain”, “Urine
pushing (continuous need for urination)”,
“Micturition pains”, “Burning of urethra, itch,
swelling of urethra outlet” }. Meanwhile, the 2
class labels (decision) consist of {“Inflammation of
urinary bladder”} and {“Nephritis of renal pelvis
origin”}.

TABLE I
Acute Inflammation Dataset Information Table

No. Attributes Value


[32°C -
1 Temperature of patient
42°C]
2 Occurrence of nausea No, Yes
3 Lumbar pain No, Yes
Fig. 1. Flowchart of Naïve Bayes Algorithm 4 Urine pushing No, Yes
Implementation
5 Micturition pains No, Yes
6 Burning of urethra No, Yes
Figure 1 is a flowchart diagram of how to use or Inflammation of urinary
7 No, Yes
implement The Naïve Bayes Algorithm in this bladder
study. Starting with entering symptom data input, Nephritis of renal pelvis
8 No, Yes
then input disease data, and after that the program origin
will process the search for naive probabilities. After
C. Dataset Transform
completion of the diagnosis, results will be
obtained. From the dataset that has been obtained, the
dataset is processed or processed using the method
B. Dataset (rough set). The Rough Set method itself is a
The dataset to be tested is the acute mathematical technique developed by Pawlack in
inflammation dataset, which was obtained from the 1982 [7]. This technique is an efficient technique
web page for KDD (Knowledge Discovery Database)
{https://archive.ics.uci.edu/ml/datasets/Acute+Infl processes and Data Mining. The Rough Set method
ammations} [5]. has been used in a variety of applications, including
A medical expert creates the dataset as a engineering design, image processing, decision
collection of data to be tested by an expert system, analysis, and medical.
it makes a diagnosis two disease of the urinary The following is a table of dataset attributes that
system that is suspected. have been processed using the (rough set) method:
The main goal of this dataset is to create an
expert system algorithm that will perform a TABLE II
suspected diagnosis of two urinary system Acute Inflammation Dataset Attribute
diseases. It could be a diagnosis of acute bladder Transformation Table
inflammation and acute inflammation nephritis, for
example. To understand more about these two No. Attributes Code Value
diseases, we can consider the definitions of the two 1 Temperature of patient c1 [1 – 44]
diseases given by medical personnel. This data was 2 Occurrence of nausea c2 1,2
gathered by a medical expert as part of a test of an 3 Lumbar pain c3 1,2
expert system that will perform a suspected 4 Urine pushing c4 1,2
diagnosis two disease of the urinary system. Rough 5 Micturition pains c5 1,2
Sets Theory is the basis for detection[6].

245 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

6 Burning of urethra c6 1,2 IF(c1=1-10)(c2=1)(c3=2)(c4=1)(c5=1)(c6=1)


THEN(d1=1)(d2=0)
In addition, the classes in the dataset are also R4:
processed using a (rough set) method. The IF(c1=30-34)(c2=1)(c3=2)(c4=2)(c5=1)(c6=2)
following is a dataset class that has been processed THEN(d1=1)(d2=2)
so that it becomes: R5:
IF(c1=21-29)(c2=2)(c3=2)(c4=1)(c5=2)(c6=1)
TABLE III THEN(d1=1)(d2=2)
Acute Inflammation Dataset Class Transformation R6:
Table IF(c1=30-44)(c2=1)(c3=2)(c4=2)(c5=1)(c6=2)
THEN(d1=1)(d2=2)
No. Class Code Value R7:
1 Inflammation of urinary IF(c1=11-20)(c2=1)(c3=1)(c4=2)(c5=2)(c6=1)
d1 1,2 THEN(d1=2)(d2=1)
bladder
2 Nephritis of renal pelvis R8:
d2 1,2 IF(c1=11-20)(c2=1)(c3=1)(c4=2)(c5=2)(c6=2)
origin
THEN(d1=2)(d2=1)
Explanation for Table II and Table III: R9:
1-10 = Normal temperature (35°C - 37°C) IF(c1=11-20)(c2=1)(c3=1)(c4=2)(c5=1)(c6=1)
11-20 = Subfebrile state (37°C - 38°C) THEN(d1=2)(d2=1)
21-29 = Febrile state (38°C - 40°C) R10:
30-44 = High fever (above 40°C) IF(c1=1-10)(c2=1)(c3=1)(c4=2)(c5=2)(c6=2)
1 = No THEN(d1=2)(d2=1)
2 = Yes R11:
IF(c1=30-44)(c2=2)(c3=2)(c4=2)(c5=2)(c6=2)
Acute urine bladder inflammation THEN(d1=2)(d2=2)
characterized by acute discomfort, urination in the R12:
abdomen due to constant urination, pain during IF(c1=30-44)(c2=2)(c3=2)(c4=2)(c5=2)(c6=1)
urination and, in some instances, urinary retention. THEN(d1=2)(d2=2)
Body temperature is elevated, but most often not
above 38°C. If interpreted from the rules above, then R1
Meanwhile, acute nephritis of renal pelvis (Rule 1) has the following meaning:
origin occurs more frequently in women than in IF:
men. It starts with a high fever that reaches and Temperature = 37°C - 38°C
sometimes exceeds 40°C. The fever is Nausea = NO
accompanied by chills and low back pain on one or Lumbar pain = YES
both sides, which can sometimes be very strong. Urine pushing = NO
Not infrequently nausea and vomiting and pain Micturition pain = NO
spreads throughout the abdomen. Burning of urethra = NO
Therefore, in determining the suitability for THEN : the conclusion
the criteria for the type of acute inflammation of the It Is NOT Acute inflammatory bladder disease and
urinary bladder disease or acute inflammation of It Is NOT Nephritis of renal pelvis origin
nephritis , a set of rules was first made based on III.ANALYSIS AND RESULT
expert system rules, which are as follows:
R1: The method used in the implementation process
IF(c1=11-20)(c2=1)(c3=2)(c4=1)(c5=1)(c6=1) of the Expert System to Predict Acute Inflammation
of Urinary Bladder and Nephritis Using Naïve
THEN(d1=1)(d2=1)
Bayes Method. Naïve Bayes method is
R2: implemented with the Scikit-learn library in Python
IF(c1=21-29)(c2=1)(c3=1)(c4=1)(c5=1)(c6=1) 3.7.
THEN(d1=1)(d2=1)
R3:

246 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

In the dataset used, there are no missing values, confusion matrix, the True Positive is 9, False
as shown in the Figure 2. Positive is 4, False Negative is 0, and True Negative
is 11. For the report of classification, the precision
are 1 for result no and 0.73 for result yes, the recall
are 0.69 for result no and 1 for result yes, the f1
score are 0.82 for result no and 0.85 for result yes,
and the accuracy is 0.83 or 83%.

Fig. 2. Check Missing Values


Then, for data transformation using label
encoder. The values of Temperature are
transformed to nominal from 1 until 44. For other
symptomps, the values “no” are transformed to 1
and the values “yes” are transformed to 2. The result Fig. 4. Class 1 Confusion Matrix & Classification
is shown in the Figure 3. Report
The training data used is 80% or 96 from 120 For the second class, the confusion matrix and
data instances and the testing data used is 20% or the classification report are shown in Figure 5. For
24 from 120 data instances (Figure 3). The random the confusion matrix, the True Positive is 17, False
state used is 0. Positive is 0, False Negative is 1, and True Negative
is 6. For the report of classification, the precision
are 0.94 for result no and 1 for result yes, the recall
are 1 for result no and 0.86 for result yes, the f1
score are 0.97 for result no and 0.92 for result yes,
and the accuracy is 0.96 or 96%.

Fig. 3. Data Transform


Both class have the confusion matrix with its
True Positive, False Positive, False Negative and Fig. 5. Class 2 Confusion Matrix & Classification
True Negative. The term "true positive" refers to a Report
prediction that is both positive and correct. When a
prediction is positive but it is incorrect, it is called a IV.CONCLUSION
false positive. When a prediction is negative but The expert system to predict acute inflammation
incorrect, it is called a False Negative. And when of urinary bladder and nephritis using Naïve Bayes
the prediction is negative and it is correct, it is method can diagnose acute inflammation of urinary
called a True Negative. bladder with 83% accuracy and nephritis of renal
pelvis origin with 96% accuracy. This results are
Both class also have the classification report
with accuracy, precision, recall, f1 score, etc. quite good because the percentage of accuracy is
more than 80% for the first class and more than
Precision is the total cases that predicted true
90% for the second class
positive in the actual data. The proportion of the
cases that predicted positive compares to the true
positive is called recall or sensivity. F1 score is the
average ratio of precision and recall, the best value REFERENCES
is 1 and the worst value is 0 [9]. Accuracy is the [1] Z.Rozikin.”Sistem Pakar Diagnosis Penyakit
accurate percentage of the correct predicted data by Ginjal Dengan Menggunakan Metode
Dempster Shafer”.Ph.D.
system. dissertation,Informatics Engineering,Sekolah
Tinggi Teknologi Pelita
For the first class, the confusion matrix and the Bangsa,Cikarang,2018.
classification report are shown in Figure 4. For the

247 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

[2] Abu-Naser, Samy S., and Mohammed Z.


Shaath. (2016)."Expert System Urination
Problems Diagnosis". World Wide Journal of
Multidisciplinary Research and Development,
vol.2,pp.9-19.Available:
https://www.researchgate.net/publication/3036
76962_Expert_system_urination_problems_di
agnosis
[3] M.Bramer.Principles of Data Mining,1st ed.
London:Springer,2007
[4] N.A. Zaidi, J. Cerquides, M.J. Carman, G.I.
Webb. (July, 2013). “Alleviating Naive Bayes
Attribute Independence Assumption by
Attribute Weighting”. Journal of Machine
Learning Research[Online]. vol. 14, 1947–
1988. Available:
https://jmlr.org/papers/volume14/zaidi13a/
[5] J.Czerniak.(2019).Acute Inflammation Data
Set [Online].Available:
https://archive.ics.uci.edu/ml/datasets/Acute+I
nflammations }
[6] J. Czerniak, H. Zarzycki, “Application of rough
sets in the presumptive diagnosis of urinary
system diseases,” in 9th International
Conference, ACS’ 2002 Międzyzdroje, Poland,
2003, pp.41-51.
[7] Z.Pawlack.”Rough Set International Journal
Of Information And Computer Sciences,”
International Journal of Computer &
Information Sciences,vol.11,pp.341-356,1982.
[8] P.Kaviani,S.Dhotre,Sunita.(2017).”Short
Survey on Naïve Bayes
Algorithm”,vol.11,number 4,pp.607-
611.Available:
https://www.researchgate.net/publication/3239
46641_Short_Survey_on_Naive_Bayes_Algor
ithm
[9] R. Arafiyah, et al., “Classification of Dengue
Haemorrhagic Fever (DHF) using SVM, naive
bayes and random forest,” in 3rd Annual
Applied Science and Engineering Conference,
Bandung, 2018.
[10] S.Grover, et al.(2011 Febuary)."Role of
Inflammation in Bladder Function and
Interstitial Cystitis".Therapeutic advances in
urology.vol.3,number 1,pp.19-33.Available:
https://www.researchgate.net/publication/5152
1136_Role_of_inflammation_in_bladder_func
tion_and_interstitial_cystitis
[11] D.Setiawati,D.Kurniawan,Riskawati,S.Tarigan
.(2015 March).”Gambaran Tingkat
Pengetahuan Mengenai Penyakit Infeksi
Saluran Kemih pada Mahasiswa/I Semester I
dan III di Akademi Keperawatan Husada Karya
Jaya”.vol.1,number 1,pp.33-36.Available:
https://garuda.ristekbrin.go.id/documents/detai
l/777005

248 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

The Search for the Best Real-Time Face Recognition


Method for Finding Potential COVID Patients
Kevin Eugene Reginald Patrick Samuel Wijaya
Computer Science Department Computer Science Department Computer Science Department
School of Computer Science, Bina School of Computer Science, Bina School of Computer Science, Bina
Nusantara University Nusantara University Nusantara University
Jakarta, Indonesia Jakarta, Indonesia Jakarta, Indonesia
[email protected] [email protected] [email protected]

Edy Irwansyah
Computer Science Department
School of Computer Science, Bina
Nusantara University
Jakarta, Indonesia
[email protected]
Abstract— Face detection and face recognition has been a accompanied by body temperature, the accuracy obtained is
part of our daily life for a while now. They usually help us still small in trials
speeding up the time of any identification process in many The Covid-19 pandemic has a wide impact in
places. In early 2020, COVID-19 has become a worldwide various countries in the world. This incident made countries
pandemic and drives many people to find many solutions to this
in the world issue various policies to detect the spread of the
problem. Our contribution to the case is finding the best method
to find potential COVID patients before they infect more people virus. One of them is the use of identity in tracing user
in public places by using facial detection and recognition. Using contacts with patients who have contracted Covid-19. In
the PRISMA Flowchart methodology, which helps authors facial recognition research to detect patients, each algorithm
systematically analyze relevant publications and improve the used has its own challenges, namely face position, detailed
quality of reports and meta-analyses The first matter to be sample images, and the database used. Each of these
solved is to find the most used algorithms used in facial detection algorithms also has its own level of accuracy for detecting the
and recognition, then followed by finding out which is the best patient's face. The way facial recognition works is to detect
one to be implemented for our case study. Our findings suggests patients infected with Covid-19 through an AI algorithm with
that algorithms that can detect and recognize faces under
infrared thermal technology to identify patients.
occluded conditions works best for this case.
In this publication, we propose ways to use face
Keywords—Face Recognition, Convolutional Neural detection and recognition technology to recognize and
Networks, Face Detection, Face Identification, Support Vector immediately treat people who are close to patients who have
Machine previously been found positive. Various methods will be
carried out on all available videotapes in all areas that positive
I. INTRODUCTION patients visited in the past week. Therefore, we raise 2 issues
In this modern era, face recognition technology is in this application. The first issue is to look for the most
well known and utilized. Face Recognition is a facial widely used means of face recognition and detection. The
recognition technology that utilizes artificial intelligence / AI second issue is to find the most appropriate method to be
to recognize the faces of people who have registered in its applied in the main problem, namely finding people around a
database. The way Face Recognition works is the camera and facial recognition target, the COVID-positive patient. This
AI will scan people's faces in detail, and then the data will be paper is structured as following: We talked about face
stored on the server. The advantages of Face Recognition are detection and recognition in general in Section 2.1, its
very helpful in improving security but also have a algorithms and techniques are explained in Section 2.2. In
disadvantage, namely the occurrence of illegal actions, Section 3, we explain our methodology used when searching
namely selling the scanned data so that people's privacy will for resources and references about facial detection and
be threatened. The coronavirus (CoV) is a huge virus family recognition. Section 4 is where we describe our research
that can infect birds, animals, and humans. Recently, a new findings through the questions inferred in Section 4.1 and
coronavirus emerged known as COVID-19 triggered an Section 4.2.
outbreak in China in December 2019 and broke out in various
countries so that WHO declared it a global pandemic. II. THEORITICAL BACKGROUND
The problem of finding patients is very difficult, too The PRISMA checklist methodology is used as a model for
specific in seeing the faces of different patients and the systematic review in this literature study. The author has
temperature is reluctant to fluctuate and testing is still chosen the PRISMA method as the PRISMA method which
manual, which could be the fault of the patient where the allows the author to define the eligibility criteria, information
patient is not infected with Covid-19, so the patient can be sources, selection of literature, data collection and selection
detected with Covid-19 for a very long processing time. of data to be included after that report the number of reviews
Where there are lots of different facial points in humans and and meta analyzes that the author has carried out in the
some are the same, so it needs a very perfect match and is

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

249 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

process of writing this review paper and because the


PRISMA method covers both concepts and topics
In this literature review, the PRISMA checklist approach is
utilized as the foundation for a systematic review. The author
has chosen the PRISMA approach, which helps him to assess
the eligibility requirements, information sources, literature
selection, data collection, and data selection to be used before
reporting the number of reviews and meta-analyzes he has
conducted. Since the PRISMA approach encompasses both
principles and subjects, it was used in the writing of this
review article.
A. Face Detection and Recognition
Face recognition is a method that is applied to existing
technologies such as computers, cameras and so on where this
technology can recognize faces. with the development of this
technology is very helpful for human work, for example for
the police and also for the health sector. One of the interesting
things to design and realize is to find out the human body
temperature whether it has increased or decreased in body
temperature, this is very helpful in the field of health and
safety, especially now that there is Covid-19, which requires
technology to minimize virus exposure. Before this system
can be done, the system must first detect the face to measure
the temperature which is taken by the camera. The method that
is often used to detect the face is the Haar Feature-Based Fig. 1. Prisma Flow Diagram
Cascade Classifier method, where this method removes
unnecessary features in areas that are not needed. The inclusion criteria used in this article are articles whose
B. Face Detection and Recognition Techniques and topics are relevant to "Face Detection" and "Face
Algorithms Recognition" and are written in English. Reference articles
used have also been published in the last 5 years, namely
One of the methods used is the Viola Jones method, where between 2016 - 2021. Then the criteria that are excluded from
this method detects human faces, data taken from video this article are duplicate articles and also no full-text article.
capturing the device's camera. This method has three Inclusion and exclusion criteria were used to select the main
components including an integral image, the AdaBoost study.
machine method and the cascade classifier method. detected
as a face and there is no marker for undetected areas of the 1. Identification, which through an online database was
face. The Haar Cascade Algorithm is an algorithm that is used conducted by searching through Google Scholar. Some of
as the foundation for object detection, especially face the sources used are ScienceDirect, IOP Science, and
recognition in an image or video. This algorithm implements IEEE. The phrases used in searching relevant articles are
a cascade function to train images through 4 main stages: (1) "Face Recognition" and "Face Detection".
determine Haar features, (2) create integral images, (3) 2. Screening, where we sort papers that have duplication and
Adaboost training and (4) perform classification with the delete them so that the results are better and relevant to the
cascading classifier. the speed-up robust features (SURF) topic
method is also a face detection method that uses a key point, 3. Eligibility, by reading the abstract of each selected article,
the key point itself is part of a face or image whose value is articles that have irrelevant questions and are off the topic
strong / fixed when experiencing scale changes, rotation, of research are eliminated
blurring, 3-dimensional transformations, lighting and also 4. Included, at this stage, a review is carried out on all articles
changes in shape. that have problems related to the research topic

III. METHODOLOGY
A. Data Extraction and synthesis
The PRISMA checklist approach is used as the model for
The results of data extraction carried out after reading the
systematic review in this literature review. The authors have
previously selected articles were carried out by reading the
chosen the PRISMA method as the PRISMA method allows
entire article text. In the process of extraction and synthesis,
the authors to report the number of reviews and meta-analyses
the extracted items are ID, reference, methodology, and the
that authors had done in the process of writing this review
context evaluated from the study.
paper and also because the PRISMA method covers concepts
and also topics that are generally covered within other
systematic reviews [1].
The Table 1 below gives a quick overview of the number
of journals, papers, and articles that were reviewed for use in
authoring this study. The selection process is as follows:

250 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

TABLE I. DATA EXTRACTION the eyes, nose, and mouth will be developed, culminating in
Extracted Description the face of a specific person.
Data The CNN (Convolutional Neural Networks) method has
ID A letter or unique assigned to each paper been used by several different authors as a method of face
that is referenced recognition and face detection. About 8 articles have been
Reference Include the title of the paper, author, reviewed in this paper, so in the fact the CNN method is one
method, and year of publication of the most effective methods in face detection and is included
Methodology Methods used by reference papers, in face recognition.
especially the methods related to face
TABLE II. PUBLICATIONS ON THE MOST COMMONLY USED MACHINE
recognition LEARNING BASED TECHNIQUES IN FACE DETECTION AND FACE
Context The result of evaluating methods related RECOGNITION
to face recognition is that the algorithm Machine Learning Number Paper
that is most often used and has the best Technique of Reference
performance is chosen Papers
CNN (Convolutional 8 [2], [3], [5],
Neural Networks) [8], [9], [10],
IV. RESULT [11], [13]
A. What are the most used algorithms in face detection SVM (Support Vector 2 [6], [12]
and recognition? Machine)
There are several different methods that have been applied Hybrid Approach 2 [4], [7]
for Face Recognition and Face Detection. Some of the Geometrical
techniques used in the review paper are CNN (Convolutional
Neural Network), SVM (Support Vector Machine), Hybrid
Approach Geometrical Features, and PCA (Principal B. Which algorithm is the better suited to be applied at
Component Analysis). detecting potential COVID-19 patients ?
The CNN method in their research has been used by [2] There are a lot of interesting algorithms and method used
and [3]. Through a paper written by [2], they use a framework for facial detection and recognition after reviewing the papers
that contains face detection based on CNN with face tracking we have collected. While each method and algorithms have
and facial recognition algorithms implemented on an their own advantages, CNN (and their variations) is still the
embedded GPU system. The framework contains face top used algorithm used for facial detection and recognition
detection based on a convolutional neural network (CNN) based on [2], [3], [5], [8], [9]. [10], [11], [13]. However, the
with face tracking and facial recognition algorithms within most used algorithm is not always the answer to every
CNN. The results of these research experiments show that the relevant case. Many of the papers we reviewed are focused
system can recognize multiple faces up to 8 faces at once in on the highest accuracy of the result. In our case however,
real-time with a short processing time. with governments already enforcing rules about public safety
In their research, the SVM method has been used by [4]. protocols and regulations, many people use face mask to
They use the SVM method as an algorithm for detecting follow them. This heavily influences many algorithms and
several faces and the results of these experiments yield fairly techniques on their ability to detect and recognize faces.
good results compared to other face detection methods. The Therefore, we concluded that there is a sequence of
accuracy of their experiment increased by 90% algorithms need to be implemented before doing the final
recognition of the faces. This sequence consists of the best
With the development of deep learning, face recognition algorithm that can detect a face even if it is occluded, the best
technology based on CNN (Convolutional Neural Network) algorithm that can distinguish the difference between the
has become the main method adopted in the field of face
occlusion and the face, and the best algorithm that can
recognition. Convolutional Neural Network (CNN)
recover and reconstruct the occluded face before finally using
architecture is proposed for the face recognition problem, with
a solution that is capable of handling facial images that contain CNN as the preferred method to recognize the face [14].
occlusions, poses, facial expressions and varying illumination. Using the review of techniques from [15] to support us.
CNN has two methods, namely the classification using
feedforward and the learning stage using backpropagation. The proposed algorithm for the best technique that can
The way CNN works has space in the MLP, but in CNN, each detect faces under occluded conditions is Face Attention
neuron is presented in two dimensions CNN is a further Network (FAN), which is an occluded face detection
development of MLP because it uses a similar method with technique by fusion decision on sub-regions [16]. With an
more dimensions. accuracy of 88.5% success rate, FAN is the most convincing
choice for the first step of the sequence. For the second
In this CNN algorithm, the preceding layer's input is a 2-
sequence step, we proposed the multiple classifier system
dimensional array instead of a 1-dimensional array. So, the
layers are pooling/merging, the first layer reflects the strokes technique using Lophoscopic PCA method as it is the only
in different directions, while the second layer shows elements technique that can make a difference between a face mask
such as the shape of the eyes, nose, and mouth, if the features region (occlusion) of a detected face and the actual region of
of a human face are equivalent. On the third layer, which is the subject’s face with a good success rate. Finally, we
still in the form of scratches, a combination of the features of propose the sparse representation classifier technique that
uses sparse representation classification (SRC) methods for

251 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

the final step in the sequence. This is the best technique that [7]. BHSBIET, L., Kaur, J., & Singh, H. Face detection and
can yield the best reconstruction results using iterative Recognition: A review.
recovery stage. [8]. Chaves D, Fidalgo E, Alegre E, Alaiz-Rodríguez R,
Jáñez-Martino F, Azzopardi G. Assessment and
V. CONCLUSION Estimation of Face Detection Performance Based on
In this paper, we reviewed recent papers about Deep Learning for Forensic Applications. Sensors. 2020
different kinds of face detection and recognition methods. Jan;20(16):4491.
With the aim to find the most effective method that can be [9]. Chun LZ, Dian L, Zhi JY, Jing W, Zhang C. YOLOv3:
used for detecting potential COVID-19 patients, we have Face Detection in Complex Environments. International
concluded which method is best for it. At first, we seem to Journal of Computational Intelligence Systems. 2020
prefer the highest accuracy and least time-consuming Aug;13(1):1153-60.
method. However, with most people using masks in public, it [10]. Kak SF, Mustafa FM, Valente P. A review of person
clears for us to use methods that still can detect and recognize recognition based on face model. Eurasian Journal of
faces in an occluded manner. Therefore, we concluded that Science & Engineering. 2018 Sep;4(1):157-68.
the best method for the case is by using Face Attention [11]. Li Q. An Improved Face Detection Method Based on
Network followed by Lophoscopic PCA before finally using Face Recognition Application. In2019 4th Asia-Pacific
sparse representation classification (SRC). Conference on Intelligent Robot Systems (ACIRS) 2019
Jul 13 (pp. 260-264). IEEE.
REFERENCES [12]. Li Y, Shan S, Wang R, Cui Z, Chen X. Fusing magnitude
and phase features with multiple face models for robust
[1]. Moher D, Liberati A, Tetzlaff J, Altman DG, Prisma face recognition. Frontiers of Computer Science. 2018
Group. Preferred reporting items for systematic reviews Dec;12(6):1173-91.
and meta-analyses: the PRISMA statement. PLoS [13]. Malhotra S, Aggarwal V, Mangal H, Nagrath P, Jain R.
medicine. 2009 Jul 21;6(7):e1000097. [1] Comparison between attendance system implemented
[2]. Saypadith S, Aramvith S. Real-time multiple face through haar cascade classifier and face recognition
recognition using deep learning on embedded GPU library. In IOP Conference Series: Materials Science and
system. In2018 Asia-Pacific Signal and Information Engineering 2021 (Vol. 1022, No. 1, p. 012045). IOP
Processing Association Annual Summit and Conference Publishing.
(APSIPA ASC) 2018 Nov 12 (pp. 1318-1324). IEEE [14]. Ramadhani AL, Musa P, Wibowo EP. Human face
[3]. Zeng D, Veldhuis R, Spreeuwers L. A survey of face recognition application using pca and eigenface
recognition techniques under occlusion. arXiv preprint approach. In2017 Second International Conference on
arXiv:2006.11366. 2020 Jun 19 Informatics and Computing (ICIC) 2017 Nov 1 (pp. 1-5).
[4]. Kumar S, Singh S, Kumar J. Multiple face detection using IEEE.
hybrid features with SVM classifier. InData and [15]. Wang F, Chen L, Li C, Huang S, Chen Y, Qian C, Loy
Communication Networks 2019 (pp. 253-265). Springer, CC. The devil of face recognition is in the noise. In
Singapore Proceedings of the European Conference on Computer
[5]. He, R., Wu, X., Sun, Z., & Tan, T. (2018). Wasserstein Vision (ECCV) 2018 (pp. 765-780).
cnn: Learning invariant features for nir-vis face [16]. Wang J, Yuan Y, Yu G. Face attention network: An
recognition. IEEE transactions on pattern analysis and effective face detector for the occluded faces. arXiv 2017.
machine intelligence, 41(7), 1761-1773 arXiv preprint arXiv:1711.07246.
[6]. Williford JR, May BB, Byrne J. Explainable Face
Recognition. InEuropean Conference on Computer
Vision 2020 Aug 23 (pp. 248-263). Springer, Cham.

252 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Waste Classification Using EfficientNet-B0


William Mulim Muhammad Farrel Revikasha Rivandi
Computer Science Departement Computer Science Departement Computer Science Departement
School of Computer Science School of Computer Science School of Computer Science
Bina Nusantara University Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia Jakarta, Indonesia Jakarta, Indonesia
[email protected] [email protected] [email protected]

Novita Hanafiah
Computer Science Departement
School of Computer Science
Bina Nusantara University
Jakarta, Indonesia
[email protected]

Abstract—Waste management has become one of the As it is known, currently there are still many people who
emerging problems. A way to speed up the whole process is by do not understand the different types of waste that they want
doing waste sorting, which could be done by computer using image to dispose of, so even though there were different types of
recognition. EfficientNet-B0 could be utilized in this scenario due trash bins with organic and inorganic types provided, people
to the more efficient architecture and comparable performance
with others deep convolutional neural network. For this
still dispose their garbage in the wrong places.
experimentation, we did transfer learning and fine-tuning on it, This of course will become very troublesome in the effort
and then do hyperparameter exploration. We also did the same to sort the waste in the trash bin where it acts as the first place
process on few other models, and EfficientNet-B0 achieves the in which the garbage gathers. Because of this, there is a need
best accuracy at 96% accuracy on training with one of the smallest for a tool that can help people to distinguish between the
models. While we got 91% accuracy on validation, we also discover types of waste before disposing it in the trash bin with an
that our model has noticeable difficulty in classifying recyclables accurate classification method.
waste. An object recognition system using a camera as its data
Keywords—Waste Image Classification, EfficientNet, Deep source, object recognition is a computer vision technique to
Learning, Confusion Matrix, CNN identify objects in images or video. Object recognition is the
main output of deep learning algorithm and machine learning.
I. INTRODUCTION The recognition of these objects becomes an important
Waste is an issue that needs serious attention. From year element for a waste classification system that can be used.
to year, it continues to increase in line with the population There are several studies that have conducted experiments in
growth. Without any proper countermeasure or better different ways. In general, object recognition with images
management strategy, we could add another problem to our uses CNN as the model [2]. The development of this model
environment on top of many environmental problems that is DNN, which is able to provide more precise accuracy [3].
might occur or still occurring in the future. However, apart from DNN, the development of CNN has also
In general, organic and inorganic waste will be produced model that extracts images based on the constraints
transported and piled up at the final disposal site (TPA). The of different objects [4]. In addition to classification, the
amount of waste dumped in the TPA can be minimized by development of CNN has also produced a model with the
recycling. Recycled waste can turn previously unusable function to extract data properly, so that the data can be
things into useful items. processed/classified better [5]. However, the development of
Recycling for organic and inorganic waste has different an overly fixated DNN resulted in a very heavy and lengthy
processes, so it is necessary to separate them before the model to train, without providing significant real-world
recycling process. Waste sorting that was carried out in the performance gains [6]. Because of that, there appears an
final disposal site (TPA) will be more difficult to do experiment that aims to make DNN more effective [7]. The
compared with waste sorting in a trash bin. So far, the only result of that experiment provides satisfactory results when
available trash bins are then ones where a manual sorting was compared with other models, without the need for time and
carried out by the people. Generally, the types of waste based computation like the other models. In the real world, this
on sorting according to Damanhuri and Padmi [1] can be computation time is also an influencing factor. Therefore, our
divided into two, namely: group plans to create a waste classification model based on
• Organic waste or often referred to as wet waste is a type the EfficientNet model. It is hoped that the classification of
of waste that originates from the remains of living things, waste obtained will give satisfactory results, without taking a
so this type of waste can be easily destroyed and long time to recognize the objects.
decomposed in a natural way. II. LITERATURE REVIEW
• Inorganic waste or often referred to as dry waste is a type
Waste classification from obtained images has many
of waste in which the substances are composed of non-
different ways. The most common way is to use DNN.
organic compounds and usually comes from non- Although DNN can still have poorer accuracy when
renewable natural resources such as petroleum, compared to other classifiers, such as SVM, the advantage of
industrial processes and minerals or mining. using DNN is that in the long term, it will be easier to avoid
overfitting problems in DNN compared to SVM [8]. Out of

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

253 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

the most common DNN models, the one with the best avoid over-training /overfitting while training. Our proposed
accuracy is ResNet-16 [3]. If further explored, then we can model is using EfficientNet-B0 with pre-trained weight from
improve the prediction accuracy of the images obtained, but ImageNet.
on the other hand, it will have a much longer computation and
training time [6]. Some of those problems can be
adjusted/reduced by making modifications to the existing
Neural Network, such as changing the calculation method in
the existing layer. Other thing that can be done is to do
normalization to accelerate convergence during training,
while simultaneously increasing accuracy [9]. Modifications
to the optimizer (adding / changing) can also make a
difference to the accuracy [10]. Transfer learning to trained
models with certain datasets can provide better results as well
without having to do training from scratch [11]. Another
modification that could be done is to perform layer expansion
to do different calculations once, instead of doing the same
Fig. 1 Architecture model for the experiment
calculation multiple times [12].
Other than that, another thing that is doable is to use a A. EfficientNet
hybrid model, where DNN is used as a feature extractor, then
EfficientNet-B0 as an architecture is based on Mnas-Net
classified with other classifiers [10]. The development of the
with excitation optimization in the MBConv layers [18].
DNN concept as a feature extractor is to use AutoEncoder,
EfficientNet differs as it used compound scaling method for
which will compress and extract existing features from the
the model scaling from EfficientNet-B0. The compound
image and then reconstruct it again, so that the features
scaling equation are as follows:
obtained can be processed better [5] with the right
combination of optimizer and classifier, the accuracy 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑ℎ: 𝑑𝑑 = 𝛼𝛼 𝜙𝜙
obtained can reaches above 95%. 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤ℎ: 𝑤𝑤 = 𝛽𝛽𝜙𝜙
Another method that is feasible to increase the accuracy 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟: 𝑟𝑟 = 𝛾𝛾 𝜙𝜙
is to make modifications to the dataset that will be processed.
𝑠𝑠. 𝑡𝑡. 𝛼𝛼 ∙ 𝛽𝛽2 ∙ 𝛾𝛾 2 ≈ 2
One way that can be used is data augmenting which will add
𝛼𝛼 ≥ 1, 𝛽𝛽 ≥ 1, 𝛾𝛾 ≥ 1
the amount from the dataset, both in terms of the total data
amount and also the variation [13]. However, this can also
Where ϕ is user defined coefficient. using the equation,
reduce the accuracy of the model that was used.
the total FLOPS would increase by (𝛼𝛼. 𝛽𝛽2 . 𝛾𝛾 2 ) ϕ. From the
The performance of this waste classification can also be
original equation, the constraint was 2, so the expected
influenced by the image of garbage obtained by using an
FLOPS increase would be around 2ϕ. With the base model, in
external device in the form of a sensor [14]. With the use of
the original paper’s case EfficientNet-B0 ϕ is fixed to 1, so
appropriate sensors, even with a simple model, the
the assumed available resource is around twice the number of
classification results obtained will be satisfactory [15].
original FLOPS. for EfficientNet-B0, it was
Other dataset modification that is viable is in the
d=1.2,w=1.1,r=1.15, those numbers would be fixed as a
preprocessing stage. One method is to use a feature descriptor
constant and model would be scaled up by changing ϕ of the
[2]. Another preprocessing that could be done is by adding a
equation.
scoring layer to provide the results of object predictions and
RoI (Region of Interest) so that the shape of the object can be B. Random Translation
processed better [15].
Translation is a process where image would be shifted in
However, the recognition of waste is not limited to only
a direction. In our augmentation layer, we used factor of 0,1
using DNN. Object recognition can be done with other
for the height and width, meaning that the image would be
developments from CNN. One of them is with R-CNN, which
shifted in between 0 to 10% of their original position, both
recognizes objects in the image based on regions/blocks that
vertically and horizontally.
are estimated as objects [16]. Another model that emerged
from the development of R-CNN is YOLO, where the model
will see the data in the image, then create a boundary box and
provide confidence value of the boundary box created,
afterwards classifying those class of boxes in one process [4].
Other development of CNN is attention-based, where CNN
will create 2 image grids with different processing sizes, then
combining the result of both of them to find information from
images that is relevant in the two process, as though imitating
the attention that humans have [17].
III. METHODOLOGY
Fig. 2 Example of random translation(right) from an image (left)
For the pre-processing stage, we scale the images to
224x224 according to our model. Then, add augmentation
layer before being inputted to our model. This is done to

254 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

C. Random Rotation F. Random Zoom


Rotation is a process where image would be rotated by a Image zoom affects how close the object captured
degree. In our augmentation layer, we used factor 0f 0.15 for compared to the actual distance. In our augmentation layer,
the random rotation. This would rotate the picture in range of we used zoom-in augmentation by factor of 0.1 for height and
0-15% of 2π/360 degrees, which would be from 0-54 degree weight, which would zoom in the image in range of 0-10%
of counter-clockwise rotation closer than the original image both vertically and
horizontally.

Fig. 3 Example of random translation and rotation of an image


Fig. 6 Example of random translation, rotation, additional contrast, flip,
D. Random Contrast and additional zoom of an image.
Contrast in an image affect visibility between objects in IV. EXPERIMENTS AND RESULT
that image. In our augmentation layer, we add contrast to the
image by 0 to 12.5% of original images. In our experiment, we used base model EfficientNet-B0
pre-trained with ImageNet weight. we modified the input
layer by adding our own Input layer, and add additional
Image Augmentation layer. We add few top layers such as a
2D pooling layer, Batch Normalization layer, and dropout
layer with 0.5 rate. We then add a new Dense layer as output
with number of outputs as 2, due to our dataset output which
has 2 output, with SoftMax activation function.
A. Dataset
For this experiment, we used Waste Classification data
[19] from Kaggle as our dataset. This dataset contains 25.077
images, and split with 80/20 ratio, which is 20,062 images for
training and 5,015 for testing. This dataset contains 13,966
Fig. 4 Example of random translation, rotation, and additional contrast of
an image
organic images and 11,111 recyclable images. Dataset
exploration shows that while the dataset contains various
E. Random Flip relevant images, consistency of the image style is low. This
could potentially give mixed results in the implementation as
Image orientation could affect prediction result, as some
there’s a higher chance of misinterpretation. Another thing to
images are only valid in certain orientation. In our
note is dataset consists of many stock photos, which might
augmentation layer, we flipped the image horizontally.
not reflect on real life performance and results, as noise for
those photos are minimum.

Fig. 5 Example of random translation, rotation, additional contrast, and flip


of an image
Fig. 7 Example of the images contained in the dataset

255 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

B. Performance Evaluation Optimiz Learning Batch


er rate Size Epochs Recall Loss
Our initial model is using Adam optimizer, along with
learning rate of 0.00001/1e-5, with epoch of 25 and batch size Adam 1,00E-05 32 25 95,15 0,2
of 64. We used categorical cross-entropy for our loss result,
and accuracy, precision, recall as the metrics. With this Adam 1,00E-05 64 30 95,17 0,18
hyperparameter, we got 94.8 percent accuracy in our model, Adam 1,00E-05 64 35 95,13 0,23
with loss of 0.21.
Optimiz Learning Batch Accurac Precisio
er rate Size Epochs y n

Nadam 1,00E-05 64 25 94,68 94,68

SGDM 1,00E-05 64 25 87,48 87,48


RMSpro
p 1,00E-05 64 25 96,47 96,47

Adam 1,00E-04 64 25 96,54 96,54

Adam 1,00E-06 64 25 88,67 88,67

Adam 1,00E-05 16 25 95,17 95,17

Adam 1,00E-05 32 25 95,15 95,15

Adam 1,00E-05 64 30 95,17 95,17

Adam 1,00E-05 64 35 95,13 95,13

With these results in mind, we compared our best model


with few other models such as MobileNetV3-Small, VGG16,
DenseNet121, and ResNet50-V2 with same fine-tuning layer
and hyperparameter tuning.

TABLE II. COMPARISON BETWEEN TUNED MODELS AND THE


HYPERPAREMETERS

Model
Optimizer Learning rate Batch Size Epochs
EfficientNet-
B0 RMSprop 1,00E-05 16 30
Fig. 8 EfficientNet-B0 Base Model Validation Accuracy and Loss Curve MobileNetV3-
Small Adam 1,00E-04 64 25
C. Hyperparameter and Experimentation VGG-16
Adam 1,00E-05 64 25
To further improve our model, we did hyperparameter DenseNet121
Adam 1,00E-05 32 30
tuning on our optimizers, learning rate, number of epochs, ResNet50-V2 Adam 1,00E-05 64 35
and batch size. For optimizer, we tested SGD with
Momentum, RMSprop with Momentum, and Nadam. All Model Parameter
Accuracy Precision Recall Loss Amount
momentum was set to 0.9. For our batch size test, we tested
EfficientNet-
with 16 and 32 mini batch size. For our learning rate test, we B0 96,29 96,29 96,29 0,18 4M
tested 0.001 and 0.0001 to smoothen our model training. For MobileNetV3- 1.5M
the epochs testing, we tested 30 and 35 epochs. The best Small 95,47 95,47 95,47 0,17
result we got is by using Adam with learning rate of 1e-4 and VGG-16 14M
95,65 95,65 95,65 0,15
RMSprop with learning rate of 1e-4, as shown in Table 1. DenseNet121
96,25 96,25 96,25 0,17
7M
ResNet50-V2 96,17 96,17 96,17 0,19 23M
TABLE I. RESULTS OF HYPERPARAMETER TESTING OF BASE MODEL
Optimiz Learning Batch
er rate Size Epochs Recall Loss From our experiment, it’s shown that EfficientNet-B0
Nadam 1,00E-05 64 25 94,68 0,2
performs the overall best performance, with the one of the
smallest models. Compared to similar performing model,
SGDM 1,00E-05 64 25 87,48 0,32 EfficientNet-B0 is showing similar performance, with much
RMSpro smaller architecture. This could affect inference time, as
p 1,00E-05 64 25 96,47 0,18 smaller model could affect inference time. On the other note,
Adam 1,00E-04 64 25 96,54 0,16 MobileNetV3- Small also gives a very good result in our
dataset, with accuracy of 95.4 percent. This could be
Adam 1,00E-06 64 25 88,67 0,3 attributed to the clean dataset and low noises and with only 2
classes classification.
Adam 1,00E-05 16 25 95,17 0,16

256 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

To further analyze our model, we used Non and Classification," in 5th International Conference on Advanced
Biodegradable Material Dataset by Rayhan Zamzamy [20] as Engineering and ICT-Convergence 2020, Tongyeong-si, 2020.
a validation dataset because of the similarity to our training [4] Y. Liu, Z. Ge, G. Lv and S. Wang, "Research on Automatic Garbage
Detection System Based on Deep Learning and Narrowband Internet
and test dataset. The validation data contains 2.567 organic of Things," Journal of Physics: Conference Series, vol. 1069, p.
images and 8.363 recyclable images. Then we plot our 012032, 2018.
prediction to a confusion matrix.
[5] M. Toğaçar, B. Ergen and Z. Cömert, "Waste classification using
AutoEncoder network with integrated feature selection method in
convolutional neural network models," Measurement, p. 107459,
2020.
[6] C. Bircanoğlu, M. Atay, F. Beşer, Ö. Genç and M. A. Kızrak,
"RecycleNet: Intelligent Waste Sorting Using Deep Neural
Networks," in 2018 Innovations in Intelligent Systems and
Applications (INISTA), Thessaloniki, 2018.
[7] M. Tan and Q. V. Le, "EfficientNet: Rethinking Model Scaling for
Convolutional Neural Networks," 11 September 2020. [Online].
Available: https://arxiv.org/abs/1905.11946. [Accessed 4 April
2021].
[8] G. E. Sakr, M. Mokbel, A. Darwich, M. N. Khneisser and A. Hadi,
"Comparing deep learning and support vector machines for
autonomous waste sorting," in IEEE International Multidisciplinary
Conference on Engineering Technology, Beirut, 2016.
[9] H. Wang, "Garbage Recognition and Classification System Based on
Convolutional Neural Network VGG16," in 2020 3rd International
Fig. 9 Confusion Matrix of Validation Dataset Conference on Advanced Electronic Materials, Computers and
Software Engineering (AEMCSE), Shenzhen, 2020.
The confusion matrix shows that while our model has a [10] O. Adedeji and Z. Wang, "Intelligent Waste Classification System
Using Deep Learning Convolutional Neural Network," The 2nd
very good precision for organic, it has bad recall due to very International Conference on Sustainable Materials Processing and
high TP but also noticeable TP. Indeed, this can be seen from Manufacturing, vol. XXXV, pp. 607-612, 2019.
F1-Score of Organic with 0.83 and R with 0.94. this could [11] A. M, A. K. M, M. A. TS, N. K. A and M. S, "Garbage Waste
indicate that our dataset doesn’t have more variety to it, or Classification Using Supervised Deep Learning Techniques,"
there might be something more to it about Recyclable waste. IJETIE, vol. VI, no. 3, 2020.
Nevertheless, we achieved 91% accuracy from our validation [12] C. Shi, R. Xia and L. Wang, "A Novel Multi-Branch Channel
dataset which could be usable in real world scenario. Expansion Network for Garbage Image Classification," IEEE Access,
vol. VIII, pp. 154436 - 154452, 2020.
V. CONCLUSION [13] H. N. Kulkarni and N. K. S. Raman, "Waste Object Detection and
Classification," CS230 Stanford, Stanford, 2019.
In our experiment, we proposed waste classification
system to distinguish between Organic and Recyclable waste [14] T. J. Sheng, M. S. Islam, N. Misran, M. H. Baharuddin, H. Arshad,
M. R. Islam, M. E. H. Chowdhury, H. Rmili and M. T. Islam, "An
using EfficientNet-B0 with RMSprop with momentum as the internet of things based smart waste management system using LoRa
optimizer, batch size of 16, learning rate of 0.00001, with 30 and tensorflow deep learning model," IEEE Access, no. 8, pp.
148793-148811, 2020.
epochs. We got an accuracy around 96% which is comparable
with other bigger models. We also discovered that our model [15] F. P. Fantara, D. Syauqy and G. E. Setyawan, "Implementasi Sistem
Klasifikasi Sampah Organik dan Anorganik dengan Metode Jaringan
encounter difficulty with recyclable images, especially in the Saraf Tiruan Backpropagation," Jurnal Pengembangan Teknologi
recall capability. With more variety and more analysis Informasi dan Ilmu Komputer e-ISSN, vol. II, no. 11, pp. 5577-5586,
towards our dataset, this could potentially alleviate the 2018.
recyclable images problem and could potentially help in [16] S. Li, M. Yan and J. Xu, "Garbage object recognition and
waste management. classification based on Mask Scoring RCNN," in Conference on
Culture-oriented Science & Technology (ICCST), Beijing, 2020.
As for our future work, we plan to explore this topic
deeper with different datasets, model, and fine-tuning. We [17] J. Y. Yuan, X. Y. Nan, C. R. Li and L. L. Sun, "Research on Real-
Time Multiple Single Garbage Classification Based on
would also explore recyclable waste in general to see if our Convolutional Neural Network," Mathematical Problems in
current model is limited by the dataset used or if there’s more Engineering, pp. 1024-123X, 2020.
into recyclable waste images in term of image recognition. [18] J. Hu, L. Shen and G. Sun, "Squeeze-and-Excitation Networks," in
IEEE Conference on Computer Vision and Pattern Recognition, Salt
VI. REFERENCES Lake City, 2018.
[19] S. Sekar, "Waste Classification data," 25 October 2019. [Online].
Available: https://www.kaggle.com/techsash/waste-classification-
[1] E. Damanhuri and T. Padmi, "PENGELOLAAN data. [Accessed 26 May 2021].
SAMPAH".Semester I - 2010/2011.
[20] R. Zamzamy, "Non and Biodegradable Material Dataset, Version 1,"
[2] S. C. W. T. Meng, " A Study of Garbage Classification with Kaggle, 6 April 2021. [Online]. Available:
Convolutional Neural Networks," in International Conference on https://www.kaggle.com/rayhanzamzamy/non-and-biodegradable-
Computing, Analytics and Networks, Chiayi, 2020. waste-dataset. [Accessed 3 June 2021].
[3] D. Gyawali, A. Regmi, A. Shakya, A. Gautam and S. Shrestha,
"Comparative Analysis of Multiple Deep CNN Models for Waste

257 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

A Survey: Crowds Detection Method on Public


Transportation
Darren Anando Leone Handry Novianto Ruben Setiawan
Computer Science Department Computer Science Department Computer Science Department
School of Computer Science School of Computer Science School of Computer Science
Bina Nusantara University Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia Jakarta, Indonesia Jakarta, Indonesia
[email protected] [email protected] [email protected]

Timothy Gilbert Novita Hanafiah


Computer Science Department Computer Science Department
School of Computer Science School of Computer Science
Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia Jakarta, Indonesia
[email protected] [email protected]

Abstract—Face recognition is a computer technology being


used in a variety of applications that identifies human faces in
digital images. At this time, face recognition can be used to find system that can count the number of passengers in public
out how crowded a public transport is. Face recognition can be transportation so that it can be analyzed and monitored. To
used to calculate how many people are in a public transport. overcome this problem, it is necessary to conduct a thorough
Nowadays, we have been assisted by the existence of Closed- analysis through surveillance camera (CCTV) recordings [1].
Circuit Television (CCTV). The video, images from the CCTV
As researchers have done before, we can use various
footage can be used to detect crowds in a public transport. Our
methods to analysis that crowd. For example, such as deep
aim in making this paper is to compare which method is most
suitable for use in detecting crowds in a public transport. The learning based analysis [1], CNN method [4], etc.
writer used various different research papers in the period of
II. STUDY REVIEW
2015-2021. From our analysis, there are many methods that can
be used to check the crowd in a public transport, such as CNN, A. Planning the review
RCNN, YOLO, Viola Jones, and still many more. From our
The keywords that used in this paper are Public
research, we found that the CNN method has the highest
accuracy rate than the other methods.
Transportation Face Recognition, Crowd Detection, Video,
Image. There are several research questions made with the aim
Keywords— Face recognition, Crowd Detection, Image, of simplifying the data analysis process. Research questions
Video, Survey that have been made to analyze all these papers are:

I. INTRODUCTION • Q1: What methods can be used to detect overcrowding


in public transport?
Currently, there are many transportation facilities
provided by the government, one of that is public • Q2: What datasets are used in crowd detection?
transportation. There are several types of public transportation The papers we used in this survey paper come from several
that have been provided by the government, such as paper databases on the internet, such as IEEE, Oxford
TransJakarta, KRL, MRT, LRT, etc. Public transportation University Research Archive, arxiv, and Science Direct. With
provides many advantages, such as we can save money several criteria shown in table I.
spending because there is no need to buy a private vehicle
which is relatively expensive. Moreover, the existence of TABLE I. PAPER SELECTION CRITERIA
public transportation can improve air quality, because public
transportation can accommodate more people better than the Selection criteria Exclusion criteria
private vehicle. International academic papers Papers published before 2015

However, public transportation still has drawbacks, for Papers related to this survey paper Websites, blogs
example, sometimes the number of people using public
transportation (TransJakarta) is very congested. This can B. Analysis
make some people feel uncomfortable in using it. After we get the paper related to this survey paper, we will
Furthermore, this full capacity can make public transportation compare the methods that can be used in crowd detection.
users vulnerable objects in the transmission of the Covid virus.
Q1: What methods can be used to detect overcrowding in
To solve this problem is not as easy as we imagine. public transport?
Because we need to think about whether the capacity in public
transportation is still normal or not. Moreover, there are also In detecting the crowd in public transportation, there is one
several constraining factors such as facial expressions, pose thing that must be considered. For example, whether the
variations, illumination effects, etc. [2]. Therefore, we need a crowd is an anomaly or is it still in normal condition.
Therefore, there are several methods that can be used to detect
crowd with each input base.

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

258 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

There are several studies that use the CNN (Convolutional accuracy of 93%. This system will not be affected by partial
Neural Network) method [5, 6, 7, 10, 11, 12, 13, 22] where occlusion because this system does not use motion and colour
this method is included in the classification with an accuracy properties but uses a mixture of images and colour depth from
rate that can reach 98.95%. and provides advantages such as, the top view camera.
being able to overcome the problem of occlusion and
background clutter, being able to use standard The Viola Jones method [9, 14, 17, 21, 28] is quite often
photogrammetric techniques, being able to recognize faces used by several researchers to detect and calculate the number
with or without accessories. of crowds. Viola Jones has several advantages, namely it can
be trained to detect other things, can detect faces in different
Moreover, there are studies that use the RCNN (Region- scenarios, and can detect more faces in a crowd. The Viola
based Convolutional Neural Network) method [8, 15, 21, 23, Jones method is able to detect crowds with an accuracy rate of
31] by classifying crowds. Where this method has an accuracy 91%.
rate of 90% with advantages such as being able to work faster
than CNN and successfully detecting humans, especially in Several studies use the Background Subtraction and
dark public transports. Histogram of Gradient [24, 26] method to detect and calculate
crowds. This method belongs to the Counting method type
There are several studies that use the skin detection because it is able to calculate the amounts of ceramics that
method [33] where this method belongs to the counting type. exist. This method has several advantages, specifically it is
This method provides an accuracy rate of 85%, with suitable for real-time scenarios, the feature information used
advantages such as being more efficient because there is no is not sensitive to changes in lighting, brightness and contrast.
tracking loss due to increasingly changing conditions. The Background Subtraction and Histogram of Gradient
method is able to detect crowds with an accuracy rate of
Furthermore, there are also methods Bootstrap algorithm, 81.25%
Cascading, Bagging and SVMs [36] where this method is
included in the counting with an accuracy rate of 93.6%. This The DNN (Deep Neural Network) method [18] can
method provides advantages such as being able to detect perform accurate detection when in an open public transport
multi-view faces in complex real scenes. with good lighting conditions. The DNN method is not very
good for detecting crowds, because the DNN accuracy is only
The method of Curve Analysis and Images Processing [30, 78%.
38] is also used by several studies. Where this method is
included in the counting with an accuracy rate of 85%. This Furthermore, the Thermal scanner method [19,29] was
method provides several advantages such as the ease of also used by previous researchers. Where this method is
performing people calculations and the ability to remove noise included in the classification of crowds. This method provides
without building any filters. an accuracy rate of 76%. With advantage such as not take up
much space.
Then there is the LBPH, Histogram of Gradient (HOG)
method [16, 27]. This method is included in the classification 2D laser range data and camera images [35] are methods
of crowds. The level of accuracy provided by this method is included in the counting which have an accuracy rate of 82%.
80% with advantages such as being able to be used in face The advantage provided by this method is that human
detection with low computational complexity. detection can be improved cooperatively.
There is also a study that uses the You Only Look Once Internet connection method [20] is a method that is
(YOLO) method [32] by using classification with an accuracy included in the calculation of the crowd. where this method
rate of 73%. This method provides advantages in the form of has advantages, such as being accurate in detecting all internet
accuracy in object detection and is able to exceed the speed of users. However, although this method is accurate in detecting
the algorithm. internet users, this method has the lowest level of accuracy in
detecting crowds on public transportation compared to other
Then there is also research using the method of Motion methods, which is 58%.
Detection and Blob Analysis [34] where this method is a
classification method. The accuracy rate that can be produced TABLE II. OBJECT DETECTION METHOD
by this method is 95% and is proven to provide an estimate of
the number of people in complex scenes. Input Object Method Type References Accuracy
base Detection
Moreover, there is The Virtual Gate Algorithm method Method
[37] used by this study. This method belongs to the counting Camera CNN Classification 98.95%
type. This method detects the size and direction of motion of Based 5, 6, 7, 10,
11, 12, 13,
objects which have the same dimensions. the accuracy that can 22
be produced by this method is 97.6%.
RCNN Classification 90%
Other studies use the method of Computer Vision, Image 8, 15, 21,
enhancement, blob detection, and blob tracking [39] where 23, 31
this study does the counting and produces an accuracy of 92%. Skin Detection Counting 85%
The advantages of this method are that it can perform two-way 33
calculations, flexibility of the algorithm when unexpected Bootstrap Counting 93.6%
algorithm, 36
situations occur, can monitoring large entrances or large Cascading,
spaces, monitoring high-traffic areas, and easy integration that Bagging and
does not restrict access. SVMs

There is also a study that uses the Gaussian Mixture Model Curve Counting 85%
Analysis and 30.38
(GMM) method [25, 40] which includes counting with an

259 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Images 32 YOLO Fast and accurate at object


Processing detection also able to exceed the
speed of other algorithms.
LBPH, HOG Classification 16.27 80%
34 Motion Detection Can provide an estimate the
Video YOLO Classification 32 73% and Blob Analysis number of people in a complex
Based scene.
Motion Classification 34 95%
Detection and 37 The virtual gate Can detect the size and direction of
Blob Analysis algorithm motion of objects which have the
same dimensions.
The virtual Counting 37 97.6%
gate algorithm 39 Computer Vision, Has a flexible algorithm and easy
Image integration that does not restrict
Computer Counting 39 92%
enhancement, blob access.
Vision, Image
detection, and blob
enhancement,
tracking
blob
detection, and 25, 40 Will not be affected by partial
blob tracking Gaussian Mixture occlusion.
Counting 25, 40 93% Model
Gaussian Viola Jones Can be trained to detect other
Mixture 9, 14, 17, things also can detect faces in
Model 21, different scenarios and can detect
(GMM) 28 more faces in a crowd.
Viola Jones Counting 91% Background Suitable for real-time scenarios and
CCTV 9, 14, 17, 24, 26
Based 21, 28 Subtraction and the feature information used is not
Histogram of sensitive to changes.
Background Counting 24, 26 81.25% Gradient.
Subtraction
and DNN The detection is successfully
18 carried out in an open public
Histogram of
Gradient. transport with high accuracy.

DNN Classification 18 74% Thermal scanner Not take up much space, because
19, 29 we can embed the used tools on the
Thermal Thermal Classification 19, 29 76% other object.
Camera scanner
Based 2D laser range data Can improved human detection.
35 and camera images
Camera 2D laser range Counting 35 82%
Based + data and Internet the detection accuracy is accurate
20 Connection for all internet users.
2D Laser camera
Range images
Sense Internet Counting 20 58%
Flow Connection Each method certainly has its own advantages and levels
System of accuracy. From the level of accuracy given by each method,
Based the CNN method has a high level of accuracy, namely for the
CNN method the accuracy rate can reach 98.95% and the
method with an internet connection has the smallest accuracy
TABLE III. OBJECT DETECTION METHODS AND THE ADVANTAGES rate of 58%. From the accuracy data, we can conclude that
CNN is better to use if we want to do crowd detection using
References Object Detection Advantages face recognition.
Method
CNN Successfully resolved occlusion Q2: What datasets are used for crowd detection?
5, 6, 7, 10, and background clutter issues, also
11, 12, 13, can reduce computational costs. From several experimental papers obtained, there are
22 various datasets that can be used in various methods to detect
RCNN Successfully detects crowds on public transportation. The datasets that can be used
8, 15, 21, humans, especially in dark public are for example: image [5, 6, 28] with a very varied number
23, transports. of images ranging from 2500 images obtained from google
31
search engines [7, 8] to image databases that have more
33 Skin Detection More efficient because there is no number of datasets [10, 8]. 11, 31]. Where each image used has
tracking loss. various characteristics such as selfie photos, photos of people
36 Bootstrap Can detect multi-view faces in passing by, to photos of people with other objects. The next
algorithm, complex real scenes. dataset that can be used in crowd detection is video, where the
Cascading, number of videos used in this dataset is also very different.
Bagging and
SVMs There are several studies using 246 videos [24, 26], there is
also a study that uses 1200 video frames [30]. The videos used
30.38 Curve Analysis Ease of performing people as datasets also have their own characteristics, such as video
and Images calculations and the ability to
Processing remove noise. recordings of passers-by, train entrance halls, and youtube
videos [11]. Furthermore, several studies also use thermal
16.27 LBPH, HOG Can detect with low computational
complexity.
where in this study they scan people using a thermal camera
system [19, 29], and the last dataset is an internet connection
[20] Where in the research conducted, they detect crowds by

260 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

detecting a collection of internet connections that exist in an Furthermore, there are studies that use video based
area. methods such as Yolo [32] with an accuracy rate of 73%,
Motion Detection and Blob Analysis [34] with 95% accuracy,
TABLE IV. DATASET The virtual gate algorithm [37] with an accuracy rate of
Dataset Amount of Data Characteristic 97.6%, Computer Vision, Image enhancement, blob detection,
Type and blob tracking [39] with 92% accuracy, Gaussian Mixture
Image 25000 images that take Characteristics of the Model [25, 40] with 93% accuracy.
from google[7, 8]; MS photos used such as
COCO datasets consisting “people crossing the There are several studies using CCTV based methods such
of 800 images.[32]; The roads”, “students playing as Viola Jones [9, 14, 17, 21, 28] with an accuracy rate of 91%,
FDDB, AWF, MALF basketball”, “participant Background Subtraction and Histogram of Gradient [24, 26]
databases which generate standing in front of with an accuracy rate of 81.25%, and DNN [18]. ] with an
more than 350,000 faces camera”; The COCO
[10]; YTF database that datasets have 6 classes;
accuracy rate of 74%.
generates 6000 faces [11]; The VOC PASCAL There are also studies that use a thermal camera based with
PASCAL VOC where a include various images
total of 2913 images [31]; from 20 different classes. methods such as Thermal scanners [19, 29] with an accuracy
rate of 76%.
Videos CCTV [9, 18, 24, 26] The characteristics of the
recording that produces video used are video Moreover, there are studies using camera Based + 2D laser
over 4000 faces; Using recordings of passers-by; Range with 2D laser range data and camera images [35] with
1200 training frames and People who gather in one
place.; video sequences
an accuracy rate of 82%. Another study used the Sense Flow
2800 testing frames [30];
the datasets used 2 videos shot in shopping malls and System Based with the Internet Connection method [20] with
with images processed at a train station entrance halls; an accuracy rate of 58%.
size of 360x288 pixels Youtube videos;
using RGB color
From the several studies we conducted, we found that
information [34]; as for the there are three effective methods to be used in crowd
YTF database which detection. The three methods are CNN, RCNN, and Viola
produces 5000 videos [11]; Jones. Where the CNN method is the best method to use in
Thermal scanned 427 people from The characteristics of the crowd detection, because the CNN method provides several
thermal camera system [19, data used are people who advantages such as Successfully overcoming the problem of
29]; pass in somewhere; occlusion and background clutter, being able to recognize
Sense 365 sample of phone detected The characteristics of the faces with or without accessories, detecting all crowds more
flow in public area [20] data used are a collection accurately, both crowds that are still empty and crowds which
of internet connections is very dense, and with the highest level of accuracy, reaching
from each smartphone in an 98.95%.
area.
IV. CONCLUSION
In this study, we used 40 research papers related to crowd
TABLE V. DATASET DETAIL detection. From the papers that we studied, we found that there
Characteristic are several methods can be used to detect crowds and some of
Dataset Dataset the methods has the similarities of each other. Those methods
Class/Object Illumination Color Pixel
Type Name Type Size can be combined based on the input based used. There are 6
MS
6 Classes - RGB - input based groupings, namely camera based, video based,
COCO CCTV based, Thermal Camera based, Camera Based + 2D
VOC
Pascal
20 Classes v RGB - Laser Range, and Sense Flow System Based. The datasets
Image
FDDB, used in each method are also different with their respective
AWF, Faces - RGB - characteristics. There are methods that uses images as the
MALF dataset, there are also methods that uses video as the dataset.
YTF Faces - RGB - There are also methods that use thermal and sense flow as the
RGB 360 dataset.
CCTV
Faces - + x
record B/W
Video 288 We found that CNN, RCNN, and Viola Jones were the
Youtube Object + most widely used methods, because from the papers we found
Video Faces - RGB -
they were had superior advantages compared to other
methods. Of the existing methods, we found that the CNN
III. DISCUSSION method has the highest level of accuracy than the other
methods. The level of accuracy given by CNN in detecting
In this study, we found that there are several input based crowds can reach 98.95%.
that can be used to detect crowds. there are several studies that
use input based in the form of camera based methods such as REFERENCES
CNN [5, 6, 7, 10, 11, 12, 13, 22] with an accuracy rate of [1] J. Ma, Y. Dai and K. Hirota, "A survey of video-based crowd anomaly
detection in dense scenes," Journal of Advanced Computational
98.95%, RCNN [8, 15, 21, 23, 31] with 90% accuracy, Intelligence and Intelligent Informatics, vol. 21, pp. 235--246, 2017.
Skin Detection [33] with 85% accuracy, Bootstrap algorithm, [2] M. Sharif, F. Naz, M. Yasmin, M. A. Shahid and A. Rehman, "Face
Cascading, Bagging and SVMs [36] with 93.6% accuracy, Recognition: A Survey," Journal of Engineering Science \&
Curve Analysis and Images Processing [30,38] with 85% Technology Review, vol. 10, 2017.
accuracy, and LBPH and HOG [16,27] with 80% accuracy

261 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

[3] A. Bathija dan G. Sharma, “Visual Object Detection and Tracking using [23] V. Sharma dan R. N. Mir, “Saliency guided faster-RCNN (SGFr-
YOLO and SORT,” International Journal of Engineering Research & RCNN) model for object detection and recognition,” Journal of King
Technology (IJERT) Vol.8, no. 11, pp. 705 - 708, 2019. Saud University – Computer and Information Sciences, pp. 1 - 12,
[4] G. Gao, J. Gao, Q. Liu, Q. Wang and Y. Wang, "Cnn-based density 2019.
estimation and crowd counting: A survey," arXiv preprint [24] S. A. Chowdhury, M. M. S. Kowsar dan K. Deb, “Human detection
arXiv:2003.12783, 2020. utilizing adaptive background mixture models and improved histogram
[5] W. Liu, M. Salzmann and P. Fua, "Context-aware crowd counting," in of oriented gradients,” ICT Express No. 4 Elsevier B.V., pp. 216 - 220,
Proceedings of the IEEE/CVF Conference on Computer Vision and 2018.
Pattern Recognition, 2019, pp. 5099--5108. [25] G. Mariem, E. Ridha dan Z. Mourad, “Detection of Abnormal
[6] O. M. Parkhi, A. Vedaldi and A. Zisserman, "Deep face recognition," Movements of a Crowd in a Video Scene,” International Journal of
2015. Computer Theory and Engineering, Vol. 8 No. 5, pp. 398 - 402, 2016.
[7] S. Shao, Z. Zhao, B. Li, T. Xiao, G. Yu, X. Zhang and J. Sun, [26] D. K. Singha, S. Paroothi, M. K. Rusia dan M. A. Ansari, “Human
"Crowdhuman: A benchmark for detecting human in a crowd," arXiv Crowd Detection for City Wide Surveillance,” dalam Third
preprint arXiv:1805.00123, 2018. International Conference on Computing and Network Communications
(CoCoNet’19), Kerala, India, 2020.
[8] X. Sun, P. Wu and S. C. Hoi, "Face detection using deep learning: An
improved faster RCNN approach," Neurocomputing, vol. 299, pp. 42- [27] S. K dan Dr.Manjunath.S.S, “Human Detection and Tracking using
-50, 2018. HOG for Action Recognition,” dalam International Conference on
Computational Intelligence and Data Science, India, 2018
[9] A. Alharbi, A. Aloufi, E. Hamawi, F. Alqazlan, S. Babaeer and F.
Haron, "Counting people in a crowd using Viola-jones Algorithm," [28] A. Alharbi, A. Alouf, E. H. F. Alqazlan, S. Babaeer dan F. Haron,
International Journal of Computer Science \& Information Technology, “Counting People in a Crowd Using Viola-Jones Algorithm,” IJCCIE
vol. 4, pp. 57--59, 2017. Vol. 4, Issue 1, pp. 57 - 59, 2017.
[10] Z. Hao, Y. Liu, H. Qin, J. Yan, X. Li and X. Hu, "Scale-aware face [29] Samuel, M.; M Ajibade, S.; Fudah Moveh, F. A.I. Driven Thermal
detection," in Proceedings of the IEEE Conference on Computer Vision People Counting for Smart Window Facade Using Portable Low‐Cost
and Pattern Recognition, 2017, pp. 6186--6195. Miniature Thermal Imaging Sensors (2020).
[11] W. Liu, Y. Wen, Z. Yu, M. Li, B. Raj and L. Song, "Sphereface: Deep [30] Z. Al-Zaydi, B. Vuksanovic dan I. Habeeb, “Image Processing Based
hypersphere embedding for face recognition," in Proceedings of the Ambient Context-Aware People Detection and Counting,”
IEEE conference on computer vision and pattern recognition, 2017, pp. International Journal of Machine Learning and Computing, Vol. 8, No.
212--220. 3, pp. 268 - 273, 2018.
[12] V. Ranjan, H. Le and M. Hoai, "Iterative Crowd Counting," in [31] Vipul Sharma, Roohie Naaz Mir,Saliency guided faster-RCNN (SGFr-
Proceedings of the European Conference on Computer Vision (ECCV), RCNN) model for object detection and recognition, Journal of King
2018, pp. 270-285. Saud University - Computer and Information Sciences, 2019.
[13] M. D. Chaudhari and A. S. Ghotkar, "A Study on crowd detection and [32] Bathija, Akansha, and Grishma Sharma. "Visual object detection and
density analysis for safety control," International journal of computer tracking using Yolo and sort." International Journal of Engineering
sciences and engineering, vol. 6, pp. 424 - 428, 2018. Research Technology 8.11 (2019).
[14] A. A. Karim and M. H. Abed, "Crowd Counting From Digital Image [33] A. A. Yusuf, F. S. Mohamad and Z. Sufyanu, "Human face detection
Based on Statistical Method," AL-MANSOUR JOURNAL, 2019. using skin color segmentation and watershed algorithm," American
Journal of Artificial Intelligence, vol. 1, pp. 29-35, 2017.
[15] S. W. Cho, N. R. Baek, M. C. Kim, J. H. Koo, J. H. Kim and K.
[34] M. Babiker, O. O. Khalifa, K. K. Htike, A. Hassan and M. Zaharadeen,
R. Park, "Face Detection in Nighttime Images Using Visible-Light "Harris corner detector and blob analysis featuers in human activty
Camera Sensors with Two-Step Faster Region-Based Convolutional recognetion," 2017 IEEE 4th International Conference on Smart
Neural Network," Sensors, vol. 18, p. 2995, 2018.
Instrumentation, Measurement and Application (ICSIMA), 2017, pp.
[16] Y. Kortli, M. Jridi, A. Al Falou and M. Atri, "A Novel Face Detection 1-5, doi: 10.1109/ICSIMA.2017.8312025.
Approach using Local Binary Pattern Histogram and Support Vector [35] J. a. N. N. M. Jin, N. Sakib, D. Graves, H. Yao and M. Jagersand,
Machine," in 2018 International Conference on Advanced Systems and "Mapless Navigation among Dynamics with Social-safety- awareness:
Electric Technologies (IC\_ASET), 2018, pp. 28-33.
a reinforcement learning approach from 2D laser scans," in 2020 IEEE
[17] K. Aashish and Vijayalakshmi, "Comparison of Viola-Jones and International Conference on Robotics and Automation (ICRA), 2020,
Kanade-Lucas-Tomasi Face Detection Algorithms," Oriental Journal pp. 6979-6985.
of Computer Science and Technology, vol. 10, 2017.
[36] Orozco, Javier, Brais Martinez, and Maja Pantic. "Empirical analysis
[18] Q. Zhang and A. B. Chan, "Wide-Area Crowd Counting via Ground- of cascade deformable models for multi-view face detection." Image
Plane Density Maps and Multi-View Fusion CNNs," in Proceedings of and vision computing 42 (2015): 47-61.
the IEEE/CVF Conference on Computer Vision and Pattern
[37] Kopaczewski, K., Szczodrak, M., Czyzewski, A., & Krawczyk, H.
Recognition, 2019, pp. 8297-8306.
(2015). A method for counting people attending large public events.
[19] K. P. Rane, "Design and Development of Low Cost Humanoid Robot Multimedia Tools and Applications, 74(12), 4289-4301.
with Thermal Temperature Scanner for COVID-19 Virus Preliminary
[38] M. Ravanbakhsh, M. Nabi, H. Mousavi, E. Sangineto and N. Sebe,
Identification," International Journal, vol. 9, 2020.
"Plug-and-play cnn for crowd motion analysis: An application in
[20] K. Li, C. Yuen, S. S. Kanhere, K. Hu, W. Zhang, F. Jiang and X. Liu, abnormal event detection," in 2018 IEEE Winter Conference on
"An Experimental Study for Tracking Crowd in Smart Cities," IEEE Applications of Computer Vision (WACV), 2018, pp. 1689-1698.
Systems Journal, vol. 13, pp. 2966-2977, 2018.
[39] A. Burbano, S. Bouaziz and M. Vasiliu, "3D-sensing Distributed
[21] T. I. Dhamecha, M. Shah, P. Verma, M. Vatsa and R. Singh, Embedded System for People Tracking and Counting," 2015
"CrowdFaceDB: Database and Benchmarking for Face Verification in International Conference on Computational Science and
Crowd," Pattern Recognition Letters, vol. 107, pp. 17-24, 2018. Computational Intelligence (CSCI), 2015, pp. 470-475, doi:
[22] M. H. Alotibia, S. K. Jarrayaa, M. S. Alia and K. Moriaa, "CNN- Based 10.1109/CSCI.2015.76..
Crowd Counting Through IoT: Application For Saudi Public Places," [40] Paolanti, M.; Romeo, L.; Liciotti, D.; Pietrini, R.; Cenci, A.; Frontoni,
in 16th International Learning & Technology Conference 2019, Saudi E.; Zingaretti, P. Person Re-Identification with RGB-D Camera in Top-
Arabia, 2019. View Configuration through Multiple Nearest Neighbor Classifiers and
Neighborhood Component Features Selection. Sensors 2018, 18, 3471.

262 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Performance Analysis Between Cloud Storage and


NAS to Improve Company’s Performance: A
Literature Review
Dimas Sekti Adji Gabriel Eduardus Michael
Computer Science Department, School Computer Science Department, School Computer Science Department, School
of Computer Science of Computer Science of Computer Science
Bina Nusantara University Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia 11480 Jakarta, Indonesia 11480 Jakarta, Indonesia 11480
[email protected] [email protected] [email protected]

Minawati Widodo Budiharto


Computer Science Department, School Computer Science Department, School
of Computer Science of Computer Science
Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia 11480 Jakarta, Indonesia 11480
[email protected] [email protected]

Abstract—Storage systems are becoming increasingly Nowadays, online applications and services have
important in framework for storing and accessing information prevailed as a fundamental constituent for modern society's
with less retrieval time and a low budget. It is common for every survival [3]. Whether it is small or large, many organizations
small organization or large company to use storage for storing or companies come to rely on data storage. Many companies
their data. The data that they store later will boost their value and organizations growing larger and larger also rely on data
as an organization or company. The problem is that many storage as it has many benefits. When the companies grow
companies do not know the best option for their storage system. more prominent, it is not just employees who are increasing in
Most companies pick one storage type without knowing the best number, but the number of departments and types of
type for their companies. This paper aims to know and compare
employees [4]. Using the data storage will make their work
what kinds of storage a company uses nowadays and what kind
of storage suits its needs. In this paper, we proposed to look for
more accessible as it may lead to the efficiency and
and compare two storages, namely Cloud Storage and Network effectiveness of the company's performance. To improve the
Attached Storage (NAS) because these two systems are the most company's performance, the storage has to provide high
commonly used. Based on our research, both storages are good availability, durability, reliability, and scalability. Availability
depending on their needs. Even so, for the small companies, is related to replicated data to multiple servers to reduce a
Cloud Storage is the best choice as it cost lower, has easier single point of failure. Durability is related to the data which
configuration and data back-up, takes up a little space, can be accessed for an extended period without disk failure.
scalability, and many more. Reliability enables the correctness of data and scalability
reduces retrieving data time by accessing different servers [5].
Keywords—Cloud Storage, Network Attached Storage, Cloud Besides it, the type of storage the organizations or companies
based Storage, Cloud Computing, Architecture choose can also affect their business value. Therefore, the
companies need to choose what type of data storage to use in
I. INTRODUCTION accordance with their needs.
Storage is a digital device used to store various kinds of
There are different types of storage, and the most used
digital data, such as videos, documents, pictures, raw data, and
ones are cloud storage and Network Attached Storage (NAS).
others, for an indefinite period depending on the age and the
Cloud storage is a cloud computing model in which data is
care of the storage device itself. A data storage system
stored on remote servers accessed from the internet [6]. To put
integrates multiple hardware and software components to
it simply, cloud storage is storage in cloud computing and is
store, manage and protect user data [1]. Volatile storage
usually considered to be a cloud computing system with ample
(memory) and non-volatile storage are two significant types
capacity storage. Cloud storage provides functions such as
of storage that users are familiar with. Temporary data and
data storage and business access. It assembles many different
application workloads such as cache memory and Random-
types of storage devices through the application software,
Access Memory (RAM) are handled by volatile storage as the
which is based on the functions of the cluster applications, grid
computer's main storage, while non-volatile storage is a
techniques, distributed file systems, and many others [7].
storage mechanism often referred to as secondary storage
mechanism. Secondary storage functions to store data Meanwhile, NAS is an IP-based file-sharing device
permanently and requires I/O operations, for example hard attached to a local area network [8] and provides data by its
disks, USB storage, and other optical media. Currently, hardware or software. Cloud storage provides advantages of
storage devices have many physical forms and what is server consolidation by eliminating the need for multiple file
currently a concern of many users are virtual and online servers through file-level data access and sharing [9].
storage devices such as the cloud. Data storage in the cloud Nevertheless, developing a financially robust storage business
allows users to access data from multiple devices at the same case under the current market conditions remains a massive
time [2]. challenge for stakeholders and investors [10]. With this paper,

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

263 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

we hope to help people in business to choose the proper use the AES symmetric method for their encoding and
storage. decoding processes [17].
II. LITERATURE REVIEW According to Liu and Dong, Cloud Storage is a system that
functions as data storage and business access, which can be
A. Overview of Storage System described as storage in cloud computing, and as cloud
In storage systems, many companies need and want to use computing equipped with large-capacity storage [7].
this. However, the problem is how and which system is there According to [16], Cloud Storage is a service model where
or which is efficient to use. Because many people want to find data is maintained, managed, and backed up remotely
and need this storage, storage systems continue to be (remotely) and is available to users via the network or the
developed from time to time, so that it gives rise to lots of internet. According to [17], cloud Storage stores data on
ideas and more efficient and even other systems. As an servers that can be accessed via the internet so that users can
example, according to one paper, This paper proposes a new view the data anywhere as long as they are connected to the
concept, Cloud Energy Storage, to provide the same service to internet [17]. Cloud storage is one of the primary uses in cloud
these users at lower social costs [11]. This storage, one of computing. With cloud storage, data is stored on several third-
which also uses cloud computing. Cloud computing is also party servers, not on special servers used in traditional data
one way to help especially small companies stay ahead. Even storage [18], making data stored in cloud storage safer and
those who have not used Cloud Computing want to try the reassures users if there are damage or hardware problems.
benefits. The most commonly used storage systems are The essential use of cloud storage is that a user or client
Network Attached Storage (NAS) and Cloud Storage or Cloud can send files via the internet to a data server, and the data
Computing, which are now more in demand than NAS [6]. As server will record or record the information [19]. According
stated by the survey results in the journal, only 8% of to Obrutsky, our clients will also store our cloud storage data
companies that do not currently use Cloud Computing rather than the local system [20]. When a client wants to
services state that they want to implement a Cloud Computing retrieve data, they access the data server via a web-based
solution [12]. After many existing field surveys, it can be interface, and then the server will send the file back to the
proven that Cloud-hosted servers are one of the servers that client and allow the client to access and manipulate files on
can provide significant savings for small businesses [13]. The the server itself [19].
reason for the use of storage on the internet for businesses,
especially early businesses, as said by Attaran, the digital Where can we get this cloud storage? Many domestic and
revolution can help companies change their business to stay foreign companies have started to provide cloud storage
connected with customers, suppliers, and employees [14]. services. As Zeng said, et al. gave examples of companies that
provide IBM, Google, Sun Microsystems, Microsoft,
B. Cloud Storage Amazon, EMC, NetApp, HP, Nirvanix, HDS, Symantec, and
Cloud Storage stores data on servers that can be accessed others. Furthermore, there are also many platforms for using
via the internet so that users can view the data anywhere as this cloud storage, for example, HDFS, GFS, Sun
long if they are connected to the internet [6] and in the other Network.com, SkyDrive, Amazon S3, EMC Atoms, Data
side NAS still using the traditional method and use local ONTAP, HP Upline, CloudNAS, Hitachi Content Platform,
connection so users cannot access stored data if they are not FileStore, and KFS, and others [21]. Many Cloud Storage
connected to the NAS [12]. it is necessary to consider several service providers make many choices for users both in terms
factors in the context of implementing data storage technology of price, performance, and credibility of the company that
in the cloud, including reliability, scalability, security, data offers a reference. Therefore, service provider companies are
confidentiality, and data processing in encrypted form must be competing in offering many choices and features both in terms
the main consideration [15]. Since it was first implemented, of performance and security [22].
cloud computing has grown very quickly even though experts
have not agreed on the terminology itself. 1996 data shows After the meaning of Cloud Storage, the architecture is no
that cloud storage technology has been used and continues to less important for companies to know because companies
grow in many organizations, educational institutions, and must know more about this to get maximum results when
business companies [13]. using its services. According to Huang, et al., Cloud Storage
has three layers of architecture, as shown in Fig. 1, namely, a
Now the need for ample storage is mandatory for most service interface that provides a secure interface and API
users due to the ease with which we can get information, such services to clients and has been discussed to avoid violating
as sharing large videos. For example, Google Drive has cloud the rights of both.
storage services, or what we usually know as Google Drive.
In contrast, users only need to use the Google Drive
application to store their data practically [14]. We can also get
a 15GB trial of storage for free when registering to google
drive.
The number of cloud storage services makes many choices
for users regarding the company's price, performance, and
credibility that offers a reference. Therefore service provider
companies are competing in offering many choices and
features in terms of performance and security [16]. Usually,
the problem is the vulnerability of data leaks, the data on the
cloud is public, private, or hybrid. Usually, cloud companies Fig. 1. Architecture of Cloud Storage

264 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

The second layer is storage management, whose work et al., there are also disadvantages in using this service, such
efficiently manages resource allocation and data management. as low-level security encryption, the data stored is not always
The last layer of physical storage infrastructure deals with correct, there is a chance to be duplicate data, cloud storage
physical security and hardware [23]. requires more storage, and reliability on the system itself [29].
After knowing the advantages and disadvantages of using
Cloud Storage also has its characteristics described by cloud storage, security is one of the biggest threats seen [30].
Spoorthy, et al. The characteristics of cloud storage can be One way to prevent this is by using the ADA system. This
divided into eight types, namely, manageability is the ability system maintains the PLC in charge of sending commands and
to manage a system with few resources, the access method is receiving data from devices. According to experiments from
a cloud storage protocol, multi-tenancy is support for many Ghaleb and his team, we demonstrate that with open source
users, scalability is the ability to scale itself to meet very high tools and simple python scripts, then the program will know,
demand, data availability measures the active time of the and the traffic used by the PLC will be played back, checked,
system, control is the ability to control the system, cost, and modified [31]. Therefore there is quite fierce competition
performance, or others, storage efficiency measures the in the market for this cloud service provider by offering tools
efficiency of raw storage used, the cost is the cost of storage and also different levels of security according to the respective
[24].
cloud service providers [32], which as a regular user is not
There are three types of cloud storage: a public cloud, uncommon because there are various choices that make it
private cloud, and hybrid cloud. A public cloud is a domain difficult for us to choose a cloud service provider with all its
where the public internet is used to obtain cloud services advantages and disadvantages. For example, Google has a
provided or maintained by cloud storage providers [19]. A cloud storage service or what we usually know as Google
private cloud is a cloud that is maintained by an organization's Drive. In contrast, users only need to use the Google Drive
internal data center, which makes it easier to manage security, application to store their data practically without managing by
maintenance, or upgrades. It provides more control over the ourselves how to save it [33]. Hence, as users, we only need
deployment and usage. A hybrid cloud combines the public to manage files or files as we store files on our computer and
and private cloud where the private cloud is linked to one or let Google Drive do its job.
more external cloud services [25], so clients can transfer data
C. Network Attached Storage (NAS)
from one cloud to another through its interface abilities [18].
NAS system is a storage device connected to a network
Cloud Storage has many advantages, namely accessibility, and allows both storage and retrieval of data from a
ease of management, cost reduction, data handling (data centralized location. Network-attached storage was
backup), invisibility, and scalability. Accessibility can be seen introduced with the early file-sharing Novell's NetWare server
from how data can be accessed from anywhere and anywhere operating system and NCP protocol in 1983 [18]. NAS itself
with the help of the network or internet [18]. Clients can store is already classified as a technology that is quite outdated
and retrieve information data from cloud storage without when compared to cloud technology. However, NAS also has
making a direct connection to the data [26]. Ease of advantages such as NAS being able to connect even if the user
management can be seen from the simplified maintenance of does not have internet access [19]. In addition, security on the
infrastructure (hardware and software) with cloud storage. NAS is quite good because of the encryption system that has
Clients can easily manage their data only by using a web been optimized, which is flexible and straightforward [20].
browser with internet connectivity [25]. Cost reduction can be Another factor is that because the NAS is not always
seen from cloud storage services that provide solutions to cost connected to the internet, it makes the NAS itself more secure
reduction by reducing systems and workers required to than the cloud.
maintain the system [18]. Data handling (backup data) can be
seen from cloud storage services that provide data backup A Network Attached Storage (NAS) device is a network
facilities that are very helpful for an organization or company. device connected to a network infrastructure that provides
Data is stored in three different locations in cloud storage, and centralized file services and access for clients across multiple
if something happens, data can be quickly recovered [26]. platforms, thus enabling multiple users or clients to have
Cloud storage currently available is also assessed for its access to one centralized storage server [34], [35]. NAS
quality from data transfer speed, download speed, and upload devices are dedicated file servers attached directly to the
speed. Some cloud service providers set upload and download network intended for access by end-users using IP addresses
speed standards depending on what type of subscription the [35],[36]. The Network Attached Storage Server is used here
user chooses. Some files can also be integrated directly with as an independent device directly connected to the local
cloud service providers with synchronization features. For network. NAS allows users to store and retrieve large amounts
example, we want to synchronize data on our computers on of data more efficiently and is often used by small businesses
Microsoft OneDrive in Real-Time using the internet network. [34].
Therefore upload speed and download speed are very According to Jaikar, et al., NAS provides data based on its
influential in the quality of cloud storage itself [27]. hardware or software, and its storage unit also contains one or
Invisibility can be seen from cloud storage services that more hard disks with fewer features such as FreeNAS,
provide virtualization to provide resources to clients. Cloud FreeBSD, and others [37].
storage does not have physical or hardware devices from the
client-side, so cloud storage does not take up space [26]. After understanding the NAS, the company also wanted to
Furthermore, for scalability, cloud services can smoothly and see the architectural part of a NAS. According to Nagle, et al.,
efficiently scale the growing nature of the business with a NAS has two basic NAS architectures, namely NetSCSI and
more cost-effective, and this is also known as elasticity [28]. NASD. NetSCSI architectures make minimal changes to the
hardware and software that allow NetSCSI disks to send data
In addition to having many advantages, cloud storage is directly to the user. The second architecture is NASD, as
also known to have several disadvantages. According to Wu, shown in Fig. 2., it works to provide a command interface that

265 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

reduces the number of storage interactions with the user to be protocols on the NAS, the platform has the advantage of high
forwarded through file management, thereby avoiding file data storage security and good access [43]. These results prove
management bottlenecks [38]. that businesses can be developed with a cloud system, and
experiments in the future can make storage systems on the
internet more secure and with cheap and small power.
According to Mistry, et al., NAS is now starting to be
accepted because files can be sent across multiple servers, and
therefore, NAS has a mechanism that uses special tools to
connect to the network [44]. Even so, it cannot be denied that
NAS has the advantage that if there is no internet connection,
this type of NAS storage system will still run well without
being constrained by the absence of internet [45]. Therefore
some companies still use these two types of storage. So the
question is whether the use of NAS is still efficient in this era
of the Cloud, or should you switch entirely to using the Cloud
instead of half the local server and half the cloud server.
Google and Microsoft themselves consider that they have now
entered the era of cloud computing where most files or files
have been stored through the Cloud [46] without having to
care about space to store, like the traditional way of using hard
Fig. 2. Architecture of NASD. drives, for example.
NAS has many advantages, namely efficiency and TABLE I. COMPARISON BETWEEN CLOUD STORAGE AND NETWORK
reliability, flexibility, easy interaction, protected data, and ATTACHED STORAGE (NAS)
scalability. Efficiency and reliability can be seen from a NAS
Storage
that has its operating system, unlike a standard server, and Topic Network Attached Storage
uses various hardware and software to complete many diverse Cloud Storage
(NAS)
tasks. The NAS server itself consists of an efficient operating Users can access the data
Users can access the data
system as well as dedicated hardware and software Accessability
anywhere as long as they
even if they are not
components. Flexibility can be seen from the NAS storage can are connected to the
connected to the internet.
be used by multiple clients and heterogeneous servers across internet.
Data can be quickly Difficult to recover data if
the network. Seamless interaction can be seen from clients Data handling
recovered because data is it has poor configuration
who can build and deploy most NAS servers with ease as NAS (data back-
usually stored in at least and does not have any
up)
can support heterogeneous environments. There is no need to three different locations. storage back-up.
do any particular configuration on the NAS server when using Security level depends on
Security on the NAS is
it. Protected data can be seen in the presence of a possible disk quite good because of the
third-party companies
encryption system that
failure. Disk failures can occur in all types of disks and all Security Security Technology
has been optimized,
types of set-up structures. If data is left in an unprotected (e.g., Google with
which is flexible and
configuration, a simple disk failure can result in data loss [34]. Google Drive).
straightforward.
NAS has a data recovery feature so that it can recover data if Cloud services can NAS capacity,
things go wrong. smoothly and efficiently connectivity, and
scale the growing nature bandwidth can be scaled
Scalability
Moreover, NAS scalability in capacity, connectivity, and of the business with a without the limitations in
bandwidth can be achieved without the limitations in file more cost-effective file systems designed with
approach (elasticity). central servers.
systems designed with central servers [39]. On the other hand,
when compared to the Cloud where the NAS does not have
cloud-free synchronization features where we can back up our III. CONCLUSION
data anywhere as long as we are connected to the internet and
also the fact that NAS is vulnerable to damage or failure of its Cloud Storage and NAS are storages with the same
disk storage although it cannot be denied depending on purpose. Therefore, both are good and can be used depending
quality. Disk storage used [40], then the NAS itself also has a on their needs. Nothing is worse or better because all these
very high level of risk because not all NAS systems, especially storage have their advantages and disadvantages. However, in
the Low-end Budget NAS, have data recovery features and this paper, we aim to find out which storage can be used better
have a level of service life and High-end Budget NAS. for a company between NAS and Cloud Storage. Especially
companies that have just started their business, appropriate
One example of a defense system in storage is POR. Based storage can help and be used for a long time to improve their
on the paper made by Wang, it is said that a reliable POR performance. Therefore, the company needs to find suitable
system is made from traditional POR and trusted logs storage for their company. Thus, this research concludes that
combined with the DSBT scheme so that it is efficient and NAS is better at security than Cloud Storage because it has an
lightweight and maintains quality [41]. Similar to the NAS optimized encryption system. Still, Cloud Storage has more
systems described in the Lanka and Gargayes paper, the advantages than NAS in terms of cost, scalability, data back-
current model can show a significant reduction in power up, room space, and others. For small companies, Cloud
consumption due to an extra layer of security in virtual private Storage is the best choice as it is cheaper and has easier
networks [42]. This security issue has also been well tested, configuration rather than NAS. It also has good security,
according to a paper by Liu, after analyzing the current convenience, and reliability.
problems, when using NAS-based storage and the existing

266 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

REFERENCES on Interaction Sciences: Information Technology, Culture and Human,


pp. 1044-1048, 2009.
[1] O. V. Mamoutova, S. V. Shirokova, M. B. Uspenskij, and A. V.
Loginova, “The ontology-based approach to data storage systems [22] L. Peiyu and L. Dong, “The New Risk Assessment Model for
technical diagnostics,” E3S Web Conf., vol. 91, pp. 1–8, 2019, doi: Information System in Cloud Computing Environment,” Procedia
10.1051/e3sconf/20199108018. Engineering, vol. 15, pp. 3200–3204, 2011,
doi:2011.10.1016/j.proeng.2011.08.601.
[2] “What is Storage?” Techopedia.com, 25-Jun-2012. [Online].
Available: https://www.techopedia.com/definition/1115/storage. [23] C.-T. Huang, L. Huang, Z. Qin, H. Yuan, L. Zhou, V. Varadharajan,
[Accessed: 21-Apr-2021]. and C.-C. J. Kuo, “Survey on securing data storage in the cloud,”
APSIPA Transactions on Signal and Information Processing, vol. 3, p.
[3] D. Alsmadi and V. Prybutok, “Sharing and storage behavior via cloud e7, 2014.
computing: Security and privacy in research and practice,” Comput.
Human Behav., vol. 85, pp. 218–226, 2018, doi: [24] Spoorthy, V., Mamatha, M., & Kumar, B. S., “A survey on data storage
10.1016/j.chb.2018.04.003. and security in cloud computing,” International Journal of Computer
Science and Mobile Computing, 3(6), 306–313, 2014.
[4] A. Aljabre, “Cloud computing for increased business value,”
[25] Y. Jadeja and K. Modi, “Cloud computing - concepts, architecture and
International Journal of Business and Social Science , vol. 3, no. 1, pp.
234–239, Jan. 2012. challenges,” 2012 International Conference on Computing, Electronics
and Electrical Technologies (ICCEET), 2012.
[5] A. Jaikar, S. A. R. Shah, S.-Y. Noh, and S. Bae, “Performance Analysis
of NAS and SAN Storage for Scientific Workflow,” 2016 International [26] A. Ghani, A. Afzal Badshah, S. U. Jan, A. A. Alshdadi, and A. Daud,
Conference on Platform Technology and Service (PlatCon), 2016. “Issues and Challenges in Cloud Storage Architecture: A Survey,”
Researchpedia, vol. 1, no. 1, pp. 50–65, Jun. 2020.
[6] A. V. V and P. Samuel, “Preventing Pollution Attacks in Cloud
Storages,” Procedia Computer Science, vol. 143, pp. 812–819, 2018. [27] S. Alotaibi, H. Alomair, and M. Elhussein, “Comparing Performance
doi:10.1016/j.procs.2018.10.385 of Commercial Cloud Storage Systems: The Case of Dropbox and One
Drive,” 2019 International Conference on Computer and Information
[7] K. Liu and L.-J. Dong, “Research on Cloud Data Storage Technology Sciences (ICCIS), 2019, doi:10.1109/ICCISci.2019.8716385.
and Its Architecture Implementation,” Procedia Engineering, vol. 29,
pp. 133–137, 2012. [28] R. -A. -P. Rajan and S. Shanmugapriyaa, “Evolution of Cloud Storage
as Cloud Computing Infrastructure Service,” IOSR Journal of
[8] S. Mistry, J. Prajapati, M. Patel, and M. S. S. Saxena, “NAS (Network Computer Engineering (IOSRJCE), vol. 1, pp. 38-45, 2012.
Attached Storage),” International Research Journal of Engineering
and Technology (IRJET), vol. 7, no. 4, pp. 6571–6575, Apr. 2020. [29] J. Wu, L. Ping, X. Ge, Y. Wang, and J. Fu. Cloud storage as the
infrastructure of cloud computing. In Intelligent Computing and
[9] E. K. Boukas, Z. Liu and P. Shi, Delay-dependent stability and output Cognitive Informatics (ICICCI), 2010 International Conference on,
feedback stabilization of Markov jump systems with time-delay, IEE- pages 380–383. IEEE, 2010.
Part D, Control Theory and Applications, vol.149, no.5, pp.379-386,
2002. [30] A Venkatesh and M. S. Eastaff, “Data Storage Security Issues in Cloud
Computing,” Lect. Notes Data Eng. Commun. Technol., vol. 49, no. 1,
[10] G. Fong, R. Moreira, and G. Strbac, “Economic analysis of energy pp. 177–187, 2020, doi: 10.1007/978-3-030-43192-1_20.
storage business models,” 2017 IEEE Manchester PowerTech,
Powertech 2017, 2017, doi: 10.1109/PTC.2017.7980829. [31] A. Ghaleb, S. Zhioua, and A. Almulhem, “On PLC network security,”
Int. J. Crit. Infrastruct. Prot., vol. 22, pp. 62–69, 2018, doi:
[11] J. Liu, N. Zhang, C. Kang, D. Kirschen, and Q. Xia, “Cloud energy 10.1016/j.ijcip.2018.05.004.
storage for residential and small commercial consumers: A business
case study,” Appl. Energy, vol. 188, pp. 226–236, 2017, doi: [32] R. Shaikh and M. Sasikumar, “Trust Model for Measuring Security
10.1016/j.apenergy.2016.11.120. Strength of Cloud Computing Service,” Procedia Computer Science,
vol. 45, pp. 380–389, 2015, doi:10.1016/j.procs.2015.03.165.
[12] T. Vasiljeva, S. Shaikhulina, and K. Kreslins, “Cloud Computing:
Business Perspectives, Benefits and Challenges for Small and Medium [33] Y. U. Chandra and S. Hartono, “Analysis Factors of Technology
Enterprises (Case of Latvia),” Procedia Eng., vol. 178, pp. 443–451, Acceptance of Cloud Storage: A Case of Higher Education Students
2017, doi: 10.1016/j.proeng.2017.01.087. Use Google Drive,” 2018 International Conference on Information
Technology Systems and Innovation (ICITSI), 2018,
[13] M. Attaran and J. Woods, “Cloud computing technology: improving doi:10.1109/ICITSI.2018.8696095.
small business performance using the Internet,” J. Small Bus. Entrep.,
vol. 31, no. 6, pp. 495–519, 2019, doi: [34] A. B. Ahir and H. N. Rakhunde, “An Efficient Approach towards File
10.1080/08276331.2018.1466850. Storage and Sharing in Network,” International Journal of Science and
Research (IJSR), vol. 4, no. 4, pp. 82–84, Apr. 2015.
[14] M. Attaran, “Cloud Computing Technology: Leveraging the Power of
The Internet to Improve Business Performance.,” J. Int. Technol. Inf. [35] Y. Yaser, N. D. K, H. Sadiqzada, and B. A. Ayobi, “Open Source
Manag., vol. 26, no. 1, pp. 112–137, 2017. Solution for Centralized Storage System using Network Attached
Storage (NAS),” International Research Journal of Engineering and
[15] N. Chervyakov, M. Babenko, A. Tchernykh, N. Kucherov, V. Miranda- Technology (IRJET) , vol. 6, no. 12, pp. 537–541, Dec. 2019.
López, and J. M. Cortés-Mendoza, “AR-RRNS: Configurable reliable
distributed data storage systems for Internet of Things to ensure [36] E. Edelson, “Security in network attached storage (NAS) for
workgroups,” Network Security, vol. 2004, no. 4, pp. 8–12, 2004.
security,” Futur. Gener. Comput. Syst., vol. 92, pp. 1080–1092, 2019,
doi: 10.1016/j.future.2017.09.061. [37] A. Jaikar, S. A. R. Shah, S. Y. Noh, and S. Bae, “Performance Analysis
of NAS and SAN Storage for Scientific Workflow,” in 2016
[16] M. M. Saunshi, M. N, M. Ramesh, N. B. T, and V. M, “Efficient and
International Conference on Platform Technology and Service,
Secure Data Storage in Cloud Computing,” International Research
PlatCon 2016 - Proceedings, pp. 1–4, 2016.
Journal of Engineering and Technology (IRJET) , vol. 6, no. 5, pp.
4862–4868, May 2019. [38] D. F. Nagle, G. R. Ganger, J. Butler, G. Goodson, and C. Sabol,
[17] P.-W. Chi and C.-L. Lei, “Audit-Free Cloud Storage via Deniable “Network Support For Network-attached Storage,” In Proceedings of
Hot Interconnects, vol. 8, 1999.
Attribute-Based Encryption,” IEEE Transactions on Cloud
Computing, vol. 6, no. 2, pp. 414–427, 2018, doi: [39] M. T. O’Keefe, “Shared File Systems and Fibre Channel,” In Proc. 6th
10.1109/TCC.2015.2424882. Goddard Conf. on Mass Storage Sys. and Technologies, 1998.
[18] A. Liu and T. Yu, “Overview of Cloud Storage,” International Journal [40] J. Bright and J. Chandy, “A scalable architecture for clustered network
of Scientific & Technology Research, 2020. attached storage,” 20th IEEE/11th NASA Goddard Conference on Mass
[19] T. KamalaKannan, K. Sharmila, C. Shanthi, and R. Devi, “Study on Storage Systems and Technologies, 2003. (MSST 2003).
Proceedings.doi:10.1109/MASS.2003.1194857.
Cloud Storage and its Issues in Cloud Computing,” International
Journal of Management, Technology And Engineering, vol. 9, no. 1, [41] R. Wang, “Research on Data Security Technology Based on Cloud
pp. 976–981, 2019. Storage,” Procedia Eng., vol. 174, pp. 1340–1355, 2017, doi:
10.1016/j.proeng.2017.01.286.
[20] S. Obrutsky, “Cloud storage: Advantages, disadvantages and enterprise
solutions for business,” In Proceedings of the Eastern Institute of [42] A. Lanka and A. Garzevas, “Remotely Accessible, Low Power
Technology Conference, p. 10 , 2016. Network Attached Storage Device,” Proc. Int. Conf. Inven. Commun.
[21] W. Zeng, Y. Zhao, K. Ou, “Research on cloud storage architecture and Comput. Technol. ICICCT 2018, no. Icicct, pp. 1083–1088, 2018, doi:
10.1109/ICICCT.2018.8473164.
key technologies,” In Proceedings of the 2nd International Conference

267 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

[43] S. Liu, “Private cloud storage platform design and implementation [45] R. Katz, “Network-attached storage systems,” Proceedings Scalable
based on the NAS,” Proc. - 2017 Int. Conf. Comput. Technol. Electron. High Performance Computing Conference SHPCC-92., pp. 68–75,
Commun. ICCTEC 2017, pp. 641–644, 2017, doi: Jan. 1992.
10.1109/ICCTEC.2017.00144. [46] M. B. Vaidya and S. Nehe, “Data security using data slicing over
[44] S. Mistry, J. Prajapati, M. Patel, and M.-S., S. Saxena, “NAS (Network storage clouds,” 2015 International Conference on Information
Attached Storage),” International Research Journal of Engineering Processing (ICIP), pp. 322-325, Dec.
and Technology (IRJET), 2020, vol. 7. 2015,doi:10.1109/INFOP.2015.7489401.

268 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Usability Evaluation of Learning Management


System
Arief Darvin Jeffry Kosasih Stefanus
Computer Science Department Computer Science Department Computer Science Department
School of Computer Science School of Computer Science School of Computer Science
Bina Nusantara University Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia Jakarta, Indonesia Jakarta, Indonesia
[email protected] [email protected] [email protected]

Novita Hanafiah
Computer Science Department
School of Computer Science
Bina Nusantara University
Jakarta, Indonesia
[email protected]

Abstract— In the current era of globalization, there are a Regarding the use of the website, User Experience (UX)
lot of things have been done with technology. One of them is is an important factor in estimating whether a website that is
education. LMS is an example implementation of technology in created is adequate and accepted by its users, such as users
Education. LMS is a platform that provided by schools or seeing what its uses are, what their needs are or the
universities, to support teaching and learning activities. Many limitations of a website. This makes UX become a form of
literatures show that with the LMS, online learning can be describing user interaction with a website that involves the
fulfilled properly. To make an LMS comfortable and easy to entire product process design including aspects of the design
use, it is necessary to have a good user interface design and and function or usability of a product, where UX itself is
user experience. User interface and user experience are a very
formed due to user interaction with the user interface of a
important part of developing an LMS. Without a good user
interface and user experience, the LMS cannot be used by the
product. User Interface (UI) is an important component in an
user properly. Based on the research, there are two methods application or software. An unattractive user interface will
commonly used to evaluate user interface design and user cause users to feel uncomfortable using an application.
experience. Usability measurement and heuristic evaluation The purpose of conducting this research is to analyze the
are two methods that are always used to evaluate user interface user interface of a Learning Management System. So that
and user experience. In addition, there is an important part of users can find out the evaluating and analyzing method used
the LMS, it was features. Based on the research, the most
from the website through this research. Then, this research
common features in LMS are discussion forums and learning
can also provide assistance to developers to be able to
materials. Therefore, users can discuss the materials each
other and also share learning materials.
customize their website to suit the user's wishes.
II. LITERATURE REVIEW
Keywords—User Interface; User Experience; Learning
Management System A. Learning Management System
I. INTRODUCTION Learning Management System or LMS is a system where
users can access several things that support the learning
In the current era of globalization, the development of system. The Learning Management System can be accessed
digital technology continues to increase rapidly which makes by users via computers and cell phones as long as it is
communication and information exchange activities become connected to the internet network. LMS can be accessed by
easy. This has an impact on various sectors including teachers or students to support learning. Currently, students
education. So it is hoped that the development of digital want an LMS that is mobile-friendly so that it can attract
technology in the education sector can help and simplify the students' attention to use it [3]. LMS becomes a media,
learning process for all aspects. platform or special service which contains activities, content,
One technology application in education is the Learning and other things that can support the process of educational
Management System (LMS), which is a system designed to programs that are carried out by combining technological
support learning activities which are owned by each developments so that they can expand the learning process in
university. In the Learning Management System, users can the world of education in different spheres[4].
access learning schedules, collect assignments, share B. User Interface
material between friends and lecture, and collaborate on
discussion forums [1]. With a learning system like this, User interface is a design that helps users to make it
students and lecturers can have material without having to easier to do activities , where the user interface affects the
print it first because it was costly and not environment user experience for users in accessing a system especially in
friendly. The use of a Learning Management System are Learning Management Systems (LMS) for doing learning
materials that will be an effective way to solve problems in activities. A good user interface design when the user can
learning material needs [2]. easily and intuitively interact with the application or the

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

269 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

system [5]. Therefore, there are three important principles to the easy-to-understand design, the LMS is very efficient to
guide the design of effective user interfaces: use so that users are not confused about how to use the LMS
[8]. LMS is also responsive to this, so users know about the
(1) place the user in control, deadlines for upcoming tasks or projects by notification from
(2) reduce the user's memory load, and LMS [9]. There are also competencies on LMS to be
achieved, supporting materials and links to related learning
(3) make the interface consistent. resources. The LMS also displays all the scores that have
To achieve an interface that abides by these principles, an been achieved or achieved by the students and the average of
organized design process must be conducted. User interface these scores [10].
design begins with the identification of user, task, and The table shows that mostly there are some features on a
environmental requirements [5]. Learning Management System, such as discussion forums
C. User Experience and learning materials. These two features are important on a
Learning Management System. So we can say that a good
User experience is one of the important factors that can
Learning Management System must have those two features.
support success in building a website. A good website is a
website that considers and pays attention to user convenience A. Usability Measurement
so that it does not cause difficulties for users when accessing Usability consists of several characteristics that will be
the website. UX ensures that the website created can meet the point of whether a system has a usability element or not,
the needs and can be used by users with a pleasant the usability characteristics used according to [21] are shown
experience. If a website that is used is not designed by taking in Table II.
into account the aspect of a good user experience, it will
result in the needs of users in using the website being TABLE II. USABILITY MEASUREMENT TABLE
unfulfilled[6].
No Usability Usability Measurement Aspect
TABLE I. LEARNING MANAGEMENT SYSTEM FEATURES Attribute

Ref. Discussion Grade Assign Learning Announc 1 Learnability Ease of use and adjustment to a system
Forum book ment Materials ement
2 Efficiency Time and speed in completing work when using
[11] v v the system

[12] v v v v 3 Memorability Ability to remember feature functions without


re-leaving the guide
[13] v v v v v
4 Error Anticipation of errors that occur due to users
[14] v v v and systems

[15] v v v 5 Satisfaction Level of user satisfaction during and after using


the system
[16] v v v
B. Heuristic Evaluation
[17] v Heuristic Evaluation is a method used to find problems in
a system related to usability with various aspects of the
[18] v v v assessment to make it easier to find problems both in
usability and overall to improve the system for the better [22].
[19] v v v
The following aspects of the heuristic evaluation proposed
[20] v by Nielsen are shown in Table III.

TABLE III. HEURISTIC EVALUATION TABLE

III. STUDY REVIEW No Heuristic Evaluation Aspect


Evaluation
Learning Management System is an application that
helps students to provide learning materials easily. So the 1 Visibility of system Inform users of what is going on, through
Learning Management System must have several main status appropriate feedback in a reasonable time.
features to support the system. Some basic features usually
exist on a Learning Management System such as download 2 Match between The concept of features or language used in
and upload materials, assignments by lecture or teacher. system and the real daily life
Because of that, students can easily access the materials world
everywhere and anytime they can. Then, there are other
3 User control and The ability of users to easily control or
features on the Learning Management System such as freedom perform activities in the system, either
discussion forum which help student and lecture to discuss selecting or canceling
the material easily[7].
4 Consistency and Use of the same features or functions as
The UI element that makes a good LMS is that in the standards other systems that comply with industry
LMS there are things such as a design that is easy to standards for system manufacture
understand so that users feel interested to see it. In addition

270 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

5 Error prevention Designing features well to reduce errors that


hoped that the appearance of the LMS can be better and
can be caused by users easier to use.
One of the most important on an LMS are features. An
6 Recognition rather Ease for users to recognize in the system
LMS cannot run if there are no features anymore. Based on
than recall compared to remembering processes or
information in the system the research, there are several features that are usually
present in LMS. But, there are two features that are most
7 Flexibility and Provides elements of usability and ease of used in LMS. Discussion Forum and Learning Materials.
efficiency of use use for both new and experienced users These features are important to support the LMS so people
who use the LMS can download and upload learning
8 Aesthetic and Display important information that is materials either lectures or students. Discussion forums are
minimalist design relevant according to the needs and purposes also useful for students who don’t understand the materials,
of its manufacture
so they can discuss the materials with their friends or lectures
9 Help users Recognize errors or problems that occur by
easily. There are also other features that are useful for the
recognize, diagnose, displaying messages or notifications by the LMS such as gradebook, assignment, and announcement.
and recover from system Therefore, the conclusion of the research is a good LMS is
errors an LMS that has good features and design of the LMS
following the evaluation method especially using usability
10 Help and Provide useful help and documentation as a measurement and heuristic evaluation.
documentation guide for users in using the system

IV. RESULT AND DISCUSSION


REFERENCES
The results of our research show that there are several [1] I. Han and W. S. Shin, “The use of a mobile learning management
LMS features that are generally always present in LMS such system and academic achievement of online students,” Computers
as discussion forums and learning materials. These features and Education, vol. 102, pp. 79–89, 2016, doi:
are needed to support the needs of students and lecturers in 10.1016/j.compedu.2016.07.003.
conducting online teaching and learning activities. The [2] Z. Nurakun Kyzy, R. Ismailova, and H. Dündar, “Learning
discussion forum feature is one of the most common features management system implementation: a case study in the Kyrgyz
Republic,” Interactive Learning Environments, vol. 26, no. 8, pp.
found in an LMS. This feature is commonly used by teachers 1010–1022, 2018, doi: 10.1080/10494820.2018.1427115.
and students to discuss learning material. So if students do [3] Y. J. Joo, N. Kim, and N. H. Kim, “Factors predicting online
not understand the learning material, they can directly ask university students’ use of a mobile learning management system (m-
questions and discuss through the discussion forum. Then LMS),” Educational Technology Research and Development, vol. 64,
there is also a learning material feature in which students and no. 4, pp. 611–630, 2016, doi: 10.1007/s11423-016-9436-7.
lecturers can access learning materials that will be used [4] M. Siregar, R. I. Rokhmawati, and H. M. Az-zahra, “Evaluasi
during teaching and learning activities in LMS. Besides these Usability dan Pengalaman Pengguna Website Zenius . net
Menggunakan Metode TUXEL : A Technique for User Experience
two features, there are other features such as gradebook, Evaluation in e-Learning,” Jurnal Pengembangan Teknologi
assignment, and announcement that can also support the Informasi dan Ilmu Komputer, vol. 3, no. 5, pp. 5058–5067, 2019.
needs of users who use LMS. [5] Sridevi.S, “User Interface Design,” User Interface Design, vol. 2, no.
2, pp. 415–426, 2014, doi: 10.1201/9780203734544.
We are also doing research about the evaluation method
[6] L. Hardiansyah, K. Iskandar, and H. Harliana, “Perancangan User
used to assess an LMS running well or not with several Experience Website Profil Dengan Metode The Five Planes (Studi
aspects of usability measurement. Usability measurement kasus: BP3K Kecamatan Mundu),” Jurnal Ilmiah Intech : Information
aims to convince users on the UX of an LMS. So when an Technology Journal of UMUS, vol. 1, no. 01, pp. 11–21, 2019, doi:
LMS is used by the user, the user does not hesitate to use the 10.46772/intech.v1i01.34.
LMS because it has been evaluated by the usability [7] P. Ramakrisnan, A. Jaafar, F. H. A. Razak, and D. A. Ramba,
measurement. There are several usability attributes that are “Evaluation of user Interface Design for Leaning Management
System (LMS): Investigating Student’s Eye Tracking Pattern and
commonly used to evaluate UX such as Learnability, Experiences,” Procedia - Social and Behavioral Sciences, vol. 67, pp.
Efficiency, Memorability, Error, and Satisfaction. In addition 527–537, 2012, doi: 10.1016/j.sbspro.2012.11.357.
to evaluating the UX of an LMS, evaluation of the UI of an [8] A. T. Wibowo, I. Akhlis, and S. E. Nugroho, “Pengembangan LMS
LMS is also important. The method used to evaluate UI is (Learning Management System) Berbasis Web untuk Mengukur
Heuristic Evaluation. There are also several aspects that are Pemahaman Konsep dan Karakter Siswa,” Scientific Journal of
used to evaluate the UI which can be seen in table 3.3. Informatics, vol. 1, no. 2, pp. 127–137, 2014, [Online]. Available:
https://journal.unnes.ac.id/nju/index.php/sji/article/view/4019/3633
V. CONCLUSION [9] F. S. Anggriawan, “Pengembangan Learning Management System (
Lms ) Sebagai Media Pembelajaran Untuk Sekolah Menengah,”
Based on the results of the research that has been done, Jurnal Kependidikan: Penelitian Inovasi Pembelajaran, no. ellis, pp.
the conclusion is that the User Interface and User Experience 1–10, 2009.
are important parts of an LMS. Without a good User [10] M. Munir, “Penggunaan Learning Management System (Lms) Di
Interface and User Experience, an LMS will not be perfect. Perguruan Tinggi: Studi Kasus Di Universitas Pendidikan Indonesia,”
So, there are several methods used to User Interface and User Jurnal Cakrawala Pendidikan, vol. 1, no. 1, pp. 109–119, 2010, doi:
10.21831/cp.v1i1.222.
Experience so that they can be implemented properly in an
[11] D. A. Back, F. Behringer, N. Haberstroh, J. P. Ehlers, K. Sostmann,
LMS. In this study, we found two methods that are always and H. Peters, “Learning management system and e-learning tools:
used to evaluate the User Interface and User Experience of An experience of medical students’ usage and expectations,”
an application. Both of these methods can be applied to International Journal of Medical Education, vol. 7, pp. 267–273,
evaluate User Interface and User Experience in LMS. The 2016, doi: 10.5116/ijme.57a5.f0f5.
two methods are Usability Measurement and Heuristic [12] A. Ban, N. C. Pa, J. Din, and N. A. Mohd Yaa’Cob, “Integrating
Evaluation. By evaluating using these two methods, it is social collaborative features in Learning Management System: A case

271 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

study,” 2017 IEEE Conference on e-Learning, e-Management and e- Learning,” International Journal of Education and Development
Services, IC3e 2017, pp. 67–72, 2018, doi: using Information and Communication Technology (IJEDICT), vol.
10.1109/IC3e.2017.8409240. 17, no. 1, pp. 45–64, 2021.
[13] N. DERAKHSHAN, “Student and Faculty Perceptions of the Features [19] M. Ouadoud, A. Nejjari, M. Y. Chkouri, and K. E. el Kadiri,
of Mobile Learning Management Systems in the Context of Higher “Educational modeling of a learning management system,”
Education,” 2012. Proceedings of 2017 International Conference on Electrical and
[14] M. J. Asiri, R. B. Mahmud, K. Abu Bakar, and A. F. bin Mohd Ayub, Information Technologies, ICEIT 2017, vol. 2018-Janua, pp. 1–6,
“Factors Influencing the Use of Learning Management System in 2018, doi: 10.1109/EITech.2017.8255247.
Saudi Arabian Higher Education: A Theoretical Framework,” Higher [20] G. W. Wicaksono, G. A. Juliani, E. D. Wahyuni, Y. M. Cholily, H.
Education Studies, vol. 2, no. 2, pp. 125–137, 2012, doi: W. Asrini, and Budiono, “Analysis of Learning Management System
10.5539/hes.v2n2p125. Features based on Indonesian Higher Education National Standards
[15] N. Emelyanova and E. Voronina, “Introducing a learning using the Feature-Oriented Domain Analysis,” 2020 8th International
management system at a russian university: Students’ and teachers’ Conference on Information and Communication Technology, ICoICT
perceptions,” International Review of Research in Open and Distance 2020, 2020, doi: 10.1109/ICoICT49345.2020.9166459.
Learning, vol. 15, no. 1, pp. 272–289, 2014, doi: [21] N. Phongphaew and A. Jiamsanguanwong, “Usability Evaluation on
10.19173/irrodl.v15i1.1701. Learning Management System,” Advances in Intelligent Systems and
[16] R. Garrote Jurado, T. Pettersson, A. Regueiro Gomez, and M. Scheja, Computing, vol. 607, no. July, pp. 1–15, 2018, doi: 10.1007/978-3-
“CLASSIFICATION OF THE FEATURES IN LEARNING 319-60492-3.
MANAGEMENT SYSTEMS,” 2014. [22] H. P. Nugraha, A. D. Herlambang, and H. M. Az-Zahra,
[17] M. K. Ishak and N. B. Ahmad, “Enhancement of Learning “Perbandingan Usability Pada Learning Management System Moodle
Management System with adaptive features,” Proceedings of the 2016 dan Edmodo Dengan Menggunakan Metode Heuristic Walkthrough,”
5th ICT International Student Project Conference, ICT-ISPC 2016, Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, vol.
pp. 37–40, 2016, doi: 10.1109/ICT-ISPC.2016.7519230. 3, no. 4, p. 7, 2019.
[18] E. Araka, E. Maina, R. Gitonga, R. Oboko, and J. Kihoro, “University
Students’ Perception on the Usefulness of Learning Management
System Features in Promoting Self-Regulated Learning in Online

272 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Self-Checkout System Using RFID (Radio


Frequency Identification) Technology: A Survey
Fachrurrozi Maulana Nixon Rizky Prawira Putra
Computer Science Department Computer Science Department Computer Science Department
School of Computer Science School of Computer Science School of Computer Science
Bina Nusantara University Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia Jakarta, Indonesia Jakarta, Indonesia
[email protected] [email protected] [email protected]

Novita Hanafiah
Computer Science Department
School of Computer Science
Bina Nusantara University
Jakarta, Indonesia
[email protected]

Abstract—RFID (Radio Frequency Identification) is a sight for data transmission, and is very effective in harsh
combination of radio frequency technology and microchip environments where barcode labels may not work Therefore,
technology, and as an alternative to barcodes microchipped in RFID is the general term for technologies that use radio waves
tags to store and transmit detailed information about tagged to automatically identify people or objects.
items. The number of RFID applications in everyday life is due
to the convenience it provides, one of which is the self-checkout Supermarkets are the place where customers come to
system. In this literature review, the authors have analyzed purchase their daily using products and pay for that. So there
various techniques that can be used to implement RFID in a self- is a need to calculate how many products are sold and to
checkout system through methodological and model analysis. generate the bill for the customer. Cashier’s desks are placed
The purpose of this study is to explore various possible in a position to promote circulation. The supermarkets have
applications of RFID technology in a self-checkout system. The problems with the customer queue in cashier’s, So we have an
authors have identified 30 different research papers in the idea to implement RFID in solving queuing problems by
period 2010 - 2021. The analysis shows that each selected making self-checkout for consumers. With the development
research study has achieved respectable, but imperfect results, of our society, supermarkets have been part of our daily life.
as each study points to its own unique strengths and weaknesses. Due to the wide variety of commodities in the market, we can
There are several methods that may suit the application of RFID buy anything we want. Customers may waste a lot of time on
technology to the self-checkout system, namely by using mobile searching what they need. The program is intended to allow
devices through mobile applications, smart shopping carts,
customers to feel the convenience that the Internet of things
cloud systems and cloud database software.
smart supermarket brought about to people's lives and
Keywords—RFID, Self-Checkout, Internet of Things (IoT), understand what Internet of things and how it affects people's
Smart Cart, Automatic Identification Technology lives really and truly. In the smart supermarket, we will never
hear customers complain about queuing up for shopping and
I. INTRODUCTION checkout.
Self- Checkout System is a system developed to detect II. RELATED WORK
several objects at the same time. The system used RFID
Technology to detect objects that are inserted into the self RFID is a method of automatic identification that can store
checkout machine using sensors to scan the RFID in the and retrieve data based on Radio Frequency. RFID technology
object. IOT (Internet Of Things) has gained widespread makes shopping easy. The efficiency of the system can be
acceptance in various walks of life. The Internet of Things increased with the help of a PIC microcontroller and RFID
forms a network of objects that are interconnected and capable technology. It promotes supermarket sales and provides
of communicating with other objects in the network. convenience when buying something. With the presence of
RFID, it can improve the performance of a store and make it
RFID (Radio Frequency Identification) is a combination work efficiently by using RFID which has many advantages.
of radio frequency technology and microchip technology, and RFID application leads to significant savings in staff costs,
is considered to be one of the most important applications in enhances services and provides efficient results, which leads
all fields such as toll collection, automobiles, packaging and to almost foolproof security and access control. It only
loading and unloading warehouses, and retail warehouses. provides a constant update of library collection, proper
This kind of technology enables other companies and holding management, but also accomplishes real-time
organizations to understand the advantages of RFID. services. [14] [23] [29]
According to AIDC (Automatic Identification and Data
Capture), "RFID is a technology that uses radio waves to The world today is moving immensely towards
transmit data between a reader and an electronic tag attached automation with the rapid advancements in technology. In
to a specific object. The typical use is to identify and track the order to minimize the time required for checkout processes
object." RFID is an alternative to barcodes, barcodes use today, there is a need to develop an automated and simple
microchips on tags to store and transmit detailed information billing procedure. Auto-Checkout system is a system
about the tagged items. RFID offers advantages over developed to detect several objects at the same time. The
barcodes, such as the ability to store more data. The ability to system used RFID technology to wirelessly detect the
change stored data during processing does not require line of groceries as trolleys passing the checkout counter. Auto-

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

273 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

checkout systems solved the problem of customer queues TABLE I. PRODUCT DATA COLLECTION METHOD COMPARISON

getting longer and making it fast of queues. For future Refere The method Advantages Disadvantages
development, the upgrade of this auto-checkout system with nces to obtain
auto-debit from a bank account. The customer can save their product data
time and be more efficient. [2] [13] [25] [8] [12], Record The data will still be The information
[14], product stored properly and data recorded is
The consumers are satisfied with the level of digitalization [25] information safely, because it is very large so it is
in retail. Self-service cash registers and other innovations have then enter into stored in a cloud necessary to be
cloud database careful in recording
been well received. In addition, they believe that the current database the information data
level of innovation is sufficient. No matter what suits the database from the product
presence of innovation and technology in stores, they still like [24], Implementing Very flexible -
to meet sales staff and appreciate human contact. Consumers [6], into RFID because, all data on a
have expressed concern about jobs due to the digitalization of [8], tags product is only
[18] enough to be stored
business and the introduction of innovative solutions. [5] [9] on an RFID tag
[17] [8], Displaying All product -
[28], data using the information data can
III. STUDY REVIEW [24] LCD monitor be easily seen by
owner and customers
A. Plamning the Review
This study conducted a comprehensive survey of research 2) RQ2: What kind of self-checkout system design has
on Self-Checkout System Using RFID (Radio Frequency
been implemented at this time?
Identification) Technology. In order to analyze all the papers
and extract important information, the writer created several The self-checkout system design that has been
research questions. Research questions that aim to facilitate implemented at this time is that every product in the store is
the process of data collection and analysis are: installed with an RFID tag, then when the customer wants to
pay, the customer can put the purchased item into a basket that
• RQ1: What method are used to obtain product data has been installed with an RFID reader in the form of a sensor
for the Self-Checkout System? that will read the RFID tag. After reading the RFID tag, it will
• RQ2: what is proposed system for Self-Checkout display product information such as product type, price, with
System? the help of an LCD monitor that is installed together with the
• RQ3: what are advantages and disadvantages of basket and customer can choose the payment methods that
using RFID in a Self-Checkout System? available in store [22] [4] [20].
• RQ4: What types are there in RFID? Each customer can find out the total price to be paid while
adding or removing the products in the basket itself. After the
B. Analysis customer makes a payment, the product database system will
After collecting the papers, we analyzed the eligible automatically update the product stock currently available in
papers based on the research questions. Then, we compare the the store and warehouse to the warehouse database system, so
methods used to see their advantages and disadvantages. We that the store owner can find out the amount of product stock
present this comparison in the form of the table provided. The availability in the store, and the store owner can find out when
following is our analysis of the reviewed papers. the product must be ordered again. to the vendor if the stock
of existing products in the store starts to run out [30] [2] [19]
1) RQ1: What method are used to obtain product data for
In addition, there is the use of a self-checkout system using
the Self-Checkout System?
a mobile application, where the application is not much
To obtain product data for the Self-Checkout System, different from the previous system but there are special
usually by recording things such as product information and features when using a mobile application. Implementation of
product stock available in the warehouse. The product the system using a mobile application, usually when the
information that is recorded is usually the product name, customer enters the store, the customer will receive an RFID
product expiration date, description of the nutrients contained tag adapted as a virtual shopping basket (VSB), where
in the product and others. All recorded product information communication between the RFID tag and the RFID reader is
will be entered into a cloud database, which will then be installed on the product shelf, on the shop, which is connected
implemented into an RFID tag [12] [14] [1]. The RFID tag directly with the mobile application. In the application of this
will contain product data such as product ID, product name, system, every product taken by the customer from the shelf
date of manufacture, name of manufacture, cost of the will display product information to the mobile application. In
product, any special discount being offered on that product, addition, the system can also read the customer's behaviour so
shelf life, payment info and its RFID tag serial number [4] [5] that the system will offer product recommendations that might
[8]. attract the customer's attention [2] [6] [11] [17]. When making
payments, the application of the system used is still the same
After the data is implemented in the RFID tag, the tag can as the implementation of the previous system, namely using a
display information from the product data using an RFID basket installed with an RFID reader, then displaying product
reader using a sensor or scanner to identify the RFID tag. After information and the total price paid using an LCD monitor [4]
the RFID tag is read by the RFID reader, product data [8] [14] [27]. After the customer makes a payment, the system
information contained in the tag will display information from will automatically update the product stock availability in the
product data with the help of using an LCD monitor. [23] [11] store and warehouse to the Warehouse database system, but
[18] [7]. because it uses a mobile application, product stock can also be
updated to the mobile application so that shop owners can

274 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

continue to monitor product stock anywhere without having to technology has a flexible form so that it is easier to use it in
come to the store. warehouse. In addition to updating product various places and situations. The process of reading
stock at the store, this system can also update product stock to information can be done easily because the field or shape does
the customer's mobile application, so that when the customer not affect the reading and also the data can run faster. RFID
wants to shop at the store and it turns out that the product they has a flexible reading distance, depending on the antenna and
want to buy is out of stock, the customer does not need to come the type of RFID chip that will be used. For example, such as
to the store which is very flexible for customers. [13] [5] [15] the calculation of stock in conveyor belts, auto payments for
[17]. toll roads and also gate access [3], [7], [17], [22]

TABLE II. SELF-CHECKOUT SYSTEM DESIGN COMPARISON


Although RFID has many advantages as described above,
it also has some disadvantages. Information collisions can
Part of Device/tools Description Mechanism Refer occur if there is one RFID chip in one reader at the same time.
System ences
If there are two freqs on the reader in one area, it can give
wrong information to the data processor or to the computer.
Customer Mobile Customer Customer turns on [10], That way, the level of accuracy will also decrease. The level
Application mobile bluetooth when enter [24], of data security can decrease slightly if the data is read by a
device store and will be [26], reader used by an irresponsible person [7], [26], [18], [25]
directly connected to [18]
store offerings and TABLE III. RFID ADVANTAGE AND DISADVANTAGE COMPARISON
products via RFID
frequency Reference Advantages of using Disadvantages of using
RFID RFID tags every time a [2], RFID RFID
installed on customer picks up a [7], [16], [18], RFID in the self-checkout Information collisions can
each product, the [30] [25] system can accommodate occur
product and application will offer more product data
product a product [7], [17] Has a flexible form The level of accuracy will
shelf recommendation also decrease
according to [22], [3], The process of reading The level of data security
customer behavior [7] information can be done can decrease
when taking the easily
product [26], [18] The data can run faster
basket every time a [2], [3], [7] RFID has a flexible
RFID customer puts his [22], reading distance
reader with groceries into a 14]
LCD basket, it will display
monitor information about 4) RQ4: What types are there in RFID?
mounted in the product and the
the basket total price to be paid
At the highest level, we can divide RFID devices into 2
with LCD monitor. categories: active and passive. Active RFID systems use
Store Mobile Store owner will update the [15], power that continuously send their own signals. Active RFID
Owner Application mobile product stock that is [23], tags are usually used as "beacons" to accurately track plus
device connected to the [30] locations in real time or in high-speed environments admire
warehouse database toll stations. The signal range is far larger than passive tags,
system
Cloud Cloud a place to store [17],
however they're also far more expensive. Active RFID
Database Database product information [1], systems have three essential parts like a reader, antenna, and a
System System and product stock, as [6] tag. Active RFID connects to power, offers infrastructure or
software well as to update uses energy held in internal batteries. Restricted by the stored
product stock energy and limited by the quantity of reads that the device
RFID RFID tag the data stored in the [19], should perform. [22] [8] [9] [17]
attached to cloud database will [24],
each be implemented into [18] Passive RFID systems use tags that don't have an internal
product the RFID tag and power source, however instead use magnetism energy
then installed on
each product transmitted by the RFID reader to work. Passive RFID tags
are utilized in applications such as access control, file
tracking, competition synchronization, offer chain
3) RQ3: what are advantages and disadvantages of using
management, and smart tags. The bottom value per tag makes
RFID in a Self-Checkout System? passive RFID systems economical in several industries.
RFID is an identification technology that turns out to be Passive RFID doesn't need batteries or maintenance. In
easier to use and flexible, so it is very suitable when used in addition, these labels are terribly durable and small enough to
automated operations. There are many RFID implementations be attached to practical self-adhesive [17] [24] [11]
around us, one of which is the self-checkout system. The Magnetic induction and electromagnetic (EM) wave
implementation brings change and impact. capture are the most variations between active RFID and
When compared to other similar devices, RFID is much passive RFID. The electromagnetic characteristics related to
more convenient to use because of its convenience. In HF antennas don't apply to the close to field or the way field.
addition, this technology is also more difficult to counterfeit, depending on the tag type, each will transmit enough energy
because RFID itself has provided a much higher level of to the remote tag to take care of its [6] [24] [14] [3].
security compared to others. The use of RFID in the self-
checkout system can accommodate more product data without
using other tools so that it has better economic value. RFID

275 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

TABLE IV. RFID TYPE COMPARISON [22], [8], Scanning Mobile Smartwatch Simple and
[9], [17], RFID tags Applica does not
Reference Type Characteristic Advantage Disadvantag [24], send the tion require any
s e [11], [6], information to smartphone
[22], [8], Active three essential The signal far more [24], mobile apps just bring your
[24], parts like a range is far expensive [14] via your smartwatch
[14], [3] reader, antenna, larger than smartwatch
and a tag passive tags
[22], [3], Scanning to Smart RFID tech Does not
[9], [17], usually used as have their Cannot use [4], [5], RFID tag in shoppin on smart require
[24], to accurately own power without [8], [23], shopping cart g cart shopping personal
[11], track plus supply battery [18], [7], and display it cart devices
[24], [3] locations in [4], [27] on LCD
real time shopping cart
[24], connects to high-speed screen
[11], [6], power, offers transfer rate [24], Scanning to Cloud Cloud No need to
[24], infrastructure [14], [1], RFID reader System device bring any
[14], [3] or uses energy [23], an update to account device, high-
held in internal [18], cloud system speed transfer
batteries [29], rate
[22], [8], the systems use Rugged [30],
[9], [3] power that device [12], [5],
continuously [3]
send their own [22] [6] Displaying Cloud Cloud No need to
signals [24] [14] product data System Database record or
[8], [9], Passiv don't have an doesn't need Smaller [3] [4] and stored System update
[17], e internal power batteries or signal [5] [8] stock, and Software product and
[24], source regular ranger [19] [28] displaying stock data
[11], [3] maintenance [27] [19] product stock manually by
[22], [8], use magnetism Smaller Less updates in the using
[24], energy device size effective store handwriting
[14], [3] transmitted for and Much around recorded on
power source cheaper water, the book
metal and
out door V. CONCLUSION
[17], utilized in Can last a
[24], applications lifetime In this study, we have reviewed 30 research articles related
[11], [6], such as access without a to RFID (Radio Frequency Identification) technology. From
[14], [3] control, file battery this research, the use of RFID can be seen from 5 general
tracking,
competition
sections: Device/Tools, Description, Method to obtain product
synchronizatio data, Advantage, Disadvantage. Mobile devices through
n, offer chain mobile applications are most often used as Self-Checkout
management, tools because getting the required product data is easier, more
and smart tags effective and efficient without the need for other devices such
as smart shopping carts, cloud accounts or cloud database
IV. RESULT AND DISCUSSION
software.
This section provides an overview of the methods used in
the Self-Checkout System Using RFID to obtain data from the From the advantages and disadvantages of each use of
product. It is also clarified with the types of devices or tools RFID based on this section, it can be seen that the use of
that are suitable for use and the advantages of each of the mobile devices through mobile applications is better. This is
existing methods. because users can shop more freely and easily pay for these
purchases. What he buys can be easily controlled with just his
There are many methods that can be used to obtain product personal mobile device. Overall, this research is very in-depth
data in the Self-Checkout System Using RFID process. as it shows various ways of using RFID to help self-pay
although each method has its own advantages, we decided to systems.
combine them into one table for comparison with each other.
REFERENCES
TABLE V. PAPER RESULT OVERVIEW [1] Athauda, T., Marin, J. C., Lee, J., & Karmakar, N. (2018). Robust low-
cost passive UHF RFID based smart shopping trolley. Department of
Referen Method to Device/ Description Advantage
Electrical and Computer Systems Engineering Monash University, 1-
ce obtain Tools
7.
product data
[14], [3], Scanning Mobile Smartphone Simple and [2] Bocanegra, C., Khojastepour, M. A., Arslan, M. Y., Chai, E.,
[4], [5], RFID tags Applica does not Rangarajan, S., & Chowdhury, K. R. (2020). RFGo: A Seamless Self-
[8], [12], send the tion require any checkout System for Apparel Stores Using RFID. MobiCom ’20, 718-
[14], [1], information to other 729.
[23], the mobile hardware, [3] Duroc, Y., & Tedjini, S. (2018). RFID: A key technology for
[18], [7], apps via your more effective Humanity. Comptes Rendus Physique, 65-70.
[4], [27] smartphone and efficient, [4] Feng, T., Fan, T., Lai, K. K., & Lin, L. (2016). Impact of RFID
no need to technology on inventory control policy. Journal of the Operational
restore any Research Society , 1-12.
device and [5] Hauser, M., Günther, S., Flath, C. M., & Thiesse, F. (2017). Leveraging
most RFID Data Analytics for the Design of an Automated Checkout
commonly System. Department of Business Management University of Würzburg,
used 1201-1203.

276 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

[6] Hussien, N. A., alsaidi, S. a., & Ajlan, I. K. (2020). Smart Shopping [19] Rezazadeh, J., Sandrasegaran, K., & Kong, X. (2018). A location-based
System with RFID Technology Based on Internet of Things. smart shopping system with IoT technology. In 2018 IEEE 4th World
International Journal of Interactive Mobile Technologies, 17-29. Forum on Internet of Things (WF-IoT) (pp. 748-753).
[7] Machhirke, K., Priyanka, G., Rathod, R., Petkar, R., & Golait, M. [20] Nallapureddy, B., Das, P., Nagaraj, N., Parameswaran, S., Zaninovich,
(2017). A New Technology of Smart Shopping Cart using RFID and J. (2020). Future of Self Checkout. Sutardja Center,1-25.
ZIGBEE. International Journal on Recent and Innovation Trends in [21] Paise, R.I., & Vaudenay, S. (2020). Mutual Authentication in
Computing and Communication, 256-258. RFID.Proceedings of the 2008 ACM symposium on Information,
[8] Musavi, A. S. (2019). A Self-service System Using RFID. Department computer and communications security,292-299
of Computer Science Earlham College, 1-7. [22] Mekrusavanich, S. (2020). Supermarket Shopping System using RFID
[9] Pal, A. K., Tripathi, A., & Saigal, A. (2017). RFID TECHNOLOGY: as the IoT Application. 2020 Joint International Conference on Digital
AN OVERVIEW. INTERNATIONAL JOURNAL OF RESEARCH Arts, Media and Technology with ECTI Northern Section Conference
GRANTHAALAYAH, 176-182. on Electrical, Electronics, Computer and Telecommunications
[10] Priyanka, D. D., T, J., Florance, D. D., Jayanthi, A., & Ajitha, E. Engineering .83-86.
(2016). A Survey on Applications of RFID Technology. Indian Journal [23] Raad, O., Makdessi, M., Mohamad, Y., Damaj, I. (2018). A System of
of Science and Technology, 1-4. Smart and Connected Supermarkets.Canadian Conference on
[11] Knezevic, B., Mitrovic, I., & Skrobot, P. (2020). Consumers Attitudes Electrical & Computer Engineering.1-6.
towards Self-Checkout Systems in FCMG Retail in Croatia. [24] Chaudhari, M., Gore, A., Kale, R., Patil. (2016). Intelligent shopping
[12] Kalange, S. H., Kadam, D. A., Mokal, A. B., & Patil, A. A. (2017). cart with goods management using sensors. International Research
Smart retailing using IOT. International Research Journal of Journal of Engineering and Technology. 3243-3246.
Engineering and Technology (IRJET), 263-268. [25] Want, R., & Research, Intel. (2004). The Magic of RFID.QUEUE. 41-
[13] Mukerjee, H. S., Deshmukh, G., & Prasad, U. D. (2019). Technology 48
Readiness and likelihood to use self-checkout services using [26] Li, R., Song, T., Capurso, N., Yu, J., Cheng, X. (2016). IoT
smartphone in retail grocery stores: Empirical evidences from Applications on Secure Smart Shopping.2016 International Conference
hyderabad, India. Business Perspectives and Research, 1-15. on Identification, Information and Knowledge in the Internet of Things.
[14] Song, W., & Li, M. (2012). Localization in Supermarket Based on 238-243.
RFID Technology . Procedia Engineering, 3779-3782. [27] Lai, F., Hutchinson, J., Zhang, G. (2006). Radio frequency
[15] Yewatkar, A., Inamdar, F., Singh, R., & Bandal, A. (2016). Smart cart identification (RFID) in China: opportunities and
with automatic billing, product information, product recommendation challenges.International Journal of Retail & Distribution Management.
using rfid \& zigbee with anti-theft. Procedia Computer Science, 793- 905-914.
800. [28] Choi, S., Yang, Y., Yang, B., & Cheung, H. (2015). Item-level RFID
[16] Chadha, R., Kakkar, S., & Aggarwal, G. (2019). Automated shopping for enhancement of customer shopping experience in apparel retail.
and billing system using radio-frequency identification. In 2019 9th Computers in industry, 10-23.
International conference on cloud computing,data science \& [29] Busu, M., Ismail, I., Saaid, M., & Norzeli, S. (2011). Auto-checkout
engineering (Confluence) (pp. 693-697). system for retails using Radio Frequency Identification (RFID)
[17] Karjol, S., Holla, A. K., & Abhilash, C. (2017). An IOT based smart technology. In 2011 IEEE Control and System Graduate Research
shopping cart for smart shopping. In International Conference on Colloquium (pp. 193-196).
Cognitive Computing and Information Processing (pp. 373-385). [30] Kaur, M., Sandhu, M., Mohan, N., & Sandhu, P. S. (2011). RFID
[18] Khairnar, P. K., & Gawali, D. H. (2017). Innovative Shopping Cart For Technology Principles, Advantages,. International Journal of
Smart Cities. In 2017 2nd IEEE international conference on recent Computer and Electrical Engineering, Vol.3, No.1, 1793-8163.
trends in electronics, information \& communication technology
(RTEICT) (pp. 1067-1071).

277 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Effective Methods for Fake News Detection: A


Systematic Literature Review

Rifdah Defrina Abdiansyah Dewi Mutiara Shevila Pannadhika Sumedha


Computer Science Department Computer Science Department Computer Science Department
School of Computer Science School of Computer Science School of Computer Science
Bina Nusantara University Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia Jakarta, Indonesia Jakarta, Indonesia
[email protected] [email protected] [email protected]

Novita Hanafiah
Computer Science Department
School of Computer Science
Bina Nusantara University
Jakarta, Indonesia
[email protected]

Abstract— The development of the spread of fake news is The number of news choices is also a separate problem for
increasingly worrying, triggering many researchers to conduct some people that trying to find the news they matched with.
experiments in creating a fake news detection system. Various That's why a recommendation system is built. The existence
algorithm methods were tested and produced different results. of a recommendation system in an application is used to
Therefore, we conducted a study to find out which method is the present recommendations that are in accordance with its users.
most effective in detecting fake news, based on total accuracy From a various existing system recommendations method, we
and consideration of its advantages and disadvantages. Other also will conduct surveys to get a method of the
than that, we also analyze what datasets are used in each recommendation system in order to provide the most accurate
different method and paper. By reviewing the methods of 22
results.
journals that have entered the eligibility criteria, we can find out
that Naive Bayes is the one who gives the best results with the II. LITERATURE REVIEW
highest accuracy of 96.08% and an average of 81.43%.
This study uses academic journal that have a correlation
Keywords—Hoax Detection, Effective, Methods, Research, with certain keywords, either in the title or content of the
Algorithm journal. Those keywords are ‘fake news’, ‘detection’,
‘method’, ‘research’, and lastly ‘algorithm’. Several questions
I. INTRODUCTION are made to gathering and analysis on all papers used, as well
Along with the development of information technology as to obtain important information that could help this study.
that continues to advance, the utilization of the internet is one Those research questions are:
of the factors that makes news are increasingly easily and • RQ1: Where did the dataset used in the paper come
quickly to receive. Even the Ministry of Communication and from?
Information (Kemkominfo) provides a statement that today,
the internet has become the main reference to accessing a • RQ2: What is the process that occurs in each method
news and information. From the data presented, 9 out of 10 to detect fake news?
Internet users are looking for information via social media [1].
That means, all widespread news on the internet is very • RQ3: What are the learning methods and the most
influential for the community. effective for the fake news detection?

This is what then starts giving rise to new problems. Hoax In order to find sources of papers, journals and theses,
information and news began to appear spreading all over the several certified electronic databases are used. Those
internet. The spread of Hoax news by irresponsible parties, electronic databases that only include internationally
can lead to a very bad impact. Start from being low on approved academic documents are Semantics Scholar and
integration, until being the reason of a conflict between tribes, IEEE Xplore. Several criteria were also applied, with the aim
countries, even religious. One of the anti-hoax ambassadors, of ensuring the eligibility of previously selected works.
Olga Lidya also said that if the Hoax news continued to
TABLE I. SELECTION CRITERIA
spread, it could make someone who was originally sanction to
believe in the news [2]. Selection Criteria Exclusion Criteria
Must explain in detail of fake Paper with the latest publications
Various ways are made to prevent and overcoming the news detection method. (The last 5 years)
spreader of HOAX news. One of the ways is the construction Must be relevant to the current Web documents.
of the system that can detect a news. The news will be detected research paper.
and the system will provide results, whether the news is Internationally published -
correct from the exist facts or hoax. Many people managed to academic paper.
build this system with various methods. Therefore we will do
a surveys on the exist methods, to be able to find which
methods provide the best results.

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

278 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

A. Reviews that included to the Article Eligibility Criteria downloaded, and directly processed without having to go
At this phase, review is carried out related to the feasibility through the data processing first. That’s why using ready-to-
of the paper. Analysis and review aims to determine which use datasets can save a lot of time. FA-KES, ISOT and LIAR
paper can be used as a final studies. From the 30 papers are also the dataset that available for public [11, 12].
previously selected, there are 22 papers that met, both of the In contrast to ready-to-use dataset providers, datasets
selection and exclusion criteria above (Fig.1). While the taken from news articles cannot be used directly. It must go
remaining 8 papers, are not eligible to be used. through the data processing process first, such as filtering [13].
In 2020, papers with the related themes and keywords Several hoax and non-hoax news articles were taken and
increased when compared to the previous year. The decline processed for later use in several papers to test existing
also occurred in the following year, which in the year 2021 methods [14, 15]. For example, https://turnbackhoax.id,
(Fig. 2). cnnindonesia.com and Cekfact.com are the several websites
that provide hoax and non-hoax news articles, and used as
B. Analysis the Paper datasets in several papers [16, 17].
Then analyse the papers that have been collected and Lastly is the dataset that are taken from social media, such
already eligible. A comparison of the algorithm model used as Twitter, Facebook and Weibo. The reason many article
from each papers are then carried out, and it is examined based used dataset from social media is because now days, social
on its advantage and disadvantage. This is the results of our media has become the main source of getting and spreading a
analysis from the reviewed papers. news [18, 19]. But poorly, not all social media has feature to
detect if the news that someone post on public is hoax or non-
14 hoax. Therefore, they used either a statement, article or link of
15 an article in social media as a dataset for their hoax detection
10 4 4 4 experiment.
3
5 1 Eligible
0 TABLE II. DATASET SOURCE
Not Eligible
References Data Source Main Input Data Size
[6] Online Real News Article 15,707
Online Fake News Article 12,761
Fig 1. Graph of Papers Eligibility KaiDMML Article 402
[20] George McIntires Article -
Fake News
10 8 [5] Kaggle Article -
[11] FA-KES Article 804
4 ISOT Article 45,000
5 3
[19] Facebook Public post 15,500
1 FakeNewsNet Article 422
0 [23] Online article Article 38
2018 2019 2020 2021 [7] Online News Article 250
[15] News website Website page 680
Fig 2. Graph of Papers Publication Years [10] Politifact Tweet 14,055
Article 14,055
[3] BuzzFeedNews Article 1,727
1) RQ1: Where did the dataset used in the paper come from? LIAR Short claim 12,836
BS Detector Link -
Data collection aims to train and test of each existing CREDBANK Tweet 60,000,000
model. Apart from having consistent data, data sources are [4] ISOT Article -
also a considered. Usually the data source divided into 2 main Kaggle Article -
difference. The first is the data that exist because it was created [22] Social Media Article 30
[16] News website Article 250
manually by the researcher. And the second is data that is [17] News website Article 1000
already available on the internet [3], provided by someone. [9] LIAR Short claim 12,836
Among all the papers that have been collected and eligible, FEVER Short claim 185,445
FakeNewsNet Article 23,921
none of the method used dataset that created manually. It [18] Twitter Article 1,084
means, all papers obtained dataset from the data that provided
[8] Twitter Article -
by someone and available on the internet. Weibo Short claim -
We divide the data available on the internet into 3 types. [24] Twitter Message 948,373
First, is dataset taken from a sites that already provide ready- [21] Buzzfeed Article 1771
to-use dataset, such as GitHub and Kaggle [4, 5, 6]. The [12] LIAR-PLUS Master Article -
second is the type of publicly dataset which is taken from a [14] Online News Article 251
news article published by an online news website [7]. And the [13] Facebook Public post 15,500
last is dataset taken on social media [8].
Ready-to-use datasets are datasets that are generally 2) RQ2: What is the process that occurs in each method to
already available in excel form, such as FAKENEWSNET, detect fakews?
which provides the PolitiFact and Gossip datasets [9, 10].
FAKESNEWSNET is available on GitHub, which also
provides the KaiDMML dataset. The dataset can be

279 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

a. Long Term Dependence (LSTM) formation or n-grams aimed at obtaining features at a higher
level. The features obtained from the previous process can be
LSTM is a variation of RNN which has been improved.
used for sentiment analysis, machine translation,
The existence of a process to minimize the error rate
(backpropagation) on a repeating network, takes such a long classification on text, and several other NLP tasks. The
time, especially if an error occurs when the backflow is not process of changing sentences into matrices can be done
good. This is what makes RNN sometimes not suitable for use using various word embedding models such as word2vec,
in learning dependency theory (dependency) in the long term. glove, and others. After going through the convolution filter
LSTM is more suitable for storing “Short Term Memories” process and the pooling method will form a representation,
for a long time, because LSTM has different gates on the this representation will be followed by one or more layers that
input, output, and forgets to control the flow of information are maximally connected to make the final prediction [5].
through the cell state, so this is what makes LSTM suitable for • Recurrent Neural Network
use in handling NLP problems. , such as news reports, articles
or other lengthy inputs [20]. Slightly similar to the Convolution Neural Network, the
Recurrent Neural Network is also a feed-forward dummy
• System Architecture Stage using Data Mining Section; network. RNN handles input sequentially in variable size
There are 3 processes carried out: which consists of an iterative hidden layer whose activity is
1. Article Text Web Mining (Search Query  carried out every time and depends on the previous time.
Scrap URL)  Secondary Features / Comparison Therefore, RNN is a better choice for remote contextual
Model  Similar  Merge Article Text & information cases. The RNN defends the state of information
Secondary Features (YES) / Ignored (NO)  in all time steps that have the possibility to process input and
Preprocessing output in variable sizes. By using the back-propagation
2. Article Text  Comparison Model  Similar  algorithm for training, the minimum error can be set by
Merge Article Text & Secondary Features (YES) / making decisions in small steps repeatedly in the direction of
Ignored (NO)  Preprocessing the derivative of the negative error associated with the
network weight. The result is that there is a problem with the
3. Article Text  Merge Article Text & Secondary
gradient disappearing in the lower layers of the lower
Features (YES)  Preprocessing network [5].
• System Architecture Stage using Deep Learning
Mode; the process carried out is: • Long Short-Term Memory - Recurrent Neural
Network
• Article Text & Secondary Features -> Word
Vector Model (Word / Word2Vec / Glove) -> LSTM is a special type of RNN that is skilled in learning
Word Vector -> Deep Learning Model (FNN / the duration of dependence. LSTM is a good and effective
LSTM) solution to the problem of missing gradients. In LSTM-RNN,
the hidden layer in the base RNN will be replaced by cells
b. Gated Recurrent Unit (GRU) from the LSTM [5]. The Process are: Memory Cell Input 
The GRU has the same LSTM, especially in terms of Input Gate  Forgot Gate  Self Recurrent Connection 
structure and capabilities, but the GRU has a simpler and more Output Gate  Memory Cell Output
efficient concept because the GRU has reset and update gates. • Bi-directional Long Short-Term Memory - Recurrent
The weakness of LSTMs, which cannot learn dependency Neural Network
theory for a long time, can be corrected at the GRU, therefore
the GRU is a good candidate for NLP applications. Using Bi-Directional is a process that takes a real approach to
word2vec and glove, each initial run of the existing data set predict large text sequences and classifications of text. The Bi-
will be used to select queries from search engines and discover Directional LSTM network steps through the input sequence
new features. After that, the word insertion model will find the in both directions at the same time. In a Bi-Directional LSTM
most suitable vector representation of the text that has been network, each existing embedding layer is matched with the
generated, then the results will be used to conduct training on training data checked with both orders at the same time [5].
the deep learning model. Each new trial data will go through
d. Hybrid CNN-RNN model
a data mining and vectorization process before being sent to
the model used to predict its authenticity, this process is CNN has the ability to get local features and LSTM is used
carried out after training with a collection of words that have to study dependencies on a long term basis. The CNN layer
gone through a modification process. The data mining process derived from Conv1D (One-Dimensional CNN) will be used
will be carried out, so that data mining can add other features, to process the input vector and get local features that are at the
then it is carried out to the vectorization process and continued text level. The result obtained from the CNN layer (feature
to perform the classification carried out in the neural network map) is the input from the RNN layer of the LSTM unit that
[20]. follows it [11].
c. Bi-directional LSTM-Recurrent Neural Network e. Naïve Bayes
• Convolution Neural Network As a technique that is considered quite simple in
classifying, the Naïve Bayes is commonly used to classify a
Convolution Neural Network is a feed-forward artificial spam or not spam messages on email [21]. In several studies,
network and a normal version of multilayer perception. This this method is also used to detect hoax news. It is necessary to
network is an approach on a neural basis that represents the carry out several stages to apply this method, in order to
function of the feature being applied, by doing word achieve the goal of detecting hoax news.

280 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

• Preprocessing Data Naïve Bayes High computational


[7] -
efficiency
Initially the dataset is divided into 2, training dataset and Support Good performance if The features that
testing dataset. Then in the training dataset, data processing Vector has a large input are irrelevant
Machine dimension
is carried out, so that the data can be used in machine (SVM)
learning. The deletion of characters, punctuation marks, and [15]
Stochastic Can offer new Gradient descent
changes the characters into all lowercase characters, as well Gradient perspective for can be slow to run
as the separation of sentences into separate words were also Descent solving problems very large datasets
(SGD)
carried out at the dataset training phase.
Ensemble Average
• Training Learner accuracy more than -
88%
Machine learning generally has several algorithms that [4] Perez-LSVM Performs best Dataset in Kaggle,
can be used for text classifying, Such as Random Forest (RF), among other accuracy rate is
K-Nearest Neighbor (KNN), Max Entropy (MAXENT), benchmark less than 80%
algorithms
Support Vector Machine (SVM) and Naive Bayes (NB) [10].
Naïve Bayes Faster and better Sentiment
In this approach to machine learning, it is necessary to do [22] accuracy level is
self-training on the training dataset using a variety of not too high
different texts, then test it using data testing [7]. Naïve Bayes With or without Performance
[16] URL, the system can decreases when
• Testing and Evaluation yield good results news topics varied
LSTM Able to overcome For best results
After training, testing is carried out to see the accuracy
long-term need word2vec
that can be generated by the algorithm. If there is a word that [17]
dependence model parameter
the classifier does not know, it means that the word has never combination
been in the training dataset, and the classifier cannot classify LSTM Works better for Temporal
the news. If the word exists and appears several times, then it [9] long sentences dependencies in
the entire text
could be possible that the article is a fact. However, it can be Geometric No regularization High
said to be fact or not from the total occurrence and accuracy Deep was used complications
calculated on the news. [18] Learning when generalizing
across different
domains
3) RQ3: What are the learning methods and the most Hybrid Deep Easily generalized to Relies on RNN to
effective for the fake news detection? [8]
Model any dataset extract temporal
representation of
The result of the data obtained from the collection of articles
papers to help the classifier works better. We have analyzed Naïve Bayes, High confidence to May not represent
several methods used from reference papers and compared it Neural detecting Fake news the whole
[24] Network, spectrum of News
in Table 3. Some of the well-known method are Naïve Bayes SVM in the real-word
has a fast [22] and effective processing time [7]. There’s also
a LSTM where works better for hoax detection in long Naïve Bayes The feature is To improve
sentences [9]. In addition, there’s an SVM method that can independent of the significantly by
[21] value of any other using more
classify with high dimensions but it can only last for short feature complex models
term [23].
Naïve Bayes Free of predictors Have a low model
[12]
TABLE III. METHODE COMPARISON size
Naïve Bayes Using probabilities Lack performance
Refere and statistics to rediscovering
Method Advantage Disadvantage
nces information
Random The highest Must create a
Gated Higher performance Forest accuracy decision tree
Recurrent and efficiency ratio - because generates based on attributes
Unit than (LSTM) random nodes for and data
[20]
Have recurrent Takes a very long [14] each node
LSTM networks time on error
backpropagation backflow Decision Tree Can handle Lack of
Bi-directional continuous data and performance to
Performance is discrete data provide the
LSTM- Able to predict large
reduced when information
[5] Recurrent text sequence and
contextual news requested by the
Neural text classification
articles are long user
Network
Can capture local Difficulty to find Logistic Suitable for man Does not transfer
Hybrid CNN- and sequential optimal Regression variations and apply information across
[11] non-interference users
RNN model characteristics of hyperparameter [13]
input data property
Content Offering food Dependent on Harmonic Can cope with large The user easy to
[19] Based performance social interactions BLC datasets assume
Based on the results of the data obtained from several
SVM Classification on Last short term existing articles, there are several methods that can help make
[23] high dimensional
data
the classification work effectively. The selection of the data
contained is by looking at the advantages and disadvantages

281 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

of some of the algorithms taken, namely by looking at the expand their research by analyzing hoax detectors using
speed of the process or the level of accuracy. combined methods, in order to see the effectiveness.
The Stance Classification has high eligibility, so it can
be said that the Stance Classification has a class standard with
a fairly high accuracy, so this method should be efficient REFERENCES
enough to be able to compare the detection of fake news.
[1] Andr010, "Berita Kominfo," 10 August 2015. [Online]. Available:
III. DISCUSSSION https://kominfo.go.id/index.php/content/detail/5421/Kemkominfo%
3A+Internet+Jadi+Referensi+Utama+Mengakses+Berita+dan+Info
From a number of papers used and including the rmasi/0/berita_satker. [Accessed 22 April 2020].
eligibility criteria, the Naive Bayes method is the most widely [2] Yunita, "Sorotan Media," 9 January 2017. [Online]. Available:
used method in several experiments. Naive Bayes has https://kominfo.go.id/content/detail/8716/bahaya-hoax-bisa-
advantages in terms of high computational efficiency, so the berujung-pada-pembunuhan-karakter/0/sorotan_media. [Accessed
Naive Bayes system is considered to be able to provide good 22 April 2021].
results with faster and better results. Where the highest [3] . K. Shu, A. Sliva, S. Wang, J. Tang and H. Liu, "Fake News
accuracy is generated from a dataset taken from Twitter, Detection on Social Media: A Data Mining Perspective," ArXiv, vol.
abs/1708.01967, 2017.
which is 96.08% [24]. Meanwhile, the lowest accuracy of
[4] I. Ahmad, M. Yousaf, S. Yousaf and M. O. Ahmad, "Fake News
74.51% was achieved by the Naive Bayes method which uses Detection Using Machine Learning Ensemble Methods,"
a dataset of 100 hoax articles and 151 non-hoax articles [14]. Complexity, vol. 2020, 2020.
From there, we calculated the average accuracy of the method [5] P. Bahad, P. Saxena and R. Kamal, "Fake News Detection using Bi-
and got 81.43% as the total accuracy. Then there’s the LSTM directional LSTM-Recurrent Neural Network," Procedia Computer
method which is not less popular to be used in the experiment Science, vol. 165, pp. 74-82, 2019.
of the paper we analyzed. Based on several papers, the [6] J. Kapusta, P. Hájek, M. Munk and Ľ. Benko, "Comparison of fake
and real news based on morphological analysis," Procedia Computer
highest accuracy with the LSTM method is 83.66% [20] and
Science, vol. 171, pp. 2285-2293, 2020.
the lowest accuracy is only 64% [17]. Overall this method can
[7] I. Y. R. Pratiwi, R. A. Asmara and F. Rahutomo, "Study of Hoax
work well for long sentences [9], but requires a combination News Detection Using Naive Bayes Classifier in Indonesian
of word2vec models to get maximum results [17]. In addition, Language," 2017 International Conference on Information &
being able to overcome long-term dependencies [17] requires Communication Technology and System (ICTS), pp. 73-78, 2017.
a long time for error backflow [20]. So that the average [8] N. Ruchansky, S. Seo and Y. Liu, "CSI: A Hybrid Deep Model for
accuracy value based on the paper we analyzed was 76.39%. Fake News Detection," no. News and Credibility, pp. 797-806, 2017.
Lastly, GRU method has a simpler and more efficient concept [9] R. Oshikawa, J. Qian and W. Y. Wang, "A Survey on Natural
Language Processing for Fake News Detection," pp. 6086-6093,
because GRU has two gates, namely reset and update, in 2020.
contrast to LSTM which has three gates. The weaknesses that
[10] J. Zhang, B. Dong and P. S. Yu, "FAKEDETECTOR: Effective Fake
exist in the LSTM can be corrected in the GRU by learning News Detection with Deep Diffusive Neural Network," 2020 IEEE
the theory of freedom (dependency) in a short time. Each new 36th International Conference on Data Engineering (ICDE), pp.
trial data will go through a data mining and vectorization 1826-1829, 2020.
process before being sent to the model used to ensure its [11] J. A. Nasir, O. S. Khan and I. Varlamis, "Fake news detection: A
authenticity, this process is carried out after training with a hybrid CNN-RNN based deep learning approach," International
Journal of Information Management Data Insights, 2021.
collection of words that go through a modification process. In
[12] Z. Khanam, B. N. Alwasel and M. Rashid, "Fake News Detection
GRU, the dataset taken has a 1:1 ratio between real and fake Using Machine Learning Approaches," IOP Conf. Series: Materials
news. If the ratio is 1:1, it will be relatively neat and the Science and Engineering, 2020.
proportion of the data will be almost the same as the right and [13] . E. Tacchini, G. Ballarin, M. L. D. Vedova, S. Moret and L. d.
left sides. Alfaro, "Some Like it Hoax: Automated Fake News Detection in
Social Networks," 2017.
IV. CONCLUSION [14] T. T. A. Putri, H. W. S, I. Y. Sitepu, M. Sihombing and S. ,
From the 30 papers we previously selected, there were 22 "ANALYSIS AND DETECTION OF HOAX CONTENTS IN
INDONESIAN NEWS BASED ON MACHINE LEARNING,"
papers that were included in the criteria in accordance with JIPN (Journal Of Informatics Pelita Nusantara), vol. 4, no. 1, pp.
existing conditions. In this study, we reviewed all of the 19-26, 2019.
eligibility criteria for the paper, and based on the data we [15] A. B. Prasetijo, R. R. Isnanto, D. Eridani, Y. A. A. Soetrisno, M.
collected, it can be concluded Bayes method with an overall Arfan and A. Sofwan, "Hoax Detection System on Indonesian News
average accuracy of 81.43%. By comparing the advantages Sites Based on Text Classification using SVM and SGD," pp. 45-49,
and disadvantages of all methods from all existing journals, 2017.
the Naive Bayes method has more advantages, fastest and [16] . B. Zaman, A. Justitia, N. K. Sani and E. Purwanti, "An Indonesian
more efficient because some use probabilities and statistics, Hoax News Detection System Using Reader Feedback and Naïve
even free of predictors. This study proved that to detect fake Bayes Algorithm," vol. 20, no. 1, pp. 82-94, 2020.
news Naive Bayes method is superior to other methods. [17] . A. Apriliyanto and R. Kusumaningrum, " HOAX DETECTION IN
INDONESIA LANGUAGE USING LONG SHORT-TERM
By presenting the different processes and the advantages MEMORY MODEL," SINERGI , vol. 24, pp. 189-196, 2020.
as well as disadvantages from each method, overall this [18] F. Monti, F. Frasca, D. Eynard, D. Mannion and M. M. Bronstein,
research has been quite useful. In the future, research can be "Fake News Detection on Social Media using Geometric Deep
Learning," ArXiv, vol. abs/1902.06673, 2019.
carried out by considering each dataset from each paper in
[19] M. L. D. Vedova, E. Tacchini, S. Moret and G. Ballarin, "Automatic
testing the method. That way, maybe the effectiveness results Online Fake News Detection Combining Content and Social
given can be even more accurate. We also suggest that
researchers who want to conduct similar research can further

282 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Signals," PROCEEDING OF THE 22ND CONFERENCE OF [23] P. Assiroj, Meyliana, A. N. Hidayanto, H. Prabowo and H. L. H. S.
FRUCT ASSOCIATION, pp. 272-279, 2018. Warnars, "Hoax News Detection on Social Media: A Survey," 2018
[20] D. S and B. Chitturi, "Deep neural approach to Fake-News Indonesian Association for Pattern Recognition International
identification," Procedia Computer Science, vol. 167, no. 2236- Conference (INAPR), pp. 186-191, 2018.
2243, 2020. [24] S. Aphiwongsophon and P. Chongstitvatana, "Detecting Fake News
[21] M. Granik and V. Mesyura, "Fake News Detection Using Naive with Machine Learning Method," 2018 15th International
Bayes Classifier," 2017 IEEE First Ukraine Conference on Conference on Electrical Engineering/Electronics, Computer,
Electrical and Computer Engineering (UKRCON), pp. 900-903, Telecommunications and Information Technology (ECTI-CON), pp.
2017. 528-531, 2018.
[22] H. A. Santoso, E. H. Rachmawanto, A. Nugraha, N. A. Nugroho, D. [25] J. Kapusta, P. Hájek, M. Munk and Ľ. Benko, "Comparison of fake
R. I. M. Setiadi and R. S. Basuki, "Hoax classification and sentiment and real news based on morphological analysis," Third International
analysis of Indonesian news using Naive Bayes optimization," Conference on Computing and Network Communications
TELKOMNIKA Telecommunication, Computing, Electronics and (CoCoNet’19), pp. 2285-2293, 2020.
Control, vol. 18, pp. 799-806, 2020.

283 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Determining the best Delivery Service in Jakarta


using Tsukamoto Fuzzy Algorithm
Phillips Tionathan Yohanes Raditya Janarto Ignatius Hansen
Computer Science Department Computer Science Department Computer Science Department
School of Computer Science School of Computer Science School of Computer Science
Bina Nusantara University Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia Jakarta, Indonesia Jakarta, Indonesia
[email protected] [email protected] [email protected]

Novita Hanafiah
Computer Science Department
School of Computer Science
Bina Nusantara University
Jakarta, Indonesia
[email protected]

Abstract—With the increasing and developing of technology, the calculation results are obtained which are used to support
shipping service providers are also increasing. This research the decision to choose the best shipping service in Jakarta [3].
paper helps the people in choosing the best delivery service in
Jakarta. Since the competition of delivery service is rising, it II. LITERATURE REVIEW
confuses people to choose which delivery service is the best. By The shipping service business sector is increasingly
using Tsukamoto Fuzzy we can determine the best delivery
popular and growing. As the number of shipping services
service in Jakarta by using a few categories such as delivery
increases, users can get confused about choosing the best
time, price, availability, safeness, and customer service then
calculate the data collected with the Tsukamoto Fuzzy model. shipping service. The best shipping services can be obtained
After calculating the data, the best delivery service can be using decision-making algorithms such as Fuzzy, AHP, ANP,
determined by the score. Tsukamoto Fuzzy are relevant to TOPSIS, SAW, Certainly Factor, WASPAS, Profile
determine the best delivery service in Jakarta. Matching, and Decision Tree. The criteria for evaluating the
best shipping services include price, service, shipping time
Keywords— algorithm, decision making, fuzzy logic, fuzzy and range of shipping, the large number of branch offices,
system, shipping services. security assurance [3].

I. INTRODUCTION One of the popular algorithms used for decision-making is


fuzzy. The fuzzy algorithm is popular because it is an
Currently, technology is increasing and developing. With approach that can solve various kinds of problems [2]. The
the presence of increasing technology, users can buy various fuzzy algorithm can find the best shipping service, especially
objects through social media, e-commerce, and others. When if it is used to assess the time between goods. This is because
buying goods on social media, e-commerce, and others, if the population value is greater, the results obtained will be
buyers will choose the shipping service used to send the goods more accurate [4]. In its use, some use the MADM type Fuzzy
they buy. Along with the development of the e-commerce or what is known as the Multi-Attribute Decision Making.
field in Indonesia, a variety of new shipping service providers MADM can be used to assess or select a limited number of
have emerged in Indonesia. The increasing number of alternatives [5]. Besides being fuzzy, SAW can also be used
shipping service providers will make users compare one for decision making because SAW is a decision system
shipping service with another. Users of shipping services will method algorithm that can make judgments more precisely
compare shipping services with one another based on a variety because it is based on the value of the criteria and weight of
of factors, which can influence human decision-making in the level of importance [6]. The advantage of the SAW
choosing a shipping service. algorithm, in this case, is that if the determining variables are
By determining the best shipping service, users will feel at limited, the results obtained can produce results with high
ease when they use the shipping service. Determining the best accuracy [7]. The drawback of the SAW algorithm is that if
shipping service can be helped by using algorithms, one of there are too many determining variables, the accuracy of the
them is by using a fuzzy algorithm. The fuzzy algorithm itself results obtained can be reduced [8].
is divided into three methods, namely Tsukamoto, Sugeno, Fuzzy can also be combined with SAW to get a more
and Mamdani. Fuzzy logic can be interpreted as the logic used optimal result because when combined with fuzzy, the
to explain ambiguity [1], or logic that can convert linguistic resulting fuzzy matrix will be normalized and then will
statements into numerical statements [2]. Therefore, the fuzzy produce alternative weights [9]. In combination, the Fuzzy
algorithm can be used to help the decision-making process for method is used to rank logistics services and the Simple
goods shipping services due to the various variables that can Additive Weighting (SAW) method for weighting the
be converted into a decisive factor for the delivery of goods. logistics service criteria [10].
Therefore, on this research Tsukamoto fuzzy method is The AHP method can also be used to assist in the selection
used to determine the best shipping service, because by using of shipping services because the AHP method is based on the
Tsukamoto fuzzy systematic calculations can be realized and human ability to use information and experience to estimate

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

284 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

relative quantities through pairwise comparisons [11]. The


AHP method is suitable for solving problems that have
multiple alternatives and also multicriteria [12]. The AHP
method can find out in detail which expedition is best based
on numerical values [13]. Another advantage of AHP is that
AHP functions well to compare existing variables based on
the data collected [14]. Over time, the AHP method evolved
into ANP, the ANP method improved the differences in the
existing structures of AHP in the form of the ability to
accommodate the linkages between criteria or alternatives
[15]. The ANP method can also be more efficient when
combined with fuzzy because by combining the two,
quantitative and qualitative aspects can be accommodated
[16]. Even though it has evolved, AHP is still effective when Fig 1. Fuzzy Tsukamoto Flow
used in conjunction with Fuzzy because it can handle
decision-making criteria that cannot be predicted or In Fig 1. There are five input variables to be processed in
experience minor changes [17], or used in conjunction with fuzzy Tsukamoto, the categories that are used to determine the
the TOPSIS algorithm because of the results of AHP can be best delivery service are delivery time, customer service,
locked up and specified so that it becomes a more accurate price, availability, and safeness. Then we process the input
result [18]. with Tsukamoto Fuzzy and make the virtualization of it.
The TOPSIS method is an effective method for solving
A. Delivery Time (DT)
decision problems because TOPSIS is a simple, efficient
method, and can measure the performance of alternative Delivery time is determined by the duration when sending
decisions [19]. TOPSIS is also a simple algorithm, therefore an item based on how many day(s) it took to arrive.
the assessment criteria can still be used more or less Fast: 1 – 3 day(s)
effectively [1]. This is because the decision results are in
simple mathematical values [20]. The results of the TOPSIS Slow: 2 – 5 day(s)
decision making also provide other alternatives in the choice
of shipping services [21]. In its application, TOPSIS can be
combined with AHP because this method can be combined to µ Fast [x] = 1, 1<=x<=2
consider the data obtained from the AHP method and can be
3-x/1, 2< x <3
ranked by TOPSIS [22]. In addition to AHP, TOPSIS can be
combined with SAW, by combining the SAW process, the 0, x>=3
sum of attributes can be normalized into a matrix and then the
matrix obtained can be measured using TOPSIS [23].
µ Slow [x] = 0, x <=2
In addition to some of the algorithms mentioned above, the
profile matching algorithm, decision tree, Weight Aggregated x - 2 / 1, 2 < x <3
Sum Product Assessment (WASPAS), Certainty Factor (CF)
can also be used to find which shipping service is the best. The 1, 3<= x >=5
profile matching algorithm is different from the method
previously used because in the profile matching algorithm the
criteria for determining the best shipping service must be
compared with the actual value and the value of a profile that
has been created with the value on the profile as expected. The
results of the comparison will be made the gap value and the
smaller the gap value, the more accurate the results will be
[24]. The decision tree algorithm (decision tree) makes it easy
for humans to identify the relationship between the factors that Fig 2. Delivery Time graph
affect something [25]. The WASPAS method is suitable for
determining priorities for alternative options that are relevant
to the weighting used [26]. Finally, there is the Certainty B. Customer Service (CS)
Factor method which is suitable for finding decisions based on Customer service quality is determined by how the
the data held because the level of accuracy reaches 100% [27]. provider treats their customer, how the customer service
responds to the customer such as complaint handling.
III. METHODOLOGY
In this research we use Tsukamoto Fuzzy logic to help
choose the best delivery service in Jakarta. The Tsukamoto
Fuzzy will follow this step as shown in Fig 1.

285 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Bad:60-80(score) D. Availability (A)


Good: 70-100(score) Determined by how many branches in the area.

Few: 10-20(branches)
µ Bad [x] = 1, 60<=x<=70 Many: 15-30(branches)

80 - x / 10, 70<x<80 µ Few [x] = 1, 10<=x<=15


0, x>=80 20-x/5, 15<x<20
0, x>=20
µ Many [x] = 0, x<=15
µ Good [x] = 0, x<= 70 x-15/5, 15<x<20
1, 20<=x>=30
x-70/10, 70<x<80
1, 80<=x<=100

Fig 5. Availability graph

E. Safeness (S)
Fig 3. Customer Service graph Measure the delivery service item safety and
completeness. The quality of the item sent must be the same
C. Price (P) when received. The quality of the item that is received must
The delivery service price for delivering an item. be the same as the quality when it was sent.

Cheap: 5000 – 7000(Rupiah) Bad:60-70(score)


Average: 6000 – 8000(Rupiah) Good: 70-100(score)
Expensive: 7000 – 10000(Rupiah)
µ Bad [x] = 1, 60<=x<=70
µ Cheap [x] = 1, 5000<x<6000 80 - x / 10, 70<x<80
7000-x/1000, 6000 < x <7000 0, x>=80
0, x>=7000
µ Good [x] = 0, x<= 70
µ Average [x] = 0, <=6000 x-70/10, 70<x<80
x-6000/1000, 6000<x<7000 1, 80<=x>=100
1, x = 7000
8000-x/1000, 7000<x<8000
0, x>=8000

µ Expensive = 0, x<=7000
x-7000/1000, 7000<x<8000
1, 8000<= x <= 10000
Fig 6. Safeness graph

Fig 4. Price graph

286 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

TABLE I. RESULT REPRESENTATION OF CRITERIA SCORING TABLE III. DELIVERY SERVICE CRITERIA

Criteria Criteria Alias Criteria


Scoring
Result Below Customer Price Availa Safeness DT Delivery Time
or service below bility scores
equals score or scores higher or CS Customer Service
to 3 higher or equals higher equals to
days equals to 70 to IDR or 70 P Price
deliver 7000 equals
y to 75
A Availability
Excellent x x x x x
S Safeness
Very Does not fulfil at least 1 criteria
Recomm
Rules Establishment
ended
Rules formation is in the form of IF … THEN with the
Recomm Does not fulfil at least 2 criteria variables used. The rules that are formed are as follows: Here
ended are 5 rules to represent 31 rules.

Slightly Does not fulfil at least 3 criteria


Recomm TABLE IV. RULE EXAMPLES
ended
Rules Criteria Result
Not Does not fulfil at least 4 criteria
Recomm
ended Rule IF DT is fast AND CS good AND P THEN Excellent
1 cheap AND A high AND S High

Rule IF DT is fast AND CS good AND P THEN Very


TABLE II. CATEGORY AND MEASUREMENT 2 cheap AND S high AND A low Recommended

Scoring Category Range Measurement Rule IF DT is fast AND CS good THEN


Criteria Value 7 AND P cheap AND S low AND A low Recommended

Delivery Time Fast 1-3 Delivering package 1-3 day(s) Rule IF DT is fast AND CS good AND P THEN Slightly
(DT) in Jakarta 17 average AND S low AND A low Recommended

Slow 2-5 Delivering package 2-5 day(s) Rule IF DT is fast AND CS low AND P low THEN Not
in Jakarta 27 AND S low AND A low Recommended

Customer Bad 60-80 60 - 80 score represent bad


Service (CS) customer services Defuzzification
The defuzzification process is done by using the
Good 70-100 70 - 100 score represent good rules above and applying the values set by the available data.
customer services Values are set by the highest or lowest value numbers
depending on the rules. With the Tsukamoto Fuzzy Inference
Price (P) Cheap 5000- Delivery price for Jakarta area System numbers produced is this. Lastly the certain value is
7000 is 5000 - 7000 Rupiah set by counting all αpred data.

Average 6000- Delivery price for Jakarta area αpred1 = 1


8000 is 6000 - 8000 Rupiah (d-50) / 50 = 1
d1=100
Expensive 7000- Delivery price for Jakarta area
10000 is 7000 - 10000 Rupiah
αpred2 = 0
(d-50) / 50 = 0
Availability Few 50-100 Delivery services have 50 -
(A) 100 branch in Jakarta d2=50

Many 75-200 Delivery services have 75 - αpred7 = 0


200 branch in Jakarta (d-50) / 50 = 0
d7=50
Safeness (S) Bad 60-80 60 - 80 score represent bad
safeness
αpred17 = 0
(d-50) / 50 = 0
Good 70-100 70 - 100 score represent good
safeness
d17=50

αpred27 = 0

287 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

(d-50) / 50 = 0 Weighting," Semnas Ristek (Seminar Nasional Riset dan Inovasi


Teknologi), 2020.
d27=50
[11] A. Wulan and B. Hendrawan, "Analisis Pemilihan Jasa Forwarder
dengan Menggunakan Metode Analytical Hierarchy Process (Ahp) di
d = (αpred1*d1+ αpred2*d2 +αpred7*d7+αpred17*d17 PT. Xyz," Journal of Applied Business Administration, vol. 2, no. 2,
+αpred27*d27) / (αpred1+ αpred2+ αpred7+ pp. 294-306, 2018.
αpred17+αpred27) [12] V. Sofica, "Microsoft Excel Pada Metode Analytical Hierarchy Process
d = (1*100 + 0*50 + 0*50 + 0*50 + 0*50) / (1+ 0+ 0+ 0+0) Untuk Memilih Jasa Pengiriman," INFORMATION MANAGEMENT
FOR EDUCATORS AND PROFESSIONALS : Journal of Information
d = 100 / 1 Management, pp. 54-66, 2016.
d = 100 [13] J. Astuti and E. Fatma, "EVALUASI PEMILIHAN PENYEDIA JASA
KURIR BERDASARKAN METODE ANALYTICAL HIERARCHY
Based on available data, therefore the closest d to 100 is J&T PROCESS (AHP)," JURNAL MANAJEMEN INDUSTRI DAN
with the score of 80. LOGISTIK (JMIL), pp. 44-52, 2017.
[14] N. Kustian, "PENENTUAN DALAM PEMILIHAN JASA
PENGIRIMAN BARANG TRANSAKSI E-COMMERCE ONLINE,"
Journal of Applied Business and Economic, 2016.
IV. CONCLUSION
[15] A. J. Olanta, M. E. Sianto and I. Gunawan, "Perbandingan metode ANP
Many choices of delivery services in Jakarta make dan AHP dalam pemilihan jasa kurir logistik oleh penjual gadget
people confused in choosing the best one to send their online," Widya Teknik, vol. 18, no. 2, pp. 96-101, 2019.
packages. In this study, Fuzzy Tsukamoto method will prove [16] R. Govindaraju and J. P. Sinulingga, "Pengambilan keputusan
pemilihan pemasok di perusahaan manufaktur dengan metode fuzzy
which one is the best delivery service. The assessment is ANP," Jurnal Manajemen Teknologi, vol. 16, no. 1, pp. 1-16, 2017.
determined by 5 criteria, Delivery Time (DT), Customer [17] D. Hajar and S. P. Arifin, "Analisis Pengambilan Keputusan Pemilihan
Service (CS), Price (P), Availability (A), and Safeness (S). Perusahaan Penyedia 3PL di Pekanbaru," Jurnal Komputer Terapan,
The results of the study showed that the best shipping service 2016.
with an assessment based on that category was J&T. [18] Ilmadi and M. Suryani, "Sistem Pendukung Keputusan Dalam
Pemilihan Perusahaan Jasa Pengiriman Terbaik Dengan Menggunakan
REFERENCES Metode AHP dan TOPSIS," Statmat, pp. 78-88, 2019.
[19] R. Risnawati and N. Manurung, "SISTEM PENDUKUNG
[1] Irianto, "PEMILIHAN PERUSAHAAN JASA PENGIRIMAN
KEPUTUSAN DALAM PENENTUAN MITRA JASA
BARANG TERBAIK MENGGUNAKAN METODE TOPSIS,"
PENGIRIMAN BARANG TERBAIK DI KOTA KISARAN
JURNAL TEKNOLOGI INFORMASI (JurTI), pp. 74-79, 2017.
MENGGUNAKAN METODE TOPSIS," JURTEKSI (Jurnal
[2] Buana, Wira, "Penerapan Fuzzy Mamdani Untuk Sistem Pendukung Teknologi dan Sistem Informasi), vol. 5, no. 2, pp. 133-138, 2019.
Keputusan Pemilihan Telepon Seluler," Edik Informatika, vol. 2, no. 1, [20] I. Mutmainah and Y. Yunita, "PENERAPAN METODE TOPSIS
pp. 138-143, 2015.
DALAM PEMILIHAN JASA EKSPEDISI," Jurnal SISFOKOM
[3] A. Pamuji and H. S. Setiawan, "Membangun Sistem Pendukung (Sistem Informasi dan Komputer), pp. 86-92, 2021.
Keputusan Untuk Rekomendasi Pada E-Commerce Melalui Penerapan [21] G. Ginting, Fadlina, Mesran, A. P. U. Siahaan and R. Rahim,
Logika Fuzzy," Teknik Informatika, Fakultas Teknik, Matematika dan "Technical Approach of TOPSIS in Decision Making," International
Ilmu Pengetahuan Alam, pp. 341-352, 2016. Journal of Recent Trends in Engineering & Reasearch, pp. 58-64, 2017.
[4] E. W. Hidayat, A. W. Widodo and B. Rahadi, "Analisis Optimasi
[22] Yonathan, "ANALISIS PEMILIHAN VENDOR TERBAIK DALAM
Multiple Travelling Salesman Problem Time Window Pada Algoritma
PENGIRIMAN PRODUK MINUMAN DALAM KEMASAN
Genetika Terhadap Pemilihan Rute Pengiriman Barang J&T Express
MENGGUNAKAN METODE AHP DAN TOPSIS DI PT CS2 POLA
Surabaya," Jurnal Pengembangan Teknologi Informasi dan Ilmu SEHAT," Jurnal Logistik Indonesia, 2020.
Komputer, pp. 3899-3905, 2018.
[23] L. S. Putri, N. Hidayat and Suprapto, "Sistem Pendukung Keputusan
[5] B. V. Christioko, H. Indriyawati and N. Hidayati, "Fuzzy Multi-
Pemilihan Mitra Jasa Pengiriman Barang menggunakan Metode
Atribute Decision Making (Fuzzy Madm) Dengan Metode Saw Untuk
Simple Additive Weighting(SAW) –Technique for Other Reference by
Pemilihan Mahasiswa Berprestasi," Jurnal Transformatika, vol. 14, no.
Similarity to Ideal Solution(TOPSIS)di Kota Malang," Jurnal
2, pp. 82-85, 2017.
Pengembangan Teknologi Informasi dan Ilmu Komputer, vol. 2, no. 3,
[6] N. a. M. N. Oktaviani and N. Nurmalasari, "Pemilihan Jasa Pengiriman pp. 1219-1226, 2018.
Terbaik Menggunakan Metode Simple Additive Weighting (SAW),"
[24] T. Y. Akhirina, "Komparasi Metode Simple Additive Weighting dan
JUSTIN (Jurnal Sistem dan Teknologi Informasi), vol. 6, no. 4, pp.
Profile Matching pada Pemilihan Mitra Jasa Pengiriman Barang,"
223-229, 2018.
Jurnal Edukasi & Penetilian Informatika (Jepin), pp. 27-33, 2016.
[7] A. Putri and S. Wasiyanti, "Pemilihan Jasa Pengiriman Barang
[25] Z. Azmi and M. Dahria, "DECISION TREE BERBASIS
Menggunakan Metode Simple Additive (SAW)," SATIN – Sains dan
ALGORITMA UNTUK PENGAMBILAN KEPUTUSAN," Jurnal
Teknologi Informasi, pp. 10-19, 2020.
Ilmiah Saintikom, 2016.
[8] T. Y. Akhirina, "SISTEM PENDUKUNG KEPUTUSAN [26] A. P. Nanda, S. Sucipto and S. Hartati, "Analisis Menentukan Jasa
PEMILIHAN MITRA JASA PENGIRIMAN BARANG Pengirim Terbaik Menggunakan Metode Weight Aggregated Sum
MENGGUNAKAN METODE SIMPLE ADDITIVE WEIGHTING," Product Assesment (WASPAS)," Jurnal Manajemen Sistem Informasi
JUPITER, 2019. dan Teknolgi, 2020.
[9] V. Putratama and D. L. Sumarna, "Penentuan Jasa Logistik Pada [27] E. D. S. Rizal, H. Ahmad, S. Wibawanto and G. P. Cahyono, "Sistem
Umkm Kota Cimahi Menggunakan Metode Fuzzy Simple Additive pendukung keputusan pemilihan driver berbasis metode certain factor,"
Weighting," Semnas Ristek (Seminar Nasional Riset dan Inovasi Jurnal Teknologi Elektro dan Kejuruan, 2019.
Teknologi), 2020.
[10] V. Putratama and D. L. Sumarna, "Penentuan Jasa Logistik Pada
Umkm Kota Cimahi Menggunakan Metode Fuzzy Simple Additive

288 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

RTR AR PHOTO BOOTH:


THE REAL-TIME RENDERING AUGMENTED
REALITY PHOTO BOOTH
Muhamad Fajar Yogi Udjaja
Computer Science Departement Computer Science Department
School of Computer Science School of Computer Science
Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia 11480 Jakarta, Indonesia 11480
[email protected] [email protected]
[email protected]

Eko Setyo Purwanto Anderies


Computer Science Department Computer Science Department
School of Computer Science School of Computer Science
Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia 11480 Jakarta, Indonesia 11480
[email protected] [email protected]

Abstract — The use of Photo Booth at several events and technology is emphasized to balance social problems
has become a means of documentation such as selfies that need to be resolved while ensuring economic
and wefies, However, in some applications, the development [4].
technology has not been utilized properly. This study On the other hand, the Future Today Institute describes
aims to describe how the application of Augmented one of the technological developments with valuable
Reality technology to be applied to the form of market potential, Augmented Reality, which includes
photography services in the form of a photo booth with various application developments in several sectors such as
real-time rendering techniques from Spark AR and the military, nursing, health, engineering, and entertainment
use of the cloud. We propose a creation schema to [5]. One form of entertainment needs for someone is
implement augmented reality in photobooths for satisfaction when sharing information and experiences
marketing purposes in exhibitions or events. The online, which can be in sharing media and photograph
implementation description will include how online [6].
augmented reality technology is used, development
methods, and references on how to use it until it is Photograph taken by non-professionals can usually be in
ready for use. The performance measurement was also the form of selfies traditionally termed on social media.
carried out using Frame Rate Per Second (FPS) on two Selfies represent a form of real-life behavior that is not
different device configurations in several experiments. contaminated by academic training [7], [8]. This type of
The results show that the proposed photobooth can run behavior has become spontaneous both in individuals and
up to more than 60 FPS or at above standard in groups. Group selfies are also commonly referred to as
performance. “selfies” or “wefies.” Most selfies are taken using front-
camera when someone monitoring their image on the
Keywords—Augmented Reality, Photo Booth, Photo Corner, preview screen like a mirror[8], [9]. One form of
Spark AR, One Drive. photography service present in Indonesia, thanks to
technological developments, is a photo booth or photo
I. INTRODUCTION corner [10]–[12].
Various daily activities have been transformed from the In line with photograph technology development,
impact of the opening of a global network known as the several research in photo booth have been designed with
Internet [1]. Reports from the survey published by the and without technology. Several studies that succeeded in
Indonesian Internet Service Providers Association in 2019, improving user experiences using physical attribute such as
Indonesia has internet users have reached 171 million users Saptodewo design and study about visual media using
with a user growth percentage of up to 10% every year [2]. physical costumes in photo booth to preserver Indonesian
Developments in the photography sector have also been culture [11] and Renzi Anita study to enhance guest
accompanied by the growth of internet users. Now experience used physical decoration in wedding
photograph has been more widely produced, consumed, celebration [10]. On the other side, several studies using
and distributed on computers, mobile phones via the virtual attribute such as augmented reality also succeed
internet and social-network sites [3]. Image digitization has improving user experiences, provide unique value and
also illustrated a change of new ways of socializing that is satisfaction experiences in [12]–[14].
more real-time, collaborative, and networked [3]. This Based on recent studies development in photo booth
form of socialization is also part of Society 5.0, which design, this study describes photo booth service design
refers to a new type of society where innovation in science

978-1-6654-4002-8/21/$31.00 ©2021 IEEE


289 28 October 2021, Jakarta - Indonesia
2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

using the implementation of Augmented Reality includes determining assets in the form of effects,
technology through a non-professional photography logic, and animation that will be planned to be made in
product in providing real-time, collaborative selfies or a sketch.
wefies and networking and provide unique experiences in a 2. Asset Production is the second stage in the planning
photo booth. process in making digital assets before implementing
them into the engine or creating physical assets if
II. LITERATURE REVIEW needed.
A. Augmented Reality 3. Development is the third stage in processing digital
assets that have been made to be implemented on the
Augmented Reality (AR) is a variation of the Virtual
Spark AR engine and creating some effects,
Environment (VE), VE technology. Make it seem as if the
animations, or interactions using the patch editor.
user has entered a synthetic world [15]. On the other hand,
4. Cloud Setup is a step to synchronize the capture
Augmented Reality, Augmented reality makes users feel
results that are made to be uploaded to the cloud
like virtual objects in the real world. In this case,
automatically.
Augmented Reality adds virtually anything to the real
5. Finishing Display is the final stage to complete and
world. For example, when we add light to an empty table,
adjust the development results on Spark AR, which
the light here signifies a virtual object inserted into the real
has been developed on the screen.
world [16]
6. Performance Test is carried out after the final stage
B. Spark AR Studio using FPS (Frames Per Second) as in previous
Spark AR Studio is software used to create virtual research in Augmented Reality[22]
effects and objects that can be integrated with the natural B. The Real-Time Rendering Augmented Reality Photo
world so that the thing is in the real world. Spark AR Booth
Studio can display objects through barcode or image
The proposed an augmented reality photo booth
scanning to display virtual objects on real objects, making
concept with a real-time rendering system using AR Spark.
the displayed objects appear to be real. For example, such
We are mirroring a display on a PC that displays real-time
as hats, glasses, to unique effects in a photo can make it
rendering in the Spark AR tool to the monitor to appear as
appear as if the object is used by humans in the real world
if the user is in front of a mirror. That the concept we
[13], [17].
named The Real-Time Rendering Augmented Reality
C. Cloud Photo Booth (RTR AR Photo Booth) can be seen in the
Cloud storage is a breakthrough in information Architecture in Figure 1.
technology that is expected to answer the needs of many
people to store large amounts of data and can be stored or
accessed remotely. Cloud computing makes it easy for
users to be able to carry out activities anywhere and
anytime, both in the form of sending data to large data
storage. Currently, cloud computing can be further
developed, for example, automation of cloud capturing.
Here, users take advantage of the cloud computing feature
to save an image of the screen capture (screenshot) that is
carried out into cloud storage. One example of cloud
storage widely used is Microsoft OneDrive, Dropbox, and
Google Drive. Google Drive and Microsoft OneDrive have
advantages in security in accessing data. In addition,
Google Drive and Microsoft OneDrive have features to
collaborate in real-time[18]–[20].
D. Photo Booth
A photo booth is a tool that uses a camera that is used to
take pictures of oneself that can be used by many people.
Photo booth usually has effects that can be used to make
Fig 1. The Architecture of RTR AR Photobooth
the photos more interesting. This is because photo booths
fundamentally combine objects virtually with objects in the
The following is an architectural explanation in Figure
real world (humans as real-world objects) [9], [11], [17],
1, which includes the following ways of working:
[21].
1. First, Spark AR on the PC requests an image.
III. PROPOSED METHOD 2. Then, the Web Camera continues the request from the
PC by capturing the image.
A. The Schematic Creation Flow 3. After that, the Web Camera takes the picture in front
We proposed The Schematic Creation Flow (TSCF) of it.
for implementing The Real-Time Rendering Augmented 4. Then, the Web Camera sends the captured image to
Reality Photo Booth that consists of 6 stages such as : Spark AR on the PC.
5. PC will mirror the screen in the real-time field
1. Gathering and Planning is a data collection process
rendering Spark AR via HDMI
related to the need for photo booth content which
6. Also, the monitor will display real-time column

290 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

mirroring of Spark AR rendering


7. When the user comes to see itself in Monitor
mirroring, the user can request to be captured via the
admin
8. Then, the admin presses the print screen button on the
PC keyboard to capture the image in the real-time
rendering column of Spark AR
9. PC processes the capture by producing an image file
with the .jpg format
10. The file will be automatically saved in the local
Fig 3. Creating digital assets using Blender
screenshot folder on the PC
11. there are changes to the local folder with a request to
2. Asset Production is the second stage in the processing
check for new files
of asset creation, including 3D-based digital assets
12. After that, One Drive will check for changes to the
using Blender tools (See Figure 3), Video Animation
folder by checking for new files
for video space using Adobe Premiere (See Figure 4),
13. Then One Drive will know the changes to the folder
and graphic design for Image Space using Adobe
by uploading the new file to an existing folder in the
Photoshop (See Figure 5)
cloud
14. After the new file has been fully uploaded, the system
will send a notification that the file has been added to
the cloud folder on one drive.
15. PC will display the message of files that have been
successfully added to the cloud
16. Admin notifies cloud link via URL shortening or QR
Code image, which will point to a single drive link so
users can view and download the screenshot file
17. User opens a link notified by admin via smartphone
18. User Smartphone accesses the cloud link by requesting
to open the cloud folder on One Drive
Fig 4. Creating assets for video space using Adobe Premiere Pro
19. One Drive displays the user’s photo image files in the
cloud folder
20. Users view the image file and can be downloaded

IV. RESULT & DISSCUSSION


The implementation of The Real-Time Rendering
Augmented Reality Photo booth based on The Schematic
of Creation Flow are as follows:

1. Gathering and Planning, this stage begins with making


a sketch drawing containing the photo booth concept
you want to use. The depiction of the sketch concept
in Figure 2 which is explained the use of digital assets Fig 5. Creating assets for image space using Adobe Photoshop
on the hat and the image of the event where this photo
booth is implemented, then the image space on the left 3. Development, at this stage, the assets that have been
and the video space is filled with video content with created in the previous process will be implemented on
content related to events and social media promotion. Spark AR by using several additional effects provided
by Spark AR (See Figure 6), such as animation, particle
effects, texture and material

Fig 6. Development on Spark AR

Fig 2. Depiction of concepts through sketches 4. Cloud Setup is a step to synchronize the capture
results using Microsoft One Drive by turning on the
“Automatically Save Screenshots capture to One

291 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Drive”.This Feature makes the print screen or capture


results from the PC integrated into the pictures folder
on OneDrive by selecting Help & Settings in One
Drive on the bar in Windows 10. If this section (See
Figure 7) has finished setting the screen button on the
keyboard when pressed, then the screenshot results
will go directly to the pictures folder directory on
OneDrive on the PC. After that, open
onedrive.live.com and log in using the same one drive
account used on the PC, then go to the directory folder
pictures to set the link settings to get the screenshot
link. You can also turn on other settings such as link Fig 10. Layout View
expiration and password (See Figure 8).

Fig 11. Display on Monitor

Fig 7. Help & Settings of OneDrive in Windows 10

Fig 12. The use of real-time photo booth rendering

Fig 8. Link Settings

5. Finishing the Display, At the final stage, open the image


space using Microsoft Paint, then open the video with the
default Windows 10 application, then make the Windows
toolbar sequence as shown in Figure 9. After that, make a
sequence layout like the concept sketch shown in Figure
10. Then, if the layout has been made, then use the
magnifier tool on windows to adjust it to the monitor size
used. If it is already displayed on the monitor, it will be
like Figure 11. Furthermore, the final result will be as in
Figure 12, the real-time photo booth rendering is ready for
the screenshot and automatically, the image will go
directly to the one drive link (See Figure 13) that has been
set before in Cloud Setup.

Fig 13. The One Drive link, when accessed by the user on a smartphone

Fig 9. Toolbar Windows 10

292 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

6. Performance Test, In this section, we measure FPS


on two different device configurations. The
configuration used is (i) Device 1 with Intel Core i7-
9750H, RTX2070 6GB, RAM 16GB (ii) Device 2
with Intel Core i5-10300H, GTX 1650TI 8GB,
RAM 8 GB. Then, we conducted five experiments
on each device with a time (S) of 60 seconds. Each
trial will measure the RTR AR Photo Booth
capability in several people. The first experiment
consisted of 1 people, the second experiment
consisted of 2 people, the third experiment consisted
of 3 people, the fourth experiment consisted of 4
people. The fifth experiment consisted of 5 people. Fig 17. Result of FPS Performance Average on Both Device

The results of FPS Performance on Device 1 tend to be


stable (see Figure 14), where the lowest value ranges from
55 to 71 (See Figure 16), and the average of the five
experiments is above 60 (See Figure 17). Previous research
shows that if the average is above 60, it shows a very good
performance [23]. This indicates that the RTR AR
Photobooth is running perfectly. Then similar results on
Device 2 also show results that tend to be stable (see
Figure 15). The lowest value occurred in the experiment
consisting of 5 people with 30, and the average was 40.33
(See Figure 16). However, if you look at the experimental
results in the five configurations, the average is around
Fig 14. Result of FPS Performance on Device 1 40FPS (See Figure 17), or it can also be classified as
acceptable performance because it is around 30 to 60 FPS
[22]. From the results of the two configurations, it is found
that the RTR AR Photobooth can run above standard
performance because there is no FPS value below 30 FPS
[22]. This means RTR AR Photo Booth has stable
performance in several experiments and can be
implemented as a photo booth for marketing purposes in
exhibitions or events.

V. CONCLUSION
The design and development method that has been
described is one way that can be used by industry players
in implementing photo booths using one of the
Fig 15. Result of FPS Performance on Device 2 technologies currently developing, namely Augmented
Reality. As well as the results that have been explained that
Photobooth can be implemented using existing tools. This
is also supported by several experiments showing that the
performance results can run at acceptable performance
which is able to accommodate selfies to wefies.
Furthermore, this paper can be used as a reference or guide
in implementing the use of an augmented reality-based
photo booth using real-time rendering from Spark AR and
how to share using the cloud on OneDrive. In addition, the
aim of this paper is to implement augmented reality
technology that can be used for marketing purposes in
events that can attract people.
Fig 16. Result of FPS Performance on Both Device

293 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

jitter on the presence of virtual entities,” ISS 2019 - Proc. 2019


ACM Int. Conf. Interact. Surfaces Spaces, pp. 5–16, 2019, doi:
REFERENCES 10.1145/3343055.3359710.
[23] S. G. Roy and U. Kanjilal, “Web-based augmented reality for
information delivery services: A performance study,”
[1] N. Limantara, I. S. Edbert, P. J. Widjaya, and M. Adina, “User DESIDOC J. Libr. Inf. Technol., vol. 41, no. 3, pp. 167–174,
acceptance analysis on intacs ERP distribution application using 2021, doi: 10.14429/djlit.41.3.16428.
technology acceptance model,” ICIC Express Lett., vol. 15, no.
4, pp. 349–355, 2021, doi: 10.24507/icicel.15.04.349.
[2] N. S. R. Rais, M. M. J. DIEN, and A. Y. DIEN, “Kemajuan
teknologi informasi berdampak pada generalisasi unsur sosial
budaya bagi generasi milenial,” J. Mozaik, vol. 10, no. 2, pp.
61–71, 2018.
[3] R. Rodríguez, F.-J. Molina-Castillo, and G. Svensson, “The
mediating role of organizational complexity between enterprise
resource planning and business model innovation,” Ind. Mark.
Manag., vol. 84, pp. 328–341, 2020.
[4] P. Ruivo, T. Oliveira, and M. Neto, “Examine ERP post-
implementation stages of use and value: Empirical evidence
from Portuguese SMEs,” Int. J. Account. Inf. Syst., vol. 15, no.
2, pp. 166–184, 2014.
[5] S. V Siar, “Harnessing the Fourth Industrial Revolution:
Creating Our Future Today,” 2019.
[6] P.-F. Hsu, “Integrating ERP and e-business: Resource
complementarity in business value creation,” Decis. Support
Syst., vol. 56, pp. 334–347, 2013.
[7] J. Cirklová, “Reaffirming Identity Through Images. The
commodification of Illusions in the Contemporary Presentation
of Self,” methaodos. Rev. ciencias Soc., vol. 8, no. 1, 2020.
[8] A. Weilenmann and T. Hillman, “Selfies in the wild: Studying
selfie photography as a local practice,” Mob. Media Commun.,
vol. 8, no. 1, pp. 42–61, 2020.
[9] B. Martínez, S. Casas, M. Vidal-González, L. Vera, and I.
García-Pereira, “TinajAR: An edutainment augmented reality
mirror for the dissemination and reinterpretation of cultural
heritage,” Multimodal Technol. Interact., vol. 2, no. 2, pp. 1–13,
2018, doi: 10.3390/mti2020033.
[10] R. Anita, R. Hidayati, and R. Juliani, “Eksistensi Diri Pengguna
Photo Booth Di Kabupaten Aceh Barat,” SOURCE J. Ilmu
Komun., vol. 5, no. 2, 2020.
[11] F. Saptodewo, “Perancangan karakter bregada keraton
Yogyakarta sebagai media visual pendukung photo booth,” J.
Desain, vol. 5, no. 02, pp. 74–85, 2018.
[12] F. Immanuel and A. P. Widodo, “Pengembangan Aplikasi
Photobooth Berbasis Augmented Reality,” J. Masy. Inform.,
vol. 11, no. 1, pp. 22–34.
[13] I. K. A. M. Putra, “PERANCANGAN FILTER INSTAGRAM
BERBASIS AUGMENTED REALITY DENGAN FACE
MASK SPARK AR PADA AKUN NEW MEDIA COLLEGE,”
J. Teknol. Inf. dan Komput., vol. 6, no. 3, 2020.
[14] I. Pachoulakis and K. Kapetanakis, “Augmented reality
platforms for virtual fitting rooms,” Int. J. Multimed. Its Appl.,
vol. 4, no. 4, p. 35, 2012.
[15] H. Anay, Ü. Özten, and M. Ünal, “A New Environment:
Augmented Reality,” in Game+ Design Education, Springer,
2021, pp. 241–253.
[16] R. Yung and C. Khoo-Lattimore, “New realities: a systematic
literature review on virtual reality and augmented reality in
tourism research,” Curr. Issues Tour., vol. 22, no. 17, pp. 2056–
2081, 2019.
[17] T. Kitamura, K. Yasui, and Y. Nakatani, “Proposal of Using
Digital Mirror Signage and AR Pictogram for Follow Me
Evacuation Guidance,” in International Conference on Human-
Computer Interaction, 2019, pp. 307–314.
[18] S. Alotaibi, H. Alomair, and M. Elhussein, “Comparing
performance of commercial cloud storage systems: The case of
dropbox and one drive,” 2019 Int. Conf. Comput. Inf. Sci. ICCIS
2019, pp. 1–5, 2019, doi: 10.1109/ICCISci.2019.8716385.
[19] N. Darmawansyah, “PENGEMBANGAN MODEL PHOTO
BOOTH BERFORMAT GAMBAR BERGERAK DENGAN
SISTEM CLOUD SHARING.” Universitas Negeri Makassar,
2021.
[20] Q. Zhang et al., “DeltaCFS: Boosting delta sync for cloud
storage services by learning from NFS,” in 2017 IEEE 37th
International Conference on Distributed Computing Systems
(ICDCS), 2017, pp. 264–275.
[21] E. M. Keen, J. Wren, É. O’Mahony, and J. Wray, “catRlog: a
photo-identification project management system based in R,”
Mamm. Biol., pp. 1–10, 2021.
[22] T. Louis, J. Troccaz, A. Rochet-Capellan, and F. Bérard, “Is it
real? Measuring the effect of resolution, latency, frame rate and

294 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)
A Systematic Literature Review: Database Optimization Techniques

Rizki Ashari Muhammad Fachri Akbar Winata Dharmawan Thamrin


Computer Science Department Computer Science Department Computer Science Department
School of Computer Science School of Computer Science School of Computer Science
Bina Nusantara University Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia Jakarta, Indonesia Jakarta, Indonesia
[email protected] [email protected] [email protected]

Novita Hanafiah
Computer Science Department
School of Computer Science
Bina Nusantara University
Jakarta, Indonesia
[email protected]

Abstract—Big data optimization is the main thing in getting delete replicas based on changes in user access patterns,
accurate and fast data. The condition of the data at this time is storage capacity, and bandwidth. Then choose a data location
very much, therefore optimization must be done. As technology depending on the current environment information. However,
develops, more and more data is generated, some data there are still drawbacks such as the difficulty of collecting
optimization techniques still take a long time to get optimal runtime information from all data nodes in a complex cloud
results. This research paper aims to find out several ways of infrastructure and ensuring consistency of data files.
optimizing the database with several existing techniques. The
design used is a literature review. The criteria for the papers II. RELATED WORK
used are those published in 2005-2020. In the paper that we have
researched, there are several ways to optimize a database, the A. Big Data
methods used, and the challenges that will exist during database Data that is larger than server storage and beyond
optimization. Based on the collected papers, it was found that processing capabilities is called Big Data. Big Data cannot be
there are indeed many ways to optimize the database. Then managed by traditional RDBMS or conventional statistical
there are many methods in database optimization that use the tools. Big Data increases storage capacity as well as
Map Reduce Algorithm, and it is proven that the algorithm can
processing power so that market predictions can be made
reduce the amount of work time when transferring data, and
easily [18]. New technologies are needed to handle big data in
there are several challenges in database optimization. This
research paper shows how to optimize the database from several
files in a very broad, cost-effective, and fault-tolerant way.
sources. Then there is an explanation of the methods or Big Data is often collected from multiple locations or
algorithms used in database optimization. platforms which contributes to measurement error and
experimental variation [20].
Keywords—Big Data Analysis, Cloud Computing, A lot of data in the current system has data replication, data
Systematic Literature Review, Optimizing Database, Query
replication has a problem if it already has multiple data
Database, NoSQL, Algorithm
replications, it definitely takes up more memory. Data
I. INTRODUCTION replication has the concept of static replication and dynamic
replication strategies. Replication is required to follow a
Data is one of the most important and vital aspects of deterministic policy, therefore, the number of replicas and
various activities in today's world. Therefore, the amount of host nodes is well defined and predetermined [9]. In addition,
data generated in every second is very large. The rapid this strategy is easy to implement but is not often used because
development of data today in various domains requires it does not adapt according to the environment. Then a
intellectual data analysis tools that will be useful to meet the dynamic strategy must be carried out in the cloud system by
requirements for analyzing large amounts of data. In previous creating and removing replication according to user access
studies, some data optimization algorithms still take a long patterns, storage capacity, and bandwidth. In data replication
time to get optimal results. In addition to time, data in big data,
optimization also requires the algorithms used in Big Data.
The following [14] characteristics of big data:
Replication is one of the most studied phenomena in a. Volume is the main characteristic in big data
distribution environments. Replication is a strategy where
b. Varietyis a characteristic of big data that includes
there are multiple copies of data across multiple sites. The
reasons for our interest in discussing replication based on facts structured, semi-structured and unstructured. Variety has
are, high availability, high performance, and high reliability. save various formats, such as text, images, audio, video,
Replication has many benefits such as improving system sensor data.
performance, increasing data availability and fault tolerance. c. Velocity is big data that manages high rates in a real-time
But replication costs extra to create, maintain, and update data process.
replicas. [A Systematic Literature Review of the Data d. Veracity is the accumulation of detailed data within its
Replication Techniques in the Cloud Environments] A good scope.
way to replicate data in a cloud environment is to create and e. Value is data that offers information in topic discussion.

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

295 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

f. Variability is to provide support for transforming data by BSON Formats, Graph is a model containing nodes and edges.
offering extensibility (addition of new data fields) and A node represents a relationship between entities, a key-value
scalability (extension of size) model that represents a key-value tuple [15]. Key-value has a
g. Valence (connectedness) is a characteristic that connects unique identifier at each index leading to the value, the index
common fields to combine different data sets. represents data of arbitrary type, structure, and size.
3) Cloud Computing
Big data characteristics can optimize high dimensions and
reduce poor performance, reduce computational costs, make Threats to data integrity are particularly relevant, as
the algorithm stable. interference with data can adversely affect critical business
decisions. This problem mainly occurs in the Cloud
Computing environment [1]. Cloud computing is a powerful
technology to operate on a large and complex scale by
B. Big Data Concept computing expensive computing hardware, it is necessary to
1) SQL (Structured Query Language) maintain expensive computer hardware, dedicated space, and
software [2].
Some examples of system management databases that use
SQL include Oracle, Sybase, Microsoft SQL Server, and III. STUDY REVIEW
PostgreSQL. MySQL is the most trusted place as a source
database platform that is used today. Many of the world's most A. Planning The Review
popular and highly trafficked websites are built on MySQL In this literature study, we get sources from several
because of its ubiquity across heterogeneous platforms and international papers, with the keywords used: 'Big Data',
application stacks and because of its renowned performance, 'Cloud Computing', 'Database', 'Optimization'. In order to
reliability and ease of use. Research, publication and analyze all the information from several existing journal
community service are among the activities managed by the papers, we have created several questions regarding this paper
Institute for Research and Community Services [7]. MySql with the aim of obtaining data and analysis.
configuration parameters using the hill-climbing algorithm.
• RQ1:How the database optimization is done?
a. Database Optimization
• RQ2:What methods are used to optimize the database?
Determination of optimization is an important step in SQL
statement processing in its execution time, among others, • RQ3:What are the challenges in performing database
based on costs, namely costs are generally measured as the optimization?
total time available to answer a question. Rules-based, rules- Several electronic databases are used as sources for
based optimization, selects an execution plan based on papers, journals, and theses. All electronic databases are
available access points and access point ratings. If there is certified and include only internationally approved academic
more than one way to execute an SQL statement, then the rule- documents. They are AAAI, IEEE Xplore, World Scientific,
based optimizer always uses the operation with the lower Elsevier, Taylor & Francis, Research Gate, Journal TIMES,
power [4]. MECS Press, Springer, Wiley, Nature, Complexity, arXiv,
One way to improve database performance is by optimizing MDPI and, ACM DL. Three selection and exclusion criteria
PL/SQL which uses 24 different approaches, including are applied to ensure the eligibility of selected works.
techniques such as Using Indexes, Number Out Techniques,
TABLE I. SELECTION CRITERIA
Reducing Subqueries and Combination Techniques. These
techniques were tested on three databases of different sizes, Selection Criteria Exclusion Criteria
small, medium and large, to evaluate efficiency in terms of International Paper Paper published before 2011
system execution time and responsiveness [6].
Papers that have methods and The topic of the paper is not
b. Big Data Quality algorithms related
Papers must be relevant to the Sourced from web documents
Data storage in big data is required to be of high quality, research topic. and blogs
influential factors [13] are quality data content information
and factors that provide the framework in which analysis can
be carried out [11]. B. Conducting the review
Analysis and review of the feasibility of the paper is
Information that meets the quality factor of big data, namely carried out at this stage. Analysis and review was conducted
the management system must include several things, namely: based on selection and exclusion criteria. As a result, among
relevance, accuracy, completeness, source of trust, the 30 papers previously correlated with keywords, there were
communication with the right people, timeliness, detail, and 28 papers that were eligible to be used as the final study
understanding [7]. because they met all the selection criteria (Fig. 1). (Fig.2).
2) NoSQL shows the trend of publications. SLR-related publication rates
have fluctuated over the years. Despite a huge increase in
A database that does not need a schema and has no publications in 2016, articles related to keywords have rarely
relations for each table. The data model in a NoSQL database appeared in the following years
is classified into four categories, namely: Column-Oriented,
which is a representation of a column that has a unique C. Analysis
partition and must group rows on an optional primary key We analyze and collect eligible papers based on research
[14], document- Oriented is a data model stored in key-value questions, then we compare the methods used to see their
pairs, documentation values are written in XML, JSON or

296 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

advantages and disadvantages. The following is our analysis maximize the use of resources, one example of which can be
of the reviewed papers. seen in the paper [18]. where Vertical Scalability is used to
increase process strength and capacity of hardware or software
by added some resources to make it work more efficiently and
faster. Later in the paper [10]. Distribution and Parallel
Computation is used where resources to complete a task can
distribute computing among different nodes of a cluster using
different rules. Then according to the paper [22]. Artificial
Bee Colony can be used to allocate resources and then connect
them so that they appear as a unit to get the best optimization.
While on paper [22] Artificial Bee Colony can be used to
allocate resources and then link them so that they appear as a
unit to get the best optimization. While on paper (2019)
Artificial Bee Colony can be used to allocate resources and
then connect them so that they appear as a single unit to get
the best optimization. While on paper [25] did Exploiting
Multi/Many Cores where some of the current multi-core
processors including Westmere-EX (six-eight-ten cores) from
Fig.1 Journal distribution group by year Intel and AMD's 16-core Interlagos are exploited using
multiple parallel programming models and languages to allow
each processor core to perform at its best. Lastly, on paper [27]
Open Source Cloud Platform used where the data center
provided by the cloud service provider provides basic needs
such as storage, CPU, and dedicated network bandwidth using
low cost and receiving fast optimization.
c. Using Mapreduce
Mapreduce consists of two functions, namely mapper and
reducer. The Mapper function is responsible for mapping
compute subtasks to different nodes and is also responsible for
task distribution, load balancing, and managing failure
recovery. The Reducer function is responsible for reducing the
response of the compute nodes to a single result. The reduction
Fig.2 Journal resources component has the responsibility to combine all the elements
together after the completion of distributed computing. There
are several ways to use Mapreduce, such as the Optimized Job
1) RQ1 : How the database optimization is done? Allocation Scheme in the paper [18]. Mapreduce is deployed
a. Changing the Database Infrastructure and powered by Flex, a flexible and smart distribution system
that can increase average response times and reduce
The database infrastructure can be changed by performing Mapreduce load. Then there is the Time Critical
database quality control thus enabling better database Implementation Mapreduce used in the paper [10]. based on
optimization and faster search results. One of them is with time-critical software and the Mapreduce cluster offers a file
Converged Infrastructure where the infrastructure in the paper management system that can store large amounts of data. Then
[18]. provides easier and faster scalability than traditional Iterative Optimization on paper [27]. Mapreduce used has
architectures. After that, infrastructure through data analytics, been changed andimproved to support efficient iterative
Infrastructure on paper [19]. explains optimization by MapReduce computation that avoids repeated instantiation. In
minimizing data infrastructure costs by analyzing data. Then addition, there is ABC Mapreduce in the paper [22]. a new
Infrastructure as a service where Infrastructure on paper [27]. Mapreduce that can be implemented in single node and multi-
The infrastructure is deployed in the cloud and is used when mode Hadoop. The Mapreduce has a Mapper that can find the
data resides or when performing critical operations. Then optimal place for dataset clustering and a Reducer that selects
change the next infrastructure by means of Cloud the right cluster according to execution time and error
Infrastructure, Infrastructure in Natural paper [9]. By classification.
providing computing resources such as network operating
systems, storage, networks, hardware, databases and all TABLE II. WAYS AND BENEFITS OF DATABASE
software applications in the cloud. And finally there is
Reference Method Way Benefits
Analyzing Infrastructure, this infrastructure is described in the
[17]. Aimed at managing and processing BD and minimal [9] [18] [19] Changing Carry out Provide the
Database database quality best approach
personnel with analytics to understand what BD is. Infrastructure control and
b. Maximizing the use of resources optimization
planning for
Database optimization can be done by taking advantage of better work
environment
the amount of memory, hardware speed and processor on the
efficiency
computer being used to get the best results on factors such as [9] [22] [25] Maximize the Takes Maximizes
efficiency, productivity, reliability, lifespan, power and use of advantage of the factors such as
utilization. Various methods have been carried out to resources amount of efficiency,

297 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Reference Method Way Benefits is the Artificial Bee Colony (ABC) algorithm where this
memory space productivity, method performs a search by analogy to a bee looking for
available on a reliability, nectar which is divided for itself. So the best nectar is chosen
given computer, longevity, by the bees, so that the nectar or data that the system thinks
the speed of the strength and
hardware and utilization
makes the performance take a long time will be ignored, then
the processor perform a more efficient distance calculation for the search so
used that the data is not redundant.
[9] [22] [27] Using Mapping Reducing the
Mapreduce computational response of TABLE III. DATABASE OPTIMIZATION ALGORITHMS
subtasks to compute nodes
different nodes to a single Algorithm
and merging all result Author
A B C D E
elements
[28] ✓ ✓
together after
completion of
distributed [20] ✓
computing,
[10] ✓ ✓

2) RQ2: What method is used to optimize the database? [6] ✓ ✓

In some cases, the database has problems caused by the [17] ✓


time in reading data, the search index has problems. To
overcome this problem from several journal sources that we [8] ✓ ✓
found, the methods presented are Population_EA algorithm,
Firefly algorithm, Vigenere Cipher algorithm, MapReduce [13] ✓
algorithm, Artificial Bee Colony (ABC) algorithm. The
following is an explanation of some of these algorithms: [24] ✓ ✓

The first is that the Population_EA algorithm in this [23] ✓


algorithm performs structure division operations on what is
prioritized and chooses the best data if it gets data that does [18] ✓ ✓
not match it will delete data [20].
[27] ✓
Second, there is the Firefly algorithm which was carried
out by [21 ]This algorithm will work like light, for example [2] ✓
light will choose objects that can be passed by it. With the
meaning of the data that is passed by the light, it can be [26] ✓ ✓
concluded that the data cannot be used as the main data.
[22] ✓
In addition, there is a Vigenere Cipher. This algorithm is
similar to the substitution technique for the Caesar cipher, but
the shift of letters for each period is different following certain According to table 3, annotation A: Population_EA
letters. [23], [18]. Variations in the number of different letter algorithm; B: Firefly algorithm; C: Vigenere Cipher
shifts each period will add to the level of complexity. The algorithm; D: MapReduce algorithm; E: Artificial Bee Colony
variation in the number of shifts is to substitute each letter in (ABC) algorithm.
the plaintext with each key letter. Depending on the length of
the existing key, if it is shorter than plaintext, the key letter 3) RQ3: What are the challenges in performing database
will be repeated. optimization?
Then there are also algorithms that can be divided into In doing database optimization, of course there are some
Merge and Join. challenges in optimizing, this is a normal thing, from the
current challenges, what we need to do is analyze and prevent
• The first is Map-Reduce-Merge, [27] perform and add something that has a bad impact on optimization.
a merge after the reduce phase by combining the two
outputs and reducing the performance of MapReduce To optimize the database, we must analyze the big data
into one job, so that database performance in a cloud infrastructure first, then study it and implement time-critical
environment becomes efficient because the data is applications. However, some infrastructures have not been
already partitioned and sorted by maps and reduction. adapted to meet time-critical applications, which poses several
challenges. One of the challenges is the lack of Time-Critical
• Then the other part is Map-Join-Reduce, the system Big Data facilities, most big data infrastructures are designed
extends and improves the runtime framework by to focus on the main purpose of the application rather than
adding a join phase to the reduce phase to perform considering time-critical. This infrastructure takes a long time
complex data analysis on large clusters. They to process a lot of data across multiple compute nodes. Among
implemented a new data processing strategy so that the list of issues to consider when it is appropriate to design
they ran the aggregation filtering job with two an effective time critical infrastructure for big data
consecutive MapReduce jobs. applications, should address the aspects namely: Ability to
From the two MapReduce algorithms, it will reduce the process high volumes of data, distribution and parallel
amount of work time when transferring data, and finally there computation, data locality, and error models. This is described
in a paper by [10], [4], [1].

298 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Then the second challenge is Security and Privacy which V. CONCLUSION


can be seen in the paper by [17], [6] , Big data systems are The amount of data generated every second is increasing
very concerned about the existence of a safe and privacy day by day, as is the development of the data. With the large
environment. Challenges regarding security and privacy are amount of data generated, we conducted research by looking
the definition of secure computing in distributed programming at several techniques from optimizing the database so that a
frameworks, secure data storage, transaction logs, validation large amount of data entered into the database could be
endpoints, realtime security, data centric security, access adjusted using the optimization method. Optimization that we
control details, auditing and data origination. research by way of Cloud Computing. In this study, we not
The challenges of database optimization can not only be only explain about Cloud Computing, we also describe the
obtained through the website, mobile users also face several concept of Big Data itself. We also show how to optimize
challenges such as access control and database privacy by databases from several sources, namely Changing Database
mobile phone users and adjusting the continuity of their Infrastructure, Maximizing resource usage, and Using
database, which can be seen in the paper made by [17]. Mapreduce. Then there is an explanation of the method or
algorithm used in optimizing the database such as the
Furthermore, there are challenges that have been informed Population_EA algorithm.
in the paper [32], [5], [11], [25], [29] that choosing relevant or
not data can also create security in the big data environment REFERENCES
itself.
Table IV is a table of each reference and there are also [1] Gaetani, Aniello, E., Baldoni, L., Roberto Lombardi, FM, Sassone, A.,
aspects of some of the challenges from different papers. & Vladimiro. (2017). Blockchain-based database to ensure data
integrity in cloud computing environments.
TABLE IV. ASPECTS & CHALLENGES [2] Hashem, Yaqoob, IA, Anuar, I. a., Mokhtar, NB, Gani, S. a., Khan, A.
a., & Ullah, S. (2015). The rise of “big data” on cloud computing:
Reference Challenge Aspects Review and open research issues. Information systems, 98-115I.
[10] [4] [1] Lack of Time Critical • Process high [3] Maabreh, KS (2018). Optimizing Database Query Performance Using
Big-Data Facility, volume of data Table Partitioning Techniques. IEEE.
Time Critical • Large parallel [4] More, Bhagat, SB, & Rajendra, H. (2015). Artificially intelligent
Algorithm and distribution and optimized database (AIOD). 1--4.
Analysis, Security and computing [5] Rahman, Abid, MH, Zaman, FB, Akhtar, M. a., & Nasim, M. (2015).
Privacy • Storage Optimizing and enhancing performance of database engine using data
• Real time security clustering technique. 198--201.
• Data source [6] Saisanguansat, Jeatrakul, D.a., & Piyasak. (2017). Improving
security optimization performance on PL/SQL. 1--6.
[17] [6] Privacy issues • Safe use of the [7] Satoto, Isnanto, KI, Kridalukmana, RR, Martono, R. a., & Teguh, K.
process (2016). Optimizing MySQL database system on information systems
• Research facility research, publications and community service. 1--5.
securty [8] Sun, Jiang, X.a., He, B.a., & Xianda. (2018). Database Query
[31] [8] [17] [19] Access control and • Efficient Optimization Based on Distributed Photovoltaic Power Generation.
privacy for mobile Technique 2382--2386.
users, Database • Access control
continuity [9] Alami, B., Milani, & Navimipour, NJ (2017). A systematic literature
and integrated review of the data replication techniques in the cloud environments.
• Phone internal Big Data Research, 1-7.
security [10] Basanta-Val, P., Audsley, NC, Wellings, AJ, Gray, I., & Fernández, N.
information (2016). Architecting time-critical big-data systems. IEEE Transactions
on Big Data, 310-324.
• Balanced phone
database [11] Khanra, S., Dhir, A., AKM, Islam, N., Matti Mäntymäki. (2020). Big
data analytics in healthcare: a systematic literature review. 14(7), 878-
912.
[32] [5] [25] [29] Selecting relevant and • Relevant data [12] F, Mondher., B, Imed., F, W, Samuel., (2016). Big Data Analytics-
important data, separation enabled Supply Chain Transformation: A Literature. 49th Hawaii
unconnected data • Real time International Conference on System Sciences, (pp. 1123-1132).
points, security angles
related to Big Data network [13] Clarke, R. (2016). Big data, big risks. Information Systems Journal,
collection managemen 26(1), 77-90.
• Big data analysis [14] Martinez-Mosquera, D., RN, & Lujan-Mora, S. (2020). Modeling and
management of big data in databases—A systematic literature review.
IV. DISCUSSION Sustainability, 12(2), 634.
[15] Mazumdar, S., Seybold, D., KK, & Verginadis, Y. (2019). A survey on
In this Literature Review, there are indeed many ways to data storage and placement methodologies for the Cloud‑Big Data
optimize databases, such as changing the database ecosystem. Journal of Big Data, 6(1), 1-37.
infrastructure to cloud-based which is described in the paper [16] Nazir, S., Nawaz, M., Adnan, A., Shahzad, S., & Asadi, S. (2019). Big
[9]. Then there are many methods (algorithms) in database Data Features, Applications, and Analytics in. IEEE Access, 7,,
optimization which are interesting in the algorithms contained 143742-143771.
in the paper [27] which uses the Map Reduce Algorithm [17] Sivarajah, U., Kamal, MM, & Zahir Irani, VW (2017). Critical analysis
of Big Data challenges and analytical methods. Journal of Business
Method which can be divided into Map Reduce Merge and Research, 70, 263-286.
Map Join Reduce, and it is proven that the algorithm can
[18] Roy, C., Rautaray, SS, & Pandey, M. (2018). Big Data Optimization
reduce the amount of work time when transferring data. And Techniques: A Survey. International Journal of Information
there are some challenges in doing database optimization, for Engineering & Electronic Business, 10(4).
the challenge itself from all the papers we have read.

299 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

[19] Zheng, K., Yang, Z., Zhang, K., Chatzimisios, P., Yang, K., & Xiang, [26] Chen, Y., Alspaugh, S., & Katz, R. (2012). Interactive Analytical
W. (2016). Big data-driven optimization for mobile networks toward Processing in Big Data Systems. arXiv preprint arXiv:1208.4174.
5G. IEEE network, 30(1), 44-51. [27] Ji, C., Li, Y., Qiu, W., Awada, U., & Li, K. (2012, December). Big data
[20] Bhattacharya, M., Islam, R., & Abawajy, J. (2016). Evolutionary processing in cloud computing environments. In 2012 12th
optimization: a big data perspective. Journal of network and computer international symposium on pervasive systems, algorithms and
applications, 59, 416-426. networks (pp. 17-23). IEEE.
[21] Wang, H., Wang, W., Cui, L., Sun, H., Zhao, J., Wang, Y., & Xue, Y. [28] Xu, J., Huang, E., Chen, CH, & Lee, LH (2015). Simulation
(2018). A hybrid multi-objective firefly algorithm for big data optimization: A review and exploration in the new era of cloud
optimization. Applied Soft Computing, 69, 806-815. computing and big data. Asia-Pacific Journal of Operational Research,
[22] Ilango, SS, Vimal, S., Kaliappan, M., & Subbulakshmi, P. (2019). 32(03), 1550019.
Optimization using artificial bee colony based clustering approach for [29] Labrinidis, A., & Jagadish, HV (2012). Challenges and opportunities
big data. Cluster Computing, 22(5), 12169-12177. with big data. Proceedings of the VLDB Endowment, 5(12), 2032-
[23] Alin, Z. (2016) Improving the security of cloud computing data using 2033.
a modified vigenere cipher algorithm. TIMES Journal, 5(1), 23-27 [30] Marx, V. (2013). The big challenges of big data. Nature, 498(7453),
[24] Chen, S., Wang, W., & Pan, SJ (2019, July). Deep neural network 255-260.
quantization via layer-wise optimization using limited training data. In [31] Bertino, E., & Sandhu, R. (2005). Database security - concepts,
Proceedings of the AAAI Conference on Artificial Intelligence (Vol. approaches, and challenges.
33, No. 01, pp. 3329-3336). [32] Toshniwal, R., Dastidar, KG, & Nath, A. (2015). Big data security
[25] Kejariwal, A. (2012, November). Big data challenges: A program issues and challenges. Complexity, 2(2).
optimization perspective. In 2012 Second International Conference on
Cloud and Green Computing (pp. 702-707). IEEE.

300 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Study on Face Recognition Techniques


Vanessa Giovani Ivyna Johansen Dea Asya Ashilla
Computer Science Department Computer Science Department Computer Science Department
School of Computer Science School of Computer Science School of Computer Science
Bina Nusantara University Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia Jakarta, Indonesia Jakarta, Indonesia
[email protected] [email protected] [email protected]

Novita Hanafiah
Computer Science Department
School of Computer Science
Bina Nusantara University
Jakarta, Indonesia
[email protected]

Abstract—Face recognition is a favoured research subject as II. LITERATURE REVIEW


it's a more conventional method to record students' attendance,
an identification method by using individuals' faces using Many methods and algorithms have been proposed and
images, or (real-time) camera. In these papers, the authors have implemented on face recognition. Some papers we have
researched and discussed the different techniques used to be researched demonstrated face recognition systems mainly
implemented in face detection, face recognition, and face using Haar Cascade algorithm, Local Binary Pattern (LBP),
recognition through their methodology and algorithm's Local Binary Pattern Histogram (LBPH) and OpenCV library.
accuracy. The purpose of this research is to explore the different Some also used Convolutional Neural Network (CNN),
algorithms implemented in creating a face recognition system Principal Component Analysis (PCA) and Deep Learning. We
with its accuracy performing the system's algorithm. The have learned that the Haar cascade classifier isable to detect
authors have identified 30 different research papers, conference low-quality images as low as 45px [1], and another paper
papers, and journal papers within the period of the years 2012- proved that it can produce up to 98.01% accuracy when the
2020. Each paper discussed and achieved results based on the cascade is added with additionalclassifiers such as the skin
accuracy of the algorithm used, demonstrating its advantages tone matching and eye and mouth detection [2]. Viola-Jones
and disadvantages. Thus, giving the conclusion which algorithm algorithm in Zulgifar et al. (2019) paper is used to detect faces
is suitable to be used in a face recognition system. and showed that the Squeezenet model is a suitable trade-off
Keywords— face detection, face training, face recognition,
with 98.6% accuracy [3]. In Mady, H. H., & Hilles, S. M.
algorithm , methodology, accuracy (2017) paper described a system which uses Viola and Jones
algorithm to detect faces under constant motion, achieving a
I. INTRODUCTION high recognition rate of 69.38% and 92.78% accuracy [4].
Attendance is essential for institutes to record their Another algorithm that has been used is CNN. A research
students' presence in class. Even though paper-based is still in 2017 [5] added 2 normalization operations to two CNN
used in some institutes of Indonesia, nowadays it's common to layers and achieved an error rate of nearly 0. Meanwhile, a
use a Radio-frequency identification (RFID) tag to detect the paper [6] analysed VGG-Face and Lightened CNN models by
arrival of students in the class. By implementing an RFID tag, providing promising classification results. The algorithm may
it reduces the loss of time whilst increasing data security and also be used in Bayesian learning which reached the
storage. Contrarily, students can falsify their attendance by accuracies of 89% and 91% on EKFD and AT & T databases
asking their peers to tap their tags in the RFID machine. [7]. Using CNN and Conditional Random Field (CRF) models
Biometric systems are also commonly used to takeattendance on Bayesian, the paper has proven the proposed method can
such as fingerprint, iris scan, or facial recognition. Therefore, reach a higher accuracy of 94.3% for the YouTube Faces
if attendance is implemented with face recognition, it will database [8]. Asim Jan’s group [9] also experimented using
improve the time consumption and give accurate timing of the same algorithm to extract facial parts instead of whole
their presence on site. faces, as well as taking both depth and texture into account,
obtaining a maximum accuracy of 88.54%.
Many methods have been used for implementing face
recognition systems. Face recognition has been widely In addition, Jones, M., & Kobori, H. (2017) used CNN
implemented in the education system. Recently, many studies algorithm to detect faces. It demonstrates a facts algorithm
have developed face recognition to track student’s attendance. that results in 83.5% accuracy [10]. Winarno, E., Al Amin,
We are using several libraries in python including OpenCV, I.H., Februariyanti, H., Adi, P. W., Hadikurniawati, W., &
NumPy, also two algorithms which are the Haar-cascade Anwar, M. T. (2019) proposed a system using the same type
algorithm and Local Binary Pattern Histogram (LBPH) of algorithm, CNN and PCA to detect faces and recognize
algorithm in this study. Using a dataset to train, the Haar- framework models. This hybrid algorithm produces 98%
cascades algorithm is used for face detection to identify and accuracy in detecting faces from an openCV dataset [11].
locate faces in an image with a fast process and suitable Similar system is built using CNN proposed by HL, D. G.,
accuracy. Meanwhile, face recognition uses the LBPH Vishal, K., Dubey, N. K., & Pooja, M. R. The algorithm uses
algorithm to recognize whose face it is by comparing the input hyperlane similarity, linear SVM method, DHA and L2linear
image histogram with the database histogram. to calculate the accuracy of detecting the dataset fromIARPA
Janus Benchmark-A (IJB-A) and YoutubeFaces. It depicts

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

301 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

that using hyperlane similarity gives more accurate results in M. M., Othman, K. A., Suliman, S. I., Shahbudin, S., &
detecting faces [12]. Mohamad, R. (2018) to detect faces whilst combining it with
LBPH. It results in showing the students attendance [29].
Then, there is the biometric method, which has been Another paper presented by Sunaryono, D.,Siswantoro, J., &
implemented on Akanksha et al. (2018) and Kumar et al. Anggoro, R. (2019) are based on classifiers namely logistic
(2017) papers. In the former [13] paper, two images were regression, linear discriminant analysis, and k-nearest
tested to identify the correct database and stated that theface neighbor, which lead them to a 97.29% accuracy by
recognition model is based on psychological biometric employing LDA and the system results in 0.000096s to detect
characteristics with relevance of expressions. The author used faces in the sample of dataset [30].
various datasets, increasing the detection rate to around80%
accuracy, 89% performance from LBP, 89.5% from Artificial III. METHODOLOGY
Neural Network, and 60% performance with Support Vector
Machine [14]. Moreover, a deep learning algorithm in A. Planning the review
Arsenovic, M., Sladojevic, S., Anderla, A., & Stefanovic, D. The study involves some academic papers and articles
(2017) is also used to recognize faces by implementing deep related to selected keywords which are ‘face recognition’,
neural network (DNN) classification to train the dataset given ‘haar-cascade’, ‘Viola-Jones’, ‘LBPH’, ‘biometric’, ‘LBP’,
whilst augmenting the images, obtaining a 95.02% accuracy ‘LDA’, ‘CNN’, ‘PCA’ and ‘attendance’. After we gather the
in recognizing faces [15]. Bhatti, K. L., Mughal, L., papers which contain all information regarding our research,
Khuhawar, F. Y., & Memon, S. A. (2018) execute a similar we analyze and observe the collected papers by proposing
system using a deep learning algorithm which also included a research questions that are:
Histogram of Oriented Gradient system. This system
proposed a high accuracy of 96% of recognizing a single face. • RQ1: What are the algorithms used for Face
However, it loses accuracy when more than one face is Recognition?
detected [16]. • RQ2: What are the processes that are implemented for
For the recognition or identification part, there are several Face Recognition?
algorithms which are commonly used, like the Local Binary • RQ3: What is the result for each Face Recognition
Pattern Histogram (LBPH), Latent Dirichlet Allocation research paper?
(LDA), and PCA. A research in 2019 [17] used the LBPH
algorithm, yielding an accuracy of almost 80% by 50 epochs, Several algorithms and databases were used as a reference
while Serign Modou Bah and Fang Ming [18] got 90.49% from papers, articles, and journals. According to certain
accuracy with the same algorithm. The latter then combined eligibility requirements, we applied several selection criteria
the LBPH algorithm with image processing techniques such and exclusion criteria.
as Contrast Adjustment, Bilateral Filter, and Histogram
Equalization, which led them to a 99% success rate. Then, TABLE I. SELECTION CRITERIA
there is a research titled “Comparison of PCA and LDA Selection Criteria Exclusion Criteria
Techniques for Face Recognition Feature Based Extraction International reference publication Paper which was published prior to
With Accuracy Enhancement”, which analysed and papers. 2010.
compared the reliability between PCA and LDA in varying Paper is either a journal or Non-academic paper.
conditions such as illumination, glass and non-glass,as well as conference paper.
specified expressions [19]. Based on this individual Paper is related to face recognition
experiment, the outcome concluded that PCA might be more
efficient than the other, with overall performances of 74.47% B. Conducting the Review
and 72.72% respectively. This is supported by Anissa Lintang
Ramadhani and her group’s work [20], which used PCA and Analysis and review has been conducted with reference to
got a minimum accuracy of 92% and a maximum of 99%. the accuracy of each algorithm used in the papers. It is carried
Ashim Saha and Krishnan, M.G.’s groups have also proven out based on selection criteria on Table 1. In conclusion, 30
this algorithm to give desirable results. [21][22] Not to papers corresponds to the keywords above
mention, this algorithm has a fast recognition time which, if C. Analysis
optimized, only takes 1.4 to 1.5 seconds [23]. Following the completion of collecting the papers, we
Another paper used the author’s face images for the analyze the eligibility of the papers based on the papers'
dataset and uses PCA Algorithm to increase the method’s questions. Next, conducting a comparison of algorithms used
accuracy much greater and resulted 65% on left direction, to view the accuracy. The following is a graph that analyzes
86% on frontal face, and 69% on right direction of accuracy the review papers.
[24]. PCA algorithm is used to detect and recognize faces in a
real-time video proposed by Adeniji, K., Awosika, O.,
Ajibade, A., & Onibonoje, M. (2019), resulting in a 73%
accuracy in marking attendance of student faces [25]. A paper
in 2018 [26] combined LBPH for recognition and LDA for
gender classification, along with Haar cascade classifiers.
This method was reliable because the gender classification
made it easier to identify the subjects. Finally, for storing the
user’s information data, some papers stated that they use
PostgreSQL [27] and Cloud Storage [28]. Haar-cascade
classifier is also used by the writer Yusof, Y. W. M., Nasir,
Fig 1. Papers Years of Publication Graph

302 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Algorithms Process
In order to detect faces based on images or videos, a From the reviewed papers, there are 3 general processes in
classification algorithm is used. The papers we have face recognition. They are face detection, face training, and
researched have used Haar Cascade algorithm, Local Binary face recognition
Pattern (LBP), Local Binary Pattern Histogram (LBPH),
OpenCV library, Convolutional Neural Network (CNN), 1. Face Detection
Principal Component Analysis (PCA) and Deep Learning. In face detection, the classifier is prepared to differentiate
From these methods, we want to find out which has the most objects into face and non-faces. This is mainly done by
detection rate, as well as to what extent the algorithm extracting facial features of a human face, such as the eyes,
maintains its accuracy, including varying conditions like low- nose, and mouth. Among the papers reviewed, some changes
resolution images or multiple faces in one picture. were done in order to test and compare the effectiveness,
Next, we need the recognition algorithm to identify the which is shown on Table III. One of these changes is the use
faces. From the papers we have reviewed, the most popular of three additional weak classifiers on the Haar cascade, such
algorithms are Local Binary Pattern Histogram (LBPH), as skin tone matching, eye, and mouth detection. By using the
Latent Dirichlet Allocation (LDA), and PCA. Therefore, we skin tone classifiers, non-human faces with illegible skin tones
conduct our analysis to observe each algorithm’s advantages will be rejected. With the eye and mouth classifiers, residual
and disadvantages in a form of a table. non-human faces are further removed. It is simple to
implement because of the availability of OpenCV, but skin
TABLE II. ALGORITHM IN FACE RECOGNITION PROCESS hues can change under different lighting so the range of races
COMPARISON for the detection is still limited [2].
Reference Algorithm Advantage Disadvantage Another method is to add a batch normalization process
1, 2, 26, Haar Fast in calculating; Low after two different layers of CNN. This can stabilize the
performance learning process, giving a powerful recognition algorithm
with a significant decrease in error rate compared to the
27, 28, 29 Cascade reduce number of in detecting face
training samples from various original CNN. However, the process of recognizing each face
races and long becomes slower because the system needs to undergo
distance additional layers [5]. A paper proposed a system which
3, 4, 16 Viola-Jones Efficient in reducing Time- extracts the whole face instead of facial features, as well as
error rate consuming in taking depth and texture into account. By capturing the face in
detecting faces; 3D, the proposed method is able to capture all muscle
sensitive to movements accurately, regardless of the lighting and pose.
lighting
conditions With this method, there are two challenges to face. First,
aligning a 3D data to a 2D data is difficult. Second, different
13, 14 Biometric Fast-processing in - muscle activations have different impacts on the face shape
limited time; effective
in dealing with [9].
challenges
In contrast, a hybrid feature extraction, CNN-PCA,
17, 18 LBP High data accuracy in - which is a combination of CNN and PCA, did it the other way
real-time environment around. CNN extracts 2D face images and constructs the 3D
17, 18, LBPH Applicable in a - model of it. With PCA, the face image resolution is reduced.
26, Face recognition becomes faster and more accurate [11].
29 controllable However, facial attendance using 2D to 3D conversion
environment; easy to methods is still rare and unexplored.
recognize front and
side face. A few methods can be implemented for deep learning,
such as DNN and HOG. By using DNN, augmented images in
19, 26, 30 LDA High-efficiency Illumination
classification problem
the dataset could be used to adapt to partial noise data,
allowing the system to work in poor traffic and low-quality
5, 6, 7, 8, CNN High performance Sensitive to low devices [15]. On the other hand, HOG works well in different
9, 10, 11, over hand-crafted depth and poses and variations, however, it still needs some
12 techniques; minimum texture improvement as it still fails to detect faces from a distance
error rate; brightness [16].
improvisation; fast
approach
TABLE III. FACE DETECTION PROCEDURE COMPARISON
11, 19, PCA High performance in Illumination
20, 21, face and mobile problem Refe Name Category Advantage Disadvantage
22, 23, recognition renc
24, 25 es
2 Addition of 3 Classifier Fewer Races that can
15, 16 Deep Reduce computation Sensitive in week false-positive be detected
Learning time; high accuracy number of classifier results arestill
(HOG and with many data and people limited
CNN) angles directions 5 Batch Method Fast and Takes more
normalization regulated time in
process learning process prediction
9 3D Method Captures all Difficulty in
extraction muscle aligning/conv
movement er ting 3D

303 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

data to 2D. purpose is because eigenvectors are derived from covariance


Different matrix which has a great probability distribution and vector
muscle
activations
space dimension to identify the face reconstruction.
have The paper [24] applies eigenface recognition because of its
different
impacts on
efficiency and ease of implementation. However, the result’s
the face accuracy can be decreasing because of light intensity, scale,
shape and orientation of the image. LDA Algorithm, where it’s the
method on paper [26][27], performs classification among
11 CNN-PCA Model Uses 2D to 3D pre-processed images of faces for gender classification with
combinati on and low-resolutio n conversion
Algorith images method is the alignment at eye level. In another case, SQL Query is used
m still to test the database and from the phpMyAdmin control panel,
unexplored the database is checked [29]. Another method, the paper [30]
15 DNN Model Works in poor - used LR, LDA, KNN classifiers to calculate the average
implement traffic and
ation in deep accuracy and total training time of the 2-fold and 5-fold cross
applicable in
learning cheap
validation to observe the differences.
hardwares
16 Histogram of Method Performs wellin Difficult in TABLE IV. FACE TRAINING PROCEDURE COMPARISON
Oriented different poses detecting
Refere Name Category Advantage Disadvantage
Gradient and variations facesin the
nces
(HOG) distance
6 VGG-Face Model Subtract Long
System in
images. inference
deep
time.

11 Mahalano Method Identify -


2. Face Training bis similarities of
After detecting faces, this step is necessary to train the Distance face features
12, 15 Bayesian Model Clustered in -
facial where the system can identify a face calculated on its Deep the same class
‘matched’ accuracy later. Some papers perform dataset Learning of image.
training where it will be used as the sample dataset 3, 13 Feature Method Minimized High
experiment. DATASET LR500 [1], AT&T face dataset Base, effects of convolution
[7][14], EURECOM Kinect Face Database [7][14], MEDUE- Holistic, shortcoming al cost.
S-V-DB videos [4], AR Face Database [6], CMU PIE and Hybrid to improve
Approach recognition
Database [6][14], Extended Yale Dataset [6][19], Color performance
FERET Database [6][14], FRGC Database [6], Facial 14 Various Method Performs well Difficult in
Bounding Box Extension [6], CASIAWebFace Database [8], Biometric in different detecting
LFW Dataset [8], YTF Dataset [8][10], BU-3DFE Database Techniques poses and faces in the
[9], IJB-A Dataset [10], CUHK03 Dataset [10], AR Database variations distance
[14], CAS-PEAL Face Database [14], CK Database [14], 18 Linear Method Blend into a Overfitting
Blending single image image.
Georgia Tech Database [5], Face94 Database [23], and other
self-made dataset images are applied in the collected research Algorithm Generate Low speed
paper. 3,17,18 LBP histogram and of process
vectors
We discovered several different methods that were used in Algorithm Create output Low
the face training process and applied them to Table IV. On 9,20,21 PCA (subspace) accuracy on
paper [6], the VGG-Face model is proven as a good , 22,23 store training the side
transferability by displaying better robustness against pose information. faces.
variations by subtracting each image. The Mahalanobis Algorithm Classify
26 ,27 LDA alignment at
distance method is applied in paper [11] to determine the eye level, less
degree of similarity between features in order to produce an human error
optimal face recognition result. Another method is proposed
in the paper [12][15] to learn a lower dimensional effective
metric space where images are represented by points and the 3. Face Recognition
images of the same class are clustered together. Feature base, Different algorithms are used for the recognition process
holistic, and hybrid approaches are used to classify and train shown in Figure 2. The algorithm used for the papers are Haar
the tested face. While feature base segments local features like Cascade algorithm, Local Binary Pattern (LBP), Local
eyes, nose, and lips, holistic is the whole face of the person, Binary Pattern Histogram (LBPH), OpenCV library,
and hybrid is combination of feature base and holistic Convolutional Neural Network (CNN), Principal Component
approach [3][13]. Analysis (PCA) and Deep Learning. The accuracy of the
Analyzing age, gender, and other facial conditions, the algorithms are based on how precisethe system is to recognise
paper [14] trains various biometrics techniques to give an the faces of the dataset and matching it. Therefore, for each
optimal accuracy result. Using JavaScript Object Notation algorithm we have calculated the percentage of accuracy for
(JSON), thepaper [16] uses this method to store and train data each algorithm in different conditions.
where the data form can be any type of form. According to the We have conducted that Viola-Jones algorithm results
paper [20][21][22][23], the research performs calculation of with the highest accuracy rate in recognising faces of its
eigenvalues and eigenvectors using PCA as a subspace. The dataset. Where its total accuracy of paper [3][4][16] is

304 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

81.08%. The second accurate algorithm is used in paper 9 2D and 3D PCA algorithm, 88.54%
[15][16]. Using deep learning recognition to recognise faces extraction using SVM model
SVM model
to train its dataset, obtaining a 74.71% in accuracy.
10 VGG-Face. VGG-Face 77.7% False
Subsequently, Haar-Cascade algorithm and Principal with CNN Acceptance Rate
Component Analysis (PCA) algorithm comes to a close model
percentage in it accuracy, obtaining a 53.14% and 52.07% in 11 Viola-Jones Mahalanobis 90% - 96%
its accuracy used in paper [1], [2], [13], [26], [27], [28], algorithm distance on the face
[29] for Haar-Cascade algorithm and [11], [19], [20], [21], method, CNN extraction using
model PCA and
[22],[23], [24], [25], [14] for Principal Component Analysis CNN-PCA
(PCA) algorithm. algorithm
produces an
According to the paper [17][18], applying the LBP accuracy between
algorithm due to the high data accuracy in a real-time 90% - 98%.
environment only results in 49.5% due to the environmental 15 Deep Learning. CNN model. 95.02%
factors affecting the image. LDA algorithm follows on with a 16 Deep Learning Euclidean 95%-100%
48.88% that has been implemented in paper [19] [26] [30]. (Histogram of distance, CNN as the highest
In line with the total papers researched, Convolutional Oriented Gradient). model accuracy and 70%
as the lowest
Neural Network (CNN) is the second most used algorithm accuracy
under Principal Component Analysis (PCA). Adding up to 19 LDA algorithm Eigenvector, 74.4%
only 32.69%, which results in the second least error-free Eigenvalues of
algorithm according to our research based on paper covariance
[5][6][7][8][9][10][11][12]. Another algorithm conducted matrix, and
from paper [17], [18], [26], [29] which uses LBPH algorithm Euclidean
distance
does not result in accuracy percentage, zero percent in classifier.
accuracy as it has notbeen stated in the papers. 20 Cascade classifier. PCA 96.3%
algorithm and
Accuracy Eigenface
Table V shows the accuracy compare to each others. classifier
21, 22 PCA algorithm Haar cascade 98.7% detection
TABLE V. RESULT COMPARISON classifier. rate and 95%
recognition rate
Refer process Accuracy for frontal face
ences Face detection Face 24 Haar cascade OpenCV 65%, 86%,
recognition classifier library and and 69% for left
1 Haar Cascade LBP and 92% Python. direction, frontal,
classifier and LBPH and right direction
AdaBoost algorithm 28 Haar-cascade LDA and 97%
algorithm classifier. Euclidean
2 Haar Cascade with - distance
3 weak classifiers: 98.01%
skin tone, eye, and Positive 30 Viola-Jones Logistic True Positive Rate
mouth Predictive value algorithm. Regression, and True Negative
3 Viola Jones and LBP algorithm, 98.76% on LDA Rate of 97.44%
AdaBoost Feature Base, The Squeezenet algorithm, and and 99.87%.
algorithm Holistic, and model and KNN
Hybrid 99.41% on
Approach. the Resnet50 IV. DISCUSSION
model.
4 Viola Jones, - 96.97% highest Based on the papers we have reviewed, standard cameras
Frontal and 63.27% and web cameras can be used, and some other good cameras
CART classifier, lowest. used for facial attendance were the Raspberry Pi Camera
and [26][29], which utilizes graphic processing, and an IP camera
Profile Face
classifier.
for easy access from the computer [15][25]. Of all 9 papers
5 Batch CNN model 98.8% which mentioned their training method, 6 of them used the
normalization PCA algorithm by OpenCV [9][11][21][22][24][25]. For
process. detection and classification, the most popular method was
using Haar cascade classifiers
6 - CNN structure; 97.95% [2][17][20][24][26][27][28][29], followed by Viola-Jones
VGG Face
network model [3][4][11][30]. Meanwhile, the most used algorithms for the
and Lightened face recognition process are Local Binary Pattern/Local
CNN model. Binary Pattern Histogram [1][17][18][26][29] as well as
7 Bayesian Network. Bayesian Euclidean distance [13][15][16][19][28].
DCNN 98.1% on
model. the FKD database The highest accuracy an experiment has achieved was
and 100% on the
AT&T database
100% by Zafar’s group [7] using the Bayesian DCNN model
8 Bayesian Deep CNN model 1.44% and 2.06% from the AT&T database. The second highest result of
Network on 99.87% true negative rate was reached by Sunaryono’s group,
LFW and YTF which used LDA for the recognizing process [30]. Next, a
databases paper using face detection with Viola-Jones algorithm,
respectively

305 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

AdaBoost algorithm, and CNN model obtained 98.76% Technology and Intelligent Systems (ISRITI), December 2019, pp. 301-
accuracy [3]. An experiment got 98.7% accuracy for face 304.
detection using OpenCV yet the number fell to 95% after the [12] HL, D. G., Vishal, K., Dubey, N. K., & Pooja, M. R. “Face Recognition
based Attendance System,” in International Journal of Engineering
recognition process [21] with Haar cascade. Research & Technology (IJERT), vol. 9(6), June 2020.
V. CONCLUSION [13] Akanksha, Kaur, J., & Singh, H. “Face detection and Recognition: A
review,” in 6th International Conference on Advancements in
To improve the efficiency, accuracy, and security of the Engineering & Technology (ICAET-2018), Sangrur, February 2018.
attendance system, a lot of institutes have used face [14] Kumar, S., Singh, S., & Kumar, J. “A study on face recognition
recognition for recording students’ absence, but using various techniques with age and gender classification,” in 2017 International
algorithms. Therefore, this research was conducted to observe Conference on Computing, Communication and Automation (ICCCA),
May 2017, pp. 1001-1006.
and analyze all the reference papers methodology, procedures,
[15] Arsenovic, M., Sladojevic, S., Anderla, A., & Stefanovic, D.
and algorithms applied. After reviewing as many as 30 related “FaceTime—Deep learning based face recognition attendance
papers, we have discovered that OpenCV, Haar cascade and system,” in 2017 IEEE 15th International Symposium on Intelligent
Viola-Jones, and LBPH and Euclidean distance are the most Systems and Informatics (SISY), September 2017, pp. 000053-000058.
popular methods for training, detection or classification, and [16] Bhatti, K. L., Mughal, L., Khuhawar, F. Y., & Memon, S. A. “Smart
recognition respectively. However, the most accurate results Attendance Management System Using Face Recognition,” EAI
were obtained with Bayesian DCNN and LDA. Nevertheless, Endorsed Transactions on Creative Technologies, vol. 5(17), 2018.
the two methods have only been experimented by 1 paper for [17] Deeba, F., Ahmed, A., Memon, H., Dharejo, F. A., & Ghaffar, A.
“LBPH-based enhanced real-time face recognition,” Int J Adv Comput
each, and therefore, these methods have been proven to work Sci Appl, vol. 10(5), pp. 274-280, 2019.
well, but not precise enough to completely conclude our [18] Bah, S. M., & Ming, F. “An improved face recognition algorithm and
research. its application in attendance management system,” Array, vol. 5, pp.
100014, 2020.
REFERENCES [19] Vyas, R. A., & Shah, S. M. “Comparision of PCA and LDA techniques
[1] Ahmed, A., Guo, J., Ali, F., Deeba, F., & Ahmed, A. “LBPH based for face recognition feature based extraction with accuracy
improved face recognition at low resolution,” in 2018 international enhancement,” International Research Journal of Engineering and
conference on Artificial Intelligence and big data (ICAIBD), May Technology, vol. 4(6), pp. 3332-3336, 2017.
2018, pp. 144-147. [20] Ramadhani, A. L., Musa, P., & Wibowo, E. P. “Human face
[2] Cuimei, L., Zhiliang, Q., Nan, J., & Jianhua, W. “Human face detection recognition application using pca and eigenface approach,” in 2017
algorithm via Haar cascade classifier combined with three additional Second International Conference on Informatics and Computing
classifiers,” in 2017 13th IEEE International Conference on (ICIC), November 2017, pp. 1-5.
Electronic Measurement & Instruments (ICEMI), October 2017, pp. [21] Kar, N., Debbarma, M. K., Saha, A., & Pal, D. R. “Study of
483-487. implementing automated attendance system using face recognition
[3] Zulfiqar, M., Syed, F., Khan, M. J., & Khurshid, K. “Deep face technique,” International Journal of computer and communication
recognition for biometric authentication,” in 2019 International engineering, vol. 1(2), pp. 100, 2012.
Conference on Electrical, Communication, and Computer Engineering [22] Krishnan, M. G., & Balaji, S. B. “Implementation of automated
(ICECCE), July 2019, pp. 1-6. attendance system using face recognition,” International Journal of
[4] Mady, H. H., & Hilles, S. M. “Efficient real time attendance system Scientific & Engineering Research, vol. 6(3), pp. 30-33, 2015.
based on face detection case study “mediu staff”,” International [23] Abdullah, Manal, Majda Wazzan, and Sahar Bo-Saeed. "Optimizing
Journal of Contemporary Computer Research, vol. 1(2), pp. 21-25, face recognition using PCA," arXiv preprint arXiv:1206.1515, 2012.
2017.
[24] Puthea, K., Hartanto, R., & Hidayat, R. “The Attendance Marking
[5] Coşkun, M., Uçar, A., Yildirim, Ö., & Demir, Y. “Face recognition System based on Eigenface Recognition using OpenCV and Python,”
based on convolutional neural network,” in 2017 International Journal of Physics: Conference Series, vol. 1551(1), pp. 012012, May
Conference on Modern Electrical and Energy Systems (MEES), 2020.
November 2017, pp. 376-379.
[25] Adeniji, K., Awosika, O., Ajibade, A., & Onibonoje, M. “Exploring
[6] Ghazi, M. M., & Ekenel, H. K. “A comprehensive analysis of deep Internet of Thing on PCA Algorithm for Optimization of Facial
learning-based representation for face recognition,” in Proceedings of Detection and Tracking,” Review of Computer Engineering Research,
the IEEE conference on computer vision and pattern recognition vol. 6(2), pp. 76-83, 2019.
workshops, 2016, pp. 34-41.
[26] Shrivastava, K., Manda, S., Chavan, P. S., Patil, T. B., & Sawant-Patil,
[7] Zafar, U., Ghafoor, M., Zia, T., Ahmed, G., Latif, A., Malik, K. R., & S. T. “Conceptual model for proficient automated attendance system
Sharif, A. M. “Face recognition with Bayesian convolutional networks based on face recognition and gender classification using Haar-
for robust surveillance systems,” EURASIP Journal on Image and Cascade, LBPH algorithm along with LDA model,” International
Video Processing, vol. 1, pp. 1-10, 2019. Journal of Applied Engineering Research, vol. 13(10), pp. 8075-8080,
[8] Wang, H., Song, W., Liu, W., Song, N., Wang, Y., & Pan, H. (2018). 2018.
A Bayesian scene-prior-based deep network model for face [27] Siddiqui, M. F., Siddique, W. A., Ahmedh, M., & Jumani, A. K. “Face
verification. Sensors, 18(6), 1906. Detection and Recognition System for Enhancing Security Measures
[9] Jan, A., Ding, H., Meng, H., Chen, L., & Li, H. “Accurate facial parts Using Artificial Intelligence System,” Indian Journal of Science and
localization and deep learning for 3d facial expression recognition,” in Technology, vol. 13(09), pp. 1057-1064, 2002.
2018 13th IEEE International Conference on Automatic Face & [28] Prangchumpol, D. “Face Recognition for Attendance Management
Gesture Recognition (FG 2018), May 2018, pp. 466-472. System Using Multiple Sensors,” Journal of Physics: Conference
[10] Jones, M., & Kobori, H. “Improving face verification and person re- Series, vol. 1335(1), pp. 012011, October 2019.
identification accuracy using hyperplane similarity,” in Proceedings of [29] Yusof, Y. W. M., Nasir, M. M., Othman, K. A., Suliman, S. I.,
the IEEE International Conference on Computer Vision Workshops, Shahbudin, S., & Mohamad, R. “Real-time internet based attendance
2017, pp. 1555-1563. using face recognition system,” International Journal of Engineering
[11] Winarno, E., Al Amin, I. H., Februariyanti, H., Adi, P. W., & Technology, vol. 7(3.15), pp. 174-178, 2018.
Hadikurniawati, W., & Anwar, M. T. “Attendance System Based on [30] Sunaryono, D., Siswantoro, J., & Anggoro, R. “An android based
Face Recognition System Using CNN-PCA Method and Real-time course attendance system using face recognition,” Journal of King
Camera,” in 2019 International Seminar on Research of Information Saud University-Computer and Information Sciences, 2019.

306 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Big Data For Smart City: An Advance Analytical


Review
Evaristus Didik Madyatmadja Asnan Habib Munassar Sumarlin
Information Systems Department Information Systems Department Teknik Informatika
School of Information Systems School of Information Systems Institut Teknologi dan Bisnis Indonesia
Bina Nusantara University Bina Nusantara University Medan, Indonesia
Jakarta, Indonesia 11480 Jakarta, Indonesia 11480 [email protected]
[email protected] [email protected]

Agung Purnomo
Entrepreneurship Department
BINUS Business School Undergraduate
Program
Bina Nusantara University
Jakarta, Indonesia 11480
[email protected]

Abstract— The use of IOT over the centuries has evolved many workflow on building and maintaining a smart city to
on many ways, many people have developed various of be more integrated and function a lot more efficient.
method on using these technologies, it has even reached to Due to the rapid growth of urban population and
the point that it can even support the disaster response, urbanization, managing this urbanization solely depends on
even though not every system is perfect, there is always a not only hardware or other physical infrastructure that
room to improve trough maintenance and improving supports it, but also the availability of knowledge and
every development milestone. One of the biggest usages of communication provide by a good data management, and that
IOT are mainly on big companies, cities, or even just used is one of the reasons why big data is will play a big role on
on daily basis by everyone. One of the biggest usages of smart city development, but beforehand we need to know
these IOT technologies are on smart city. The what a big data is, big data is a collection of various data that
development of smart city require many information is commonly used for business and scientific purposes, these
from various department, since smart city is a big scope data are usually generated from text, videos, audios, images,
covering the daily basis of their citizen which covers many click streams, search queries, social networking interaction,
scope from energy, transportation, public services, logs, emails, science data, and mobile or web applications,
security, and many more, these massive traffic data that and these information are stored in a big database to store,
are gathered to ensure the productivity on smart city to manage, share and visualize the data with various database
keep running on daily basis, which is why on this paper tools such as SQL[3][4].
the author will be discussing on the challenges on
implementing smart city, the role of big data on smart
city, and an advance review on the systematic of the big
data system on the smart city. The author of this paper
hoped that the result of this research will benefit
government understanding on how to build an
appropriate system for smart city and its citizen.
Keywords— analytics, challenges, data analytics, smart city,
big data, data visualization, data mining, governance, IOT.

I. INTRODUCTION
Technologies has evolved rapidly, throughout the centuries,
and many of them has helped on enabling supports for smart Fig. 1. The landscape of smart city and big data technologies[1].
city, various digitalization that helps the main components of
smart city which are, adequate water supply, assured The data then collected and stored on a certain data center or
electricity supply, sanitation, including waste management, cloud-based database such as SQL, the size of data set affects
efficient urban mobility and public transport, affordable the programming model for processing required for the
housing, especially for the poor, robust IT connectivity and database, so the larger the data set then the programming
digitalization, good governance, especially e- governance model used for data analytics will be more complicated,
and citizen participation, and sustainable environment[1][2]. because processing large data sets with parallel algorithm is
Most of the technologies that re involved on the core element required to obtain value from the stored data, as you can see
of a smart city are hardware but that is now always the case, on Figure 1 on The landscape of smart city and big data
as a lot of use of software tools has helped the process of technology[5].

307 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Smart city plays a big role on transforming various field on is gathered from various resources are needed to keep the
human life, by taking parts on the improvement on health, needs of information keep running in the smart city,
education, energy, and transportation[6]. For example smart whether it is a real time information such as traffics or
city can make it so that public transport could be used more even information based on past data to calculate
efficient and accessible for the citizens of the city just like the anticipations that needed to be done based on past
computerize line on public train transportation, or the evaluation on the smart city[15][11].
TransJakarta which uses integrated database on scheduling 2. Security threads on smart city.
when will the transport departure and eases the payment Security threads does not only concern as security in
process by using fintech, not only that but the traffic general such as when there is an activity held by a certain
management has helped it so that latency that happens on organization publicly, or maybe car free day, and other
their scheduling are very unlikely to happened, and not only activity might require not so much effort on managing its
on a backend scale, but the technology of the transportation security.
itself just like the MRT make it faster for people to travel 3. Require big investment.
without polluting or and making the traffic in a certain city As we see on the previous point implementing a smart city
worst, but these are just the small example on how smart city required a large amount of investment, mainly investment
can change lives on certain cities[7][8]. on technology usage, which is why sometimes, some city
These transformation enables smart city to utilize the on developing or poor countries needs to consider a lot of
information that they obtain by using the big data analytics to factor before considering implementing a smart city, the
ease the daily life of the smart city, which improves the cultural influences are also an important consideration on
quality of life, and intelligent management of natural implementing smart city[11][16].
resources and city facilities by taking the advantages of Since implementing a smart city required a big investment
merging such technologies can also reduce cost and resource regional government cannot just solely relies on the
consumption so these other resource and cost can be allocated governmental budget on the development of smart city,
for other infrastructural development that the city needs[9]. but the involvement of other private organizations, and
Even though implementing smart city might sounds very other business organization are required successfully
prospering and very advantageous, not every city in the world implement a smart city and creating an ideal
can really implement this easily because mainly transforming environmental ecosystem that is integrated and
a traditional city into a smart city also mainly depends on the sustainable[11].
living standards of the citizen itself, it might be easy to 4. The availability of resources required for
implement this on big cities with many citizen that have quite infrastructural development.
high standards for their living, but if looked on a lower scale The infrastructural development of ITC from
where the citizen life standards are lower it would be more communication channel for sensors and actuators in fixed
challenging to transform it into a smart city[10]. physical space is consider challenge on taking initiative
II. LITERATURE REVIEW for smart city development, the development of IT
infrastructure is one of the key elements on connecting the
A. Challenges on Designing and Implementing smart city infrastructural development on smart city, by integrating
Designing and implementing a smart city from the regular the system’s information on every parts of the city. A
traditional city is not as easy as it seems as the writer had well-developed infrastructural IT is an important part on
mentioned before implementing smart city might seems to be the development of smart city[11].
convenient and could really eases the way of life in a certain 5. Adapting with social, environment, and culture of the
city, but in reality it is not that easy to implement one, city.
especially on a city that wants to implement certain aspect of A smart city could seem like an ideal solution on solving
what makes a city a smart city but doesn’t quite meet the various problems faced by the city, such as urbanization,
criteria from their citizen standard living[11][12]. But it natural disaster resilience, traffic management, and many
doesn’t just stop there the problem isn’t just appear when we other city problems that could be solved with the right
are going to implement or transform it, even when we manage usage of technologies, but in reality not many cites can
to get the acceptance and the willingness to cooperate form just implement a smart city easily, there are many other
the citizen of the city, we still need to figure out how to cultural and environmental factors that must be
maintain it and how can we make it better for the future of considered on that smart city, for example if a smart city
this smart city, such as future opportunities and development wanted to implement a more modern grocery store that
of ITC that could help the growth of that city, as well as future operated cashier less, might seems very convenient, and
thread that could endanger the city, whether it is a natural very innovative, but that isn’t always the case, there are
disaster, economic problems, or even human made disaster also small private / family business in that city that could
such as flood etc. on this section we will be talking about what be affected economically by this changes, thus resulting
makes a smart city a smart city, how to implement it and what on the increasing number of more unemployed people,
are the challenges on implementing and designing a smart since the store also doesn’t need a cashier which make it
city[13][14]. And those challenges are: less costly but at the same time it makes people lose their
1. The information and technology availability. job, it is important for the government to take concern on
For a city that depends on the technology usage and this challenge when implementing new technologies and
integrated information and technology whether it is on a or policies on smart city[11]
form of software or hardware is important to keep the 6. Application development.
activity running in that smart city. An integrated data that

308 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

The growth of applications such as mobile applications development which provides us with a clear insight on
that support the citizen’s living takes part on the growth the knowledge from the big data[19].
on smart city, because the data acquired on the The outmost important factor on urbanization is how to
applications itself could help on the development of comprehend the demand with profiling services to
governmental services to run more efficiently by using achieve enhanced efficiency to the recent advancement
these data accurized from the analytical traffic on these of the environment. The main challenge on urban
applications, and one of the most common applications planning on smart city by using IOT and big data
usage that supports smart city are online transportation analytics is to achieve a synchronize link between the
that uses mobile phone as its media, which makes it very smart city system and the IOT information, by providing
convenient for anyone to use, and the governance could relay nodes, aggregation classifiers, etc. and all of this is
use data based on this application to support the generate an abundant data on high speed, which is
development of public transportation on smart city to termed as Big Data.in order to process these data
make it more convenient to use and less costly and not efficiently, the Hadoop system that is used on this
only that as these mobile application support smart city in architecture on how the sensors are deployed to generate
this way, but these mobile applications also help smart data, the proposed architecture is an example on what the
city define itself as smart city by making the daily life of database architecture that is used on urban planning
many citizen of the city become more convenient and implementation with big data.
easier which is the very definition of smart city by using On urban planning using IOT, the data from all system
technology to make lives become more convenient and are collected and aggregate into one aggregation server,
runs more efficiently on a city[11]. since the data is received on high speed therefore the
process is aggregate enough for it to be analyzed through
B. The role of big data on smart city
the IOT system. In urban planning we use past data
The usage of technology on smart city took many forms either generated from the same IOT device for smart city IOT
if it’s a hardware or software, but the most influential one is services to do planning on future planning regarding any
on the software usage, specifically the uses of data, as the development on the city. By analyzing these data for
writer had mention on the challenges on smart city the next example data on electricity consumption from previous
part of this paper is going to discuss how big data plays big years to predict and match future demands to be
role on the smart city, since smart city runs multiple activities fulfilled[20].
on their daily basis, the required data that needs to be 2. Smart transportation using big data.
integrated and managed is huge as well, and that’s where big The development of smart city involves various of IOT
data takes part on this matter, using big data and the software based related project on improving the quality of life on
for it we can easily access multiple data from multiple users, that city, the effectiveness of the governance
on real time to fully utilize the data gathered and stored on performance greatly affects the smart city, big data as a
the database [17]. According to multiple journals that the technology plays an important on building a smart
writer has gathered here are various cases on how big data transportation. For example, in India the Aadhar cards
plays a big role in smart city[18]. area one example of smart transportation using big data
1. Urban Planning and building smart city using IOT technology. By using this card governments can
and big data analytics. aggregate data on their citizens, and it can be extended
The increasing demand of IOT used to increase the into various services, based on intervention and
growth and development of smart city has increased as feedbacks so that the governance can improve their
the time progress, especially on urban planning on smart services. This Aadhar card consist a 12 – digit unique
city, demands on the improvement of infrastructural, and identity number that is obtained voluntary by the resident
services provide on smart city. Big data and IOT plays a of India or passport holder[21][22].
big role on urban planning especially on smart city, this As for the framework on smart transportation on smart
uses of technology helped smart city to completely city, it consists of multiple layers of collected
implement the ten urban planning humanitarian which information that will be used for processing and decision
consist, engaging the community, efficiently uses data, making. The data that are stored in the data layer are
opportunities come from overlap, place matters, design collected from multiple sources and then stored for
mater because place matter, politics persist, civil society analysis. The data then fetched by sing multiple tools on
plays a big role, be inclusive, be visionary, and have a parallel computing environment so that it could be
long-term plan[19]. accessed at real – time, and then once the data are
The fact that urban planning uses multiple completed inference it will be forwarded onto
technologies as well as the involvement of the internet, communication layer for publishing on real - time
such system required large number of devices that are throughout media such as internet, mobile, radio, and or
communicating with each other on real time which television[23][24].
generates a large amount of data, and to manage these 3. Using big data to enhance crisis response and
massive data, is where big data takes place, by using disaster resilience for smart city.
analytics big data we can enrich the smart technologies Technology has grown to the point that we it can even be
used in the urban planning on smart city. Big data used to predict ay potential thread on crisis and disaster
analysis helps us get a better understanding towards that would be happen on that city. Even though science
useful information that will be used for planning and and technology has evolved tremendous enough to
handle this problem, there is no such thing as a perfect

309 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

system there bound to be flaw and limit into it therefore governance, and then the second searching was for the
a sustainable and well-developed disaster management methodology which is the Systematic Literature Review, and
become one of the most problematic worldwide problem. the total of gathered paper are 29 in total which then examine
After that, all these data source is later supported by then noted manually, to determine the relevance of these
multiple engine and tools which are, data mining engine, paper for this topic.
stream processing engine, search engine, video analysis
B. Candidate Studies
engine, audio analysis engine, and text analysis
engine[25]. As a result, all 29 paper were chosen based on their
studies which covers an advance part on using big data for
4. Real time secure communication for smart city in smart city, because on this paper well be covering an advance
high-speed data. review which also shows how these big data is implemented,
Since many activities on smart city involved the use of a framework on the big data is also required for this paper.
IOT, it is important to have the right information at the C. Selected Studies
right time based on the current city scenario, but these
The selected articles that are relevant to this topic meet
data mainly are collected form sensors, and devices that
are connected through the internet. And some of this one of the required criteria which are:
information that the smart city has includes information • The research focuses on the challenges of
on all the people, houses, daily routine, video from implementing a smart city.
surveillance and many other. Which is why it is very • The research is conducted towards big data and or
crucial for a smart city to have a secure communication smart city related issues.
on obtaining these data. It is the responsibility of the city • The research involves frameworks on how big data
to take care of the right and privacy of their citizen data takes part in smart city development.
while designing and developing a smart city. • The published articles are between the year 2016 –
Since these data are transmitted forma public link all 2019.
over the internet, the intermediate adversary can capture • The overall research is a mixed of latest and past
the data and do anything to the data as well as observing research related to big data, smart city, and IOT.
the activity on the data, with this we can observe on any
suspicious activity regarding these data since the
authorization of accessible data are only given to certain
people responsible for it any other party that is accessing
the data will automatically be tracked and given limited
access towards the given data so that it won’t be
accessible to outer party. All of these are done so that we
can secure the overall data management activities on the
smart city[26].

III. RESEARCH METHODOLOGY


The method for identifying and reviewing on big data for
smart city role in an advance analytical review will be done Fig. 2. Searching strategy for Systematic Literature Review.
using the Systematic Literature Review, by selecting various
multiple research that is related, and would support to the TABLE I. NUMBER OF STUDIES IN SELECTED SOURCES
research that the author is conducting, by identifying and Source Studies Candidate Selected
gathering previous studies on the topic of this paper. The aim Found Studies studies
of this study is to fully understand and analyze on what is a
big data and smart city, the role of big data on smart city, and Google 12 12 12
an advance review on how big data plays a big role on smart Scholar
city. There are 29 journals selected for this paper which are Science 12 12 12
gathered from Google Scholar, IEE, and Science Direct. Most Direct
of the selected paper are the ones with IEE standard and most IEE / ACM 1 1 1
of them are from Google Scholar, since it provides the right IEE 4 4 4
paper that would support the research on this paper International
efficiently. Most journals on this paper would concern on conference
topics related on Information Systems, Computer Science, Total 29 29 29
and Cyber Security studies[27].
IV. RESULT AND DISCUSSION
D. Participation Factor
A. Study Found
The participations of these paper are mainly matched
In order to gathered all of the required journal that would based on the paper title, the gathered papers are then
be used for this paper, the first searching was by using categories on their relevance on certain field of this paper, the
keywords such as, Big data, Smart city, Big data analytics, E- participation factor for this paper are mainly on the very

310 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

definition of smart city, and big data, the challenges on 2 Big data for Governance • CDR
implementing, a framework example on how the big data governance • Big Data
system is implemented with various tools from IOT[28][29]. policy. • MPD
• Mobile
software
V. CONCLUSION 3 Microservice Transportation • ITS
oriented big data • HDFS
The word smart from smart city has a meaning which means
(MOBDA) for • Hardware
it is a city that is smart because it has technology involvement smart transport. such as
on its structure, a smart city is not complete without the
CCTV,
participation of IOT on its development and implementation. Traffic lights
To achieve this big data plays a big role on the development etc.
of smart city, and from this paper we can conclude that big 4 Real-time secure Security • ICB
data can be used on various ways based on the literature communication • IDS
review, from urban planning, using highspeed data for secure using big data. • ACL
communication, disaster and crisis resilience, transportation • Big data
management and many more. On this paper we also learned • Software
that implementing smart city has it challenges that we need applications
to overcome which concludes funding, implementing 5 Real-time smart Transportation • CDM
technologies, as well as looking over factors that could affect traffic • Big data
the implementation such as social and environmental factors, management • Mobile
and many more which are shown on the table below. using big data application
and IOT. • GPS
• PDC
TABLE II. LIST OF FACTORS 6 Urban planning Governance • Big Data
and building (Hadoop)
smart city using • HDFS
No. Factors Factor’s categories big data • SQL
analytics and • HDFS
1 Knowledge on IOT Technological IOT • IOT
2 System development Technological hardware
knowledge and software
3 Social commitment Social
4 Saliency Culture The author of this paper hopes that the future research could
5 Information system Technological be supported by the research that the author is conducting and
quality could also help the governance to understand the factors that
6 Security and privacy Technological could affect the development of smart city, as well as a clear
7 Accessibility Technological framework example on how smart city is supported by Big
8 Public service Behavioral data. For the future research it is advisable to understand the
awareness dimension of various uses of big data on the developing smart
city as well as the clear framework on how the system itself
9 Environmental Behavioral
works, the quality of gathered information influence any
influence
public participations[30].
10 Governance Political
11 Traffic management Technological REFERENCES
12 Challenges and
strategy [1] I. A. T. Hashem et al., “The role of big data in smart city,” Int. J.
13 Knowledge on big Technological Inf. Manage., vol. 36, no. 5, pp. 748–758, 2016, doi:
10.1016/j.ijinfomgt.2016.05.002.
data technologies
[2] K. Soomro, M. N. M. Bhutta, Z. Khan, and M. A. Tahir, “Smart
14 Public service quality Technological city big data analytics: An advanced review,” Wiley Interdiscip.
15 Trust Behavioral Rev. Data Min. Knowl. Discov., vol. 9, no. 5, pp. 1–25, 2019, doi:
16 Attitude Behavioral 10.1002/widm.1319.
[3] M. Batty, “Big data, smart cities and city planning,” Dialogues
17 Technology quality Technological Hum. Geogr., vol. 3, no. 3, pp. 274–279, 2013, doi:
10.1177/2043820613513390.
TABLE III. THE USAGE OF BIG DATA ON SMART CITY [4] A. F. C. Santos, Í. P. Teles, O. M. P. Siqueira, and A. A. de
Oliveira, “Big data: A systematic review,” Adv. Intell. Syst.
Comput., vol. 558, pp. 501–506, 2018, doi: 10.1007/978-3-319-
No. Smart city Field Tools/technology used 54978-1_64.
Feature /Issues [5] Z. Khan, A. Anjum, and S. L. Kiani, “Cloud based big data
1 Crisis responses environmental • IOT analytics for smart future cities,” Proc. - 2013 IEEE/ACM 6th Int.
Conf. Util. Cloud Comput. UCC 2013, pp. 381–386, 2013, doi:
and disaster • Big Data 10.1109/UCC.2013.77.
resilience. (Hadoop) [6] B. Cheng, S. Longo, F. Cirillo, M. Bauer, and E. Kovacs,
“Building a Big Data Platform for Smart Cities: Experience and
Lessons from Santander,” Proc. - 2015 IEEE Int. Congr. Big Data,
BigData Congr. 2015, pp. 592–599, 2015, doi:

311 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

10.1109/BigDataCongress.2015.91. Data analytics,” Comput. Networks, vol. 101, no. 2016, pp. 63–80,
[7] A. Parlina, H. Murfi, and K. Ramli, “Smart city research in 2016, doi: 10.1016/j.comnet.2015.12.023.
Indonesia: A bibliometric analysis,” 2019 16th Int. Conf. Qual. [20] International Rescue Committee, “Humanitarian Action in a New
Res. QIR 2019 - Int. Symp. Electr. Comput. Eng., pp. 1–5, 2019, Urban World: The IRC’s role in Improving Urban Crisis
doi: 10.1109/QIR.2019.8898264. Response,” no. August, 2015, [Online]. Available:
[8] R. A. Alshawish, S. A. M. Alfagih, and M. S. Musbah, “Big data https://www.rescue-uk.org/sites/default/files/IRC Urban position
applications in smart cities,” Proc. - 2016 Int. Conf. Eng. MIS, paper WHS Budapest PRINT.pdf.
ICEMIS 2016, pp. 1–7, 2016, doi: [21] S. P. R. Asaithambi, R. Venkatraman, and S. Venkatraman,
10.1109/ICEMIS.2016.7745338. “MOBDA: Microservice-oriented big data architecture for smart
[9] I. Vilajosana, J. Llosa, B. Martinez, M. Domingo-Prieto, A. city transport systems,” Big Data Cogn. Comput., vol. 4, no. 3, pp.
Angles, and X. Vilajosana, “Bootstrapping smart cities through a 1–27, 2020, doi: 10.3390/bdcc4030017.
self-sustainable model based on big data flows,” IEEE Commun. [22] S. Shukla, K. Balachandran, and V. S. Sumitha, “A framework for
Mag., vol. 51, no. 6, pp. 128–134, 2013, doi: smart transportation using Big Data,” Proc. 2016 Int. Conf. ICT
10.1109/MCOM.2013.6525605. Business, Ind. Gov. ICTBIG 2016, pp. 1–3, 2017, doi:
[10] A. Herdiyanti, P. S. Hapsari, and T. D. Susanto, “Modelling the 10.1109/ICTBIG.2016.7892720.
smart governance performance to support smart city program in [23] P. Rizwan, K. Suresh, and M. Rajasekhara Babu, “Real-time smart
Indonesia,” Procedia Comput. Sci., vol. 161, pp. 367–377, 2019, traffic management system for smart cities by using Internet of
doi: 10.1016/j.procs.2019.11.135. Things and big data,” Proc. IEEE Int. Conf. Emerg. Technol.
[11] C. E. W. Utomo and M. Hariadi, “Strategi Pembangunan Smart Trends Comput. Commun. Electr. Eng. ICETT 2016, 2017, doi:
City dan Tantangannya bagi Masyarakat Kota,” J. Strateg. dan 10.1109/ICETT.2016.7873660.
Bisnis Vol.4, vol. 4, no. 2, pp. 159–176, 2016. [24] A. Sharif, J. Li, M. Khalil, R. Kumar, M. I. Sharif, and A. Sharif,
[12] C. Lim, K. J. Kim, and P. P. Maglio, “Smart cities with big data: “Internet of things - Smart traffic management system for smart
Reference models, challenges, and considerations,” Cities, vol. 82, cities using big data analytics,” 2016 13th Int. Comput. Conf.
no. August 2017, pp. 86–99, 2018, doi: Wavelet Act. Media Technol. Inf. Process. ICCWAMTIP 2017, vol.
10.1016/j.cities.2018.04.011. 2018-Febru, pp. 281–284, 2017, doi:
[13] M. Khan, M. Babar, S. H. Ahmed, S. C. Shah, and K. Han, “Smart 10.1109/ICCWAMTIP.2017.8301496.
city designing and planning based on big data analytics,” Sustain. [25] C. Yang, G. Su, and J. Chen, “Using big data to enhance crisis
Cities Soc., vol. 35, pp. 271–279, 2017, doi: response and disaster resilience for a smart city,” 2017 IEEE 2nd
10.1016/j.scs.2017.07.012. Int. Conf. Big Data Anal. ICBDA 2017, pp. 504–507, 2017, doi:
[14] A. Alsaig, V. Alagar, Z. Chammaa, and N. Shiri, “Characterization 10.1109/ICBDA.2017.8078684.
and efficient management of big data in IoT-Driven smart city [26] M. M. Rathore, A. Paul, A. Ahmad, N. Chilamkurti, W. H. Hong,
development,” Sensors (Switzerland), vol. 19, no. 11, pp. 1–29, and H. C. Seo, “Real-time secure communication for Smart City
2019, doi: 10.3390/s19112430. in high-speed Big Data environment,” Futur. Gener. Comput.
[15] W. Villegas-Ch, X. Palacios-Pacheco, and S. Luján-Mora, Syst., vol. 83, pp. 638–652, 2018, doi:
“Application of a smart city model to a traditional university 10.1016/j.future.2017.08.006.
campus with a big data architecture: A sustainable smart campus,” [27] A. Nightingale, “A guide to systematic literature reviews,”
Sustain., vol. 11, no. 10, 2019, doi: 10.3390/su11102857. Surgery, vol. 27, no. 9, pp. 381–384, 2009, doi:
[16] E. Okwechime, P. Duncan, and D. Edgar, “Big data and smart 10.1016/j.mpsur.2009.07.005.
cities: a public sector organizational learning perspective,” Inf. [28] J. Frith, “Big Data, Technical Communication, and the Smart
Syst. E-bus. Manag., vol. 16, no. 3, pp. 601–625, 2018, doi: City,” J. Bus. Tech. Commun., vol. 31, no. 2, pp. 168–187, 2017,
10.1007/s10257-017-0344-0. doi: 10.1177/1050651916682285.
[17] Madyatmadja, E.D., Gaol, F.L., Abdurachman, E., Pudjianto, [29] B. N. Silva, M. Khan, and K. Han, “Integration of Big Data
B.W. “Social media based government continuance from an analytics embedded smart city architecture with RESTful web of
expectation confirmation on citizen experience”. International things for efficient service provision and energy management,”
Journal of Mechanical Engineering and Technology, 9(7), pp. Futur. Gener. Comput. Syst., vol. 107, pp. 975–987, 2020, doi:
869–876, 2018 10.1016/j.future.2017.06.024.
[18] D. Pal, T. Triyason, and P. Padungweang, “Smart Cities under the [30] E. D. Madyatmadja, Meyliana, and H. Prabowo, “Participation to
Lens of Big-Data: State-of-Art Research and Challenges,” public e-service development: A systematic literature review,” J.
Indones. J. Electr. Eng. Informatics, vol. 6, no. 4, pp. 351–360, Telecommun. Electron. Comput. Eng., vol. 8, no. 3, pp. 139–143,
2018, doi: 10.11591/ijeei.v6i1.543. 2016.
[19] M. M. Rathore, A. Ahmad, A. Paul, and S. Rho, “Urban planning
and building smart cities based on the Internet of Things using Big

312 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Analysis of Big Data in Healthcare Using


Decision Tree Algorithm
Evaristus Didik Madyatmadja Antonius Rianto Johanes Fernandes Andry
Information Systems Department Information Systems Department, Information Systems Department,
School of Information Systems Faculty Technology and Design Faculty Technology and Design
Bina Nusantara University Universitas Bunda Mulia Universitas Bunda Mulia
Jakarta, Indonesia 11480 Jakarta, Indonesia 14430 Jakarta, Indonesia 14430
[email protected] [email protected] [email protected]

Hendy Tannady Aziza Chakir


Department of Management, Faculty of Law Economics and Social Sciences,
Institut Teknologi dan Bisnis Kalbis Economic Department
Jakarta, Indonesia 13210 Hassan II University
[email protected] Morocco 8110
[email protected]

Abstract—Era of technological developments, big data source of data has the potential to enhance
has been widely implemented in various any company understanding of disease mechanisms and better health
especially healthcare. Big data has opened up new gaps in care [3].
health care. Big data in healthcare has the potential to Big data point to wide and complex data sets that are
improve better healthcare. The effective use of Big data
outside the ability of classic database management
can reduce health care problems such as how to provide
proper care, maximum care solutions, and improve systems to put away, to manage and to process [4]. Big
existing systems of health care. There are 6 defining Data can develop various challenges in data retrieval, to
domains in Big Data, which are Vol., and etc. Big data transferred, encryption, storage, analysis and to make
represents a variety of opportunities to improve the visualization. Healthcare relies on medical data in the
quality and efficiency of healthcare. Big Data in healthcare decision-making process. Big data analytics can be used
need to expanded and explore utilize big data analytics to to achieve valuable information from all sorts of
gain valuable knowledge. Big data analytics is used to sources that are too large, raw, or unstructured in
catch value any information from all kinds of sources in healthcare [5]. BDA has the possibility to increase
healthcare that can be used to gain information for the
healthcare by find out associations and understanding of
purpose of better decision making in healthcare. Big data
analytics in healthcare has the prospect to increased design and trends in the medical record [6].
healthcare by discovering decision tree and understanding In others countries, the healthcare deals with huge
formats and trends in medical record data. volumes of electronic health data such as cardiovascular
Cardiovascular illness datasets is big data in healthcare disease [7]. Big Data analytics can be used to achieve
which is one or others resources in the health sector and is valuable information from large and complicated
used as part of facilitating the process of documenting datasets such as cardiovascular disease to improve
medical records that must be analysed to offer an effective medical treatment and healthcare [5]. Cardiovascular
solution to solve problems in healthcare. This paper disease is one of others the services in the healthcare
provides valuable information by using big data analytics
and is used as a part to facilitate the process of
from medical data cardiovascular disease to provide
effective solutions for the problems in healthcare and also documenting medical records [8]. Big medical data
provide how important big data for healthcare is. have a big analytical capability that can be used to
provide productive solutions to solve the problems in
Keywords— Healthcare, Analysis of Big Data, Medical the healthcare [9].
Records.
II. BACKGROUND STUDY
I. INTRODUCTION
A. Big Data
Right now, era of big data, information technology Big Data (BD) generally refers to the enormous
has been widely utilized in enterprise. Medical data are volumes of data that the usual data tools and practices
continuously explosively, there are challenges of the are not ready to handle and presents unprecedented
21st century for managed of data, repository, and opportunities to advance science and inform resource
tabulation [1]. An efficient data acquisition, to process management through data-intensive approaches, and big
and to consumption methodology has been a theme of data technologies are enabling new types of activism
great attention for decades over enterprise [2]. Rich environment in the process [10].

313 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

BD has the features of wide-scale, high multi- This dataset has 70K rows records of patient’s data
dimensional, diversity, more-complex, unstructured, not and 12 attributes of information about patients and the
complete, and crowded that makes it feasible to gather results of medical examination. The following are the
valuable data and information [11]. attributes of the cardiovascular disease datasets:
1. Age: factual information about patient’s age (in
B. Big Data Healthcare days).
Big healthcare data to covers collected large 2. Height: factual information about patient’s height (in
collections of data from any various healthcare cm).
foundations followed by stored, managed, analyzed, 3. Weight: factual information about patient’s weight
visualized, and delivered any information for effective (in kg).
conclusions. Big healthcare from primary sources such 4. Gender: factual information about patient’s gender (1
as clinical DSS, electronic health records etc.) And is a woman and 2 is a man).
secondary sources such as from laboratories, insurance 5. Systolic blood pressure: results of medical
firm sources, pharmacies, etc [12]. examination from patient’s systolic blood pressure (in
mmHg).
III. RESEARCH METHOD 6. Diastolic blood pressure: results of medical
Figure 1 shows the stages from research, firstly from examination from patient’s diastolic blood pressure (in
problem formulation, next step is data collection from mmHg).
kaggle, and then authors will analysis from that source, 7. Cholesterol: results of medical examination from
after that, make a report or visualizations, next step is patient’s indicated cholesterol (1 for N = Normal; 2 for
evaluation, when the data is ok, the stage is finish. AN = Above Normal; 3 for WAN = well above
normal).
8. Glucose: results of medical calibration from patient’s
glucose (1: normal; 2: above normal; 3: well above
normal).
9. Smoking: information given by the patient about
smoking (0: non-smoker; 1: smoker).
10. Alcohol intake: information given by the patient
about alcohol intake (0: not an alcohol drinker; 1:
alcohol drinker).
11. Physical activity: information given by the patient
about physical activity (0: no physical activity; 1: have
physical activity).
12. Presence or absence of cardiovascular disease:
information from analytics using decision tree
algorithm about the presence of cardiovascular disease
Fig. 1. Research Stages [13]
(0: absence; 1: presence).
Preprocessing phase, will transform raw data into a
IV. RESULT AND DISCUSSION
useful and efficient format [14]. Authors explore our
For Massive amounts of data, driven by record cardiovascular disease datasets to cleaning data. In this
keeping, regulatory compliance & requirements, and phase, and identify missing attributes and blank fields,
patient care are generated by healthcare. Healthcare cleaning or replacing missing values, duplicate or
used big data analytics to analyze data to get valuable wrong data, and inconsistent data [15]. Examine the
information to improve healthcare performance. data for completeness, correctness, and consistency.
RapidMiner is software that authors use to analyze Problematic data that has not been identified and
data with algorithms to get useful information for analyzed can produce misleading results. This phase is
healthcare. Authors analyzed the data using a important to produce good results by processing the
classification method and a decision tree algorithm to analyzed data
classify cardiovascular disease which can be used as After preprocessing phase, then put the data into
predictive and prescriptive analytics. Predictive analysis RapidMiner to start the analytics. Before authors doing
predicts what might happen in the future and analytics to the data, set the attribute type based on data
prescriptive analysis recommends actions that can be type, then have to set the attribute type correctly in
taken to act on these results. order to produce correct results. Authors will analyze
Authors used Cardiovascular disease datasets that the data using decision tree algorithm in RapidMiner to
were collected at the moment of medical examination in do analytics. Decision tree algorithm is used to classify
this research. There are 3 types of features in this presence or absence of cardiovascular disease. First,
datasets: process the data using decision tree algorithm to
1. Goal: Factual any data and information generate the rules and decision tree and then analyze the
2. Calibrations: Outcome of medical results. Figure 2 shows the result of decision tree.
3. Subjective: Information any specified by the patient

314 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Fig. 2. Decision Tree Results

Figure 2 shows the result of decision tree and the | age ≤ 23681
hidden pattern that generated from cardiovascular | | height > 58
disease datasets using decision tree algorithm in | | | weight > 21.500
RapidMiner. Decision tree algorithm provide decision | | | | ap_hi > -130
tree to find the classification rules from the data. In that | | | | | ap_hi > 129.500: 1 {0= 3719, 1= 5541}
decision tree, the root node or predictor is ap_hi | | | | | ap_hi ≤ 129.500
(systolic blood pressure), internal nodes are the other | | | | | | age > 22269
attributes that contains in the data and the leaf node is | | | | | | | weight > 38.500
the cardio. The result of decision tree from | | | | | | | | height > 191.500: 0 {0= 3, 1= 0}
cardiovascular disease is used to explain or understand | | | | | | | | height ≤ 191.500: 1 {0= 1853, 1=
the result from classification based from the other 2382}
attributes as the root node and internal nodes that | | | | | | | weight ≤ 38.500: 0 {0= 4, 1= 0}
determine. | | | | | | age ≤ 22269: 0 {0= 26300, 1= 10776}
The advantage of decision tree is that can be | | | | ap_hi ≤ -130: 1 {0= 0, 1= 2}
explained as a rules or description from decision tree. | | | weight ≤ 21.500: 1 {0= 0, 1= 2}
Rules or description from decision tree is an if-else | | height ≤ 58: 1 {0= 0, 1= 2}
statements. Following are the rules from the decision
tree: These rules are generated from the decision tree
ap_hi > 138.500 starting from the root node or predictor until the leaf
| ap_lo > 5 node. These rules give a clear analytical view of the
| | height > 190.500: 1 {0= 0, 1= 20} result from the decision tree. Data can understand all
| | height ≤ 190.500 the process from the decision tree using these rules.
| | | weight > 168.500: 1 {0= 0, 1= 9} After the decision tree is generated from the
| | | weight ≤ 168.500 cardiovascular disease datasets using RapidMiner, and
| | | | weight > 166.500: 0 {0= 1, 1= 1} test for the performance. Next are performing
| | | | weight ≤ 166.500 performance testing to determine whether the analyzed
| | | | | weight > 158.500: 1 {0= 0, 1= 9} data is accurate or not. Table 1 shows the result from
| | | | | weight ≤ 158.500 performance testing.
| | | | | | weight > 155.500
| | | | | | | age > 19339: 1 {0= 0, 1= 2} TABLE I. RESULT OF PERFORMANCE TESTING TOPIC
| | | | | | | age ≤ 19339: 0 {0= 3, 1= 0} Accuracy: 72.17%
| | | | | | weight ≤ 155.500: 1 {0= 3130, 1= true 0 true 1 class precision
16225} pred. 0 26319 10780 70.94%
| ap_lo ≤ 5: 0 {0= 8, 1= 3} pred. 1 8702 24199 73.55%
Class recall 75.15% 69.18%
ap_hi ≤ 138.500
| age > 23681: 1 {0= 0, 1= 5}

315 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Figure 2 is the result from the performance testing of V. CONCLUSION


classification of the cardiovascular disease datasets in In this research, authors used classification methods
RapidMiner. The performance testing shows the level with decision tree algorithm for cardiovascular disease
of accuracy, class precision and class recall. Authors datasets. The results are the decision tree and rules that
obtained 72.17% for the level of accuracy in classify availability of inclined cardiovascular illness.
classification. The class precision for prediction 0 or Test for the performance to determine whether the
Absence is 70.94% and the class precision for analyzed data is accurate or not. Get obtained 72.17%
prediction 1 or Presence is 73.55%. For the class recall, for the level of accuracy in classification. The class
obtained 75.15% for true 0 or Absence and 69.18% for precision for prediction 0 or Absence is 70.94% and the
true 1 or Presence. The measured level of accuracy, class precision for prediction 1 or Presence is 73.55%.
precision and recall achieved a high performance. This Class recall, obtained 75.15% for true 0 or Absence and
means that a classification improved the efficiency and 69.18% for true 1 or Presence. The analyzed data can be
effectiveness for cardiovascular disease. used as predictive and prescriptive analytics.
Big data analytics with classification method using Big data analytics with classification method using
decision tree algorithm for cardiovascular disease can decision tree algorithm for cardiovascular disease can
improve the efficiency and effectiveness. The decision improve the efficiency and effectiveness for healthcare.
tree provides us explanation for the hidden pattern from Healthcare can predict a patient who has cardiovascular
the data can understand the information that gets from disease and provide preventive care to patient. With the
the data using decision tree algorithm. Next authors can help of big data, healthcare has the opportunity to
analyze the results from classification using decision improve better healthcare, such as improved preventive
tree as predictive or prescriptive analytics. Healthcare care, improved diagnostic symptoms, and reducing
can predict a patient who has cardiovascular disease and healthcare cost.
provide preventive care to patient. Big data in In further research, classification methods using
healthcare is important; here is the opportunity for decision tree algorithm can be used for another datasets
healthcare that implemented big data: and can be developed by combining or comparing using
1. Improved Preventive Care other classification algorithms to get better results.
Big data analytics using medical data in healthcare can
improve prevention for their patient. With big data REFERENCES
analytics, healthcare can capture, analyze and compare
[1] Y. Zhang, M. Qiu, C. W. Tsai, M. M. Hassan, and A. Alamri,
patient symptoms. Healthcare with improved preventive “Health-CPS: Healthcare cyber-physical system assisted by
care can treat the patient well and prevent or delay the cloud and big data,” IEEE Syst. J., vol. 11, no. 1, pp. 88–95,
illness and disease. As in the research, then classify the 2017, doi: 10.1109/JSYST.2015.2460747.
cardiovascular disease so the healthcare can know their [2] T. Schultz, “Turning healthcare challenges into big data
opportunities: A use-case review across the pharmaceutical
right treatment for their patient more effective and development lifecycle,” Bull. Am. Soc. Inf. Sci. Technol., vol.
efficient. 39, no. 5, pp. 34–40, 2013, doi: 10.1002/bult.2013.1720390508.
2. Improved Diagnostic Symptoms [3] N. V. Chawla and D. A. Davis, “Bringing big data to
By doing big data analytics, healthcare improved their personalized healthcare: A patient-centered framework,” J. Gen.
Intern. Med., vol. 28, no. SUPPL.3, pp. 660–665, 2013, doi:
diagnostic symptoms for their patient. A diagnostic 10.1007/s11606-013-2455-8.
symptom is a process to determine patient with disease. [4] R. Nambiar, R. Bhardwaj, A. Sethi, and R. Vargheese, “A look
Improved diagnostic symptoms using the knowledge at challenges and opportunities of Big Data analytics in
collected from the hidden pattern of patient’s data. With healthcare,” Proc. - 2013 IEEE Int. Conf. Big Data, Big Data
2013, pp. 17–22, 2013, doi: 10.1109/BigData.2013.6691753.
improved diagnostic symptoms, healthcare can [5] L. Wang and C. A. Alexander, “Big Data Analytics in
diagnose their patient more effective and efficient. As in Healthcare Systems,” Int. J. Math. Eng. Manag. Sci., vol. 4, no.
the research, can classify the cardiovascular disease that 1, pp. 269–276, 2018, doi: 10.1109/ICoAC44903.2018.8939061.
has various attributes that determine the disease so the [6] W. Raghupathi and V. Raghupathi, “Big data analytics in
healthcare: promise and potential,” Heal. Inf. Sci. Syst., vol. 2,
healthcare can diagnose their patient more accurately no. 1, pp. 1–10, 2014, doi: 10.1186/2047-2501-2-3.
from their symptoms. [7] U. Srinivasan and B. Arunasalam, “Leveraging big data
3. Reducing Healthcare Cost analytics to reduce healthcare costs,” IT Prof., vol. 15, no. 6, pp.
Big data can help reduce the cost of providing medical 21–28, 2013, doi: 10.1109/MITP.2013.55.
[8] A. Prasad and S. Prasad, “Imaginative geography, neoliberal
treatment. BDA for healthcare can carry out valuable globalization, and colonial distinctions: Docile and dangerous
about any information to improve their system by the bodies in medical transcription ‘outsourcing,’” Cult. Geogr., vol.
medical record data. With analyzed data, their improved 19, no. 3, pp. 349–364, 2012, doi: 10.1177/1474474012445734.
medical treatment can analyze and diagnose their [9] M. Bouhriz and H. Chaoui, “Big data privacy in healthcare
moroccan context,” Procedia Comput. Sci., vol. 63, pp. 575–
patient more accurately, effective and efficient. With 580, 2015, doi: 10.1016/j.procs.2015.08.387.
more accurately, effective and efficient healthcare, [10] S. S. Hasan, Y. Zhang, X. Chu, and Y. Teng, “The role of big
patient will pay less and get correct treatment than the data in China’s sustainable forest management,” For. Econ.
ordinary medical treatment. Rev., vol. 1, no. 1, pp. 96–105, 2019, doi: 10.1108/fer-04-2019-
0013.
[11] C. W. Tsai, C. F. Lai, H. C. Chao, and A. V. Vasilakos, “Big
data analytics: a survey,” J. Big Data, vol. 2, no. 1, pp. 1–32,
2015, doi: 10.1186/s40537-015-0030-3.

316 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

[12] S. SA, “Big Data in Healthcare Management: A Review of


Literature,” Am. J. Theor. Appl. Bus., vol. 4, no. 2, p. 57, 2018,
doi: 10.11648/j.ajtab.20180402.14.
[13] D. P. Acharjya and K. A. P, “A Survey on Big Data Analytics:
Challenges, Open Research Issues and Tools,” Int. J. Adv.
Comput. Sci. Appl., vol. 7, no. 2, pp. 511–518, 2016, doi:
10.26438/ijcse/v6i6.12381244.
[14] Madyatmadja, Evaristus Didik, and Mediana Aryuni. 2014.
“Comparative Study of Data Mining Model for Credit Card
Application Scoring in Bank.” Journal of Theoretical and
Applied Information Technology 59 (2): 269–74.
[15] Aryuni, Mediana, and Evaristus Didik Madyatmadja. 2015.
“Feature Selection in Credit Scoring Model for Credit Card
Applicants in XYZ Bank: A Comparative Study.” International
Journal of Multimedia and Ubiquitous Engineering 10 (5): 17–
24. https://doi.org/10.14257/ijmue.2015.10.5.03.

317 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Detrimental Factors of the Development of Smart


City and Digital City
Evaristus Didik Madyatmadja Betley Heru Susanto Darian Handoro
Information Systems Department Information Systems Department Information Systems Department
School of Information Systems School of Information Systems School of Information Systems
Bina Nusantara University Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia 11480 Jakarta, Indonesia 11480 Jakarta, Indonesia 11480
[email protected] [email protected] [email protected]

Abstract— The modern civilization strive to integrate Taking example from Amsterdam this paper is hoped to help
technology into our life more and more everyday. Such people avoid the mistakes done in the case and any other
technology evolves more and more everyday improving in problems usually found when developing and maintaining
every aspects of its performance. Every technology used to smart cities and digital cities.
improve the life quality of its user uses power in order to
operate and the power source used varies depending of the The development of civilization is synchronous
item used. Every power source needs fuel and will produce with the development of technology so as technology
residue and the residue could range from eco friendly to advances so does the development of human settlements
environmental hazard. Government tries to take care of such as cities, village and so on. Such advances are
such problems by using smart technologies that are eco detrimental to the developments of cities because of the
friendly to reduce pollution. Now with all those technology
increasing needs of humans inhabiting the settlements
used and implemented our lives will also be dependent on
such technology and we will try to integrate it into every
therefore the development of the cities is connected to the
aspects of our life and in the end the whole human settlement development of technology. Development of cities and
will be integrated with technology in its every aspects technology will eventually mix and give birth to a concept
eventually it will be called smart city and digital city. The called smart city.
development of both digital city and smart city is never This research is that the development of smart cities
without problems and challenges. This research is hoped to are usually defined as a hi tech city but the real meaning of
identify and disect the concept of both smart city and digital smart city is not only focused on the technology aspect but
city to help improve the development process of both it also need to be highly developed, innovative,
concept and reduce the problems encountered by every
environment-friendly, and incorporates relevant aspects of,
developers.
economy, technology, mobility, quality of life of its
Keywords— Smart City, Digital City, Technology, Quality of Life residents [2].
I. INTRODUCTION The purpose of this study is to research on the
In modern times our lives are connected to technology development of smart cities and digital cities in order to find
almost 24/7 and as our lives get more and more inseparable out most mistakes and improvements done to the
from technology problem will also show up along the development of smart cities. The benefit of this research is
advancement of technology and its implementation into our that any other individual or groups developing smart city
lives. The idea of integrating technology into our daily lives will be able to avoid repeating mistakes that has been done
exist ever since modern and advanced technology becomes during the development of past smart cities and be able to
more and more accessible to the public. [1] improve its development in the future. The scope of this
Along the course of history there are a number of smart city research is to use researches related to researching smart
and digital city initiatives around the world such as cities and digital cities and how such concept has been
Amsterdam. Back in 1993 there is an experiment done by a developed by many individuals along the course of history
private group to make a DDS as a digital city in order to close II. LITERATURE REVIEW
the gap between politicians and the citizens for the upcoming
election and it was a success but after several years the
A. Definition of a Smart City
project start to manifest problems such as changing the
The first concept to understand is what are smart cities. A
design without any support from the users, funding problems
smart city is a place where traditional networks and services
and the DDS finally stopped being a digital city and
are made more efficient with the use of digital and
converted into an ordinary internet access provider. (source:
telecommunication technologies for the benefit of its
(PDF) The Life and Death of the Great Amsterdam Digital
inhabitants and business. [3]. Smart city is a concept that if
City (researchgate.net))
successfully executed it will be able to contribute to the

318 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

improvement of the life of the people, environment, and its C. Difference of smart city and digital city
surroundings. Here are some more examples of the The difference of a smart city and a digital city is
definitions of a smart city: that a smart city exist in the real world meanwhile digital
1. According to Renata Dameri in her article [4] “a smart cities exist in the digital world. But Smart cities are the
city is a well-defined geographical area, in which high successor to digital cities in a sense because digital cities are
technologies such as ICT, logistic, energy production, cities that rely on modern technology to run because it is on
and so on, cooperate to create benefits for citizens in the internet meanwhile smart cities usually rely on the
terms of well-being, inclusion and participation, internet in order to connect the citywide network and run
environmental quality, intelligent development; it is perfectly.
governed by a well-defined pool of subjects, able to state
the rules and policy for the city government and D. The similarity of smart city and digital city
development” Both uses Information and Communication
2. Smart cities can be built either as a solution to different Technology (ICT) to promote economic development, life
existing and completely new problems basic cities face, quality, and sustain the green environment to sustain the city
so as to upgrade the quality lives of citizens and to one way or another Challenges of development. [14]
remark the development in a country. [5]
3. a smart city is an urban environment encircled or III. RESEARCH METHODOLOGY
embedded with smart systems or a city with ideas and The research method that is employed for this specific
people that provide clever perceptions. [6] research is a Systematic Literature Review in order to
4. Digital city is a city in which Information and analyze and summarize existing researchers’ data to reach a
Communications Technology is integrated and merged conclusion.
along with traditional infrastructures and then integrated The source of such research comes mostly from science
and coordinated utilizing brand-new digital technologies direct, springer, Elsevier, researchgate, emerald, iopscience,
[7] academia, ieee, taylor and francis, MDPI, sagepub, eprint
5. The concept of smart city is a city that is strongly and countless other source from google scholar. All the
dependent on the implementation of technology [8] sources mentioned are used because their reliability to
6. Smart city is a city that meets the requirements of green provide up to date and their up to date repository of papers
cities therefore smart city is a city that is also sustainable and articles.
and environmentally friendly [9]
7. Smart city is a urban locality where information and
IV. RESULT
communication technology is used extensively to
provide a higher life quality to its inhabitats. [10]
A. Research Stages
The papers and articles selected must meet the following
B. Definition of Digital City
criteria:
According to my observation when this research is
- research’s theme must be based on smart city or digital
conducted the concept of digital city is mostly discussed
city
from the 1990’s up to early 2000’s and ever since the concept
- articles published between 2005-2020
of smart city become more popular digital city gets less and
- research must discuss the identification, development or
less relevant but is still mentioned whenever a new
the implementation of smart city or digital city
development of smart city starts. digital cities are regarded
as predecessor of smart cities. Here are some examples of
the definition of digital city: Selected Studies
candidate studies
1. The existence of digital cities is mostly purely digital •keyword •research paper
hence the name and it is based on activities on web search match to
•abstract answer
services like the internet. [11] •300 match with research
2. the definition of a digital city is a representation of a papers questions question
town or village on the World Wide Web as local studies found •62 papers •25 papers
information and communication system [12]
3. digital city is a trend that emerges from the uses of
smart and digital-based devices that encourages local
governments to create a supply of digital based services Fig. 1. Illustration for the Paper Filtering Stages
[13]

319 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

TABLE I. STUDIES FOUND


Zurinah Tahir university Malaysia
Studies found 300 Jalaluddin Abdul
Candidate studies 62 Malek university Malaysia
Selected studies 25 Li Hao university China
Xue Lei university China
The total literature article found and meet the criteria are 87 Zhu Yan university China
papers and the total papers selected as reference is 25 with
the candidate studies being 62 papers. Yang ChunLi university China
United Arab
Mehrdad Mohasses university Emirates
TABLE II. SOURCE AND NUMBER OF PAPER USED Anton Safiullin university Russia
Source total candidate Selected Lyudmila Krasnyuk university Russia
sci direct 10 8 2 Zoya Kapelyuk university Russia
springer 8 3 5 Aravindi Samarakkody university sri lanka
elsevier 4 2 2 Dilum Bandara government Australia
researchgate 5 1 4 Ignasi Capdevila university france
emerald 2 2 0 Matías I. Zarlenga university Spain
iopscience.iop.org 3 1 2 Luca Mora university UK
academia 4 2 2 Roberto Bolici university italy
Ieee 8 6 2 Emine Mine Thompson university UK
taylor n francis 6 6 0 Shihe Fu university China
Mdpi 2 2 0 Iraklis Argyriou government France
sagepub 1 0 1 T.M. Vinod Kumar University India
google scholar 34 29 5 Bharat Dahiya University India
Catherine Griffiths University UK
Total 87 62 25
Claire Thorne University UK
Leonidas G.
B. Overview of selected studies
Anthopoulos University Greece
The 25 articles selected for this research are written by 37
authors affiliated with 18 nations and 35 authors are
affiliated with universities and 2 authors are affiliated with
governments Author's Affiliation
TABLE III. LIST OF AUTHOR AFFILIATIONS

Author Names affiliation Nation 5%


Toru Ishida university Japan
Peter van den Besselaar university Netherlands
Abhishek Bhati university Singapore
95%
Michael Hansen university Singapore
ching man chan university Singapore
Vasja Roblek university Slovenia university government
Sumiyatun university Indonesia Fig. 2. Venn diagram that depicts the ratio of affiliation of the authors
Adiyuda Prayitna university Indonesia
Tan Yigitcanlar university Australia
Renata Paola Dameri university Italy
Annalisa Cocchia university Italy
D Çinar Umdu university Turkey
Alakavuk, E university Turkey

320 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

The city that implements technologies to create green, safe


Author's nationality and sustainable environment without environmental
problems and living hazards.
6 -Smart people
5 The equal distribution of each role will help sustain a smart
4 city and help it develop further.
3
-Smart living
The city where every resident contributes and become an
2 active participant in the community to ensure the high life
1 quality through health status, cultural objects, housing
0 quality, and security. [15]
-Smart governance
Australia

Sri Lanka

Turkey
Malaysia

Indonesia
Japan

United Kingdoms

France

Spain

United Arab Emirates

Netherlands

Greece
Italy

Russia

India
China

Slovenia
Singapore This is achieved by taking advantage of technology to help
support and facilitate decision making and planning when
governing and providing public services.

Components of the process into making a city into a smart


city [17]
Need Driven by growths in technical information and urban
Fig. 3. Chart of the Geological Location of Authors
centers
Enablers: skills, appetite, data
C. Method of Collecting Data Implementation environment: infrastructure, economy,
the method of collecting data for this research comes from governance
the use of google scholar, emerald and sci hub to search for Approach: engagement, implementation, adoption
paper and articles that meet the criteria of paper needed Outputs: smart living, smart mobility, smart economy

Source Furthermore, four key principles have been selected for


guiding the development of both the strategy and individual
40
projects [18]:
35
1) Collective effort: a highly collaborative approach is
30
considered fundamental for achieving results. For this
25
reason, cooperation between the public and private sectors is
20
constantly stimulated and supported in every project,
15
together with the involvement of citizens (Public-Private-
10
People Partnership);
5
2) Economic viability: only the most advantageous projects
0
can be considered for potential large-scale
implementation.
3) Tech push/pull demand: the action against the climate
change has to be supported through technological
innovation and the stimulation of behavioral change.
4) Knowledge dissemination: sharing and spreading the
Fig. 4. Chart Depicting the Sources of Papers Used knowledge acquired during the path towards the
smart city transformation is considered as actions of crucial
D. Development of smart city importance
There are several criteria in which when all the criteria are E. Development of digital city
met a city can be classified as a smart city [2] and those Digital city is intended to provide communication, supply
criterias are: Information, e-services, and finally connect the citizens to
-Smart economy the public administration and to each other. [19]
A city that implements ICT based economy and technologies Related use of a digital cities are as follows:
into their business models and production processes [15] -promoting the quality of local public services
-Smart mobility -external and internal administrative usage
A city that implements modern transportation technologies,
logistics, and new transportation systems to improve the Relation between Internet of Things and Smart City
urban living and mobility [15] It can be concluded that in the digital age everything is
-Smart environment connected to the internet and the internet is a good media that
enables fast communication between people or even gadgets

321 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

and the internet helps a smart city to connect one component Hangzhou: Dream Town Internet village [22] green
to another in order to run and sustain itself. smart city [23]
The internet village project’s goal is to improve the quality
F. Relation between internet of things and digital city of life and promote economic growth around internet based
Digital cities are online communities of with therefore the startups. The green smart city Project’s goal is to develop
internet is crucial for digital cities to operate or to even exist Hangzhou into a smart city by implementing new
in the first place. ecofriendly material, Environment Protection, smart
industry, Digital City Management, Emergency System, and
G. Implementations of Smart cities Smart Transportation.
Vienna: energy conservation (smart city of Vienna) [20] Singapore: energy conservation [24]
The main goal of developing Vienna into a smart city is to First data of everyday electronic appliances is collected and
reduce the Energy consumption per capita by 40% in 2050. stored in the cloud, next the data is analyzed, lastly the data
Projects that are included: is used to optimize the use of every smart technology, so
-Wien Energie energy consumption is minimized.
-Clean Heat and Stable Power Grid
-Energy from Metro brakes Yogyakarta: smart tourism using app [25]
-Smart Traffic Lights Since digital cities are getting less and less discussed we can
Dubai: smart government (how Dubai is becoming a draw to a conclusion of in the modern eras Digital city is
smart city) [21] implemented as social medias or apps that hep citizens or
The implementation of cloud-based technology and tourist to access the e-services provided by local and
Intelligent service delivery along with collaboration with governments or social media-based communities of people
private and public sectors helped to realize 1000 smart that live in the same city.
service to the citizens.

TABLE IV. FACTORS THAT ARE DETREMENTAL TO THE DEVELOPMENT OF SMART CITY AND DIGITAL CITY

Factors Explanation Citation


Being able to sustain the environment could help the longevity
Eco friendly
of the smart city [2] [9]
A good connection to the internet will improve accessibility of
Internet
the e-service’s accessibility. [22] [11] [14]
The contribution of the citizens during development will ensure
Citizen’s contribution
the fulfillment of the peoples’ needs [8] [12] [18] [26]
Equal distribution of Distribution of facilities could help improve the mobility of the
facilities people [26]
being able to conserve energy could help the city to save the
Energy Conservation
budget and allocate it to other sectors in need [24] [6] [20]
Managing the city and being able to solve city wide problem
smart government
quickly is a must for smart cities [5] [4] [21]
Usage of modern technology is the best option to use during
Modern technologies
development in order to prolong the city’s life expectancy [7] [24]
Providing business opportunities will help the city grow
smart Economy
economically [3] [13] [15] [16] [27]
The reason smart city and digital city concept is made because
Quality of life
they want to promote the people’s quality of life [10] [19]
availability to the
every info and facilities must be available for public use
public [25] [17]

322 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

V. CONCLUSION [12] P. V. D. Besselaar and D. Beckers, "The Life and Death of the Great
Amsterdam Digital City," in Lecture Notes in Computer Science,
Berlin, academia, 2005, pp. 66-96.
The literature analysis reveals that both digital city and smart
city has many criterias and issues that is encounterred sooner [13] A. Cocchia, "Smart and Digital City: A Systematic Literature
Review," in Smart city, Cham, Springer, 2014, pp. 13-43.
or later in its development and lifetime this shows how both
[14] R. P. Dameri, "Comparing Smart and Digital City: Initiatives and
digital city and smart city are still far from perfect and it Strategies in Amsterdam and Genoa. Are They Digital and/or
shows that both still needs polishing in its every aspects. The Smart?," in Smart city, Cham, Springer, 2014, pp. 45-88.
purpose of this paper is to identify what factors are [15] A. Safiullin, L. Krasnyuk and Z. Kapelyuk, "Integration of Industry
detrimental to help the development of the digital city and 4.0 technologies for “smart cities" development," in IOP Publishing
smart city.the factors detrimental to the development is Ltd, Saint-Peterburg, 2019.
displayed in Table 4. We hope this paper is useful for every [16] B. Dahiya and T. V. Kumar, "Smart Economy in Smart Cities," in
future smart city and digital city development so they will Advances in 21st Century Human Settlements, Singapore, Springer,
2016, pp. 3- 76.
polish the development even more to make a smart city or
[17] C. Thorne and C. Griffiths, "Smart, Smarter, Smartest: Redefining
digital city that is able to meet the definitions of digital city Our Cities," in Dameri R., Rosenthal-Sabroux C. (eds) Smart City,
or smart city respectively. Cham, Springer, 2014, pp. 89-99.
REFERENCES [18] L. Mora and R. Bolici, "How to Become a Smart City: Learning from
Amsterdam," in Smart and Sustainable Planning for Cities and
Regions: Results of SSPCR 2015, researchgate, 2017, pp. 251-266.
[1] E. D. Madyatmadja, M. and H. Prabowo, "Participation to Public e-
Service Development: A Systematic Literature Review," Journal of [19] R. Dameri and A. Cocchia, "Smart City and Digital City: Twenty
Telecommunication, Electronic, and Computer Engineering, vol. 8, years of terminolofy evolution," in Conference of Italian Chapter of
no. 3, pp. 133-137, 2016. AIS, 2013.
[2] Z. Tahir and J. A. Malek, MAIN CRITERIA IN THE [20] V. Roblek, "5 - The smart city of Vienna," in Smart City Emergence,
DEVELOPMENT OF SMART CITIES DETERMINED USING Elsevier, 2019, pp. 105-127.
ANALYTICAL METHOD, p. 4, 2016. [21] M. Mohasses, "How Dubai is Becoming a Smart City?," in 2019
[3] Ç. U. D and A. E, "UNDERSTANDING OF SMART CITIES, International Workshop on Fiber Optics in Access Networks, Dubai,
DIGITAL CITIES AND INTELLIGENT CITIES: SIMILARITIES 2019.
AND DIFFERENCES," in 2020 5th International Conference on [22] I. Argyriou, "The smart city of Hangzhou,China: the case of Dream
Smart City Applications, Virtual Safranbolu, 2020. Town Internet village," in Smart City Emergence, Paris, Elsevier,
[4] R. P. Dameri, "Searching for Smart City definition: a comprehensive 2019, pp. 195-218.
proposal," INTERNATIONAL JOURNAL OF COMPUTERS & [23] L. Hao, X. Lei and Z. Yan, "The application and Implementation
TECHNOLOGY, vol. 11, no. october, pp. 1-9, 2013. research of Smart City in China," in 2012 International Conference
[5] A. Samarakkody, D. Bandara and U. Kulatunga, "WHAT on System Science and Engineering, Dalian, 2012.
DIFFERENTIATES A SMART CITY?," in Proceedings of the 8th [24] A. Bhati, M. Hansen and C. M. Chan, "Energy conservation through
World Construction Symposium, Colombo, 2019. smart homes in a smart city: A lesson for Singapore," in Energy
[6] L. G. Anthopoulos, "the rise of the smart city," in Understanding Policy, Singapore, James Cook University Singapore, 2017, pp. 230
Smart Cities: A Tool for Smart Government or an Industrial Trick?, - 239.
Cham, Springer, 2017, pp. 5-45. [25] S. and A. Prayiyna, "PENGEMBANGAN KONSEP MOBILE CITY
[7] E. M. Thompson, "What makes a city ‘smart’?," International MENUJU JOGJA SMART CITY," STMIK Akakom Yogyakarta,
Journal of Architectural Computing, vol. 14, no. 4, pp. 358-371, Yogyakarta, 2020.
2016. [26] S. Fu, "Smart Café Cities: Testing human capital externalities in the
[8] I. Capdevila and M. I. Zarlenga, "Smart city or smart citizens? The Boston metropolitan area," Journal of Urban Economics, vol. 61, no.
Barcelona case," Journal of Strategy and Management, vol. 8, no. 3, 2007, p. 86–111, 2006.
pp. 266-282, 2015. Madyatmadja, E.D., Gaol, F.L., Abdurachman, E., Pudjianto, B.W.
“Social media based government continuance from an expectation
[9] A. A. Hameed, "Smart city planning and sustainable development," confirmation on citizen experience”. International Journal of
in IOP Publishing Ltd, Baghdad, 2019. Mechanical Engineering and Technology, 9(7), pp. 869–876, 2018.
[10] T. Yigitcanlar, "Smart cities in the making," International Journal of
Knowledge-Based Development, vol. 8, no. 3, pp. 201-205, 2017. [27]
[11] T. Ishida, "Digital City, Smart City and Beyond," in International
World Wide Web Conference, Perth, 2017.

323 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Application of Internet of Things in Smart City: A


Systematic Literature Review
Evaristus Didik Madyatmadja Hendro Nindito Albert Verasius Dian Sano
Information Systems Department Information Systems Department Computer Science Department,
School of Information Systems School of Information Systems School of Computer Science
Bina Nusantara University Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia 11480 Jakarta, Indonesia 11480 Jakarta, Indonesia 11480
[email protected] [email protected] [email protected]

Agung Purnomo Dimas Rizki Haikal Corinthias P.M. Sianipar


Entrepreneurship Department Information Systems Department Department of Global Ecology,
BINUS Business School Undergraduate School of Information Systems Graduate School of Global
Program Bina Nusantara University Environmental Studies (GSGES),
Bina Nusantara University Jakarta, Indonesia 11480 Division of Environmental Science and
Jakarta, Indonesia 11480 [email protected] Technology,
[email protected] Graduate School of Agriculture (GSA)
Kyoto University
Kitashirakawa, Oiwakecho, Sakyo
Ward, Kyoto, Japan 606-8502
[email protected]

Abstract— Nowadays, every urban area in developed countries at specific location [4]. The uses of IoT in recent growth has
needed to have a “Smart City” to compete with other country to resulted in innovation and the development of advance
simply make the name of “Future City” to become a true Analysis that include Machine Learning (ML), Deep
existent. By using Smart City the development of country will Learning (DL), and Computing Infrastructure [5]. With this
heavily increase by the amount of used Things that connected
development a lot of traditional technologies become
with the Internet, not only increase the development of a country
but also increase the productivity of people that used the things. unpopular and gradually changing to IoT implemented things
To get the right results and according to the purpose, this paper [6]. By the changing technologies IoT and Big Data Analysis
will use a research method, namely Systematic Literature has become popular research domain over the last decade [7-
Review (SLR). This study will collect data from the previous 10]. In Countries such as India the uses of BIOS Data Area
research that include journal and conference papers to be (BDA) is a solution for the creation of smart destination, with
observed. The results of this paper will be used as a help to this technologies the local authorities in India make a
reverence for the newer journal on the same topic, then for the significant changes to become a smart departure point [11-
end goals this paper will give an example about the country that 12]. Implementation in University Rhode Island, by using fog
already applied “Smart City” and the benefit of using Smart
computing in contrast to Cloud computing this technologies
City itself.
Keywords—Smart City, Internet of Things, SLR.Introduction performs not only latency-sensitives applications at the
networks. But also performs latency-tolerant at powerful
I. INTRODUCTION. computing nodes at the intermediated of the network.
The uses of Smart things in people such as “Smart
This day, over 50% of the global population inhabit in Home” Technologies, the more advance analytics in
cities across the globe and its reported that in urban area such Information Analysis could start a massive role in the
as big city like Jakarta that the urban residency will increased development of Information and Communication
up to 68% within the next 30 years [1]. This growth will force Technologies (ICT). With a great Information Analysis will
several challenges on cities that include the quality life of the allow a good understanding and full view about the future in
people that lives in this urban area. Therefore, the planning and development of the people in the countries.
government uses Information and Communication Therefore provide the government the insight knowledge of
Technologies (ICT) as their solution to overcome the growth big data [14,15]. The advancement of things that helped
over the urban area to increase the quality of life of the people Smart City develop is also include sensors and wireless IoT
that lived in the affected area [2]. Problem such as Excessive application, these devices generate a very large volume of
pollution (Air pollution, water pollution) and environment data that in the end improve an efficient in management and
damage has become the main problems in urban area, it is a control of urban operation and services [16].
must for the government to control and also monitor this The assessment of data reliability is challenging,
massive problem [3]. This way ICT can be the main solution, especially for newer implementation of smart city, as it has to
by using the Internet of Things and Data Analysis from the keep up with data velocity in order to keep up with the data
data that the machine will work on by collecting data such as in order to provide up-to-date responses [17]. The Bases of a
Intelligent Identification, Location, tracking, and monitoring “Smart City” is an Internet of Things, this technology have 4

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

324 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

layer in total, this layer is a must have so the technology can companies that will bring prosperity to life quality of the
reach its target, this 4 layer includes: citizens of Hong Kong that the goals of this development is
• Device Layer: Sensory Devices, Radio Frequency to offer the citizens public services such as Application can
Identification (RFID), and other things that can get detect the next bus arrival times, location of the nearest coffee
connected with each other through Internet. shops, etc. [27]
• Data Archive Layer: On this layer the database is
collecting unstructured data because of the data that not C. Germany
yet to be categorized.
• Data Computing Layer: at this layer the unstructured Development in Germany in general view is that the
data have already been categorized by system such as government want to make an innovation on “Smart City”
Hadoop, Spark, etc. from Entrepreneur point of view. The data collection is from
• Application Service Layer: This layer is the place where entrepreneurs that lived in Germany. With a major goals to
people meet with machine, by using application such as manage the progress and development while the Government
Traffic Control, water monitoring and other application can be the main role for spread the necessary resources and
that can help people look at the data that has been stakeholders.
processed by the system.[18, 19, 20]
D. Hangzhou, China
II. BACKGROUND STORY
In the city Hangzhou, China the government is start to
A. Definition and Approaches for Smart City developed a five year plan for National Economic and Social
Development that includes clear directions for resources
In some of the literature that had been gathered by efficiency and environment suitability, and carbon intensity
author, Smart City as a kind of blurred idea that always search that the city’s economy still get ahold on, this development
for and collecting huge amount processed data that connected will be plan on 2011 – 2015 [27].
to the people that lived in certain area as a means to justify
the main planning and to manage a certain area or places [21- E. Kitakyushu, Japan
24]. Meanwhile this study highlighting on literature that
selected countries choses Internet of Things to develop on its In the city of Kitakyushu, japan the development
cities. using Internet of Things is focusing on Localized intervention
Reference (IBM, 2012) stated, that the major players that policy implementations, Improvement on Information
promotes and support this specific vision of a “Future City” Infrastructure, Stabilize the distribution of sustainable energy
or Lawmakers should approach this future ideal of a living in smart Micro grids, Smart Monitoring, and Smart Build that
city as “Complex of Interconnected Systems”. Reference focuses on reduce the city carbon emissions.
(New York Times Lehrer, J., 2010) state, that smart city
continuously creates new information that in the end can be F. Peterborough, United Kingdom
used to monitoring, measuring, and also managing the
progress of urban life in the countries or cities by The development in Peterborough, United Kingdom
“Influencing some of the processed information to make a that the government is beginning to working together with
better decision making, anticipates and fix some issues that universities around Manchester to focus on collecting data
proactively found and to coordinate the machine that operates and smart tools to support businesses.
the present data more effective”
The efforts to build a city that fully scientific were of III. RESEARCH METHOD
often temporary, due to some technologies on the cities that
The Research Method that will be implemented is
still not fully updates to the require technologies for the cities
Systematic Literature Review where this method will be the
to be fully function as “Urban Science”, despite the obvious
most useful for collecting, identifying and reviewing the
given the seasonal history of the urban planner [19].
journal and conference journals that will be selected to
Reference [24] stated that, if somebody wanted to recognize
support the topic of this research paper. Additionally, the
“Smart City” as a reasonable, scientific, and as a way of
Reviewers carried out have to meet the specific condition and
freedom to understand and as a phase for the city to
critical analysis so it can be picked as selected journals. The
developed. It is a crucial thing to remember, that “Smart City”
media to find the journal, the portal for the collection of
is already been recognized and get criticize due to the little
journal articles and conference article will have to be a well
comparison with the current reality even after major changes
– known and trusted publisher. There is 4 selected portals,
in the era of Internet of Things and Big Data Analysis.
namely ACM DL, Springer Link, Science Direct, and Google
Reference [25, 26] stated, the importance of stratification on
Scholar. The selected paper will mostly collected in Science
people that living in the affected area of changing is one of
Direct and Google Scholar. The publication date will also be
the most important phase for the government to continuously
limited from 2015 – 2020 and 1 paper from 2012, this journal
improve.
will be reviewed to view the most proper journals to be the
selected papers.
B. Hong Kong

The Development of Cities in Hong Kong is to giving


free access of the data platform for public and individual

325 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

IV. RESULT D. Candidate Studies

A. Study Found From the 30 chosen articles, there are 73 authors that
The main keywords to find the specific journals is participate, 31 Institutions, and 27 universities getting
(Smart City) AND (Internet of Thing OR Big Data Analysis) involved in the making. A couple of authors write 2 papers
by using this keywords 104 papers was obtained. Then on while the rest only wrote 1 paper. And fortunately, a couple
the second searching (Smart City Internet of Things) AND of universities wrote 2 papers while the rest 1 paper. This
(Smart City Big Data Analysis) and was obtained 26 papers, institution location is in Korea, Taiwan, China, India, USA,
and on the third searching the keywords that used was Morocco, Hong Kong, Australia, United Kingdom, Tunisia,
“Application Smart City” in this search 21 papers was Saudi Arabia, Romania, Pakistan, France, Indonesia, Italy,
obtained. Belgium, Greece, and Iran.
From the keywords that has been used we got one All of the authors worked in 11 Departments which
hundred and fifty one papers that every article has been are Computer Science and Engineering, Business
noted and should defined related to the topic. Administration, Information System, Electrical and
Biomedical, Information System, Electrical and Biomedical,
B. Candidate Studies Geography, Building and Real Estate, Engineering,
Communication Engineering, Data Science, Management,
In this phase, the collected 151 papers get a reduction and Urban Planning and Regional Development.
to 74 candidate paper the condition because of the
adjustment with the main research question and needs.
Author Instituion Universities in countries
C. Selected Studies

73
The selected paper will need a specific condition
below:

45
• The paper must focus on journal that participate on
research in Smart City using Internet of Things.

19
• The paper must discusses the activity of the users in
community that in accordance with the provisions.
• The paper must answer the main question of the
current research. 1
• The article published between the years 2010 – 2021
Fig. 2. Author Demography
• The collected article that has been collected is the
latest research relating to application of Smart City and The Writer Academic Scholar appeared in Figure 3,
Internet of Things while the University that get involved by the country can be
The amount of paper that meet the condition is on total view in Figure 4. The articles is selected by publication
of 30 Journal Articles and Conference Articles, the review between 2012 and 2020 as shown in Figure 5.
of articles will be explained in Figure 1. Then, the data
procurement, which is the number of study from the 20
selected paper, can be seen in Table 1.

0
Computer Science & Engineering Business Administration
Information System Electrical & Biomedical
Geography Building & Red Estate
Engineering Communication Engineering
Fig. 1. Searching Strategy fit Systematic Literature Review Data Science Management
Urban Planning & Regional Development

TABLE I. NUMBER OF STUDIES IN SELECTED SOURCES Fig. 3. Author’s Academic Background

326 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

• Business
based Urban
Development
3 Asia Hangzhou, • Digital City
China • Smart
Environment
• Smart Life
• Internet of
Things
4 Asia Kitakyushu, • Smart Life
Japan • Internet of
Things
• Smart
Environment
5 Europe Peterborough, • Smart Life
United • Internet of
Fig. 4. University in the country Kingdom Things
• Smart
Environment
• Smart
Government

V. CONCLUSION
There is a lot of factors that influenced countries to change
their technology from traditional to a more modern “Smart
City” but with the influence of “Smart City” there is a lot of
benefits that can be gathered, like increase of people
productivity, and standard of a country can be increase
significantly. This research had a main purpose to show how
Fig. 5. Publication Year important the development of Smart City in a Country, so the
country that still hesitant to develop Smart City can be
inspired to do it. The issues on under developing country can
E. Participation Factory Category be done with the implementation of Smart City.

From the literature review, there are four categories, ACKNOWLEDGEMENT


namely: Technology, Culture, Social, and political. With This work is supported by Research and Technology
technology, a Smart City can be developed depends on the Transfer Office, Bina Nusantara University as a part of Bina
needs and how big the problem that has caused the city to Nusantara University’s International Research Grant entitled
make a big change. With Political, in the implementation of “Aplikasi Smart City: Akselerasi Partisipasi Masyarakat
Smart City can also change how the country works and how dalam Membangun Kota yang Berkelanjutan” or “Smart City
the government policies over time due to adaptation on Application: Accelerating Community Participation in
implementation of Smart City. And with Culture and social, Building a Sustainable City” with contract number:
this will view the people inside the affected area due to 017/VR.RTT/III/2021 and contract date: 22 Maret 2021.
change technology did the people uses the change to be more
productive or make them slacking off.
REFERENCES
F. Participation Countries in Smart City
[1] T. Schultz, “Turning healthcare challenges into big data opportunities:
In this paper will be discussed about the implementation of A use-case review across the pharmaceutical development lifecycle,”
Smart City in several countries that has been described Bull. Am. Soc. Inf. Sci. Technol., vol. 39, no. 5, pp. 34–40, 2013, doi:
10.1002/bult.2
beforehand. In the table below there is a list of research on [2] Atitallah, Safa Ben, Maha Driss, Wadii Boulila, and Henda Ben
number of countries. Ghezala. 2020. “Leveraging Deep Learning and IoT Big Data
Figure 3: Author’s Academic Background Analytics to Support the Smart Cities Development: Review and
No. Continent Country and Implementation Future Directions.” Computer Science Review 38: 100303.
City of Smart City https://doi.org/10.1016/j.cosrev.2020.100303.
1 Asia Hong Kong • Open Data [3] Din, Sadia et al. 2017. “SDIoT: Software Defined Internet of Thing to
Project Analyze Big Data in Smart Cities.” Proceedings - 2017 IEEE 42nd
• Internet of Conference on Local Computer Networks Workshops, LCN Workshops
Things 2017: 175–82.
• Smart [4] Lu, Sheng Qiang, Gang Xie, Zehua Chen, and Xiaoming Han. 2015.
Environment “The Management of Application of Big Data in Internet of Thing in
• Smart Home Environmental Protection in China.” Proceedings - 2015 IEEE 1st
2 Europe Germany • Entrepreneur International Conference on Big Data Computing Service and
based Applications, BigDataService 2015: 218–22.
[5] Eremia, Mircea, Lucian Toma, and Mihai Sanduleac. 2017. “The Smart
• Smart Life
City Concept in the 21st Century.” Procedia Engineering 181: 12–19.
• Smart
[6] Babar, Muhammad et al. 2019. “Urban Data Management System:
Education
Towards Big Data Analytics for Internet of Things Based Smart Urban

327 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Environment Using Customized Hadoop.” Future Generation


Computer Systems 96: 398–409.
[7] Yadav, Preeti, and Sandeep Vishwakarma. 2018. “Application of
Internet of Things and Big Data towards a Smart City.” Proceedings -
2018 3rd International Conference on Internet of Things: Smart
Innovation and Usages, IoT-SIU 2018: 1–5.
[8] Babar, Muhammad, and Fahim Arif. 2017. “Smart Urban Planning
Using Big Data Analytics to Contend with the Interoperability in
Internet of Things.” Future Generation Computer Systems 77: 65–76.
[9] Rathore, M. Mazhar, Awais Ahmad, Anand Paul, and Seungmin Rho.
2016. “Urban Planning and Building Smart Cities Based on the Internet
of Things Using Big Data Analytics.” Computer Networks 101: 63–80.
[10] Khajenasiri, Iman, Abouzar Estebsari, Marian Verhelst, and Georges
Gielen. 2017. “A Review on Internet of Things Solutions for Intelligent
Energy Control in Buildings for Smart City Applications.” In Energy
Procedia, Elsevier Ltd, 770–79.
[11] Wu, Yung Chang, Yenchun Jim Wu, and Shiann Ming Wu. 2019. “An
Outlook of a Future Smart City in Taiwan from Post-Internet of Things
to Artificial Intelligence Internet of Things.” In Smart Cities: Issues
and Challenges Mapping Political, Social and Economic Risks and
Threats, Elsevier, 263–82.
[12] Balasaraswathi, M. et al. 2020. “Big Data Analytic of Contexts and
Cascading Tourism for Smart City.” Materials Today: Proceedings.
[13] Kummitha, Rama Krishna Reddy, and Nathalie Crutzen. 2019. “Smart
Cities and the Citizen-Driven Internet of Things: A Qualitative Inquiry
into an Emerging Smart City.” Technological Forecasting and Social
Change 140: 44–53.
[14] Tang, Bo et al. 2015. “A Hierarchical Distributed Fog Computing
Architecture for Big Data Analysis in Smart Cities.” ACM
International Conference Proceeding Series 07-09-Ocob.
[15] Rathore, M. Mazhar, Awais Ahmad, and Anand Paul. 2015. “Big Data
and Internet of Things: An Asset for Urban Planning.” ACM
International Conference Proceeding Series 20-23-Octo: 58–65.
[16] Babar, Muhammad, and Fahim Arif. 2017. “Smart Urban Planning
Using Big Data Analytics Based Internet of Things.” In
UbiComp/ISWC 2017 - Adjunct Proceedings of the 2017 ACM
International Joint Conference on Pervasive and Ubiquitous
Computing and Proceedings of the 2017 ACM International
Symposium on Wearable Computers, , 397–402.
[17] Puangpontip, Supadchaya, and Rattikorn Hewett. 2019. “Assessing
Reliability of Big Data Stream for Smart City.” ACM International
Conference Proceeding Series: 18–23
[18] Osman, Ahmed M.Shahat. 2019. “A Novel Big Data Analytics
Framework for Smart Cities.” Future Generation Computer Systems
91: 620–33.
[19] Nasiri, Hamid, Saeed Nasehi, and Maziar Goudarzi. 2018. “A Survey
of Distributed Stream Processing Systems for Smart City Data
Analytics.” ACM International Conference Proceeding Series.
[20] Angelidou, Margarita. 2015. “Smart Cities: A Conjuncture of Four
Forces.” Cities 47: 95–106.
http://dx.doi.org/10.1016/j.cities.2015.05.004.
[21] Chin, Jeannette, Vic Callaghan, and Ivan Lam. 2017. “Understanding
and Personalising Smart City Services Using Machine Learning, the
Internet-of-Things and Big Data.” IEEE International Symposium on
Industrial Electronics: 2050–55.
[22] Morozov, Evgeny, and Francesca Bria. 2018. “Rethinking the Smart
City - Democratizing Urban Technology.” Rosa Luxemburg Stiftung:
56.
[23] lvzhihan, and Amit Kumar Singh. 2020. “Big Data Analysis of Internet
of Things System.” ACM Transactions on Internet Technology 21(2).
[24] Okoli, Chitu, and Kira Schabram. 2012. “A Guide to Conducting a
Systematic Literature Review of Information Systems Research.”
SSRN Electronic Journal.
[25] E. D. Madyatmadja, H. Nindito, D. Pristinella, “Citizen Attitude:
Potential Impact of Social Media Based Government”, ICEEL 2019:
Proceedings of the 2019 3rd International Conference on Education
and E-Learning, pp. 128–134, 2019
https://doi.org/10.1145/3371647.3371653
[26] Madyatmadja, E.D., Gaol, F.L., Abdurachman, E., Pudjianto, B.W.
“Social media based government continuance from an expectation
confirmation on citizen experience”. International Journal of
Mechanical Engineering and Technology, 9(7), pp. 869–876, 2018.
[27] Ma, Ruiqu, and Patrick T.I. Lam. 2019. “Investigating the Barriers
Faced by Stakeholders in Open Data Development: A Study on Hong
Kong as a ‘Smart City.’” Cities 92(January): 36–46.
https://doi.org/10.1016/j.cities.2019.03.009.

328 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Smart Tourism Services: A Systematic


Literature Review
Evaristus Didik Madyatmadja Debri Pristinella Nicklaus Rahardja
Information Systems Department Faculty of Psychology Information Systems Department
School of Information Systems Atma Jaya Catholic University of School of Information Systems
Bina Nusantara University Indonesia Bina Nusantara University
Jakarta, Indonesia 11480 Jakarta, Indonesia Jakarta, Indonesia 11480
[email protected] [email protected] [email protected]
Raheliya Br Ginting
Teknik Informatika
Institut Teknologi dan Bisnis Indonesia
Medan, Indonesia
[email protected]

Abstract— Smart tourism is a new technology in tourism how goods at a low price with the quality of the best, and can also
increasing of tourism industries. by this technology can help bargain with sellers.
tourists and tourism agent by concept smart tourism make The main problem from this case with technological
massive income to be transformed into value propositions in advances and encouragement from the central and regional
tourism object and other, from this paper we can learn from
governments in making Bali a smart city pilot city and
some success smart tourism in Province Bali, Indonesia has
success to transform Sukowati art market in Ubud from wanting to advance the businesses that are there, especially
traditional market to be smart tourism for help tourist to find the Sukowati market from being a traditional art market to a
shop want they buy and can help owner shop to promote the modern art market that can promote the income of traders
product in her shop in social media. Smart tourism can help who trade in that market and promote Sukowati market in the
tourists and owner shop can use new smart tourism trends, and international arena
then can increase and government to make better tourism. So with research and application development can
Smart tourism can be improved and make from traditional provide can encourage tourists to visit the special tourist sites
tourism to modern tourism in this era of the Sukowati market and at the same time can bring more
Keywords— services, systematic literature review, smart
income to the government and especially traders who trade in
tourism a tourist attraction.

I. INTRODUCTION II. BACKGROUND STUDY


Smart tourism technology can improve the tourism A. Definition of Smart Tourism
industry specifically for areas which is often a lot, with Smart tourism is a new technology in tourism how
technology such as big data, IoT, blockchain, etc., with increasing of tourism industries. by this technology can help
existing capabilities it can encourage foreign tourists to want tourists and tourism agent by concept smart tourism make
to visit the area and can increase local community income. massive income to be transformed into value propositions in
starting from objects tourism to traditional markets to want to tourism object and other [3]. In an era where technology has
switch to technology that is currently popular In supporting developed, smart tourism is highly dependent on the existing
the work program of the local government in advancing infrastructure, in smart tourism, 3 components build smart
tourism areas, the smart tourism program will be able to help tourism as follows:
increase the income of the community around these tourist • Smart experience
objects and at the same time help domestic and foreign • Smart eco-tourism
tourists in traveling in a tourist attraction [1]. • Smart business tourism
Indonesia has an example of successful smart • Smart destination
tourism in from this reference [2] I can give conclusion like ICT-enabled devices and applications that can be widely
this Sukawati art market is a famous art market in the Ubud accessed and entrusted with improving the world of tourism
area which is a favorite for local tourists and foreign tourists, to increase the income of the country specifically for the
this is a tourist spot in Bali which is a favorite because it sells income of the people that exist in the area and the people who
various handicrafts such as clothes. sarongs, paintings, wood trade in the tourism area, this is an example from success
sculptures, etc, the market is about 40m from the center of smart tourism in Internasional [4], example:
Bali, Denpasar
• Barcelona, (http://smartcity.bcn.cat/en/bicing.html).
This market is always crowded and always crowded
• The city of Brisbane
every day which has become a market economy for the pulse
(http://goo.gl/QidSOC).
of Ubud and surrounding areas, the market is always the
• Amsterdam
(http://amsterdamsmartcity.com/).

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

329 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

• Seoul TABLE I. LIST SOURCE


(http://www.visitseoul.net),
• Jeju SOURCE STUDIES CANDIDATE SELECETED
(https:// youtu.be/d3C7vS-IbAY) FOUND STUDIES STUDIES

ACM DIGITAL 61,447 250 5


LIBRARY
III. RESEARCH METHODLOGY SCIENCE 6,634 150 15
DIRECT
Research methodology Review is being chosen by
SPRINGER 8,198 70 6
method to identifying and reviewing to knowledge
community in tourism before begin the research. The aim not EMERALD 4000 50 2
only to summarize the research but also to include the GOGGLE 559.000 150 10
elements of analytic criticism. The Research results being SCHOLLAR
reviewed and analyzed as the data for literature review There TOTAL 639.279 670 38
are five online research databases chosen for the article,
source by ACM Digital Library, Springer Link, Science
Direct, Emerald, and Google Scholar. ACM and Springer D. Candidate Study
being by selected because they have a good procedure in From the 38 selected papers, 56 authors reference. The
smart tourism and blockchain author can only write one paper, and fortunately, each
Science direct and springer have good choice to institution has also only Single papers. The institution
find about smart tourism, and many research article I can location is in Norway, Rusia, Oman, UEA, Singapore, UK,
write because have good research result to understand this Thailand, USA, Canada, Argentina, Brasil, Hungary, South
article Korea, Ukraine, Egypt, Tunisia, and the German

IV. RESULT
A. Study Found
For this first search by using keyword (Smart tourism) wat
output result with 112 papers. for the second searching by
using keywords (tourism), and the output from this result is 16
papers
B. Candidate Studies
According to the systematic review I have been chosen
based on the method of identification and a review of 25
research journals in the form of a cumulative approach, which
is later more pictures and explanations than numbers sub- Fig. 2. Reseach Country Bar
topics.
This research can already use and understand in 54
C. Selected Studies departments are Information System, Management,
The selected articles or paper meet the following Computer Science, Financial, Hotel Management, creative
criteria : designer, tourism, Hotel management, and others.
• This research is about Smart tourism
• This research used between the year 2018-2021
• The Research is the latest because research relating
to smart tourism began around 2018
The result from this research between 25 papers and I
already met to review can see in table 1,

Keywords search
Paper tittle selection
match with topic
Abstract match
171 paper
with topic
Fig. 3. Faculty Chart
research
Paper result match to 45 paper
topic research
25 paper

Fig. 1. A Searching Strategy for Systematic Literature Review

330 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

ii. The ability smart tourism


[6]Smart tourism can be able to advance a tourism
business and tourism objects from a traditional concept to a
modern market specifically for tourist objects that are often
visited by busy people and are already well-known so that
they become more famous and can provide more income to
the government and communities around these tourism
objects, at the same time promoting Indonesian tourism
objects in the world of international tourism so that people
have never been to Indonesia so they want to visit Indonesia
Fig. 4. Picture Chart [7].
People have different personalities, habits, and
preferences in determining their vacation, usually, they will
determine their vacation from their hobby, for example: if the
photographer goes on vacation, then they will choose a
vacation to a place that is more or their young partner will
choose a romantic place like the beach or vacation to an
island in an ocean, they usually share their photos to their
social media and will be seen by their friends, their friends
giving comments and so on [8].

F. Participation conculsion
[9]This is my opinion about this reference I can give
a conclusion like this that good or bad consumer
Fig. 5. University in the World
commentary on changes from tourism objects must be
adjusted to the interests of how they share their
experiences on their social media or a large number of
viewers of their videos on social media, but the changes
that occur in object tourism must make smart tourism in
tourism objects better and able to satisfy the desires of
people who work in the tourism industry, therefore it must
be able to assist people who want to struggle to make
tourism so that they can more than attract many people
who want to visit a tourism object [10].
From my conclusion [11] can be replicated in Bali
tourism object have big important factors that affect the
acceptance of Smart tourism on the domestic and
international tourist. Good smart tourism can be accepted
by international and local tourists. After all, good smart
tourism can be accepted and used well, because everyone
Fig. 6. Publication Year has a difference in learning and using a facility or
E. Participation Dimension application in using it because usually different habits
The Research dimension is talk about social media, this become an obstacle. in using an application, because good
theory about social media and ability : or bad tourism objects must be seen in the opinion of
i. Tourist used social media by smart tourism consumers about smart tourism, it can provide an increase
By reference [4] Social media is a tool to share their in community income and it is easy for people to use an
feelings in a visit somewhere a travel destination is application[12]
sometimes the brand will share photos or them when visiting
social media, social media can be used as a promotion of a G. Tourism Factor
tourist attraction in an area because besides cheaper and The reference that good or bad an application or ICT
usually are also who promotes is Artist or YouTuber who has must be seen from how the user can learn and use it because
a very large number of followers or fans. the design of an application influences the user to use an
Based on existing references, social media divided application because an application must be able to provide
into four (4) tourists use social media as a means of making data security or user information from an application because
decisions in choosing and promoting a tourist attraction applications such as in the field of tourism usually store
which can help the local and central government to promote confidential and limited user data such as banking, etc [13].
place tourism at the same time must also be accompanied by The systematic literature review look resulted in two
an increase in facilities that can increase the willingness of dimensions of government, tourism partner, and tourist. This
tourists to use applications in promoting a tourist Systematic Literature Review resulted in 12 factors that affect
attraction[5].

331 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

the citizens’ participation in the development of the system [4] O. O. Afolabi, A. Ozturen, and M. Ilkan, “Effects of privacy concern,
risk, and information control in a smart tourism destination,” Econ.
(see Table 2).
Res. Istraz. , 2020, doi: 10.1080/1331677X.2020.1867215.
[5] S. Amir, N. H. Dura, M. A. Yusof, H. Nakamura, and R. A. Nong,
V. CONCLUSION “Challenges of smart tourism in Malaysia eco-tourism destinations,”
This is the result of my conclusion regarding my research, Plan. Malaysia, vol. 18, no. 4, pp. 442–451, 2020, doi:
such as that in making cooperation between tourism agents 10.21837/pm.v18i14.844.
[6] A. Erceg, J. D. Sekuloska, and I. Kelic, “Blockchain in the tourism
and financial people, each factor was mapping by sort the industry - A review of the situation in Croatia and Macedonia,”
categories tourist In future research, many topics require a Informatics, vol. 7, no. 1. MDPI Multidisciplinary Digital Publishing
Structure Research approach and it needs success in smart Institute, p. 5, Feb. 13, 2020, doi: 10.3390/informatics7010005.
tourism Each factor was mapping by sort the categories tourist [7] J. M. Gomis-López and F. González-Reverté, “Smart tourism
In future research, many topics require a Structure Research sustainability narratives in mature beach destinations. Contrasting the
collective imaginary with reality,” Sustain., vol. 12, no. 12, Jun. 2020,
approach to use the application, where the government fully doi: 10.3390/su12125083
supports the developer of Smart tourism towards a better Well, [8] U. Stankov and U. Gretzel, “Digital well-being in the tourism domain:
this research can help people working in the world of tourism mapping new roles and responsibilities,” Inf. Technol. Tour., Mar.
to encourage the interest of tourists to visit tourist objects in 2021, doi: 10.1007/s40558-021-00197-3
Indonesia and can increase foreign exchange and income of [9] I. Tussyadiah, “A review of research into automation in tourism:
Launching the Annals of Tourism Research Curated Collection on
people who trade in tourist areas. Artificial Intelligence and Robotics in Tourism,” Ann. Tour. Res., vol.
81, p. 102883, Mar. 2020, doi: 10.1016/j.annals.2020.102883.
TABLE II. LIST FACTORS [10] M. S. Khan, M. Woo, K. Nam, and P. K. Chathoth, “Smart city and
smart tourism: A case of Dubai,” Sustain., vol. 9, no. 12, Dec. 2017,
No Factor Factor No Reference
doi: 10.3390/su9122279
categorization [11] Meiliana, D. Irmanti, M. R. Hidayat, N. V. Amalina, and D. Suryani,
1 Tourism Social 2,12,14,16,17,26 “Mobile Smart Travelling Application for Indonesia Tourism,” in
Social media Procedia Computer Science, Jan. 2017, vol. 116, pp. 556–563, doi:
10.1016/j.procs.2017.10.05
[12] Y. Jarrar, A. O. Awobamise, and P. S. Sellos, “Technological readiness
index (TRI) and the intention to use smartphone apps for tourism: A
2 VR reality Technology 1,10,38,16,8,19 focus on indubai mobile tourism app,” Int. J. Data Netw. Sci., vol. 4,
no. 3, pp. 297–304, Jul. 2020, doi: 10.5267/j.ijdns.2020.6.003.
,20, 25
[13] T. Um and N. Chung, “Does smart tourism technology matter? Lessons
3 AR Technology 1,7, 17 from three smart tourism cities in South Korea,” Asia Pacific J. Tour.
(Argumented Res., 2019, doi: 10.1080/10941665.2019.1595691.
Realty) [14] M. K. Pradhan, J. Oh, and H. Lee, “Understanding travelers’ behavior
for sustainable smart tourism: A technology readiness perspective,”
4 IoT Technology 1,7,23,19 Sustain., vol. 10, no. 11, Nov. 2018, doi: 10.3390/su10114259
[15] L. Ardito, R. Cerchione, P. Del Vecchio, and E. Raguseo, “Big data in
smart tourism: challenges, issues and opportunities,” Current Issues in
5 Big data Technology 17,7, 26 Tourism, vol. 22, no. 15. Routledge, pp. 1805–1809, Sep. 14, 2019,
doi: 10.1080/13683500.2019.1612860
[16] J. L. C. Ortega and C. D. Malcolm, “Touristic stakeholders’
6 Blockchain Technology 2 perceptions about the smart tourism destination concept in Puerto
Vallarta, Jalisco, Mexico,” Sustain., vol. 12, no. 5, Mar. 2020, doi:
7 Attitude Cultural 8,34,3,12,23, 27, 10.3390/su12051741.
28 [17] García-Milon, E. Juaneda-Ayensa, C. Olarte-Pascual, and J. Pelegrín-
Borondo, “Towards the smart tourism destination: Key factors in
8 Trust Social 1,2,21,28,19,21, information source use on the tourist shopping journey,” Tour. Manag.
27, 28 Perspect., vol. 36, Oct. 2020, doi: 10.1016/j.tmp.2020.100730.
[18] U. Stankov and U. Gretzel, “Digital well-being in the tourism domain:
9 E-Bus Technology 1,7,6 mapping new roles and responsibilities,” Inf. Technol. Tour., Mar.
schedule 2021, doi: 10.1007/s40558-021-00197-3.
[19] T. Zhang, C. Cheung, and R. Law, “Functionality Evaluation for
10 Wi-Fi internet Technology 1,7,6,35,34,17,25 Destination Marketing Websites in Smart Tourism Cities,” J. China
Tour. Res., vol. 14, no. 3, pp. 263–278, Jul. 2018, doi:
10.1080/19388160.2018.1488641
11 Charger Technology 1,7,6,24
[20] B. H. Ye, H. Ye, and R. Law, “Systematic review of smart tourism
Station research,” Sustainability (Switzerland), vol. 12, no. 8. MDPI AG, Apr.
12 GPS Technology 1,7,6,25 01, 2020, doi: 10.3390/SU12083401
[21] . García-Milon, E. Juaneda-Ayensa, C. Olarte-Pascual, and J. Pelegrín-
Borondo, “Towards the smart tourism destination: Key factors in
information source use on the tourist shopping journey,” Tour. Manag.
REFERENCES Perspect., vol. 36, Oct. 2020, doi: 10.1016/j.tmp.2020.100730
[22] U. Gretzel, M. Sigala, Z. Xiang, and C. Koo, “Smart tourism:
foundations and developments,” Electron. Mark., vol. 25, no. 3, pp.
[1] U. Gretzel, M. Sigala, Z. Xiang, and C. Koo, “Smart tourism: 179–188, Sep. 2015, doi: 10.1007/s12525-015-0196-8.
foundations and developments,” Electron. Mark., vol. 25, no. 3, pp. [23] C. Lim, N. Mostafa, and J. Park, “Digital omotenashi: Toward a smart
179–188, Sep. 2015, doi: 10.1007/s12525-015-0196-8 tourism design systems,” Sustain., vol. 9, no. 12, Nov. 2017, doi:
[2] Sukawati Art Market - inbali.org.” 10.3390/su9122175.
https://www.inbali.org/place/sukawati-art-market/ (accessed Apr. 18, [24] I. Tussyadiah, “A review of research into automation in tourism:
2021). Launching the Annals of Tourism Research Curated Collection on
[3] J. M. Gomis-López and F. González-Reverté, “Smart tourism Artificial Intelligence and Robotics in Tourism,” Ann. Tour. Res., vol.
sustainability narratives in mature beach destinations. Contrasting the 81, p. 102883, Mar. 2020, doi: 10.1016/j.annals.2020.102883.
collective imaginary with reality,” Sustain., vol. 12, no. 12, Jun. 2020, [25] T. Zhang, C. Cheung, and R. Law, “Functionality Evaluation for
doi: 10.3390/su12125083 Destination Marketing Websites in Smart Tourism Cities,” J. China

332 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Tour. Res., vol. 14, no. 3, pp. 263–278, Jul. 2018, doi:
10.1080/19388160.2018.1488641.
[26] A. Jasrotia and A. Gangotia, “Smart cities to smart tourism
destinations: A review paper,” J. Tour. Intell. Smartness, vol. 1, no. 1,
pp. 47–56, Sep. 2018, Accessed: Mar. 07, 2021. [Online]. Available:
https://dergipark.org.tr/en/pub/jtis/446754.
[27] E. D. Madyatmadja, H. Nindito, D. Pristinella, “Citizen Attitude:
Potential Impact of Social Media Based Government”, ICEEL 2019:
Proceedings of the 2019 3rd International Conference on Education
and E-Learning, pp. 128–134, 2019
https://doi.org/10.1145/3371647.3371653
[28] Madyatmadja, E.D., Gaol, F.L., Abdurachman, E., Pudjianto, B.W.
“Social media based government continuance from an expectation
confirmation on citizen experience”. International Journal of
Mechanical Engineering and Technology, 9(7), pp. 869–876, 2018.

333 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Indonesia China Trade Relations, Social Media and


Sentiment Analysis: Insight from Text Mining
Technique
Eka Miranda Rangga Aditya Elias Tia Mariatul Kibtiah
Information Systems Department, Department of International Relations, Department of International Relations,
School of Information Systems Binus University Binus University
Bina Nusantara University [email protected] [email protected]
Jakarta, Indonesia 11480
[email protected]

Aditya Permana
Department of International Relations,
Binus University
[email protected]

Abstract—Sentiment Analysis (SA) employed for detecting, Therefore, some Indonesian people have concerned with
extracting, and classifying people opinions about an issue. the trade relationship between Indonesia and China. A
Social media is a channel to show people opinions and national survey sponsored by the ISEAS-Yusof Ishak
thoughts. This study aimed to detect and classify Indonesia Institute in 2017 shows 62.4% of the Indonesian people
public opinion from Twitter written in Indonesia language for believe the trade relationship between Indonesia and China
trade relations between Indonesia and China topic with text will not give benefit for Indonesia. On the other hand, the
mining techniques. The result was the model that detected and Indonesian government believed China is a major and
classified sentiment of public opinion into negative, neutral or important partner. Another fact shows Indonesia people
positive sentiment. The sentiment detected by lexicon-based
believe a massive number of china products in the
and rule-based sentiment analysis. VADER was chosen as a
tool for sentiment analysis lexicon-based. The text classification
Indonesian market are a threat to Indonesia local market [5].
process was a training stage for the model. The experiment Social media is a channel to show people opinions and
revealed SVM classifier performed higher accuracy value than thoughts. The most frequently used social media channel is
Naïve Bayes, 67.28% and 64.68% respectively. Twitter [6]. Opinion analysis from social media becomes an
alternate tool for mining public opinion of a specific topic.
Keywords—social media, twitter, sentiment analysis, SVM,
naïve bayes Sentiment Analysis (SA) employs for detecting,
extracting, and classifying people opinions about an issue.
Sentiment Analysis (SA) is a type of Natural Language
I. INTRODUCTION Processing (NLP) to find out the public's opinion about any
Globalization is a phenomenon that opens our eyes and topic like laws, policies, products or social issue. Opinion
shows a country actually part of a global society where they mining performs to inspect people opinions for an issue by
depend on each other [1]. Economic globalization reveals investigating people opinion tendency or sentiment for an
opportunities for direct investment and technology transfer issue whether shows positive, neutral or negative tendency.
from developed countries to developing countries which
Plethora studies for sentiment analysis have been
bring on progressive economic development [2]. Therefore,
performed. Research by [6] demonstrated the Naïve Bayes
the relationship between countries is an essential need for a
algorithm for sentiment analysis. The accuracy value reached
country as well as for Indonesia. Indonesia has already
80% with 1700 datasets. This study suggested further work
developed a relationship with several countries in various
by exploring another algorithm, Support Vector Machine
aspects. The one crucial aspect namely trade relationship.
(SVM) using a huge number of datasets. Another research by
One of Indonesia's main trading partners is China. The
[7] revealed the text mining technique the Naïve Bayes
partnership between Indonesia and China was getting deeper
algorithm for sentiment analysis for Indonesian tweets using
since the declaration of the BRI Silk Road (Belt and Road
lexicon and naïve bayes approach. The result performed an
Initiative). This program was launched to improve global
accuracy value of 84%.
connectivity, either for developed or developing countries.
BRI focuses on the establishment of networks that enable This study aimed to detect and classify Indonesia public
efficient and productive free trade as well as enhanced opinion from Twitter written in Indonesia language for trade
integration in international markets both for the physical and relations between Indonesia and China topic with text mining
digital markets [3]. Bilateral relation between Indonesia and techniques. Research question established to achieve the
China has rapidly progressed in line with the enhanced research objective, namely: 1. How to implement text mining
cooperation as a Comprehensive Strategic Partnership in techniques, SVM and Naive Bayes to detect and classify
2013 [3]. China is the second-biggest investor country in Indonesia public opinion from Twitter written in Indonesia
Indonesia. China investment in Indonesia reaches US$ 4.7 language for trade relations between Indonesia and China, 2.
billion in 2019 [4]. Exports and imports value between How to measure the classification result.
Indonesia and China also increase every year. However, the
overall trade value experienced a deficit condition [5].

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

334 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

II. LITERATURE STUDY results show Naïve-Bayes performance lead rather


than the other two techniques. The advantage of the
Naive Bayes algorithm namely simple and low
A. Sentiment analysis computational complexity but need to take into
Sentiment analysis relates to Natural Language consideration the independent nature of the Naive
Processing and Text Mining fields which have the aim to Bayes feature which influence the accuracy value.
detect, extract, analyze and classify people opinion for a Naive Bayes formula shown in equation 1 [13].
specific issue [8]. Moreover, opinion mining classifies the
opinion tendency whether tend to positive, neutral or
negative tendency [9].
One most frequently uses technique for the sentiment
(tendency) detection namely lexicon-based. Lexicon-based is
one approach that automatically categorizes sentiment for a
people opinion. This technique includes estimate preference
of a document based on semantic approach (Khoo &
Johnkhan, 2017).
Several packages were provided in python for sentiment
analysis namely TextBlob and VADER.
1. TextBlob 2. Support Vector Machine (SVM)
TextBlob is a Python library for processing textual Support vector machine (SVM) known as a
data. It offers an API for jumping into natural classification algorithm often used for sentiment
language processing (NLP) tasks for example part- analysis. The purpose of the Support vector machine
of-speech tagging, noun phrase extraction, (SVM) method namely finding the most optimum
sentiment analysis, classification and translation hyperplane (separator function). The optimum
[10]. Textblob sentiment analyzer produces two hyperplane is a hyperplane that stands up obviously
properties for an input sentence: 1. Polarity is a float in the middle of two class labels and it has an equal
number [-1,1], -1 specifies negative sentiment and distance to both classes. The basic idea of the SVM
+1 specifies positive sentiments 2. Subjectivity is a algorithm for identifying the hyperplane that
float number [0,1]. Subjective sentences usually provides the best separation between training data
denote to opinion, emotion, or judgment. Textblob instances [14]. Figure 1 show the optimum
will take into words and expressions to determine hyperplane to separate two class labels. Figure 1
extremity and obtain a final score [10]. shows how SVM looks for the equivalent optimum
hyperplane by maximizing the margin or distance
2. VADER (Valence Aware Dictionary and sEntiment between two different sets of classes [14].
Reasoner)
VADER is a lexicon and rule-based sentiment
analysis tool to determine sentiments on social
media [11]. It employs lexical features include a
positive, negative or neutral label based on the
semantic focus for calculating the sentiment
preference [11]. This study used VADER as a
sentiment analysis tool.

B. Text Classification
Text Classification is an important part of NLP with
many applications such as sentiment analysis, information
retrieval, document ranking, document indexing, and
document classification [12]. Several studies on text
classification have involved traditional machine learning
algorithms such as Support Vector Machine, Naïve Bayes,
K-Nearest Neighbors, Logistic Regression and produced
high accuracy value [12]. The text classification method used
in this study namely Naïve Bayes Classifier and Support
Vector Machine.
1. Naïve Bayes classification
The Naïve Bayes method commonly known as a
simple probabilistic classification technique wherein
the probability value of a hypothesis calculated by Fig. 1. Optimum hyperplane
adding the frequency of an event and combination
values of a condition from a dataset. A study by [13] Research by Severyn et al. (2016) explored SVM for opinion
revealed the performance of Naive-Bayes, decision mining on YouTube. The research aims to detect the type
tree, and nearest neighbour to classify opinions. The

335 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

and polarity of opinion. The results reveal SVM give a and Google Translate services employed for translation
potential and suitable method to get over opinion process.
classification. [15].

C. Classification Measurement
B. Text Pre-Processing
Evaluating machine learning algorithm is an essential part.
Classification accuracy mostly employed for measuring the Text preprocessing known as a preprocess text for
accuracy of the classification model. It is the ratio of number predicting and analyzing purpose with machine learning
of correct predictions to the total number of input samples. algorithm. Types of text preprocessing techniques
The Confusion matrix commonly used for calculating the performed shown in Figure 3.
correctness and accuracy of the model [16] Confusion matrix
could be seen in Figure 2. The formula for classification
measurement based on confusion matrix shown in equation
2-5.

Fig. 3. Text Pre-processing step

Convert text into lower case to convert text into lower case.
Fig. 2. Confusion matrix
Remove punctuation by mean removes this set of symbols
[!”#$%&’()*+,-./:;<=>?@[\]^_`{|}~]:.Tokenization split the
text into smaller pieces called tokens. A token involve
words, numbers, and punctuation marks. Part-of-speech
tagging assigns parts of speech to each word (such as
nouns, verbs, adjectives, and others). This study used
NLTK for preprocessing tools. The lemmatization employs
to decrease inflectional forms to a common form and not
miss its meaning. Lemmatization has a pre-defined
dictionary that stores the context of words and checks the
word in the dictionary while diminishing. Stopwords
usually deleted from the text due to less or no meaning.

C. Sentiment Analysis
Sentiment analysis performed built upon lexicon-based
method. Lexicon-based method works by creating a
dictionary of opinion words (lexicon) first subsequently used
the dictionary to detect opinion of the sentence [17]. There
three sentiments will be detected from opinions namely
III. DATA AND METHODS namely positive, neutral and negative. Sentiment detection
step shown in Figure 4. VDER package available in python
A. Dataset will be employed for detection.
The data crawling method was carried out by the Twint
library. The Twint library did not involve the Twitter API to
pull data. The keywords used to search the data namely
"Indonesia-China trade" and "Indonesia-China trade". Since
and Until involved to determine the time period for tweet
withdrawn. Filter_retweets involved to avoid tweet
redundancy. The tweet was written in the Indonesian
language. Lang statement employed to determine the
language used and to restrict onto ID (code for Indonesia
language). The remove_duplicates() function was used to
remove duplicate tweets by filtering the Twitter ID. The data
collection achieve 1,033 data. The dataset is tweet in
Indonesia language but on the other hand phyton library
mostly detect English language, therefore the translation text
involved to translate Indonesi into English. GoSlate library

336 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Fig. 7. Class label detecting using VADER

The class label detection involved 894 tweets, the result


showed 405 tweets detected as positive class, 150 tweets
detected as neutral class whereas 339 tweets detected as
negative class.

C. Classification Measurement Result


Within machine learning, there are two basic approaches:
supervised learning and unsupervised learning. The main
Fig. 4. Sentiment detection step difference is one uses labeled data to help predict outcomes,
while the other does not. Naïve Bayes and SVM are type of
supervised learning. The hold out method was commonly
used to determine training data and testing (70% and 30%
IV. RESULT AND DISCUSSION respectively). TF-IDF (Term Frequency — Inverse
Document Frequency) subsequently performed to quantify a
word in documents. TF-IDF calculated by the
A. Text Pre-processing Result TfidfVectorizer() method from Scikit-Learn afterward
The Regex library was involved in removing mentions, max_features statement employed to determine the
HTML entities, website URLs and symbols, hashtags, maximum number of top terms desired. TF-IDF calculation
numbers and special characters. Removing mention was script performed in Figure 8.
performed by deleting words contain @ in front of the text
subsequently replace with an empty string. Removing hastag
was completed by detecting words contain # in front of the
text afterward change with space. Removing URL website
was achieved by identifying URL and website symbol then
replace with space. Removing URL website as well as
remove other symbols such as a hastag. Removing the
number was executed by replacing the string containing the
number with an empty string. Figure 5 shows tweet
preprocessing result and Figure 6 shows translation result.

Fig. 8. TF-IDF calculation script

All data that was passed the data pre-processing process then
continued to the text classification process which a training
Fig. 5. Tweet preprocessing result stage. The classification stage performed through two
algorithms, namely Support Machine Vector (SVM) and
Naïve Bayes. Lastly classification measurement was
calculated based on confusion matrix for accuracy, recall,
precision and F-measure. Confusion matrix for SVM and
Naïve Bayes shown in Figure 9 and Figure 10 respectively.

Fig. 6. Text preprocessing translation result

B. Sentiment Analysis Result


VADER works by calculating the score of each sentence
or tweet. The result scores either positive, neutral, or
negative sentiment. Each score was subsequently combined
to produced a compound value to determine the class label.
The positive class has a compound value > 0, the neutral
label class has a compound value = 0, while the negative
label class has a compound value < 0. Figure 7 shows class
label detecting using VADER.

337 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

V. CONCLUSION AND FUTURE WORK

In this research, the authors aimed to detect and classify


Indonesia public opinion from Twitter written in Indonesia
language for trade relations between Indonesia and China
topic. The result was the model that detected and classified
sentiment of public opinion into negative, neutral or positive
sentiment. The sentiment detected by lexicon-based and
rule-based sentiment analysis. VADER was chosen as a tool
for sentiment analysis lexicon-based. The sentiment analysis
was performed by calculated the polarity score for the data
(tweet). Positive sentiment marked by polarity value > 0,
negative sentiment marked by polarity value < 0 whereas
neutral sentiment marked by polarity value = 0. The text
classification process was a training stage for the model.
The experiment revealed SVM classifier performed higher
accuracy value than Naïve Bayes. With the help of text
analysis based on machine learning, people opinion from
Fig. 9. Confucion matrix: SVM social media has an opportunity to be analyzed based on the
content. Furthermore, for future work this model could be
SVM, VADER and NLTK with Lemmatization have
employed to detect people opinions on another topic.
successfully performed sentiment prediction. The result
Additionally coupling other preprocessing techniques,
showed 62 data (tweets) produce same prediction (negative
sentiment) for the actual label and predicted label, 4 data sentiment analysis and classification techniques could be
(tweets) produce same prediction (neutral sentiment) for the developed to achieve higher accuracy results.
actual label and predicted label and 115 data (tweets)
produce same prediction (positive sentiment) for the actual
label and predicted label.
ACKNOWLEDGMENT
Binus International Research Grant entitled “Indonesian
Perception on China’s and Taiwan’s Digital Public
Diplomacy”, Contract Number: No: 017/VR.RTT/III/2021
and date: 22 Maret 2021.

REFERENCES

[1] P. Samimi , H. S Jenatabadi, “Globalization and Economic Growth:


Empirical Evidence on the Role of Complementarities”, Plos One, vol
9, issue 4, pp. 1-7, 2014.
[2] S. Marginean, “Economics Globalization: From Microeconomi
Foundation to National Determinant”, In Prosedia, 22nd International
Economic Conference – IECS 2015 Economic Prospects in the
Context of Growing Global and Regional Interdependencies”, Sibiu:
Elsevier, pp. 731-735, 2015.
[3] L. Zou, Series on China's Belt and Road Initiative, World Scientific
Publishing, vol 1, 2018, pp. 10-11.
[4] Indonesia Investment Coordinating Board, “Looking ahead, China-
Japan investment is in tight competition”, Indonesia Investment
Fig. 10. Confucion matrix: Naïve Bayes Coordinating Board, 2020, [Online] Available”
https://www.bkpm.go.id/id/publikasi/siaran-
pers/readmore/2396401/51101, [Accessed 15 July 2021].
Naïve Bayes, VADER and NLTK with Lemmatization have [5] F. D. Radityo, G. Rara, I. Amelia, R. Efraim, “China Geopolitics In
successfully performed sentiment prediction. The result Southeast Asia: Trade Routes”, Jurnal Asia Pacific Studies, vol 3,
showed 61 data (tweets) produce same prediction (negative number 1, pp. 87-94, 2019.
sentiment) for the actual label and predicted label, 0 data [6] A Rachmadany, Y M Pranoto, Gunawan, M T Multazam, A B D
Nandiyanto, A G Abdullah and I Widiaty, “Classification of
(tweets) produce same prediction (neutral sentiment) for the Indonesian quote on Twitter using Naïve Bayes”, In Proceedings,
actual label and predicted label and 113 data (tweets) The 2nd Annual Applied Science and Engineering Conference,
produce same prediction (positive sentiment) for the actual Bandung: IOPScience, pp 1-5, 2017.
label and predicted label. This study revealed SVM produced [7] M. Ahmad, M. F. Octaviansyah, A. Kardiana, K. F. Prasetyo,
a higher accuracy value than Naïve Bayes with VADER “Sentiment Analysis System of Indonesian Tweets using Lexicon and
packages for sentiment analysis and preprocessing stages Naïve Bayes Approach”, In Proceedings, Fourth International
coupling Part-of-speech tagging and lemmatization. Conference on Informatics and Computing, Semarang: IEEE explore,
pp 1-5, 2019.
[8] S. Saad, B. Saberi, “Sentiment Analysis or Opinion Mining: A
Review”, International Journal on Advanced Science Engineering and
Information Technology, vol 7, no 5, pp. 1660-1666, 2017.

338 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

[15] A. Tripathy, A. Agrawal, S. K. Rath, Classification Of Sentiment


[9] X. Fang, J. Zhan, “Sentiment Analysis Using Product Review Data”, Reviews Using N-Gram Machine Learning Approach, Expert System
Journal of Big Data, vo 2, issue 5, pp. 1-14, 2015. Application, vol 57, pp. 117-126, 2016.
[10] S. Ahyja, G. Dubey, G, “Clustering and Sentiment Analysis on [16] J. D. Novakovi´c, A. Veljovi´c, S. S. Iliˇc, Z. Papi´c, M. Tomovi´c,
Twitter Data”, In Proceedings, 2nd International Conference on “Evaluation of Classification Models in Machine Learning”,Theory
Telecommunication and Networks, pp. 1-5, Pradesh: IEEE, 2017. and Applications of Mathematics & Computer Science, vol 7, no 1,
pp. 39 – 46, 2017.
[11] C. Hutto, G. Eric, VADER: A Parsimonious Rule-Based Model for
Sentiment Analysis of Social Media Text, Association for the [17] C. S. G. Khoo, S. B. Johnkhan, “Lexicon-based sentiment analysis:
Advancement of Artificial Intelligence, 2014, pp. 1-10. Comparative evaluation of six sentiment lexicons”, Journal of
Information Science, April, pp 1-21, 2017
[12] A. Fesseha, S. Xiong 1, E. D. Emiru, M. Diallo, A. Dahou, “Text
Classification Based on Convolutional Neural Networks and Word
Embedding for Low-Resource Languages: Tigrinya”, Information,
vol 12, no 52, pp. 1-17, 2021.
[13] H. Parveen, S. Pandey, “Sentiment analysis on Twitter Data-set using
Naive Bayes algorithm, In Proceedings, the 2016 2nd International
Conference on Applied and Theoretical Computing and
Communication Technology, pp. 416–419, Kamataka: IEEE, 2016.
[14] G. Ignatow, R. Mihalcea, An Introduction to Text Mining Research
Design Data Collection and Analysis. Hundred Oaks: SAGE, 2017,
pp. 20-25.

339 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Sinophobia in Indonesia and Its Impact on


Indonesia-China Economic Cooperation with the
SVM (Support Vector Machine) Approach

Tia Mariatul Kibtiah Rangga Aditya Eka Miranda


International Relations Department, International Relations Department, Information Systems Department,
Bina Nusantara University Bina Nusantara University School of Information Systems
Jakarta, Indonesia 11480 Jakarta, Indonesia 11480 Bina Nusantara University
[email protected] [email protected] Jakarta, Indonesia 11480
ekamiranda @binus.ac.id

Aditya Permana
International Relations Department,
Bina Nusantara University
Jakarta, Indonesia 11480
[email protected]

Abstract-- This article describes Sinophobia in Indonesia and established two financial institutions, namely the Asian
its impact on China and Indonesia’s economic cooperation. Infrastructure and Investment Bank (AIIB) and the New Silk
Sinophobia is an extreme dislike or fear of strangers, customs, Road Fund (NSRF) (Iqbal, Rahman, & Sami, 2019). [1]
religion, etc. Sinophobia often overlaps with forms of prejudice
including racism and homophobia. This condition in Indonesia
occurred as a result of colonialism, the privileged ethnic Chinese China’s vision coincides with Indonesia’s vision of
during the Suharto era and continued to increase sharply during becoming the World Maritime Axis during the reign of Joko
the Jokowi’s administration. In addition to using qualitative Widodo. To realize this vision, the government needs
methods, to determine the analysis of anti-Chinese sentiment in assistance funds to build and renovate infrastructures such
Indonesian society and its influence on Indonesia-China
as 300 ports, supporting industries as well as enhancing
economic cooperation, this study uses SVM (Support Vector
Machine). The research step starts from crawling data from Indo-Pacific relations and cooperation. Indonesia hopes that
Twitter, cleaning data, translating data into English, and testing through China the vision of the World Maritime Axis can be
the classification results by calculating accuracy. SVM realized. The World Maritime Axis Policy is in Indonesia's
classification text with a training: testing data ratio of 80:20, interest in taking advantage of Indonesia's strategic
resulting in an accuracy of 76.40%, precision of 74.44%, and F- geographical position. Two-thirds of the world's shipping
measure of 74.48%. through Check Point that enters Indonesian waters. In
addition, Indonesia is also an archipelagic country that
Keywords: Indonesia, China, Sinophobia, SVM, Economic depends on sea transportation. (Kosandi, 2016). [2]
Cooperation
However, this collaboration encountered problems
related to the negative perception of the Indonesian public
I. INTRODUCTION towards China during the Jokowi administration. This
China continues to make progress in its economy. increase can be seen from the public’s response through
Since the Deng Xiaoping government, China began to open non-mainstream media to cases related to China such as the
isolation and receive various economic investments in COVID-19, the arrival of Chinese migrant workers, and also
several special areas. Since then China has continued to cases of violations of the waters in the Natuna Sea in 2017.
grow and become the country with the strongest economy In addition, there has been a debate in the Indonesian
after the United States. President Xi Jinping confirmed his Parliament. due to the provision of debt and investment
vision of the Belt and Road Initiative (BRI) by providing from China which is undergoing its Belt and Road Initiative
various loans and investments for infrastructure vision which is feared to be a “debt trap” like what
development to several regional areas to integrate Supply happened at Sri Lanka's Hambantota Port (Herlijanto, 2017).
Chain. The supply chain is very important for China as the [3]. The migration of Chinese Indonesians to Indonesia is
largest industrial country to distribute its products. The Belt often colored by horizontal conflicts with the surrounding
and Road Initiative is a foreign policy that strengthens population, giving rise to anti-Chinese sentiment or
China’s position in geopolitics. BRI policy is a form of Sinophobia (Charlotte, 2016). [4]. Ethnic Chinese in
China’s economic interests that can be seen since the Han Indonesia are often victims of discrimination from the
Dynasty. At that time the land route called the Silk Road policies of the ruling regime. 978-1-6654-4002-8/21/$31.00
connecting China to Europe brought prosperity to the
empire because of trade. To implement BRI, China

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

340 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

At the end of President Soekarno’s reign, there was economic relationship (Herlijanto, 2017). [10] Merlyna Lim
a G30S/PKI rebellion that brought back ethnic Chinese into analyzes the relationship between social media and electoral
a marginalized ethnic group. The relationship between politics in Indonesia more. Just like the method used by the
several ethnic Chinese people with the Indonesian author, Lim uses quantitative data in the form of traffic from
Communist Party and the Chinese Communist Party made the websites under study. The difference is, in this study, the
them the target of crushing PKI members at that time author directly uses SVM to measure the saturation of
(Kosandi, 2016). [5]. Christian Chua explained that the negative sentiments of the Indonesian people towards
Chinese ethnicity during the New Order era was used as a Chinese ethnicity. (Lim, 2017).[11] The negative sentiment
transfer of issues. The Chinese ethnicity, which is less than continued until the time of President Jokowi. President
3% of the population, is rumored to control more than 70% Jokowi’s political opponents use this issue for political
of the economy, resulting in social jealousy in the purposes. They managed to win the election for the
community. This was done by Suharto to divert people from Governor of DKI Jakarta who supported Anis Baswedan in
the real economic problems by blaming certain groups. the face of Ahok, who was of Chinese ethnicity. (Setijadi 8
(Chua, 2004). [6]. The civil rights of ethnic Chinese June 2017) [12]. The hatred of ethnic Chinese has increased
continued to be trimmed during the New Order with the with the COVID-19 pandemic, just as it has in the United
assimilation program and the prohibition of Chinese New States. (Zhanga, 2020). [13]
Year celebrations.
III. DATA AND METHODS
This study uses a qualitative method, namely an
During Suharto’s time, he severed diplomatic
approach that seeks to build a social reality based on in-
relations with China. China-Indonesia relations began to
improve during the presidency of Abdurrahman Wahid. depth and detailed case knowledge. (Neuman, 2014). [14]
President Abdurrahman Wahid made one of the policies to This research process is usually carried out by asking
questions to informants, collecting data obtained from
allow ethnic Chinese to celebrate Chinese New Year and
participants and analyzing data from specific themes to
visit China again. Until the time of President Susilo
general themes. (Creswell & Poth, 2016). [15]. Sources of
Bambang Yudhoyono, who had a high diplomacy policy,
brought Indonesia’s diplomatic relations much closer to data obtained from journals, e-books, websites related to the
China and signed a strategic partnership in 2005 (Anwar, topic by means of literature studies. In addition, this study
also uses SVM (Support Vector Machin) as a measure of the
2019). [7]. Since 2010, China has been ASEAN's largest
accuracy of Sinophobia in Indonesia, its impact on
trading partner. Foreign Direct Investment (FDI) from China
Indonesia-China economic cooperation. The process is
to Indonesia increased in the 2000s (Booth, 2011). [8] Based
through crawling data from Twitter, cleaning data,
on these problems, this study aims to analyze whether the
presence of xenophobic sentiments in Indonesian society translating data into English, and testing the classification
will affect Indonesia's economic cooperation with China results by calculating accuracy.
during the Joko Widodo administration.
IV. RESULT AND DISCUSSION
II. LITERATURE STUDY
Racism is common in all parts of the world. According to
The study literature in this study further explores
Grosfoguel, racism is a continuation of the hierarchy of a
how the Indonesian people perceive the Chinese ethnicity.
For example, the article written by Evi Fitriani about how sense of superiority over politics, culture, and economy that
various groups in Indonesia responded to the “Arising of gives birth to injustice (Grosfoguel, 2016). [16]. According
to Baş, racism can be perpetrated by individuals, cultures,
China”. There are three things that are seen in the analysis
and institutions. An example of Baş is the ban on Chinese
of the rise of China, namely first, the subject has a different
migrants from studying in San Francisco public schools.
perspective based on its position. Second, the term China's
rise has different meanings, namely economically and Then, the phenomenon of Sinophobia is present as part of
militarily. Third, China’s influence on other countries, racism which refers to China, both the country and the
people. (Baş, 2020).[17]
China's participation in international organizations, and also
“Bold Diplomacy” in the South China Sea. Stakeholder
This happens as a result of social interaction in the
perception in Indonesia towards the rise of China is positive
community. When viewed from the theory put forward by
due to the US embargo. Perceptions changed when China
violated the Natuna Sea. (Fitriani, 2018). [9]. Another Theys, namely constructivism, racism, and Sinophobia, they
research on perception is discussed by Johannes Herlijanto. are things that are formed, not formed. What makes it up is
the agent. Agents interact with each other and construct the
Perceptions of China became bad at the beginning of the
“Structure” of the social situation that occurs, namely
New Order regarding the PKI rebellion. The Pew Research
racism. As a result of a “Structure” condition created by the
Institute obtained data from 2005 to 2014 about 55% - 73%
Agent, it will create an “identity” that characterizes a
of Indonesians positive about China. Whereas in 2007 only
29% rated the United States as positive and 62% negative community in this case is a country that communicates with
towards China in 2009. The issue of immigrants and other countries. Based on constructivism, there is an
interaction between the state and other countries due to the
investment from China had become viral due to hoaxes and
interests in which these countries and their interests come
rejection by several anti-China groups. However, the data in
from the identities that have been made by each country,
this study explains that there are still more people who rate
such as Indonesia and China (Theys, 2017).[18] In addition
positively and are optimistic about the Indonesia-China
to constructivism theory that sees anti-Chinese sentiment in

341 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Indonesian society as a result of agents, SVM also


participates in analyzing this problem. The trick is to take
data from Twitter social media because social media in
Indonesia is currently a place for free public opinion.
(Aditiawarman & Raflis, 2019). [19].

SVM (Support Vector Machine)

The analysis of these opinions is a source of data to find out


about public interest in a topic. Opinion Mining (OM) or
Sentiment Analysis (SA) is the process of detecting,
extracting, and classifying opinions about something.
Opinion Mining (OM) or Sentiment Analysis (SA) is a type
of Natural Language Processing (NLP) to find out public
opinion on various matters including their perception of
China-Indonesia economic cooperation, whether it is
negative, positive, or neutral. (Saad & Saberi, 2017). [20].
While in this study, opinion mining is analyzed using SVM
(Support Vector Machine), which is one of the classification
algorithms that is often used for sentiment analysis. The
purpose of the SVM method is to find the most optimum
hyperplane (separator function). (Suyanto, 2018). [21]. The
fundamental idea behind the SVM algorithm is to identify
the hyperplane that can provide the best separation between
training data instances. (Ignatow & Mihalcea, 2018). [22].
For the process, the data is taken from Twitter within a
period of one year as much as 1,033 data. [23] For the data
source, this research took data from Twitter in the form of
text in Indonesian. The data was translated into English for
further analysis. This is because the Indonesian language is
included as a low resource language or a language with low
resources so that the corpus (collection of written speech),
dictionaries, tools, and scripts that support processing and
analyzing text directly from Indonesian are still lacking.
Tools that help in stemming, post-tagging, and non-standard
word processing to become standard for words in
Indonesian are still very few available. In addition, corpus
for texts in Indonesian with data that is clean and ready for
pre-processing, classification, and text analysis is still very
rare. (Adi, Wulandari, Mardiana, & Muzakki, 2018).[24]

342 28 October 2021, Jakarta - Indonesia


No Date2021 1st International
Username Conference on Computer Science and ArtificialTweet
Intelligence (ICCSAI)
1 Rezim pemerintahan Om jokowi berhasil memajukan Indonesia etnis Cina dg
2020-10-25 strategi 1. Mengecilkan peran Pribumi Muslim dlm bidang ekonomi
14:52:23 0pierdwu perdagangan 2. Membuka seluas-luasnya investasi China dg TKA China. 3.
UTC BUMN dan lembaga2 Hukum &amp; Polri menjadi company Milik Oma
megawatiPDIP 🙄

2 @Beritasatu *BAJINGAN KAMPUNG BARU PEGANG JABATAN DI


NEGARA BEGITULAH! HEY INTINYA PERDAGANGAN BILATERAL
2020-10-12
ANTARA DUA NEGARA ITU SOAL DEFISIT ATAU SURPLUSNYA,
06:23:43 henkmizell
NYED! JIKA KAU PEMIMPIN ASLI PASTI MENYATAKAN:
UTC
PERDAGANGAN INDONESIA DEFISIT 2,1 MILYAR DOLLAR
DENGAN CHINA PERIODE JAN AUGUST 2020NYED

3 Pernyataan Dubes China Xiao Qian sangat arogan &amp; jelas melecehkan
2020-02-04 pemerintahan JKW &amp; selurah rakyat Indonesia. Itulah bukti watak
15:17:27 faizalassegaf kejahatan Rezim Komunis Cina. Tolak #coronaviruschina. China Peringatkan
UTC Kerugian Jika RI Setop Perdagangan karena Corona
https://t.co/VpW5KYk4Ui

4 donald trump sangat tolol bodoh , melebihi presiden indonesia yang bernama
2021-01-19 jokowi beraliran ideologi cina komunis pkc atheis ,.camkan itu semuanya
08:15:34 muhamma35215575 bangsa negara manapun ,tidak paham aturan peraturan ketentuan mekanisme
UTC tata negara ,ekonomi ,keuangan ,bisnis perdagangan ,politik .
https://t.co/50iWTfmGwH

5 2020-11-21
Di Indonesia cina bukan hanya menguasai perdagangan tp juga menguasai
15:04:12 enjoyjoki
pemerintahanðŸ˜
UTC

6 Lebih suka profile ibunya, karena biasanya presiden US suka bikin kebijakan
2020-11-09
yang terlalu pro Israel, jadi ya biasa ajalah. Daripada ntar kesel. Lagian
02:53:19 aisyah_gozali
Indonesia udah terlalu akrab dengan Cina, dan Biden bakal meneruskan
UTC
kebijakan menekan perdagangan bebas dengan Cina.

7 Grand plan 🠉 9 LippoGrup menguasai Indonesia. Era eyang Soeharto


2020-10-07 setelah Tragedi G30S PKI menekan Cina dr segi ekonomi dan budaya. Cina2
14:21:08 0pierdwu PKI banyak lari ke Shanghai kena PP 10 1959. Ternyata Muchtar Riady
UTC LippoGrup gagal menyusup mentri ekonomi perdagangan kabinet Soeharto
🙄ðŸ‘

8 2020-09-19
@geloraco Bukankah indonesia sudah mrpk bagian dari cina? Seluruh
23:16:20 sumoburloff
kebijakan,perdagangan, berbau cina semua.
UTC

9 Indonesia jgn tolerir lg neg Komunis Cina itu. Mulai skrg siapkn perdagangan
2020-09-15
dg neg2 selain Cina. Disaat RI putus hub Diplomatik nantinya dg Cina maka
17:09:22 manahara20
smua sektor perdagangn msh normal. Bergabunglah Asean dg AS hancurkan
UTC
Neg Cina Komunis itu, otaknya KOTOR https://t.co/MEzXVykTrP

10 Strategi global LippoGate JamesRiady agenCMI pengacak Negeri ini ..


2020-09-12 Pribumi fokus pertengkaran politik dan ideologi, agar bidang ekonomi
13:03:57 0pierdwu perdagangan keuangan Indonesia dikuasai 12 jt etnis Cina. Pribumi Bodok
UTC 😇😩😩 @SumaUI @BEMUndip_ @BEMUNS @BEMFEBUNAIR
@bemkm_ugm @UB_Official

Fig. 1. Sample of Tweet


Calculation Accuracy

The result of this study is that text classification SVM with a This study uses a confusion matrix-based accuracy
training: testing data ratio of 80:20 results in an accuracy of
76.40%, precision of 74.55%, recall of 76.40%, and F- calculation method, which is a method that is generally used
measure of 74.48%.

343 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

to calculate the level of accuracy in data mining. The


confusion matrix contains information about the V. CONCLUSION AND FUTURE WORK
classification that is correctly predicted by a classification This article describes the effect of Sinophobia in
Indonesia on China-Indonesia economic cooperation. This
system (Hamillton, 2017). [25] research uses a qualitative method with a constructivism
theory approach to look at the social phenomena of
The confusion matrix consists of 4 metrics (matrix), Indonesian society towards anti-Chinese issues that affect
namely: China-Indonesia economic cooperation. In addition to
constructivism, SVM (Support Vector Machine) also
a) True Positive (TP) analyzes this issue so that negative, positive or neutral
accuracy is produced. On the chart it is known, the negative
Leads to the number of positive predictions and the result is quite large although it remains below the positive.
result is correct This means that the anti-Chinese issue in Indonesia has quite
affected China-Indonesia economic cooperation.
(positive)
ACKNOWLEDGMENT
b) True Negative (TN)
I appreciate to BINUS Internatonal Reseach Grant entitled
Leads to the number of negative predictions and the Indonesian Perception on China’s and Taiwan’s Dgital
result is correct Public Diplomacy, contract number: 017/VR.RTT/III/2021,
date: March 22, 2021.
(negative)

c) False Positive (FP)


REFERENCES
Leads to the number of positive predictions and the
result is wrong [1] Iqbal, B. A., Rahman, N. M., & Sami, S. (2019). Impact of Belt and
Road Initiative on Asian Economies. Global Journal of Emerging
Market Economies, 260-277.
(not positive)
[2] Kosandi, M. (2016). CHINA’S MARITIME SILK ROAD AND
INDONESIA’S MARITIME NEXUS POICIES: TOWARDS
d) False Negative (FN) POLICY CONVERGENCE? International Conference on Social
Politics.
Leads to the number of negative predictions and the [3] Herlijanto, J. (2017). Public Perceptions of China in Indonesia: The
result is wrong Indonesia National Survey. YUSOF ISHAK INSTITUTE.
[4] Charlotte, S. (2016). ‘A Beautiful Bridge’: Chinese Indonesian
(not negative) Associations, Social Capital and Strategic Identification in a New Era
of China–Indonesia Relations. Journal of Contemporary China, 822-
835.
[5] Kosandi, M. (2016). CHINA’S MARITIME SILK ROAD AND
INDONESIA’S MARITIME NEXUS POICIES: TOWARDS
POLICY CONVERGENCE? International Conference on Social
Politics.
[6] Chua, C. (2004). Defining Indonesian Chineseness under the new
order. Journal of Contemporary Asia, 465-479.
[7] Anwar, D. F. (2019). Indonesia-China Relations: To Be Handled
With Care. Yusof Ishak Institute.
[8] Booth, A. (2011). China's economic relations with Indonesia: Threats
and opportunities. Journal of Current Southeast Asian Affairs, 141-
160.
[9] Fitriani, E. (2018). Indonesian perceptions of the rise of China: dare
you, dare you not. The Pacific Review, 391-405.
[10] Herlijanto, J. (2017). Public Perceptions of China in Indonesia: The
Indonesia National Survey. YUSOF ISHAK INSTITUTE.
[11] Lim, M. (2017). Freedom to hate: social media, algorithmic enclaves,
and the rise of tribal nationalism in Indonesia. Freedom to hate: social
media, algorithmic enclaves, and the rise of tribal nationalism in
Indonesia, Critical Asian Studies.
[12] Setijadi, Charlotte. 8 June 2017. Ahok’s Downfall and the Rise of
Islamist Populism in Indonesia. Perspective, Singapore: ISEAS –
YUSOF ISHAK INSTITUTE. Diambil dari
https://www.iseas.edu.sg/images/pdf/ISEAS_Perspective_2017_38.pd
f
[13] Zhanga, B. B. (2020). The Intersection of Racism and Xenophobia on
the Rise Amid COVID-19 Pandemic: A Qualitative Study
Investigating Experiences of Asian Chinese International Students in
America. Revista Argentina de Clínica Psicológica, 1145-1156.

344 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

[14] Neuman, W. L. (2014). Social Research Methods: Qualitative and and Information Technology. Diambil kembali dari
Quantitative Approaches. In Teaching Sociology (7th ed., Vol. 30, https://www.researchgate.net/publication/320762707_Sentiment_Ana
Issue 3). Pearson Education Limited. https://doi.org/10.2307/3211488 lysis_or _Opinion_Mining_A_Review
[15] Creswell, J. W., & Poth, C. N. (2016). Qualitative Inquiry and [21] Suyanto. (2018). Machine Learning Tingkat Dasar dan Lanjut.
Research Design: Choosing Among Five Approaches (4th ed.). SAGE Bandung: Informatika Bandung
Publications Ltd. [22] Ignatow, G., & Mihalcea, R. (2018). An Introduction to Text Mining
[16] Grosfoguel. 2016. "What is racism?" Journal of World-Systems Research Design Data Collection and Analysis. Hundred Oaks:
Research 9-15. SAGE.
[17] Baş, Doğukan. 2020. "Rising Sinophobia in Kyrgyzstan: The Role of [23] Results of data processing from social media Twitter in 2001
Political Corruption." MS thesis. Middle East Technical University. [24] Adi, S., Wulandari, M., Mardiana, A. K., & Muzakki, A. (2018).
[18] Theys, Sarina. 2017. "Constructivism." In International Relations Survei: Topik dan Tren Analisis Sentimen Pada Media. Seminar
Theory, by Mcglinchey Stephen, Rosie Walters and Christian Nasional Teknologi Informasi dan Multimedia 2018, 55-60.
Scheinpflug, 36-41. Bristol: E-International Relations. [25] Hamillton, H. (2017). Confusion Matrix. Diambil kembali dari
[19] Aditiawarman, M., & Raflis. (2019). Hoax dan Hate Speech di Dunia http://www2.cs.uregina.ca/~hamilton/courses/831/notes/confusion
Maya. Padang: Lembaga Kajian Aset Budaya Indonesia Tonggak _matrix/confusion_matrix.html, 24 Maret 2018.
Tuo.
[20] Saad, S., & Saberi, B. (2017). Sentiment Analysis or Opinion Mining:
A Review. International Journal on Advanced Science Engineering

345 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Towards Classification of Personality Prediction


Model: A Combination of BERT Word Embedding
and MLSMOTE
Henry Lucky Roslynlia Derwin Suhartono
Computer Science Department Computer Science Department Computer Science Department
School of Computer Science School of Computer Science School of Computer Science
Bina Nusantara University Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia Jakarta, Indonesia Jakarta, Indonesia
[email protected] [email protected] [email protected]

Abstract— The rise in internet usage improved digital Personality Inventory (NEO-PI), and Myers-Briggs Type
communication and an increase in user data, particularly on Indicator (MBTI) [2], [6]–[8]. The NEO-PI model is
social media. The information supplied from social media, recognized as the most comprehensive model with 240
including Twitter, can be used to retrieve user personality. In questions, but it is uncommon. Instead, Big Five Inventory
this paper, we experiment to predict user’s personality based on and MBTI are widely used in personality assessment [2], [9].
Big Five Personality Trait on Twitter, particularly Indonesian The MBTI model contains four dimensions with Thinking-
users. We focus on using XGBoost classifier as it gives promising Feeling (T-F), Judging-Perceiving (J-P), Extraversion-
result in the previous study. We experiment on using multiple Introversion (E-I), and Sensing-Intuitive (S-N) [9]. In
Bidirectional Encoder Representations from Transformer
contrast, the Big Five Inventory is grouped into five
(BERT) models for extracting contextual word embeddings
from tweets data to see the best model. We also address the
dimensions with 44 questions. The dimensions are Openness
imbalanced dataset problem with Multilabel Synthetic Minority (OPN), Conscientiousness (CON), Extraversion (EXT),
Oversampling Technique (MLSMOTE). Our research found Agreeableness (AGR), and Neuroticism (NEU) and hence
that the IndoBERT model, which is pre-trained with general also known as OCEAN dimensions [2], [9]. However, each
data including Indonesian Twitter tweets, has the best overall MBTI dimension correlated with four of the Big Five
performance on our dataset. We also found that using dimensions, excluding neuroticism [9].
MLSMOTE could increase the accuracy up to 19,91% and the Previously, doing personality trait assessment entailed a
F1 up to 19,38%, which is a huge increment and shows that
psychologist and thus required their energy and time.
MLSMOTE works well with our dataset.
Therefore, researchers proposed assessing individual
Keywords— BERT, Five-factor model, Indonesian personality personality automatically by extracting the user’s social media
prediction, MLSMOTE, Twitter dataset account, such as Facebook and Twitter. In automatic
personality assessment, machine learning and deep learning
I. INTRODUCTION assists promising results, especially in English. Some
The growth of internet usage urges to provide better digital researchers use Linguistic Inquiry Word Count (LIWC),
communication and increase the amount of user data, Medical Research Council (MRC), and Structured
especially on social media. In 2017 there were 2.46 billion Programming for Linguistic Cue Extraction (SPLICE).
users, or equivalent to 33.6% of the world’s population, using However, those methods may not cover dynamic data of
the internet [1]. There was a decent increase of internet usage sentences and are limited to a certain language [1], [9]. Other
of 10.12%, reaching 171.7 million users in Indonesia based on approaches come up with n-gram and TF-IDF as the most
a survey held by Asosiasi Penyelenggara Jasa Internet frequent feature extraction methods [9]–[11]. Recently, the
Indonesia in 2019 [2]. Moreover, Twitter’s country industry presence of pre-trained models such as Bidirectional Encoder
head claimed Indonesia as one of the most influential growth Representations from Transformer (BERT) [12] and
of daily Twitter usage and surpassed daily global usage [3]. Embeddings from Language Models (ELMo) [13] makes
Based on the data provided, it is obvious that social media, researchers change their approach by implementing
including Twitter, can accommodate digital data analysis. For transformer models to increase model performance, including
instance, it provides user’s activities such as microblog posts, in personality trait area. BERT model also provides
searches, preferences from who they followed or liked, also a multilingual languages and learns a deep bidirectional
circle of friends. representation of data. This pre-trained model is used for
personality assessment and gives promising performance even
User’s activities nowadays can be explored to retrieve user in the fuzziness of language structure [14]. For Indonesian,
personality and interest. The importance of personality there are also specific BERT models called IndoBERT [15],
assessment is to study an individual’s behavior, thinking, and [16].
feeling to be used in the hiring process, picking the right
career, recommendation system, etc. [4], [5]. There are This study explores the newest approach for personality
various personality trait assessment techniques such as data assessment using some BERT models, including Multilingual
gained from observation, interview, and via questionnaire. BERT (BERT-M) and two IndoBERT models for extracting
Data from questionnaires processed by various types of Indonesian social media data. Afterwards, we combined it
personality models, such as Big Five Inventory, NEO with a machine learning approach for classifying personality
class using XGBoost, which is commonly used in this area [9],

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

346 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

[17], [18]. The rest of this paper is organized as follows:


Section 2 discusses previous work for the automatic
personality trait assessment model. Section 3 states our
proposed solution in escalating model accuracy and F1 score
by using pre-trained BERT model as feature extractor and
XGBoost as the classifier. In section 4, we elaborate on our
experimental results from different scenarios. Finally, we
conclude our study in section 5.
II. RELATED WORKS
Fig. 1. Proposed Methodology
Automatic personality assessment relied on the user’s
digital footprint on their social media, including text, picture, III. METHODOLOGY
and user profile. The data afterward are processed on machine
learning and deep learning algorithms. In recent years, The workflow of the study is divided into four parts,
researchers in this area have been commonly limited to the mainly data preparation, feature extraction, oversampling, and
English language and use Big Five personality traits [10], [11], training of personality classifiers. The overall workflow of this
[14], [18]–[21]. At first, researchers implement machine study is provided in Fig. 1.
learning algorithms, as shown in [9-11], by utilizing TF-IDF A. Dataset
as a weighting scheme of extracted features. However, the
algorithms used for classifying or assessing personality traits The dataset used was taken from [2], which is Indonesian
are different. In [10], the authors also use the ULMFiT Twitter tweets, and user profile data consists of 508 users. Up
language model and linear support vector regression (Linear to a maximum of 100 tweets were collected from each user in
SVR) as a regressor. Another approach proposed to utilize the dataset gathering process. However, we only use the last
GloVe word embedding and balance classes using Synthetic 25 tweets from each user due to the limitation of the pre-
Minority Oversampling (SMOTE), then comparing XGBoost trained models used in this work, and also, the experts
and Support Vector Machine (SVM) classifier and resulting expressed that at least 20 tweets are needed to assess the user’s
XGBoost more prominent than SVM in terms of F1 score and personality [22]. The dataset has five classes that represent the
accuracy [9]. Lastly, authors [11] apply dimensionality big five personality traits, which for each class consists of two
reduction technique of Linear Discriminant Analysis to labels: ‘High’ and ‘Low’. Three psychology experts labeled
improve model performance by learning latent features. the dataset with voting system - for each trait of one user, the
label that gets the most vote by experts becomes the label in
A new era of automatic personality assessment has begun the dataset. The label distribution can be seen in Table I.
when researchers use deep learning algorithms with neural
network-based models. Simple deep learning model proposed B. Data Pre-processing
by [18] with the elaboration of the ULMFiT language model. Our approach to text pre-processing follows the pre-
Thereafter, convolutional neural networks have been explored processing methods from the original dataset [22]. To remove
for classification techniques [19], [21]. In [19], they extract noise while processing the data, several text cleansing
word-level features as a vector using the word2vec algorithm. methods were performed, which are:
Thus, they use a convolutional neural network (CNN) as a 1) Stopwords removal
classification network. In comparison, the authors of [21] use 2) Mentions and Hashtags replacements to [UNAME]
pre-trained word vector GloVe for word vector representation and [HASHTAG] accordingly.
and 2CLSTM to classify user personality traits. This 3) URL removal.
classification architecture consists of two parts, bi-directional 4) Retweets omitting.
long short-term memory (BiLSTM) and CNNLSG, which use
5) Emoji removal.
CNN to learn latent sentence group (LSG) features. Based on
the study of deep learning in personality assessment, it can Some of these text pre-processing has been included by
learn deeper about the features used and may result in better default in the Indonesian twitter dataset used in this study,
performance compared with machine learning approaches such as the mention and hashtag replacement. The tweets
[18], [21]. dataset will then be concatenated for each user with special
token [SEP] to make a user-level personality prediction
As the attention-based mechanism and transformer model model. We did not use Twitter metadata because the previous
comes up, it gains researcher attention. References [20] study mentioned that metadata didn’t result in a significant
implements the attention mechanism on the LSTM model difference to the result [2].
with the LDA model to convert user’s topics to be added as
attention to the LSTM model and word2vec’s skip-gram C. Feature Extraction
method to process word vectors. On the other hand, research This study focuses on using pre-trained BERT models as
by [15] uses attention-based neural network on sentence-level a feature extractor to extract contextual word embeddings
and utilizes the BERT model for sentence encoding, which from textual data due to promising results of the pre-trained
supports multilingual language data for 105 languages, models in various NLP tasks. We use several models to see
including English. The proposed solution could decrease the comparison of the results and capabilities between them.
model MSE by 30% for each trait. Based on these results, the The main reason is to investigate the suitability and
attention mechanism and transformer model become state-of- performance of these various models as a feature extractor for
the-art. personality modelling.
We used the base model of two IndoBERT models [15],
[16] and BERT-Multilingual (BERT-M) model [12] as our

347 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

TABLE I. LABEL DISTRIBUTION OF THE DATASET TABLE II. EXPERIMENTAL SCENARIOS WITH XGBOOST CLASSIFIER

High Low MLSMO FE Model


Feature Extractor (FE)
TE Output Type
OPN 272 (53,5%) 236 (45,5%) No Indo
Indo BERT Last 4
BERT No Yes Pooled
CON 131 (25,8%) 377 (74,2%) BERT -M Layers
2
EXT 363 (71,5%) 145 (28,5%) 1 ✓ ✓ ✓
AGR 278 (54,7%) 230 (45,3%) 2 ✓ ✓ ✓
NEU 221 (43,5%) 287 (56,5%) 3 ✓ ✓ ✓
4 ✓ ✓ ✓
feature extractor. The first IndoBERT [16] is pre-trained with
general dataset, including Wikipedia, online articles, subtitles, 5 ✓ ✓ ✓
Twitter tweets, etc., while the second [15] is pre-trained
6 ✓ ✓ ✓
mainly with Wikipedia and news datasets. BERT-M is pre-
trained with multiple languages sources, including 7 ✓ ✓ ✓
Indonesian, so it is still relevant to use with Indonesian data.
8 ✓ ✓ ✓
The multilingual model could also capture text with multiple
languages better than unilingual models. In addition, we also 9 ✓ ✓ ✓
experiment whether to use the models’ pooled output or 10 ✓ ✓ ✓
concatenation of the last four layers as our features, as
suggested in the original BERT paper. 11 ✓ ✓ ✓

D. Oversampling with MLSMOTE 12 ✓ ✓ ✓


Because of the heavy imbalances of some traits in the
dataset, as shown in Table I, and the amount of the dataset
IV. RESULT AND DISCUSSION
which is perceived lacking, this study uses oversampling on
the dataset. Synthetic Minority Over-sampling Technique All classification results can be seen in Table III and IV
(SMOTE) was used in [23]. However, our dataset can be seen with bolded scores for the top five scores. Table III shows the
as a multilabel dataset as each data could have more than one results on accuracy metric, while Table IV shows the results
label. SMOTE algorithm only does oversampling over binary on F1 metric. The tables show that scenarios with the highest
class, which means it does oversampling once for every five average scores, both accuracy and F1, are scenario 4, 7, 8, 11,
classes of our dataset, thus produces different sizes of and 12. From this, it can be concluded that using MLSMOTE
can boost the performance of the classifier, with up to 19,91%
synthetic data for each class. So, it is not the best technique
increment on accuracy (OPN trait on scenario 8 and 6) and
to implement for our dataset. Instead, to keep the size of 19,38% on F1 score (CON trait on scenario 12 and 10), in
produced synthetic data to be the same over all classes, this contrast to [23] that mentioned resampling technique
study uses Multilabel Synthetic Minority Over-sampling (including SMOTE) could not improve the performance. The
Technique (MLSMOTE), the multilabel version of SMOTE, highest average accuracy is 71,43%, obtained by using BERT-
which is a good match for our multilabel dataset. Because
MLSMOTE works with numeric features, we input the
TABLE III. RESULTS ON ACCURACY
extracted features from the previous step to oversample.
E. XGBoost Classifier 10-Fold Cross Validation Accuracy Average
No.
Accuracy
This study utilized gradient boosted trees called O C E A N
XGBoost, due to the promising results delivered in the 1 46,67 68,66 70,27 52,16 64,71 60,49
previous study using the same algorithm. Five personality
binary classifiers were built for this use case – one for each 2 47,67 70,45 72,06 53,37 65,10 61,73
Five-Factor Model personality trait. None of the 3 61,98 70,45 72,25 66,75 74,67 69,22
hyperparameters is tuned, meaning that we use the default
value for each hyperparameter. 4 63,79 75,33 76,49 66,89 73,50 71,20
In order to separate the training data from test data, we 5 48,25 71,04 68,88 53,17 59,81 60,23
use 10-fold cross validation because of the lacking amount of
6 45,29 70,62 70,67 48,84 59,42 58,97
the data, as the train-test splitting approach will give high bias
if we have limited data. Cross-validation is used in a stratified 7 63,53 71,21 75,32 65,33 72,10 69,50
manner to keep the amount of label distribution to be the 8 65,20 71,33 75,85 65,07 72,61 70,01
same for each split. For evaluation metrics, we use accuracy
to see the general performance and F1-macro for deeper 9 52,79 70,64 67,71 51,59 58,42 60,23
insight due to imbalance classes in the dataset. Table II shows 10 53,17 68,64 67,53 50,02 60,19 59,91
the breakdown of experimental scenarios to be performed on
the XGBoost classifier. 11 65,07 76,87 76,10 63,66 75,44 71,43

12 65,07 72,11 77,00 65,21 73,00 70,48

348 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

contain many English or other language words. Another


TABLE IV. RESULTS ON F1 example of this behaviour is the performance of the OPN,
EXT, and NEU classifiers in Table III and IV. OPN classifier
10-Fold Cross Validation F1 consistently gives higher scores with BERT-M as feature
No. Average F1 extractor, while EXT and NEU classifier consistently gives
O C E A N
higher scores with IndoBERT as a feature extractor. It can
1 45,15 50,71 53,18 49,81 59,32 51,63 indicate that users with the OPN trait are more likely to tweet
using multilingual language, while users with EXT and/or
2 46,10 50,42 53,64 50,82 60,56 52,31
NEU traits are more likely to tweet using Indonesian.
3 57,39 65,20 61,22 63,98 69,49 63,46
B. MLSMOTE Performance
4 61,49 69,69 64,66 64,88 67,44 65,63 From Table III and IV, it can be observed that CON and
5 47,48 48,88 49,38 51,16 55,69 50,52 EXT traits have the highest accuracy in all scenario. However,
it has a significant drop on the F1 score. This is caused by the
6 44,37 47,37 50,76 47,15 54,48 48,83 imbalanced classes on CON and EXT traits. However, this is
7 58,77 64,02 61,74 62,60 66,29 62,68 fixed with MLSMOTE, which significantly boost the F1
scores on CON and EXT traits. To dig more into the
8 59,94 64,27 61,49 62,08 66,91 62,94 MLSMOTE behaviour, we present the data label distribution
9 51,54 51,33 49,67 49,87 55,17 51,52 from the model with best and worst average F1 in Table VI
and VII. The best model is taken from scenario 7, while the
10 51,97 46,40 48,05 48,28 56,15 50,17 worst is taken from scenario 11. As we can see, MLSMOTE
11 62,02 72,79 64,82 61,07 71,13 66,37 helped the imbalanced CON label. However, it did not help
with the imbalanced EXT label. Interestingly, it changes the
12 62,11 65,78 63,04 63,43 68,61 64,59 main label of NEU label from Low with 56,5% distribution to
High with 61,57% distribution in Table VI. Regardless, we
M and its pooled output as features, then oversampled with still got higher scores with MLSMOTE. This may be caused
MLSMOTE. By using the same configuration, the highest by the synthetic data produced from MLSMOTE, which make
average F1, 66,37%, is obtained. the classifier learns easier.
A. Feature Extractor We can also observe that the label distribution of the best
In this study, we used three models as our feature model and the worst model has slight differences. The worst
extractors, two IndoBERT and BERT-M. From Table III and model has more imbalanced labels than the best model. For
IV, it was concluded that BERT-M has the best performance example, the worst model has 1,8% more High labels than the
across all the classes with the highest accuracy and F1. best model on the OPN trait, which made it more imbalanced.
However, the total performance of the model is calculated As MLSMOTE works with numerical features and uses the
from the four scenarios for each model, and the results can be nearest neighbour to produce synthetic data, the input feature
seen in Table V. From Table V, we can observe that the will affect the final label distribution. This means that the
IndoBERT model actually has the best performance across all BERT-M pooled output is the best input feature for
the classes and scenarios. Even though it does not score as MLSMOTE as it gives the most score boost on both accuracy
high as BERT-M in MLSMOTE, it already has the highest and F1 score.
performance even before using MLSMOTE, both with pooled
and the last four layers output.
TABLE VI. LABEL DISTRIBUTION WITH MLSMOTE ON MODEL WITH
This behaviour may be possible because IndoBERT is BEST MEAN F1
already pre-trained with Twitter tweets, which gives it an
advantage over other models. While the other models dropped High Low
the scores when using the last four layers output, IndoBERT’s OPN 472 (60,67%) 306 (39,33%)
performance is boosted consistently, with or without
MLSMOTE. This is consistent with the original BERT paper, CON 265 (34,06%) 513 (65,94%)
which recommends using the concatenation of the last four EXT 561 (72,11%) 217 (27,89%)
layers as features as it gives richer features.
AGR 462 (59,38%) 316 (40,62%)
However, BERT-M overall performance is right behind
NEU 479 (61,57%) 299 (38,43%)
IndoBERT, with differences only 0,15% on accuracy and
0,1% on F1 score. This may happen because the user tweets
TABLE VII. LABEL DISTRIBUTION WITH MLSMOTE ON MODEL WITH
WORST MEAN F1

TABLE V. RESULTS BASED ON PRETRAINED MODELS High Low

Pre-trained Mean OPN 486 (62,47%) 292 (37,53%)


Mean F1
Model Accuracy
CON 256 (32,9%) 522 (67,1%)
IndoBERT 65,66 58,26 EXT 571 (73,39%) 207 (26,61%)
IndoBERT 2 64,68 56,24 AGR 471 (60,54%) 307 (39,46%)
BERT-M 65,51 58,16 NEU 485 (62,34%) 293 (37,66%)

349 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

V. CONCLUSIONS “Automatic Extraction of Personality from Text: Challenges and


Opportunities,” in Proceedings - 2019 IEEE International
In this study, we have attempted user-level personality Conference on Big Data, Big Data 2019, 2019, pp. 3156–3164,
prediction with Twitter as our source of the data using the big doi: 10.1109/BigData47090.2019.9005467.
five as the personality model, focusing on Indonesian users. [11] D. R. Jaimes Moreno, J. Carlos Gomez, D. L. Almanza-Ojeda, and
With a focus on the XGBoost classifier, we analyzed three M. A. Ibarra-Manzano, “Prediction of personality traits in twitter
users with latent features,” in CONIELECOMP 2019 - 2019
settings: feature extractors, output type, and the use of International Conference on Electronics, Communications and
MLSMOTE. Our proposed method was able to reach decent Computers, 2019, pp. 176–181, doi:
results, with the best average accuracy of 71,43% and the best 10.1109/CONIELECOMP.2019.8673242.
average F1 score of 66,37%. The best feature extractor across [12] J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-
training of deep bidirectional transformers for language
all traits and scenarios is the IndoBERT, as it is already
understanding,” in NAACL HLT 2019 - 2019 Conference of the
familiar with Indonesian Twitter tweets. We also deduced the North American Chapter of the Association for Computational
users’ language choice from the results of different feature Linguistics: Human Language Technologies - Proceedings of the
extractors. For instance, users with OPN trait are more likely Conference, 2019, vol. 1, pp. 4171–4186.
to tweet using multilingual language, while users with EXT [13] M. E. Peters et al., “Deep contextualized word representations,” in
Proceedings of the 2018 Conference of the North American
and/or NEU traits are more likely to tweet using Indonesian. Chapter of the Association for Computational Linguistics: Human
As for MLSMOTE, it consistently gives a performance boost Language Technologies, Volume 1 (Long Papers), 2018, pp.
to the classifier, and we found that the best input features 2227–2237, doi: 10.18653/v1/N18-1202.
come from the BERT-M model. [14] S. Leonardi, D. Monti, G. Rizzo, and M. Morisio, “Multilingual
For future study, this research plans to build up the dataset transformer-based personality traits estimation,” Information, vol.
11, no. 4, pp. 1–21, 2020, doi: 10.3390/info11040179.
in terms of size and reliability so we can use other scenarios
[15] F. Koto, A. Rahimi, J. H. Lau, and T. Baldwin, “IndoLEM and
in our system, such as train-test splitting. We also plan to use IndoBERT: A Benchmark Dataset and Pre-trained Language
deep learning models as a classifier and make a system that Model for Indonesian NLP,” in International Conference on
uses all 100 tweets per user to improve this prediction system. Computational Linguistics, 2021, pp. 757–770, doi:
10.18653/v1/2020.coling-main.66.
REFERENCES [16] B. Wilie et al., “IndoNLU: Benchmark and Resources for
Evaluating Indonesian Natural Language Understanding,” in
Proceedings of the 1st Conference of the Asia-Pacific Chapter of
[1] G. Y. N. N. Adi, M. H. Tandio, V. Ong, and D. Suhartono, the Association for Computational Linguistics and the 10th
“Optimization for Automatic Personality Recognition on Twitter International Joint Conference on Natural Language Processing,
in Bahasa Indonesia,” in Procedia Computer Science, 2018, vol. 2020, pp. 843–857, [Online]. Available:
135, pp. 473–480, doi: 10.1016/j.procs.2018.08.199. http://arxiv.org/abs/2009.05387.
[2] N. H. Jeremy, C. Prasetyo, and D. Suhartono, “Identifying [17] S. K. R, S. R. G, R. Priyan, and P. T, “Personality Prediction and
personality traits for Indonesian user from twitter dataset,” Int. J. Classification Using Twitter Data,” Int. Res. J. Eng. Technol., vol.
Fuzzy Log. Intell. Syst., vol. 19, no. 4, pp. 283–289, 2019, doi: 07, no. 07, pp. 4878–4882, 2020.
10.5391/IJFIS.2019.19.4.283.
[18] M. H. Amirhosseini and H. Kazemian, “Machine learning
[3] B. Clinten, “Pengguna Aktif Harian Twitter Indonesia Diklaim approach to personality type prediction based on the Myers–Briggs
Terbanyak,” kompas.com, 2019. type indicator®,” Multimodal Technol. Interact., vol. 4, no. 1,
[4] M. L. Kern, P. X. McCarthy, D. Chakrabarty, and M. A. Rizoiu, 2020, doi: 10.3390/mti4010009.
“Social media-predicted personality traits and values can help [19] M. A. Rahman, A. Al Faisal, T. Khanam, M. Amjad, and M. S.
match people to their ideal jobs,” in Proceedings of the National Siddik, “Personality Detection from Text using Convolutional
Academy of Sciences of the United States of America, 2019, vol. Neural Network,” in 1st International Conference on Advances in
116, no. 52, pp. 26459–26464, doi: 10.1073/pnas.1917942116. Science, Engineering and Robotics Technology 2019, ICASERT
[5] J. Philip, D. Shah, S. Nayak, S. Patel, and Y. Devashrayee, 2019, 2019, pp. 1–6, doi: 10.1109/ICASERT.2019.8934548.
“Machine Learning for Personality Analysis Based on Big Five [20] J. Zhao, D. Zeng, Y. Xiao, L. Che, and M. Wang, “User personality
Model,” in Advances in Intelligent Systems and Computing, 2019, prediction based on topic preference and sentiment analysis using
vol. 839, pp. 345–355, doi: 10.1007/978-981-13-1274-8_27. LSTM model,” Pattern Recognit. Lett., vol. 138, pp. 397–402,
[6] R. N. Harahap and K. Muslim, “Peningkatan Akurasi pada 2020, doi: 10.1016/j.patrec.2020.07.035.
Prediksi Kepribadian Mbti Pengguna Twitter Menggunakan [21] X. Sun, B. Liu, J. Cao, J. Luo, and X. Shen, “Who am I?
Augmentasi Data,” J. Teknol. Inf. dan Ilmu Komput., vol. 7, no. 4, Personality detection based on deep learning for texts,” in IEEE
pp. 815–822, 2020, doi: 10.25126/jtiik.2020743622. International Conference on Communications, 2018, pp. 1–6, doi:
[7] N. Abood, “Big five traits: A critical review,” Gadjah Mada Int. J. 10.1109/ICC.2018.8422105.
Bus., vol. 21, no. 2, pp. 159–186, 2019, doi: [22] V. Ong, A. D. S. Rahmanto, W. Williem, N. H. Jeremy, D.
10.22146/gamaijb.34931. Suhartono, and E. W. Andangsari, “Personality Modelling of
[8] M. M. Tadesse, H. Lin, B. Xu, and L. Yang, “Personality Indonesian Twitter Users with XGBoost Based on the Five Factor
Predictions Based on User Behavior on the Facebook Social Media Model,” Int. J. Intell. Eng. Syst., vol. 14, no. 2, pp. 248–261, 2021,
Platform,” IEEE Access, vol. 6, pp. 61959–61969, 2018, doi: doi: 10.22266/ijies2021.0430.22.
10.1109/ACCESS.2018.2876502. [23] T. Tandera, D. Suhartono, R. Wongso, and Y. L. Prasetio,
[9] K. N. P. Kumar and M. L. Gavrilova, “Personality traits “Personality Prediction System from Facebook Users,” in
classification on twitter,” in 2019 16th IEEE International Procedia computer science, 2017, vol. 116, pp. 604–611.
Conference on Advanced Video and Signal Based Surveillance,
AVSS 2019, 2019, pp. 1–8, doi: 10.1109/AVSS.2019.8909839.
[10] N. Akrami, J. Fernquist, T. Isbister, L. Kaati, and B. Pelzer,

350 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Level of Password Vulnerability


Indira Mannuela Jessy Putri
Computer Science Department Computer Science Department
School of Computer Science School of Computer Science
Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia 11480 Jakarta, Indonesia 11480
[email protected] [email protected]

Michael Maria Susan Anggreainy


Computer Science Department Computer Science Department
School of Computer Science School of Computer Science
Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia 11480 Jakarta, Indonesia 11480
[email protected] [email protected]

Abstract— Nowadays password vulnerability is very participants, 61% memorize their password. One way to make
dangerous for accounts on the internet. The need to create an the user account more secure is using a biometric sensor, but
account is very important as it can properly store personal data. a biometric sensor has its own set of problems like money,
However, with a password, an account can maintain the convenience, and familiarity. With all this problem, password
integrity or authenticity of the account owner. Several things are is still the most popular authentication mechanism [5].
very influential so that they make passwords vulnerable, such as
several criteria in making passwords, namely the length of the For the case of password reuse in multiple accounts, a
password, the elements used in the password, reuse of combination of measurement study finds out that 43%
passwords, frequently changing passwords, and other things participant is reusing their password or only changing their
that will be discussed in this research. To determine the password with a small change. The rate of password reuse is
password vulnerability of current users. From the questionnaire also increased with the number of user accounts, making user
data, the most vulnerable key in password security is password with more account is more at risk, this happens because
reusability and frequently changed password. reusing password is the second source of high probability
password guesses. [6][7].
Keywords— Security; Password Strength; Text Password;
Usable Security; Password Component; Hijacking accounts can result in financial loss, stress and
embarrassment. In an online attack against passwords,
usually, the attacker tries to estimate the user password in the
I. INTRODUCTION live system. So that the service provider may block the
attacker after several failed attempts. However, there are only
Passwords are the main part of many security ways the attacker can handle this, namely the offline attack
technologies, they are the most commonly used authentication model. The attacker may be able to steal the hashed password
method. For a password system to be secure, users must make file from the service provider. So that the attacker managed to
a good choice about that password to use and what is the get many accounts and hurt many users[8].
password [1].
This research will use qualitative method for our analysis
Some people choose to use a short password and some method with survey research as our research type to give
choose to use a long password, both password lengths have numeric description for the opinions of population about
advantages and shortcoming. With a short password, people password vulnerability.
can easily guess the password and it will be a vulnerability,
but the user can remember the password with ease when the II. RESEARCH METHODOLOGY
password is short, whereas a long password gives the user
Research methodology that will be used can be seen on Fig
more protection against an attacker, the user will need more
1. Starting with research problem, research purpose, data
time to type it and it also hard for some people to remember a
collection, data analyzation and lastly research conclusion.
long password. [2].
Although password has many security problems like
attacker thread and usability problems like human error,
password is the primary form of user authentication, it has
been used for more than half-century and many believe it will
still be used in the future[3][4].
Even though users are more familiar with password
security and vulnerability, remembering complex, long, and
multiple passwords is hard. To make thing easier, many users
choose to use the same password for many of their accounts,
writing the password down, and not changing the password
regularly. this of course is making the password is more
vulnerable to many threats. Das et al found that out of 224 Fig 1. Research Diagram Research Problem

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

351 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

In this days and age, the development of internet named "master password" to handle all passwords. However,
technology is very fast and the use of online services, social it is also very dangerous if the master password is cracked, all
media (e.g. Instagram, Facebook, Linkedin, etc) for various passwords can be seen.
needs such as entertainment and business is widespread.
Authentication is very important in accessing accounts on the 1) Research Purpose
internet and in information security, authentication is useful The purpose of this research paper is to get the information
for identifying someone's identity. There are several about the level of password vulnerability of the user and user
categories of authentication such as knowledge-based concern about their password strength so that the data can be
authentication (e.g. textual), biometrics authentication (e.g. analyzed and can provide insight about password vulnerability
voice, fingerprint, retina), and token-based authentication (e.g. level.
smart cards, mobile phones). There are also other alternatives
in authentication such as using two-way authentication as 2) Data Collection
using smart cards, biometrics. However, this is troublesome To get the data to be analyzed, several questions will be
for the user because they have to carry the item everywhere at provided to obtain information on the password vulnerability
any time when they want to access the system [9]. of users, using Google Forms. Questionnaire is a data
What will be discussed in this research is knowledge- collection method that contains questions to get information
based authentication because this research will focus more on from the answerer [15]. This research will use a questionnaire
the combination of usernames and passwords that are often so that information gathering could be easily obtained and
used by many people. Therefore, it’s known that many people reached further.
have accounts on the internet and all of them require a 3) Data Analyzation
username, email, and password for authentication. For
example, Gmail alone, which was launched in 2004, already The data analysis will be done after the data collection is
has 1.8 billion users in the world. From here it can be imagined finished and the data has been processed.
how many user passwords there are and how many passwords 4) Research Conclusion
that are generally used have been leaked and known to thieves.
A password such as “helloworld” can take 3 minutes to crack The conclusion of this research is that the reader gets an
because it’s in dictionary, but “helloworld” that not in the insight into the average level of vulnerability in user password
dictionary can take 15 days to crack [10]. and user concern about their password strength.
With data from the dark web or some other leaky website, III. RESULT
data thieves can estimate the password of user account. This The result of our research is shown in pie charts. As seen
data can be in the form of personally identifiable information from Fig 2, from respondent the result is 15% respondent have
(PII) and passwords which are generally leaked [6]. their password length less than 8 characters, 56% respondent
According to 2016 Verizon DBIR, user login ID and password have their password length between 8 and 12 characters, 14%
were used in over 50% of all data breach in 2015 and an respondent have their password length between 12 and 16
analysis in 2015 also determine that 63% of the password is characters, 15% respondent have their password length more
part of weak, default, or stolen password[11]. Recently in than 16 characters. With this, the average of respondent has
2020 it was discovered that Tokopedia, a unicorn from their password length between 8 and 12 characters, this
Indonesia, had 15 million accounts that had their personal data happen because most of the website need the users the have at
leaked. Lots of employees can access the internal system from least 8 characters for their password.
Tokopedia. In addition, the official Twitter page
@underthebreach also informs that there are around 91
million Tokopedia users who are being sold online for US
$5,000[12]. The data obtained consists of emails, password
hashes, and names. So that many thieves take advantage of
this leaked data to estimate the victim's password.
Sometimes, the security of password doesn’t matter
because the user got pished, but on the other hand, it’s most
useful to make password not to be trivially predictable[13].
Worst case is that user behavior generally makes passwords
easy to recognize. Many users like to use the same password,
change a few previously used passwords, use PII into
passwords, often use only digits or often use only letters. To
increase the strength of a user's password, users are advised to
comply with password generation rules as in general,
passwords need to be eight characters in length, require at least Fig 2. Password Length Question Result
one uppercase letter and number, and do not contain a For the elements in the password that shown in Fig 3, 71%
username. In addition, to avoid security failure, users must passwords have uppercase letter in it, 79% passwords have
have different passwords for each account. According to the lowercase letter in it, 83% passwords have number in it, 36%
study, users generally have 25 online password-required passwords have a symbol or more in it. With this, most of
accounts and use 8 passwords per day, this makes it difficult passwords contain a number or more in it, more than the
for users to remember passwords [14]. There is also a method password contain lowercase/uppercase letter, this happens
whereby the user stores all of his passwords into the password because some of password type is PIN or Personal Identifier
management system. So it only requires one single password Number which only contain numbers in it.

352 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Fig 3. Password Element Question Result Fig 6. Account Amount Question Result

As for PII (Personally Identifiable Information) result is


shown the result in Fig 4, 54% passwords doesn't contain it,
and 46% password contains it. Because the optimal password
is the one that doesn't contain PII, the higher number in
password that contain PII mean that the password is more
vulnerable for an attacker that has user PII. Unlike PII, most
password doesn't contain dictionary word in it. Like shown in
Fig 5, 71% of the passwords doesn't use dictionary word and
29% of it use dictionary word. As passwords that contain
dictionary word is easier to attack than passwords that contain Fig 7. Daily Login Question Result
PII, this result is good for the safety of user account.

For Password reuse, the result is shown in Fig 8, 74%


respondent reuse their password and 26% respondent don't
reuse their password, password is reused mostly because
remembering password for different account is hard so most
respondent choose for convenient in exchange for safety. Like
shown in Fig 9, most respondent also don't change their
password regularly, only 26% respondent change their
password frequently and 74% respondent don't change their
password frequently.

Fig 4. Password With PII Question Result

Fig 8. Password Reuse Question Result

Fig 5. Password With Dictionary Words Question Result

In the account that respondent have, 60% respondent have


1 - 10 accounts, 24% respondent have 10 - 20 accounts, 9%
respondent have 20 - 30 accounts, and 7% respondent have
more than 30 accounts. For daily login in their account 73%
respondent login to their account 1 - 5 times daily, 18%
respondent login to their account 5 - 10 times daily, and 9%
respondent login to their account 10 - 15 times daily, both of
this data is shown in Fig 6 and Fig 7 respectively. Fig 9. Password Change Question Result

353 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

respondent re-use their password and don't change their


IV. CONCLUSION password frequently.
The conclusion of this research is, out of 8 indicator of With the result data, 5 out of 8 indicators is secure and
password vulnerability that observe, password length, improvement can be made on password reuse indicator and
password elements, password with PII, password with regular password change indicator. Hopefully, this finding can
dictionary word, user accounts count, daily login, password help people to make their password more secure and for future
reuse and regular password change. The vulnerable indicators security improvement.
are PII, password reuse, and password change, with more than
50% respondent password contains PII and more than 70%
vol. 18, no.4, pp. 13:1-13:34, May 2016.
REFERENCES https://doi.org/10.1145/2891411
[9] V. Taneski, M. Heričkp, B. Brumen, "Systematic overview of
password security problems," Acta Polytechnica Hungarica, vol.16,
[1] R. Wash, E. Rader, R. Berman, Z. Wellmer, "Understanding password no.3, pp. 143-165, 2019. 10.1016/j.infsof.2017.09.012
choices: how frequently entered passwords are re-used across
websites," in Proceedings of the Twelfth Symposium on Usable Privacy [10] M. Awam, Z. Al-Qudah, S. Idwan, A.H. Jallad, "Password security:
and Security, Colorado, 2016, pp. 175-189. password behavior analysis at a small university," in Proceedings of
the 2016 5th International Conference on Electronic Devices, Systems
[2] M. Golla, M. Dürmuth, "On the accuracy of password strength meters,"
and Applications (ICEDSA), Ras al Khaimah, 2016.
in Proceedings of the 2018 ACM SIGSAC Conference on Computer 10.1109/ICEDSA.2016.7818558
and Communications Security, Ontario, 2018, pp. 1567-1582.
https://doi.org/10.1145/3243734.3243769 [11] S. Aurigemma, T. Mattson, L. Leonard, "So Much Promise, So Little
Use: What is Stopping Home End-Usersfrom Using Password
[3] S.I. Alqahtani, S. Li, H. Yuan, P. Rusconi, "Human-generated and
Manager Applications?," in Proceedings of the 50th Hawaii
machine-generated ratings of password strength: what do users trust
International Conference on System Sciences, Waikoloa, 2017, pp.
more?," EAI Endorsed Transactions on Security and Safety, vol.6, 4061-4070.
issue.21, August 2019. http://dx.doi.org/10.4108/eai.13-7-
2018.162797 [12] E.A. Eloksari. "Tokopedia data breach exposes vulnerability of
personal data." thejakartapost.com.
[4] R. Chatterjee, A. Athayle, D. Akhawe, A. Juels, T. Ristenpart, https://www.thejakartapost.com/news/2020/05/04/tokopedia-data-
"Password typos and how to correct them securely," in 2016 IEEE breach-exposes-vulnerability-of-personal-data.html (accessed March.
Symposium on Security and Privacy, California, 2016, pp. 799-818. 24, 2021).
10.1109/SP.2016.53
[13] B. Ur, et. al., "Do users' perceptions of password security match
[5] N. Woods, M. Siponen, "Improving password memorability, while not reality?," in Proceedings of the 2016 CHI Conference on Human
inconveniencing the user," International Journal of Human-Computer Factors in Computing Systems, California, 2016, pp. 3748-3760.
Studies, vol.128, pp. 61-71, August 2019. https://doi.org/10.1145/2858036.2858546
https://doi.org/10.1016/j.ijhcs.2019.02.003
[14] M. Yıldırım, I. Mackie, "Encouraging users to improve password
[6] S.G. Lyastani, M. Schiling, S. Fahl, M. Backes, S. Bugiel, "Better security and memorability," International Journal of Information
managed than memorized? studying the impact of managers on Security, vol.18, issue.6, pp. 741–759, December 2019.
password strength and reuse," in Proceedings of the 27th USENIX 10.1007/s10207-019-00429-y
Security Symposium, Maryland, 2018, pp. 203-220.
[15] H. Taherdoost, "Validity and reliability of the research instrument; how
[7] H. Habib, et. al., "Password creation in the presence of blacklists," to test the validation of a questionnaire/survey in a research,"
Proceedings of the Workshop on Usable Security (USEC), California, International Journal of Academic Research in Management, vol. 5, no.
2017. 3, pp. 28-36, 2016. 10.2139/ssrn.3205040
[8] R. Shay, et. at., "Designing password policies for strength and
usability," ACM Transactions on Information and System Security,

354 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Cultural Tourism Technology Used and Themes: A


Literature Review
Hendro Nindito Harjanto Prabowo Spits Warnars Harco Leslie Hendric
Department of Computer Science, Department of Computer Science, Department of Computer Science,
BINUS Graduate Program-Doctor of BINUS Graduate Program-Doctor of BINUS Graduate
Computer Science. Computer Science. Program-Doctor of Computer Science.
Bina Nusantara University, Jakarta, Bina Nusantara University, Jakarta, Bina Nusantara University, Jakarta,
Indonesia 11480, Indonesia Indonesia 11480, Indonesia Indonesia 11480, Indonesia
[email protected] [email protected] [email protected]

Sfenrianto
Department of Information Systems
Management,
Bina Nusantara University, Jakarta,
Indonesia 11480, Indonesia,
[email protected]

Abstract— In today's circumstances, there is a phenomenon II. LITERATURE STUDY


for people to want to find out culture outside their
neighborhood. This can be realized by collaborating technology A. Cultural Tourism
with cultural tourism. This study discusses the application of Cultural tourism is one type of tourism that received a
ICT technology in the cultural tourism domain by conducting
new operational definition from the UNWTO at the 22nd
peer-reviewed articles using keywords and analyzing their
content. This study finds trends and technology themes in the Session of the General Assembly held in Chengdu, China [1],
visualized cultural tourism domain so that it can make it easier “Cultural tourism is one type of tourism activity whose main
for researchers in this field to understand and further develop motivation for visitors is to learn, discover, experience, and
the cultural tourism domain. (Abstract) consume tangible and intangible cultural products/products
in a tourism destination.”
Keywords—Cultural Tourism Technology, Technology According to [5], [6], "Knowledge Workers" or also
themes, Visualized. known as "Mature Toursits" or "Experienced Tourists" are
I. INTRODUCTION the main segmentation of cultural tourism products. Their
goal is more than just a trip but rather to experience direct
Research in the field of cultural tourism is expanding involvement in the cultural traditions and activities of the
fastly, in areas: cultural motivation, cultural consumption, local community.
heritage preservation, anthropology, cultural tourism
economics, and the relationship with the creative economy B. Digital Technology in Cultural Tourism
[1]. The boundaries between the virtual world and the real
Digitization has changed the travel and tourism industry world are bridged by digital technology where its use will
significantly. The impact of digital technology on tourist increase the level of immersion and experience [7], [8]. In the
behavior before, during and after a trip assumes increasing tourism domain, the application of Virtual reality (VR) and
significance and weight, thus transforming traditional augmented reality (AR) technology has been successfully
travelers into digital travelers and savvy travelers. Smart integrated with the resulting benefits, namely increasing
tourists use technology that has several functions, such as visitor engagement both before the visit and remembering the
recognizing tourist attractions, displaying events visited tourist destination.
around tourist attractions, displaying the nearest police In the tourism domain, the use of Virtual reality (VR) and
station and hospital for emergencies, and keeping a history of augmented reality (AR) has been effectively combined with
recognized objects [2]. the resulting benefits to enhanced visitor engagement.
Previous research was conducted by [3] where this paper
To date, AR studies in tourism have mainly been carried
reviews and analyzes studies comprehensively in the context
out for the enhancement of experiences and interactions,
of Internet applications for tourism. Another academic where real scenes are enhanced by multimedia to provide
research related to mobile technologies and applications in personalized interactive information in a user-friendly
smart tourism published between 2012 and June 2017 interface [8]. VR applications emerged for marketing
conduct by [4]. promotions, and to enhance and create memorable travel
This research is intended to answer the question "What experiences in destinations, as well as off-site [9].
technologies are widely used in the cultural tourism sector?".
Where to answer this, the researchers conducted a literature Tourism activities and experiences will contribute to the
study from scopus database from 2001 to 2020. Cultular preservation and management of local culture including
tourism is spesific domain from tourism that willbe explain community-based cultural tourism and cultural tourism
further in the nexxt section. initiatives. This causes cultural heritage to become the center
of tourist destinations [10]. Hence, as lot tourism objective are

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

355 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

focused on local heritage, technology-based applications have


a great opportunity to contribute to tourism experience
management, heritage preservation and also improve the
tourism experience [7].
The focus of heritage conservation and conversion in
recent years has been the incorporation of digital technology
in place of traditional methods. Preservation of heritage
physical artifacts and sites employing methods such as
photogrammetry, digital document storage and 3D scanning
[8], [11]. In addition, AR and VR have been applied for
heritage preservation, with a greater focus on user
engagement. Another application of AR and VR in heritage
preservation is to increase the involvement of the tourist Figure 1. Distribution Articles by Period
experience not only in the context of museums but for a wider
range of fields and tourist destinations [8], [12]. B. Distribution Articles per Source
Observe from the sources of the top 5 journals that
C. Cultural Tourism as a Government Service
publish, it appears that the sources of Ceur Workshop
One of the tasks of the government is to formulate policies Proceeding and Sustainability Switzerland publish the most
and implement public services, where one of the strategies is articles as many as 4 articles.
to use social media-based information technology [13].
From the literature study, it can be classified six
stakeholders who interact with the tourism service domain,
namely visitors, employees, citizens, non-profits, businesses
and government. These interactions are Government to
Employees (G2E), Government to Visitors (G2V),
Government to Citizens (G2C), Government to Nonprofits
(G2N), Government to Business (G2B) and Government to
Government (G2G) [14].

III. METHODOLOGY
Figure 2. Distribution Articles by Source
In this research, the search for articles was conducted to
find publications on the success factors in the cultural C. Distribution Articles by Country
Technology in tourism industry. The keywords “Mobile”, The five countries with the most publications on this topic
“Technology”, nd “Applications”, were integrated with are Italy, China, Greece, United Kingdom and Malaysia.
”Cultular-Tourism” to explore the abstract, title and
keywords of the journal documents from the Scopus Database
and are written in English and obtain 49 documents.
The search started from 2001 to 2021. Choosen journal
articles were straight exported to reference manager
Mendeley which assisted the authors managed the research.
The metadata also eport to VosViewer to find relationship
among the keyword occurance and classified into clustered.
After the data collection tasks, We independently evaluated
all 39 choosed journal articles to expand the reliability and
validity.
.
Figure 3. Distribution Articles by Period
IV. RESULT AND DISCUSSION
D. The Most Cite Articles
The outcome of the research are described in the The article with the most citations is "ARCHEOGUIDE:
disemination of articles periodicly, the distribution of articles First Results of an Augmented Reality, Mobile Computing
from sources, and the distribution of articles based on System in Cultural Heritage Sites by Vlahakis, V.,
research methods, by country and according, most cite article. Karigiannis, J., Tsotros, M., (...), Carlucci, R ., Ioannidis, N.
A. Distribution Articles per Years published in Proceedings VAST 2001 Virtual Reality,
Articles on the topic of Cultural technology tourism were Archeology, and Cultural Heritage pp. 131-139” with a total
first published in 2001, then from 2002 to 2005 and 2007 to of 146 citations.
2010 no articles were published. This topic started to rise
again in 2008. Among the 39 identified articles, the most
articles were published in 2020 for eight articles, reserch
themes and technologis feature.

356 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

TABLE I. TOP FIVE MOST CITE ARTICLES 365 keywords, 15 are meet the criteria. The theme of this
Most Cite Articles topics can be group as follow:
Ref 1) Mobile Applications (red). This cluster containg
Title Years Cite
“ARCHEOGUIDE: First Results of keywords mobile applications, computer applications,
an Augmented Reality, Mobile cultural heritage site, history, human computer
[15] 2001 146
Computing System in Cultural interction, information science, mobile, computing,
Heritage Sites”.
mobile device and smart tourism.
"From Territory to Smartphone:
Smart Fruition of Cultural Heritage 2) Heritage Tourism (green), in this cluster the keywords
[16] 2014 42 are heritage tourism, data analytic, environment impact,
for Dynamic Tourism
Development" innovation, knowledge, mobile communication, tourism
"GeoGuides, Urban Geotourism development and tourism management
[17] Offer Powered by Mobile 2018 33
Application Technology" 3) Cultural Tourism (blue) this cluster contain Cultural
“Personalisation Systems for tourism, creative tourism, emerging technologies,
[18] 2013 18
Cultural Tourism" information systems and mobile technologies
“Augmented Reality Smart Glasses 4) Cultural heritage (yellow) , containing key word cultural
[19] (ARSG) visitor adoption in cultural 2019 15
tourism” heritage, ICT, mobile application, Sustainable
development, urban development, Virtual Reality
5) Augment Reality (purple) , containing keywords
E. Research Themes Mapping Augment reality, 3d modeling, three dimensionl
The use of keywords in the topic of tourism technology is computing
analyzed and mapped these research visually as Fig.4 using 6) Cloud Computing (Light blue), cloud computing,
the VosViewer tool. The minimum repetition of key words Information technology, mobile telecomunication
used is three times, the results obtained from the process of

Figure 4. Research Mapping

F. Technology Featurs used in Cultural Tourism


TABLE II. TECHNOLOGIES FEATURES
The technological feature used in the current cultural
tourism domain is the use of Augmented Reality. Utilization Technology Featurs
No
of other features can be seen in table 2 below: Technology Featurs Articles
1 3D [20], [21], [22]
[16], [19], [23], [24], [25], [26], [27],
2 Augmented Reality
[28], [22], [29], [30], [31], [32], [15]

357 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Technology Featurs For Indonesia Tourism,” Procedia Comput. Sci., vol.


No
Technology Featurs Articles 116, pp. 556–563, 2017.
3 Cloud Computing [33], [32] [3] D. Buhalis and R. Law, “Progress in information
technology and tourism management : 20 years on and
4 Digital Storytelling [33], [34] 10 years after the Internet — The state of eTourism
5 Virtual Reality [35], [21], [15] research,” vol. 29, pp. 609–623, 2008.
Location-based [4] D. Jelena, K. Jelena, and M. Suzana, “Mobile
6 [36], [27], [37], [38],
Technology technologies and applications towards smart tourism –
7 Multimedia [18], [39], [17], [23] state of the art,” Tour. Rev., vol. 74, no. 1, pp. 82–103,
Jan. 2019.
8 Social Media [40], [41], [42], [39]
[5] C. Surugiu and M. R. Surugiu, “Is the Tourism Sector
9 Web [40], [38], [43] Supportive of Economic Growth? Empirical Evidence
on Romanian Tourism,” Tour. Econ., vol. 19, no. 1, pp.
AR technology is the most widely applied technology in 115–132, Feb. 2013.
the cultural tourism domain in terms of improving the cultural [6] N. M. Iversen, L. E. Hem, and M. Mehmetoglu,
experience of tourists, such as to intelligently promote cultural “Lifestyle segmentation of tourists seeking nature-
heritage in Italy [16], and presenting personalized audio- based experiences: the role of cultural values and travel
visual interactive information for museums and cultural motives,” J. Travel Tour. Mark., vol. 33, no. sup1, pp.
tourism on users' personal mobile devices [23]. 38–66, Apr. 2016.
[7] A. Bec, B. Moyle, K. Timms, V. Schaffer, L.
VR technology is often used in mobile applications to
produce a better user perspective. As an example of Skavronskaya, and C. Little, “Management of
application in the application of Romanian cultural heritage immersive heritage tourism experiences: A conceptual
sites [35], simulation and exploration of the vast landscape of model,” Tour. Manag., vol. 72, pp. 117–120, 2019.
the Quanzhou Maritime Silk Road to support cultural tourism [8] T. Jung, M. C. Tom Dieck, H. Lee, and N. Chung,
[21]. “Effects of Virtual Reality and Augmented Reality on
Visitor Experiences in Museum,” in Information and
Location-based technology is used in mobile applications communication technologies in tourism …, 2016, pp.
as a navigation medium at a cultural tourism location such as 621–635.
in a mobile tourism application based on cultural tourism sites
[9] Y. Huang, K. Backman, S. Backman, and L.-L. Chang,
in Malaysia [38].
“Exploring the Implications of Virtual Reality
Social media technology is very widely used and one of Technology in Tourism Marketing: An Integrated
the effective ways to spread information. This media stores Research Framework: The Implications of Virtual
data regarding user profiles, interactions and media for Reality Technology in Tourism Marketing,” Int. J.
exchanging information, making reviews and checking Tour. Res., vol. 18, pp. 116–128, 2015.
locations. Social media is integrated with multi media for the [10] M. Ursache, “Tourism – Significant Driver Shaping a
benefit of cultural tourism [39]. Destinations Heritage,” Procedia - Soc. Behav. Sci., vol.
V. CONCLUSSION 188, pp. 130–137, 2015.
[11] N. Yastikli, “Documentation of cultural heritage using
This research systematically reviewed 39 articles related digital photogrammetry and laser scanning,” J. Cult.
to the application and themes of ICT technology in the cultural Herit., vol. 8, no. 4, pp. 423–427, 2007.
tourism domain. Articles reviewed have indicated the use of [12] J. Martins, R. Gonçalves, F. Branco, L. Barbosa, M.
AR technology, VR, digital storytelling, multimedia, location- Melo, and M. Bessa, “A multisensory virtual experience
based services, social media and the web in this domain.
model for thematic tourism: A Port wine tourism
Another finding is the theme of technology application in application proposal,” J. Destin. Mark. Manag., vol. 6,
this domain which can be classified into six clusters, namely no. 2, pp. 103–109, 2017.
Mobile Applications, Heritage Tourism, Cultural Tourism, [13] E. Madyatmadja, A. Sano, C. Sianipar, H. Nindito, and
Cultural Heritage, Augment Reality and Cloud Computing. R. Bhaskoro, Factors Influencing the Uses of Social
In carrying out this research, there are several limitations Media within the Government: A Systematic Literature
experienced by researchers, including the small sample of data Review. 2020.
or articles, access to journal resources that are still few and [14] N. Kalbaska, T. Janowski, and E. Estevez, “E-
also deepening of analysis in the field of cultural tourism Government Relationships Framework in the Tourism
technology. For future research improvements can be made by Domain . A First Map" no. March 2018, pp. 71–86,
critically analyzing the findings progressively 2016.
[15] V. Vlahakis et al., “ARCHEOGUIDE: First Results of
an Augmented Reality, Mobile Computing System in
REFERENCES Cultural Heritage Sites,” in Proceedings VAST 2001
Virtual Reality, Archeology, and Cultural Heritage,
[1] G. Richards, “Cultural tourism: A review of recent 2001, pp. 131–139.
research and trends,” J. Hosp. Tour. Manag., vol. 36, [16] C. Garau, “From Territory to Smartphone: Smart
pp. 12–21, 2018. Fruition of Cultural Heritage for Dynamic Tourism
[2] Meiliana, D. Irmanti, M. R. Hidayat, N. V. Amalina,
Development,” Plan. Pract. Res., vol. 29, no. 3, pp.
and D. Suryani, “Mobile Smart Travelling Application 238–255, 2014.

358 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

[17] A. Pica et al., “GeoGuides, Urban Geotourism Offer Computer Communication and the Internet, ICCCI
Powered by Mobile Application Technology,” 2020, 2020, pp. 174–178.
Geoheritage, vol. 10, no. 2, pp. 311–326, 2018. [29] M. Lv, L. Wang, and K. Yan, “Research on Cultural
[18] K. Kabassi, “Personalisation Systems for Cultural Tourism Experience Design Based on Augmented
Tourism,” Smart Innovation, Systems and Reality,” 8th International Conference on Culture and
Technologies, vol. 25. Department of Protection and Computing, C and C 2020, held as part of the 22nd
Conservation of Cultural Heritage, TEI of Ionian International Conference on Human-Computer
Islands, 2 Kalvou Square, 29100 Zakynthos, Greece, pp. Interaction, HCII 2020, vol. 12215 LNCS. Springer,
101–111, 2013. School of Digital Media and Design Arts, Beijing
[19] D.-I. D. Han, M. C. Tom Dieck, and T. Jung, University of Posts and Telecommunications, Beijing,
“Augmented Reality Smart Glasses (ARSG) visitor 100876, China, pp. 172–183, 2020.
adoption in cultural tourism,” Leis. Stud., vol. 38, no. 5, [30] X. Huo, H. Chen, Y. Ma, and Q. Wang, “Design and
pp. 618–633, 2019. realization of relic augmented reality system of
[20] A. Berlino, L. Caroprese, A. La Marca, E. Vocaturo, and integration positioning and posture sensing
E. Zumpano, “Augmented reality for the enhancement technology,” in 1st International Symposium on Water
of archaeological heritage: A Calabrian experience,” in System Operations, ISWSO 2018, 2018, vol. 246.
1st International Workshop on Visual Pattern [31] L. Zhang, W. Qi, K. Zhao, L. Wang, X. Tan, and L. Jiao,
Extraction and Recognition for Cultural Heritage “VR games and the dissemination of cultural heritage,”
Understanding, VIPERC 2019, 2019, vol. 2320, pp. 86– 6th International Conference on Distributed, Ambient
94. and Pervasive Interactions, DAPI 2018 Held as Part of
[21] T.-C. Hsiao, R. Yan, C.-Y. Chang, C.-C. Chen, and M. HCI International 2018, vol. 10921 LNCS. Springer
Guo, “Application of Virtual Reality Technology to Verlag, Department of Information Art & Design,
Display of ‘Maritime Silk Route’ Culture,” Sensors Tsinghua University, Beijing, 100084, China, pp. 439–
Mater., vol. 33, no. 2, pp. 815–823, 2021. 451, 2018.
[22] S. Xin, S. Qingting, L. Zhiqiang, and C. Tengfei, [32] C. De La Nube Aguirre Brito, “Augmented reality
“Application of 3D tracking and registration in applied in tourism mobile applications,” in 2015 2nd
exhibition hall navigation interaction,” in 2020 International Conference on eDemocracy and
International Conference on Intelligent Computing, eGovernment, ICEDEG 2015, 2015, pp. 120–125.
Automation and Systems, ICICAS 2020, 2020, pp. 109– [33] F. Clarizia, S. Lemma, M. Lombardi, and F. Pascale,
113. “An ontological digital storytelling to enrich tourist
[23] I. Deliyiannis and G. Papaioannou, “Augmented reality destinations and attractions with a mobile tailored
for archaeological environments on mobile devices: A story,” 12th International Conference on Green,
novel open framework,” Mediterr. Archaeol. Pervasive and Cloud Computing, GPC 2017, vol. 10232
Archaeom., vol. 14, no. 4, pp. 1–10, 2014. LNCS. Springer Verlag, DI, University of Salerno,
[24] A. Fiore, L. Mainetti, L. Manco, and P. Marra, Fisciano, SA, Italy, pp. 567–581, 2017.
“Augmented reality for allowing time navigation in [34] Q. Wu, “Commercialization of digital storytelling: An
cultural tourism experiences: A case study,” 1st integrated approach for cultural tourism, the Beijing
International Conference on Augmented and Virtual Olympics and wireless VAS,” Int. J. Cult. Stud., vol. 9,
Reality, SALENTO AVR 2014, vol. 8853. Springer no. 3, pp. 383–394, 2006.
Verlag, Department of Innovation Engineering, [35] A. Briciu, V.-A. Briciu, and A. Kavoura, “Evaluating
University of Salento, Via Monteroni, Lecce, 73100, how ‘smart’ Brasov, Romania can be virtually via a
Italy, pp. 296–301, 2014. mobile application for cultural tourism,” Sustain., vol.
[25] M. Epstein and S. Vergani, “Mobile technologies and 12, no. 13, 2020.
creative Tourism: The history unwired pilot project in [36] P. N. Lumpoon and P. Thiengburanathum, “Effects of
Venice, Italy,” in 12th Americas Conference on integrating a mobile game-based learning framework in
Information Systems, AMCIS 2006, 2006, vol. 3, pp. a cultural tourism setting,” in 10th International
1361–1369. Conference on Software, Knowledge, Information
[26] C.-C. Chiu, W.-J. Wei, L.-C. Lee, and J.-C. Lu, Management and Applications, SKIMA 2016, 2017, pp.
“Augmented reality system for tourism using image- 281–285.
based recognition,” Microsyst. Technol., vol. 27, no. 4, [37] P.-L. Wu, M.-S. Chen, Y.-F. Kao, and L.-L. Guo,
pp. 1811–1826, 2021. “Application of Mobile Navigation Systems in Social
[27] A. Ramtohul and K. K. Khedo, “A prototype mobile Networks of Historic Sites,” in 4th IIAI International
augmented reality systems for cultural heritage sites,” Congress on Advanced Applied Informatics, IIAI-AAI
5th International Conference on Information System 2015, 2016, pp. 723–724.
Design and Intelligent Applications, INDIA 2018, vol. [38] M. S. Panahi, P. Woods, and H. Thwaites, “Designing
863. Springer Verlag, Faculty of Information, and developing a location-based mobile tourism
Communication and Digital Technologies, University application by using cloud-based platform,” in 2013
of Mauritius, Moka, Mauritius, pp. 175–185, 2019. International Conference on Technology, Informatics,
[28] V. A. Memos, G. Minopoulos, C. Stergiou, K. E. Management, Engineering and Environment, TIME-E
Psannis, and Y. Ishibashi, “A Revolutionary Interactive 2013, 2013, pp. 151–156.
Smart Classroom (RISC) with the Use of Emerging [39] Y. Ping, L. Yang, and S. Cao, “Design and
Technologies,” in 2nd International Conference on Implementation of Mobile Multimedia System in

359 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Cultural Tourism Field under the Condition of Media


Convergence,” in 2020 International Conference on
Culture-Oriented Science and Technology, ICCST
2020, 2020, pp. 582–586.
[40] T. Kalvet, “Innovative tools for tourism and cultural
tourism impact assessment,” Sustain., vol. 12, no. 18,
2020.
[41] G. Jun, “Research on the Marketing Innovation of ‘live
+ Short Video’ in the Culture and Tourism Industry in
We Media Era,” in 2021 International Conference on
Tourism, Economy and Environmental Sustainability,
TEES 2021, 2021, vol. 251.
[42] L. Zhang, C. Zhang, and M. Shi, “Applying Mobile
Technology for Developing Cultural and Creative
Products in Tourism: A Case Study on the Forbidden
City,” in 20th IEEE International Conference on
Software Quality, Reliability, and Security, QRS 2020,
2020, pp. 542–549.
[43] C. E. Lorea, “Searching for the Divine, handling mobile
phones: Contemporary lyrics of baul songs and their
osmotic response to globalisation,” Hist. Sociol. South
Asia, vol. 8, no. 1, pp. 59–88, 2014.

360 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

IoT Sensors Integration for Water Quality Analysis


Hermantoro Suparman Dominikus Sutrisno Ariyanto
Agricultural Engineering Department, Agricultural Engineering Department, Agricultural Engineering Department,
Faculty of Agricuktural Technology, Faculty of Agricuktural Technology, Faculty of Agricuktural Technology,
Instiper Agriciltural University Instiper Agriciltural University Instiper Agriciltural University
Yogyakarta, Indonesia Yogyakarta, Indonesia Yogyakarta, Indonesia
[email protected] [email protected] [email protected]

Reza Rahutomo Teddy Suparyanto Bens Pardamean


Information System Department Bioinformatics & Data Science Computer Science Department
School of Information Systems Research Center BINUS Graduate Program
Bina Nusantara University Bina Nusantara University Master of Computer Science
Jakarta, Indonesia 11480 Jakarta, Indonesia 11480 Bina Nusantara University Jakarta,
[email protected] [email protected] Indonesia 11480
[email protected]

Abstract—Water quality data is important for analysis in many Restriction (LSSR) that was established in March 2020. In
domain applications. The research aims to collect water quality addition, it is discovered that establishments of LSSR shifted
data through Internet of Things (IoT) approach that integrates
several sensors and a micro-controller. This research is conducted PM 2.5 apex time without changing the nadir [12; 13]. Until
by constructing a research framework that covers conceptual 2021, Pardamean et al. continued the research by conducting a
design, component selection, design realization, and sensor chi-squared test to prove that LSSR contributes to the changes
accuracy and precision test. An integrated sensor with high
accuracy and precision is provided as the research outcome. It is
of Jakarta air quality in longer timeline according to the
suggested that future research explore water quality classification observed pollutants [14].
and surpass the limited visualization with a modern method. As a subset of natural resources management, water quality
Index Terms—Internet of Things (IoT), Integration, Sensors, analysis has been done with limited resources. Research from
Water Quality Analysis Pule, Yahya, and Chuma emphasizing on water quality
monitoring offers wireless sensor with its promising benefits,
I. INTRODUCTION which are affordability and capability in conducting
measurement remotely. However, several limitations in terms
The rapid developments of sensing technology lead Internet of processing power, memory, communication bandwidth and
of Things (IoT) as an important and well-known model to be energy or power must be considered before implementation
implemented on various domains. Gubbi, Buyya, Marusic, and [15]. Adu-Mandu et al. highlighted various sensors for
Type your text
Palaniswami supported the statement as IoT has already measuring water quality parameters. One can learn that to
become an influencing technology in order to fulfil the needs of analyze water quality, amounts of chemical content such as
enormous amount of data for research [1]. Soumyalatha and chlorine, calcium salts, and magnesium salts must be examined
Hegde added that its advantage in facilitating data very well with pH, turbidity, and TDS (Total Dissolved Solids)
communication align with wireless networking such as cloud sensor [16].
computing [2; 3] makes IoT able to play a major role in
In this research, a prior challenge in water quality analysis is
nowadays’ data environment that put forward openness,
developing a data collection tool that collects important
volume and variety of data [4].
parameters with high accuracy and precision at the same time
A benefit of collecting natural resource data with emerging via Internet of Things (IoT). The research focuses on designing
data retrieving method like IoT sensor can be seen in how deep a tool for a multi-purpose water quality analysis in the future
a research can be conducted afterwards. For example, Caraka with determining usage of a number of sensors which is studied
et al. conducted a number of analysis various areas such as by Adu et al. [16]. An integrated sensor with high accuracy and
electric load [5], plant pattern [6], rainfall forecasting [7; 8] precision is presented as an outcome
using Support Vector Regression (SVR) [9; 10; 11] at most.
Also, Bioinformatics and Data Science Research Center II. RELATED STUDIES
(BDSRC) has been conducting air quality analysis for times by A study from Vijaykumar and Ramya focused in monitoring
utilizing governmental data collection resources. The research drinking water quality. Temperature, turbidity, pH,
center has discovered that Air Quality Index (AQI) in Jakarta conductivity, and dissolved oxygen sensor are installed on
city shows improvement two weeks after Large-Scale Social Raspberry PI B+ kit that is prepared for pointing contamination

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

361 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

in drinking water. Visualization of the examined object can be normal water sample contains a pH between 6.0 to 9.0.
seen in a web platform that is accessible whether using desktop Meanwhile turbidity indicates the concentration of suspended
computer or smartphone [17]. and colloidal material in a water sample. Measured in
Motivated by preventing large amount of water, Rao, Srinija, nephelometric turbidity units (NTU), drinkable water must
Bindu, and Kumar utilized IoT sensor for water monitoring and contain less than 1 NTU. Lastly, Corwin and Yemoto explained
quality analysis. The research utilized Raspberry Pi micro that total dissolved solids refer to chemical contents such as
controller connected to a number of integrated sensors namely salts, minerals, cations, or anions that contained in a water
ultrasonic, turbidity, and pH, which are designed for placement sample [21].
in overhead water tanks. Combination of the sensors TABLE I
WATER SAMPLES
successfully detected water supply and determine the hygiene
Water Temp. pH Turbidity TDS
of the water from pH and turbidity measurement. Data from the Description
Sample (°C) (pH) (NTU) (PPM)
integrated sensors is visualized in an Android mobile Drinking
application with cloud storage empowerment [18]. 1 30.13 6.16 0.48 132
water
In order to enhance water quality control in Bristol Harbour Ground water
2 29.53 1.13 20.13 206
area, Chen and Han proposed an integration of water quality with sugar
sensor, cloud computing, and IoT technology which focused for 3 Tea 29.86 5.26 81.67 274
the future of smart city infrastructure. The research proposed an 4
Ground water
29.50 5.40 1350 354
integration of temperature, dissolved oxygen, and pH to with dirt
determine the quality of water in Bristol Harbour specifically. 5 Coffee 29.73 2.66 1160 545
The proposed solution successfully detected changes of water
quality during seasons remotely. Even so, sometimes sensors B. Water Samples
generated unusual measurements that was unable to be included Table 1 shows five different water samples utilized in this
in further data processing [19]. research. Consumable water samples are represented with
sample 1, 3, 5 while sample 2 and 4 are prepared for agricultural
purpose. To complete the description of water samples,
III. RESEARCH METHODOLOGY laboratory measurements which cover temperature, pH,
In order to deliver an integrated IoT sensors for water quality turbidity, and TDS (Total Dissolved Solids) are included as
analysis, the research is based on four main activities. Figure 1 well.
explains four stages in integrating IoT sensors to analyze the C. Sensor Accuracy and Precision Test
quality of various water samples. It begins with conceptual
design, component selection, design realization, and finalizes If equation 1 and 2 are utilized to calculate accuracy and error
with sensor accuracy and precision test. of sensors respectively, equation 3 is utilized to calculate
precision of sensors. The sensors will be tested with five
different water samples described on Table 1.

100% (1)

100% (2)

Fig. 1. Research Framework Where ’Lab’ means values collected from laboratory tests,
and ’sensor’ means values measured by sensors, and ’Average
Sensor’ means the value obtained by dividing the sum total of
A. Water Quality Parameters
a set of sensor data by the number of sensors recording
This research must be capable of collecting the following activities.
water quality parameters:
1) Temperature.
2) pH. IV. RESULT AND DISCUSSION
3) Turbidity. A. Conceptual Design
4) Total Dissolved Solids. Figure 2 illustrates the concept of the integrated sensor for
Temperature is one of the most significant parameters between water quality analysis. In a big picture, the concept consists of
seasons [19; 20]. According to a previous survey, pH represents three parts which cover input sensors, processing units, and
the amount of acidity or alkalinity from a water sample [15]. A visualizations. Four stacks of input sensors covered

362 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

temperature, pH, turbidity, and TDS. The sensors measured


water samples followed by data compilation at processing unit
which consists of Arduino Mega 2560 R3 micro controller and
a module to stream the data. To support output and simple
visualization, a 0.96” wide LCD OLED screen is connected to
micro-controller. Also, we added SIM 800L Module for further
research preparation that enables data transmission to web
platform.

Fig. 3. Design Realization


Fig. 2. Conceptual Design

TABLE II
LIST OF COMPONENTS
Component Qty Description
Microcontroller Arduino Mega 2560 R3 1 unit An integrated circuit to govern a specific operation
Sensor pH 4502c 1 unit Input sensor for pH
Sensor GE Turbidity 1 unit Input sensor for turbidity
Temperature Sensor DS18B20 Waterproof 1 unit Input sensor for water temperature
TDS (Total Dissolved Solids) Sensor 1 unit Input sensor for TDS
LCD OLED 128x64 0.96” 1 unit Output sensor
SIM800L Module 1 unit Data transmitter to web-portal
Push Button Switch 5 units A simplified electricity mechanism to turn on/off
Printed Circuit Board 1 unit Connects electronic components using conductive pathways,
tracks, or signal
Socket CB 3 pin 3 units Lockable metal connector
LM2695 5 volt 1 unit Step Down Switching Regulator
Adaptor 12 V 1 unit Power supply
Jump wire 4 units To complete / by-pass a break in an electrical circuit
LED module 1 unit A module that emits light when electrical current pass through

B. Component Selection D. Sensor Accuracy Test


Table 2 collects the required components for the research. In The accuracy of temperature, turbidity, pH, and TDS sensors
order to integrate water temperature, pH, turbidity, and TDS to all water samples is respectively listed in Table 3 to Table 6.
sensor, Arduino Mega 2560 R3 is chosen as the micro controller Overall, all sensors develop more than 95% accuracy with less
coupled with SIM800L module for facilitating data streaming than 5% error rate.
to web platform. In addition, LCD OLED screen with 0,96” Table 3 lists temperature measurements that successfully
width is prepared for data visualization. Lastly, several recorded at 99.19% average accuracy. Meanwhile water
supporting tools are included to support the installation process. temperature of Sample 1-4 is measured with 99% accuracy,
water temperature of Sample 5 is measured with a slight less
C. Design Realization accurate among all which is 98.92%. In examining temperature
Figure 3 illustrates the realization of the integrated sensors. sensor to water samples, room temperature must be checked
The micro controller, Arduino Mega 2560 R3 is connected with prior to the start for optimal sensor performance.
three different lines to every component: Data, VCC, and GND Turbidity measurements are represented in Table 4 with 96%
line. VCC line contains positive voltage charge of 5V while of average accuracy. The accuracy levels are ranged from
GND contains negative voltage with 5V charge. minimum 84.12% (Sample 2) up to 100% (Sample 1). As the
significant range of accuracy, it is necessary to clean the sensor

363 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

regularly to prevent the miscalculation due to the lack of E. Sensor Precision Test
stability of GE type turbidity sensor. 15 trials are conducted on water sample 2 for claiming the
Next, pH level of water samples is listed in Table 5 with consistency of the integrated sensor. Table 7 shows 15 trials of
98.20% average accuracy. The pH of water sample 5 is recording activities of all sensors. Although all sensors operate
measured with perfect accuracy while the accuracy of water in high precision, turbidity sensor operates with the lowest
sample 1-4 are varied with minor differences starting from precision (94.40 %) while pH operates with the most precise
96.08% (Sample 4) - 98.84% (Sample 2). Similar with turbidity (99.15 %).
sensor, regular cleansing is necessary to maintain the accuracy
of pH sensor. TABLE VII
TABLE IV SENSOR PRECISION TEST
TEMPERATURE ACCURACY TEST Trial Temp. (C) pH NTU TDS
Sample Lab Result Sensor Data Diff. Accuracy Error 1 29.12 1.19 23.93 190
1 30.13 29.91 0.22 99.26 % 0.73 % 2 29.06 1.19 22.19 197
2 29.53 29.29 0.24 99.18 % 0.81 % 3 29.06 1.20 19.97 196
3 29.86 29.62 0.24 99.19 % 0.80 % 4 29.00 1.16 22.19 190
4 29.50 29.32 0.18 99.38 % 0.61 % 5 29.00 1.18 22.19 196
6 29.00 1.17 24.42 190
5 29.73 29.41 0.32 98.92 % 1.07 %
7 29.00 1.16 26.62 194
Average 0.24 99.19 % 0.80 %
8 28.94 1.18 17.74 191
9 28.94 1.18 19.97 191
10 28.94 1.19 26.64 194
TABLE IV
11 28.94 1.21 24.42 190
TURBIDITY SENSOR ACCURACY TEST
12 28.94 1.20 22.19 190
Sample Lab Result Sensor Data Diff. Accuracy Error
13 28.87 1.18 23.93 190
1 0.48 0.48 0 100 % 0%
14 28.94 1.22 22.19 191
2 20.13 23.93 3.8 84.12 % 15.87 %
15 28.94 1.21 19.97 191
3 81.87 82.09 0.22 99.73 % 0.26 %
Average 28.97 1.18 22.59 192
4 1350 1361 11 99.19 % 0.81 %
Precision (%) 98.10 99.15 94.40 97.95
5 1160 1175 15 98.72 % 1.29 %
Average 6.04 96.00 % 4.00 &
F. Discussion
TABLE V Like the related studies, this research measures turbidity and
PH SENSOR ACCURACY TEST
pH of water samples. The research is different from other
Sample Lab Result Sensor Data Diff. Accuracy Error
related studies mainly due to the utilization of a different type
1 6.16 6.01 0.15 97.56 % 2.43 %
of micro controller which is Arduino, while all studies on water
2 1.13 1.19 0.06 98.84 % 1.16 %
quality uses Raspberry Pi which is more popular. Additional
3 5.26 5.34 0.08 98.50 % 1.52 %
4 5.40 5.62 0.22 96.08 % 4.07 %
sensors are included in the research design, namely temperature
5 2.66 2.66 0.00 100 % 0.00 %
and TDS. All sensors are able to operate with accuracy and
Average 0.10 98.20 % 1.84 % precision up-to 99%.
If this research is examined very well with Chen and Han’s,
TABLE VI the solution is also disrupted by several aspects during field
TDS SENSOR ACCURACY TEST test. in this case, a modification of sensor is necessary as the
Sample Lab Result Sensor Data Diff. Accuracy Error capability of turbidity sensor in collecting data is affected by
1 132 124 8 96.55 % 3.44 % room temperature and light intensity.
2 206 196 10 95.14 % 4.85 % Compared to Vijaykumar and Ramya, this research tests
3 274 267 7 97.44 % 2.55 % more water samples equipped with a more modern dissolved
4 354 349 5 98.58 % 1.41 % sensor named TDS that indicates how many milligrams of
5 545 543 2 99.63 % 0.36 % soluble solids dissolved in one litter of water instead of regular
Average 6.4 97.46 % 2.52 %
dissolved sensor. However, the research focus is limited in
integrating the sensors and making sure it operates in high
Lastly, the accuracy of TDS sensor to water samples are
performance. Therefore, visualization is one thing to consider
shown in Table 6. While the average accuracy is 97.46%, the
in the future.
sensor’s accuracy in testing each sample ranges from 95.14%
Comparing this research to Rao, Srinija, Bindu, and Kumar,
(Sample 2) to 99.63% (Sample 5). The variety is due to sensor
it is found that their solution is emphasized in more tasks. Not
sensitivity affected by electrical conductivity in each water
only water quality analysis, but their solution also provides
sample.
water supply monitoring. Despite being designed for basic
analysis of water quality, this research includes more sensor
types. In the scope of visualization, this research relies on the

364 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

installed LCD OLED that simplifies the shown information temporal autoregressive rainfall-enso pattern in east java
while their research is well-completed with additional usage of indonesia,” in 2018 Indonesian Association for Pattern
Android mobile application instead of web-platform. Recognition International Conference (INAPR), pp. 75–
The continuation of this research will be focused on several 79, IEEE, 2018.
dimensions. The proximity of water quality analysis and IoT [8] R. E. Caraka, B. D. Supatmanto, M. Tahmid, J. Soebagyo,
applications to various domain such as aquaculture [22] and M. A. Mauludin, A. Iskandar, and B. Pardamean,
agriculture [23; 24; 25] become the main concentrations for “Rainfall forecasting using pspline and rice production
further studies. In addition, web portal is considered as an with ocean-atmosphere interaction,” in IOP Conference
enhancement in the side of visualization to facilitate Series: Earth and Environmental Science, vol. 195, p.
information delivery to users especially for learning purpose 012064, IOP Publishing, 2018.
[26; 27; 28; 29]. [9] R. E. Caraka, S. A. Bakar, B. Pardamean, and A. Budiarto,
V. CONCLUSION “Hybrid support vector regression in electric load during
This research successfully combines temperature, turbidity, national holiday season,” in 2017 International
pH, and TDS sensor with Arduino micro-controller. Based on Conference on Innovative and Creative Information
accuracy and precision test, the proposed integrated sensors Technology (ICIMTech), pp. 1–6, IEEE, 2017.
score up-to 99%. This succession will be followed by further [10] R. E. Caraka, R. C. Chen, T. Toharudin, M. Tahmid, B.
research in how to classify water quality and interpret water Pardamean, and R. M. Putra, “Evaluation performance of
quality analysis in comprehensive application. svr genetic algorithm and hybrid pso in rainfall
forecasting,” ICIC Express Lett Part B Appl, vol. 11, no.
REFERENCES 7, pp. 631–639, 2020.
[1] J. Gubbi, R. Buyya, S. Marusic, and M. Palaniswami, [11] R. E. Caraka, R. C. Chen, S. A. Bakar, M. Tahmid, T.
“Internet of things (iot): A vision, architectural elements, Toharudin, B. Pardamean, and S.-W. Huang, “Employing
and future directions,” Future generation computer best input svr robust lost function with nature-inspired
systems, vol. 29, no. 7, pp. 1645–1660, 2013. metaheuristics in wind speed energy forecasting,” IAENG
[2] S. B. Saraf and D. H. Gawali, “Iot based smart irrigation Int. J. Comput. Sci, vol. 47, no. 3, pp. 572–584, 2020.
monitoring and controlling system,” in 2017 2nd IEEE [12] R. Rahutomo, K. Purwandari, J. W. Sigalingging, and B.
International Conference on Recent Trends in Pardamean, “Improvement of jakarta’s air quality during
Electronics, Information Communication Technology large scale social restriction,” in IOP Conference Series:
(RTEICT), pp. 815–819, 2017. Earth and Environmental Science, vol. 729, p. 012132,
IOP Publishing, 2021.
[3] M. F. Kacamarga, B. Pardamean, and H. Wijaya, “Light
weight virtualization in cloud computing for research,” in [13] R. Rahutomo, K. Purwandari, B. Pardamean, and A. A.
International Conference on Soft Computing, Intelligence Hidayat, “South jakarta’s air quality using pm 2.5 data at
Systems, and Information Technology, pp. 439–445, the beginning of covid-19 restriction,”
Springer, 2015. [14] B. Pardamean, R. Rahutomo, T. W. Cenggoro, A.
[4] S. G. H. Soumyalatha, “Study of iot: understanding iot Budiarto, and A. S. Perbangsa, “The impact of large-scale
architecture, applications, issues and challenges,” in 1st social restriction phases on the air quality index in
jakarta,” Atmosphere, vol. 12, no. 7, p. 922, 2021.
International Conference on Innovations in Computing &
Net-working (ICICN16), CSE, RRCE. International [15] M. Pule, A. Yahya, and J. Chuma, “Wireless sensor
Journal of Advanced Networking & Applications, no. 478, networks: A survey on monitoring water quality,” Journal
2016. of applied research and technology, vol. 15, no. 6, pp.
562–570, 2017.
[5] R. E. Caraka, R. C. Chen, T. Toharudin, B. Pardamean, S.
A. Bakar, and H. Yasin, “Ramadhan short-term electric [16] K. S. Adu-Manu, C. Tapparello, W. Heinzelman, F. A.
load: a hybrid model of cycle spinning wavelet and group Katsriku, and J.-D. Abdulai, “Water quality monitoring
method data handling (csw-gmdh),” IAENG Int J Comput using wireless sensor networks: Current trends and future
Sci, vol. 46, pp. 670–676, 2019. research directions,” ACM Transactions on Sensor
Networks (TOSN), vol. 13, no. 1, pp. 1–41, 2017.
[6] R. Caraka, M. Tahmid, R. Putra, A. Iskandar, M.
Mauludin, N. Goldameir, H. Rohayani, B. Pardamean, et [17] N. Vijayakumar, Ramya, and R, “The real time
al., “Analysis of plant pattern using water balance and monitoring of water quality in iot environment,” in 2015
cimogram based on oldeman climate type,” in IOP International Conference on Innovations in Information,
Embedded and Communication Systems (ICIIECS), pp.
Conference Series: Earth and Environmental Science,
1–5, IEEE, 2015.
vol. 195, p. 012001, IOP Publishing, 2018.
[7] R. E. Caraka, M. Ulhusna, B. D. Supatmanto, N. E. [18] K. R. Rao, S. Srinija, K. H. Bindu, and D. S. Kumar, “Iot
Goldameir, B. Hutapea, G. Darmawan, D. C. R. based water level and quality monitoring system in
Novitasari, and B. Pardamean, “Generalized spatio overhead tanks,” International Journal of Engineering &
Technology, vol. 7, no. 2, pp. 379–383, 2018.

365 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

[19] Y. Chen and D. Han, “Water quality monitoring in smart


city: A pilot project,” Automation in Construction, vol.
89, pp. 307–316, 2018.
[20] M. S. Gradilla-Hernandez, J. de Anda, A. Garcia-´
Gonzalez, D. Meza-Rodr´ıguez, C. Y. Montes, and
Y. Perfecto-Avalos, “Multivariate water quality analysis
of lake cajititlan, mexico,”´ Environmental monitoring
and assessment, vol. 192, no. 1, pp. 1–22, 2020.
[21] D. L. Corwin and K. Yemoto, “Salinity: Electrical
conductivity and total dissolved solids,” Soil Science
Society of America Journal, vol. 84, no. 5, pp. 1442–1461,
2020.
[22] U. Darmalim, F. Darmalim, S. Darmalim, A. A. Hidayat,
A. Budiarto, B. Mahesworo, and B. Pardamean, “Iot
solution for intelligent pond monitoring,” in IOP
Conference Series: Earth and Environmental Science,
vol. 426, p. 012145, IOP Publishing, 2020.
[23] R. Rahutomo, B. Mahesworo, T. W. Cenggoro, A.
Budiarto, T. Suparyanto, D. B. S. Atmaja, B. Samoedro,
B. Pardamean, et al., “Ai-based ripeness grading for oil
palm fresh fruit bunch in smart crane grabber,” in IOP
Conference Series: Earth and Environmental Science,
vol. 426, p. 012147, IOP Publishing, 2020.
[24] B. Samodro, B. Mahesworo, T. Suparyanto, D. B. S.
Atmaja, B. Pardamean, et al., “Maintaining the quality
and aroma of coffee with fuzzy logic coffee roasting
machine,” in IOP Conference Series: Earth and
Environmental Science, vol. 426, p. 012148, IOP
Publishing, 2020.
[25] E. Firmansyah, H. G. Mawandha, B. Pardamean, D. P.
Putra, C. Ginting, and T. Suparyanto, “Development of
artificial intelligence for variable rate application based
oil palm fertilization recommendation system,”
[26] J. W. Baurley, A. S. Perbangsa, A. Subagyo, and B.
Pardamean, “A web application and database for
agriculture genetic diversity and association studies,”
International Journal of Bio-Science and Bio-
Technology, vol. 5, no. 6, pp. 33–42, 2013.
[27] B. Pardamean, J. W. Baurley, A. S. Perbangsa, D. Utami,
H. Rijzaani, and D. Satyawan, “Information technology
infrastructure for agriculture genotyping studies,” Journal
of Information Processing Systems, vol. 14, no. 3, pp.
655–665, 2018.
[28] J. W. Baurley, A. Budiarto, M. F. Kacamarga, and B.
Pardamean, “A web portal for rice crop improvements,”
in Biotechnology: Concepts, Methodologies, Tools, and
Applications, pp. 344–360, IGI Global, 2019.
[29] E. Firmansyah, D. Nurjannah, S. Dinarti, D. Sudigyo, T.
Suparyanto, and B. Pardamean, “Learning management
system for oil palm smallholder-owned planta-
tions,”

366 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Street View Object Detection for Autonomous Car


Steering Angle Prediction Using Convolutional
Neural Network
Ilvico Sonata Yaya Heryadi Widodo Budiharto
Computer Science Department, BINUS Computer Science Department, BINUS Computer Science Department, BINUS
Graduate Program – Doctor of Graduate Program – Doctor of Graduate Program – Doctor of
Computer Science Computer Science Computer Science
Bina Nusantara University Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia 11480 Jakarta, Indonesia 11480 Jakarta, Indonesia 11480
[email protected] [email protected] [email protected]

Antoni Wibowo
Computer Science Department, BINUS
Graduate Program – Doctor of
Computer Science
Bina Nusantara University
Jakarta, Indonesia 11480
[email protected]

Abstract— Autonomous car research is currently developing To improve the effectiveness and efficiency of the
rapidly to find optimal and accurate steering angle and speed autonomous car model, the shortcomings of LIDAR and
control. Various sensors such as cameras, LIDAR, and RADAR RADAR will be overcome through a CNN deep neural
are used to recognize the surrounding environment to determine network approach using camera sensors.
the correct steering angle prediction in avoiding obstacles. In
addition to being expensive, LIDAR and RADAR have several The next chapter will discuss the literature review of
drawbacks such as the level of accuracy that depends on the previous work related to the use of cameras, RADAR, and
weather and the ability to detect adjacent objects. This paper LIDAR as autonomous driving controllers, proposed new
will propose the prediction of steering angle and speed control methods, and experimental results that have been carried out.
in autonomous cars based on the detection of street view objects
such as cars in front, traffic signs, pedestrians, and lane lines. II. LITERATURE REVIEW
The process of object detection and prediction of steering angle, In several previous studies, the use of LIDAR and
as well as prediction of speed control using a convolutional RADAR as sensors to detect the surrounding area has been
neural network (CNN) through video captured using a single proposed by Lee et al. [2], Hajri & Rahal [3], and Faraq [4]
camera. In this method, other sensors such as LIDAR and
for estimating the distance between vehicles. However,
RADAR are no longer needed so that the costs required are
lower and the weaknesses found in LIDAR and RADAR can be
RADAR and LIDAR have some drawbacks. Based on
eliminated. The results obtained are very good with 92% research conducted by Zimmerman & Wotawa [5], RADAR
accuracy for steering angle prediction and 85% for speed has the disadvantage that it cannot distinguish close objects
control prediction. The autonomous car can run well in the and the accuracy of LIDAR is highly dependent on weather
simulator environment through the video taken on the real road. such as rain, snow, dust, and fog.
Several studies using computer vision in detecting objects
Keywords—CNN, object detection, single camera, steering
for autonomous cars have also been carried out by previous
angle
researchers. For instance, Tarmizi & Aziz [6] and Fan et al.
I. INTRODUCTION [7] proposed vehicle detection using CNN. Chen & Huang [8]
and Hbaieb et al. [9] proposed pedestrian detection using
An autonomous car must have the ability to detect HOG, SVM, and CNN. Pizzati et al. [10] and Mamidala et al.
obstacles such as pedestrians and vehicles in front of it and [11] proposed lane line detection using CNN. Alghmgham et
adjust its steering angle or speed to avoid those obstacles. In al. [12] proposed traffic sign detection using CNN. From these
addition, autonomous cars must also be able to follow lane studies, no one included the results of steering angle
lines and traffic signs in order to follow the applicable rules. predictions in their research results.
Without human intervention, autonomous cars must be able to
carry out their driving duties properly and provide comfort for On the other hand, some previous studies predicting
their passengers. steering angle using artificial intelligence did not cover all
object detection systems as previously described. Examples
Research in the field of artificial intelligence for include the research conducted by Chishti et al. [13], Kocic et
autonomous car control is increasing along with the need for al. [14], Bojarski et al. [15], and Do et al. [16].
more effective and efficient models. At the same time, the use
of sensors such as infrared cameras, RADAR, LIDAR, and This paper will discuss the prediction of steering angle and
SONAR is increasing to support artificial intelligence speed control using deep learning CNN based on the detection
decision-making systems despite their many shortcomings of street view objects such as cars in front, pedestrians, lane
[1]. lines, and traffic signs.

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

367 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

III. PROPOSED METHODS A. Street View Object Detection


CNN deep learning has been widely used for image In this method, the object detection that will be carried out
classification and detection. In general, the block diagram of are car detection, lane line detection, traffic sign detection, and
CNN deep learning can be seen in Fig. 1. The image from the pedestrian detection.
camera will be pre-processed first to reduce the pixel size
Before the detection process is carried out, CNN must be
through the reshaping process. The pre-processed image will
trained to be able to detect the desired object. This training
be processed by convolution to get a feature map in the form
process requires datasets in the form of images of cars, lane
of an array by the feature extractor. Furthermore, the feature
lines, traffic signs, and pedestrians. To create a dataset, the
map is reduced without losing important information through
image from the camera must be pre-processed. Image pre-
the down sampling process by max pooling by taking the
processing steps can be seen in Fig. 3.
maximum value. The output of max pooling will be converted
into a one-dimensional matrix through the flatten process
before being fed into a deep neural network for object Image from Convert
Cropping Augmentation
detection [17]. camera to grey

Fig. 3. Images pr-processing steps.

Image from Pre- The image from the camera will be pre-processed first to
camera processing remove unnecessary parts through the cropping process. With
this cropping process, the image size becomes smaller and the
detection process becomes more focused on the desired
object. After the cropping process, then the conversion
process to dark is carried out in order to provide a higher
Max contrast level on the desired object. The augmentation process
Pooling F is also carried out to obtain several combinations of images
l such as flipping, rotation, mirroring, and zooming to produce
Output
a generalizations of objects to be detected [18].
t
t After the dataset is created, the next process is to create a
e class through the labeling process on the dataset. This dataset
n has 4 classes, namely cars, lanes, traffic signs, and pedestrians.
Fully connected The traffic sign dataset is also divided into 4 more classes,
layer namely do not turn left, do not turn right, forbidden, and others
Fig. 1. CNN deep neural network. such as U-turn, no parking, mandatory speed, etc. With this
dataset, CNN will be trained to detect the desired object as
shown in Fig. 4.
In order to predict steering angle and control speed in
autonomous cars, object detection on the street view must be
carried out first. So, in this method, two processes are carried Car dataset
out using CNN deep learning, namely object detection and
prediction of steering angle and speed control as shown in Fig.
2. Lane line
dataset Trained
P
object
r A CNN
u
detection
e
Steering t Traffic sign model
o
p angle dataset
Street n
Image r view prediction o
from o m
object o Pedestrian
camera c detection dataset
u
e Speed s
s control Fig. 4. CNN object detection training process.
s prediction c
i a After the training process is completed, a trained CNN
n
r model will be obtained to detect cars, lane lines, traffic signs,
and pedestrians as shown in Fig. 5.
g

Image Trained CNN


Object
Fig. 2. The proposed method for steering angle and speed control from object detection
prediction. detected
camera model

Fig. 5. CNN object detection process.

368 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

B. Steering Angle and Speed Control Prediction IV. EXPERIMENT AND RESULT
After the objects contained in the street view such as cars, In the CNN training process for street view object
lane lines, traffic signs, and pedestrians can be detected, detection, 6,500 car datasets were taken from Udacity
predictions are made for steering angle and speed control. This (https://s3.amazonaws.com/udacity-
prediction process also uses CNN where the dataset needed sdc/Vehicle_Tracking/vehicles.zip). 7,000 traffic sign
for the training process is an image of the detection result datasets were taken from the German Traffic Sign
which is equipped with a label in the form of steering angle Recognition Benchmark (GTSRB)
and the active position of the brake or gas pedal. The complete (https://benchmark.ini.rub.de/), 6,000 lane datasets were taken
training process can be seen in Fig. 6. from the Tusimple dataset (https://s3.us- east-
2.amazonaws.com/benchmark-
In the early stages, CNN will predict the steering angle and
frontend/datasets/1/train_set.zip), and 4,000 pedestrian
speed control through the brake and accelerator positions
datasets were taken from VIPER: VISUAL PERSON
based on the detection results from cars, lane lines, traffic
signs, and pedestrians. This prediction result is then compared DETECTION MADE RELIABLE
with the actual value of the dataset. The difference between (https://iiw.kuleuven.be/onderzoek/eavise/viper/dataset).
the predicted value and the actual value becomes the input for An example of the image pre-processing carried out can
backpropagation to set the CNN weight value [19]. This be seen in Fig. 8.
process will repeat until the predicted value is close to the
actual value.
Convert to
Image from Camera dark

Car Lane line Traffic


Pedestrian
image image sign
detection
detection detection detection
Cropping and
convert to dark

CNN

Steering
Convert to
angle and Actual dark
Brake/Gas steering
prediction angle and
Brake/Gas
Weight

Fig. 6. CNN steering angle and speed control training process.

After the training process is complete, the system will


generate a steering angle prediction model to determine the
direction of the autonomous car and brake or gas prediction to
determine the speed of the autonomous car based on the street
view object detection. Convert to
dark
Fig. 7. shows the proposed steering prediction model and
speed control system to control autonomous cars based on the
detection of street view objects.

Steering
angle
prediction
Street
Image Trained
view
from CNN Fig. 8. Examples of image pre-processing.
object
Camera
detection Brake/Gas
prediction The training process used a laptop with an i5-4200U CPU,
12GB of memory equipped with an NVIDIA Geforce 740M
Fig. 7. CNN autonomous car driving model.

369 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

GPU. 80% of the images were used for training and 20% were
used for validation. The CNN architecture used consisted of 3
convolution layers where each layer used max pooling for
down sampling. One flatten layer and 2 fully connected layers
were also used to complete this network. The training process
used 10 epochs and took 3 hours to complete.
The detection results using the CNN network can be seen
in Fig. 9.

Fig. 11. Training and validation loss results.

Using a simple python program, the steering angle


prediction display and speed adjustment recommendations via
the brake and gas pedals can be seen on the simulator taken
from video footage. Fig. 12 shows a simulation on a straight
road without obstacles in front of it so that the steering angle
tends to be straight and the gas pedal was pressed. Fig. 13
shows a simulation on a turning road with an obstacle in the
form of a car in front so that the steering angle showed the
right turn position and the braking position was pressed.
Fig. 9. Street view object detection results.

After detecting the street view object, the next step is to


predict the steering angle and speed. Prediction of speed
regulation is done by recommending the use of the brake or
gas pedal. At this stage, training on the CNN network will be
carried out to obtain the desired prediction model based on the
image dataset containing the results of street view object
detection.
The dataset used consisted of 2,000 images of street view
object detection results equipped with a steering angle and the
active position of the brake or gas label for each image. Using
the same laptop as the previous training process, where 80%
of the images were used for training and 20% were used for
validation. The CNN network used consisted of 4 convolution
layers with max pooling for each layer, one flatten layer, and
2 fully connected layers. Training and validation results using
10 epochs can be seen in Fig. 10 and Fig. 11.

Fig. 12. Simulator when road conditions were straight and there were no
obstacles.

Fig. 10. Training and validation accuracy results.

370 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

From the graph, it can be seen that the accuracy obtained


is 92% for steering angle prediction and 85% for speed control
prediction. The autonomous car was able to run smoothly in
the simulator environment.
Based on the experiments carried out, the number of
convolution layers and fully connected layer used did not
cause a significant delay in the processing time of the resulting
steering angle and speed regulation prediction.

V. CONCLUSION
The proposed model can run well in the simulator
environment through videos captured on the road. The system
can recognize street view objects in the form of cars, lane
lines, traffic signs, and pedestrians to produce steering angle
predictions with 92% accuracy and speed control with 85%
accuracy.
Further development can be done by adding objects that
will be detected in the street view such as traffic lights,
motorcycles, and rail crossing gates so that the ability of
autonomous cars to avoid obstacles in front of them is
increasing.

Fig. 13. Simulator when road conditions turn right and there was an obstacle REFERENCES
in front of it.

From the experiments carried out, the system was found to [1] Y. Ma, Z. Wang, H. Yang, and L. Yang, “Artificial intelligence
be able to follow the line and able to avoid a collision with the applications in the development of autonomous vehicles: A survey,”
car in front of it by adjusting the steering angle and speed via IEEE/CAA J. Autom. Sin., vol. 7, no. 2, pp. 315–329, 2020.
gas or brake recommendations. Prediction of steering angle [2] H. Lee, H. Chae, and K. Yi, “A geometric model based 2D
also did not lead to roads prohibited by traffic signs. LiDAR/radar sensor fusion for tracking surrounding vehicles,” IFAC-
PapersOnLine, vol. 52, no. 8, pp. 277–282, 2019.
Fig. 14. shows the comparison between the actual and [3] H. Hajri and M.-C. Rahal, “Real time lidar and radar high-level fusion
predicted values of steering angle and speed control through for obstacle detection and tracking with evaluation on a ground truth,”
gas/brake pedal. 2018, [Online]. Available: http://arxiv.org/abs/1807.11264.
[4] W. Farag, “Kalman-filter-based sensor fusion applied to road-objects
detection and tracking for autonomous vehicles,” Proc. Inst. Mech.
Eng. Part I J. Syst. Control Eng., 2020.
60.000 [5] M. Zimmermann and F. Wotawa, “An adaptive system for autonomous
Actual driving,” Softw. Qual. J., vol. 28, no. 3, pp. 1189–1212, 2020.
50.000
[6] I. A. Tarmizi and A. A. Aziz, “Vehicle detection using convolutional
Prediction
neural network for autonomous vehicles,” Int. Conf. Intell. Adv. Syst.
40.000
Steering Angle

ICIAS 2018, 2018.


30.000 [7] S. Haag, B. Duraisamy, and W. Koch, “Radar and lidar target
ssignatures of various object types and evaluation of extended object
20.000 tracking methods for autonomous driving applications,” pp. 1746–
1755, 2018.
10.000 [8] Z. Chen and X. Huang, “Pedestrian detection for autonomous vehicle
using multi-spectral cameras,” IEEE Trans. Intell. Veh., vol. 4, no. 2,
0.000 pp. 211–219, 2019.
1
51
101
151
201
251
301
351
401
451
501
551
601
651
701
751

[9] A. Hbaieb, J. Rezgui, and L. Chaari, “Pedestrian detection for


-10.000 autonomous driving within cooperative communication system,” IEEE
Wirel. Commun. Netw. Conf. WCNC, vol. 2019-April, pp. 1–6, 2019.
Frame [10] F. Pizzati, M. Allodi, A. Barrera, and F. García, “Lane detection and
classification using cascaded CNNs,” Lect. Notes Comput. Sci.
Gas (including Subser. Lect. Notes Artif. Intell. Lect. Notes
Bioinformatics), vol. 12014 LNCS, pp. 95–103, 2020.
[11] R. S. Mamidala, U. Uthkota, M. B. Shankar, A. J. Antony, and A. V.
Brake Narasimhadhan, “Dynamic approach for lane detection using google
street view and CNN,” IEEE Reg. 10 Annu. Int. Conf.
1
43
85
127
169
211
253
295
337
379
421
463
505
547
589
631
673
715
757

Proceedings/TENCON, vol. 2019-Octob, pp. 2454–2459, 2019.


[12] D. A. Alghmgham, G. Latif, J. Alghazo, and L. Alzubaidi,
“Autonomous traffic sign (ATSR) detection and recognition using deep
Frame CNN,” Procedia Comput. Sci., vol. 163, pp. 266–274, 2019.
Fig. 14. Actual VS predictions for steering angle and gas/brake. [13] S. O. Ali Chishti, S. Riaz, M. Bilal Zaib, and M. Nauman, “Self-driving

371 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Cars using CNN and Q-Learning,” Proc. 21st Int. Multi Top. Conf. [17] K. O’Shea and R. Nash, “An introduction to convolutional neural
INMIC 2018, 2018. networks,” pp. 1–11, 2015, [Online]. Available:
[14] J. Kocić, N. Jovičić, and V. Drndarević, “An end-to-end deep neural http://arxiv.org/abs/1511.08458.
network for autonomous driving designed for embedded automotive [18] A. Mikołajczyk and M. Grochowski, “Data augmentation for
platforms,” Sensors (Switzerland), vol. 19, no. 9. 2019. improving deep learning in image classification problem,” 2019 Int.
[15] M. Bojarski et al., “End to end learning for self-driving cars,” pp. 1–9, Interdiscip. PhD Work. IIPhDW 2019, pp. 117–122, 2019.
2016, [Online]. Available: http://arxiv.org/abs/1604.07316. [19] A. S. Weigend, D. E. Rumelhart, and B. A. Huberman, “Back-
[16] T. D. Do, M. T. Duong, Q. V. Dang, and M. H. Le, “Real-time self- propagation, weight-elimination and time sseries prediction,” Connect.
driving car navigation using deep neural network,” Proc. 2018 4th Int. Model., pp. 105–116, 1991.
Conf. Green Technol. Sustain. Dev. GTSD 2018, pp. 7–12, 2018.

372 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Extract Transform Loading (ETL) Based Data


Quality for Data Warehouse Development
Munawar
Dept of Computer Science
Esa Unggul University
Jakarta, Indonesia
[email protected]

Abstract— Extract Transform Loading (ETL) plays a keep high quality of data. Wrong data analysis result, wrong
decisive role in data warehouse (DW) construction. It involves figures in reports are examples of low DQ that will effect on day-
retrieval informations from multiple sources to improve to-day operation of an organisation. Therefore, using a
information quality in DW for decision making process. A DW framework for managing DQ in ETL is necessary to guarantee
development relies on the development of ETL. Therefore, ETL that high quality levels are maintained.
conceptual model not only represents an overview of overall Risk of DW project and DW development time can be reduced
process, but also as a mapping amongst data sources, DW sharply by developing a general standard and good performance
targets and required transformation to make sure that data of ETL framework [2].Without proper ETL framework with DQ
quality (DQ) dimensions are incorporated in order to meet the
consideration, even accummulation of minor errors can result in
requirements. In this paper, an ETL framework is proposed
loss of revenue, ineffiecient processes and failure to comply with
which incorporates data quality to improve information
industry and government regulations (the butterfly effect [11]).
processes in data warehouse development through ‘the story’
ETL framework should be made in order to show the overall
of process whilst others framework more to technical
process and also as a mapping amongst data sources, DW targets
approach. In order to be useful, the proposed framework
and required transformation meet the requirements and the
compared with other framework in case of advantages and
required structure and content exist.
disadvantages for future improvement.
Mapping of ETL processes are time consuming and prone to
failure [12] and costly [13] with at least a third of efforths and
Keywords—data quality, data quality incorporation, ETL, budgets expenses of a DW [12]. Therefore, the feasibility and
style, data warehouse ease of maintaining a conceptual model of ETL process is
urgently needed. Several ETL design guides have been
I. INTRODUCTION proposed, however, guides are not enough [12], they indicate
The information in DW (data warehouse) is mainly obtained subjective decisions [14] and this can lead to difficult-to-
from operational systems through a set of processes of maintain ETL processes.
extracting, transforming and loading data called ETL into the Traditionally, ETL process has been designed by taking into
DW. ETL is an important component of DW, starting with account a certain vendor tool from early stage of DW project life
extraction of data from various heterogeneous data sources, cycle. There is a lack of vendor-neutral and plat-form
transforming those data to the required format and then loading independent for developing ETL processes. To resolve this issue,
those data into the DW. More than 60% to 80% of total effort in this work proposes ETL based DQ for DW development. This
DW development is allocated to ETL construction [1; 2]. framework incorporates all important DQ dimensions in the
Therefore ETL has a very important role in DW entire ETL processes and then ETL visualization can be used to
development. show ‘the story’ of ETL process.
With the large amount of generated data, the fast speed of data This paper is arranged in the following order. The first section
that arrives and the diversity of heterogeneous data, error rate of presents ETL processes. After that discusses data quality in data
warehouse, then followed by data quality incorporation in ETL
data is above 30% [3]. This leads to considering ETL stage as
and a sample case study to implement the proposed framework in
plenty area of data quality (DQ) problems [4]. a real case. Furthermore, result and discussion is presented in the
Data Quality (DQ) issues are highly essential matters for next section. Finally, this paper ends with conclusion and future
consideration in DW projects [5]. Most of the time, however, work.
DW developers neglect the effects of low-quality data on such
initiatives [1]. Many organizations are already aware of DQ II. ETL PROCESSES
issues, but their improvement efforts are generally focused on
ETL is a process of integrating data from many sources
only data accuracy; many other equally important DQ attributes
(usually heterogeneous) into a DW database. Considering the
and dimensions are disregarded [6].
huge amount of data processed and the number of data sources
Several surveys indicate that a significant percentage of DWs
that should be integrated, designing ETL processes is very
fail to satisfy expectations or are outright failures. Failure rates
complicated and considerably time consuming. There are three
vary, but these usually average from 20% to 50% [7; 8; 9]. The
important parts in ETL process: extraction, transformation and
failure can be traced to a single cause: the absence of quality
loading. The following are detail explanation about these
[10]. These statistics highlight the necessity of developing
processes [15]:
methods designed to incorporate DQ dimensions that are critical
a. Extraction: the extract function reads data sources from all
to DW development.
types of heterogenous data and cleansing the data, which is
With large amount of data processed in ETL (by various
the reason of all the work.
processes for converting data into DW data; to a certain extent,
b. Transformation: the transform function works with the
therefore, these processes also affect DQ), it becomes harder to
extracted data from above process by some prearrange rules

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

373 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

or lookup tables or by combining the data with other data, III. DATA QUALITY IN ETL
and checking some redundant, ambiguous, and incomplete In the DW process, data are processed in several phases,
data to convert it to the desired state/ format. Information wherein each phase causes different changes to the data and then
about data sources and targets have to be known in order to satisfies user requirements by providing information in the form
complete the transformation process from the source format of a chart or report.
to the target DW format. The ETL process starts with data extraction from data
c. Loading: the load function is used to import the above data sources to be loaded into the DW. To avoid mispopulating data
to target DW system by all or by planned increment, which into the DW, formal evaluation is needed to assess the quality of
may or may not previously exist. this process. However, to the best of our knowledge,there have
A few methods for designing ETL are available [16]. been no proposals evaluating the quality of ETL processes.
Immediate and deferred ETL designs are options suitable for Therefore, we try to ensure the quality of ETL processes by
data extraction. Immediate extraction can be adopted for real- incorporating DQ dimension in the entire ETL processes. DQ
time data extraction or after a transaction is initiated. By leads to data that is truly fit for business needs that are consistent,
contrast, deferred extraction is appropriate when data extraction accurate, complete and unambiguous. Data cleaning leads to
is not performed immediately after the initiation of a transaction. activities to define and detect unwanted, inconsistent, incorrect
Immediate extraction can be classified into three options: and corrupted data to improve DQ.
extraction through the transaction logs from database source; The most often cited benefit of DW construction is
extraction through database triggers; and extraction of files from improvement in information quality for decision making process
source systems. Immediate extraction using source files imposes [17] through improvement of information processes [18]. In
additional loads on on-line transaction processing (OLTP) today’s era, information plays a decisive role in organizational
source systems because of the need for extra processing to success. All information obtained by decision makers must be in
good quality to support decision making. The literature discusses
identify modifications to source data. This extraction method is
several initiatives for the inclusion of DQ dimensions in ETL as
typically more costly than extraction via transaction logs and
summarised in Table 2.
database triggers, but it may serve as a last option in case legacy
file-oriented systems are used to store data. The most attractive IV. DATA QUALITY INCORPORATION IN ETL
option is data extraction through transaction logs given that it
imposes no extra overhead on OLTP systems. This method is ETL processes can be represented in a high-level description.
appropriate for cases wherein a database contains all the source First, source data (OLTP or legacy systems) are extracted. The
data available. Extraction through database triggers is another extracted data is information that has just been inserted, updated
favourable option because it imposes little extra load on system or deleted, which is different from the information obtained in
resources; nevertheless, writing trigger procedures remains a the previous ETL execution. One extract file matches the source
requirement. table, which has the same attributes as the source tables, but the
Deferred extraction techniques can be categorised into data types are similar to the target table in DW. Transformation
extraction using time stamping and extraction using file of one or more extracted files forms pre-loaded file, whose
comparison programmes. The former is preferable because it is attribute are the same with target tables. Both extracted file and
less time consuming than the file comparison technique. pre-loaded file (as intermediate files) are flat file format to assure
When near-real-time data are needed in a DW, immediate the unification of ETL performance and to improve the running
extraction should be adopted in transferring data sources to speed of ETL. Homogenisation, cleansing, filters and checks are
the most frequently applied transformations to ensure that data
the DW also in a near-real-time manner. Otherwise, the
put into DW complies with business regulations and integrity
deferred technique can be used to minimise negative effects
constraints as close as doing the transformation schema to assure
on OLTP source systems. the suitability of data to the target DW schema. Ultimately, the
data are loaded to the central DW and all its counterparts (e.g.
TABLE I. TRANSFORMATION TYPE IN ETL [15]
data marts/ DMs and data views).
Data receiving, entering, integrating, maintaining and
ETL Process Transformation Type processing (extraction, transformation and cleansing) and
loading determine the quality of data in ETL. Data quality has to
Extraction Immediate be incorporated in ETL design in order to keep the DW
• Transaction log trustworthy for business users to support their decision making
• Database trigger process. ETL plays a major role in data cleansing and DQ
• Source file process as it aids automate most of the cleansing process. During
Deferred extract, transform and load, DQ can be incorporated. Table 2
• Time stamp based shows DQ dimensions that should be considered in ETL.
• File comparison ETL is a complex process because many goals must be
Transformation • Manual achieved [19]. In this paper, ETL process is simplified by
separating flow of data that contain records of transformation
• Automated
only for attribute transformation. Visualization process can be
• Manual/Automated used to simplify data flow, keys can be used to represent records,
Loading • Incremental and description of attributes will be explained in accompanied
• Full refresh table. representation of a table (source or target) can be seen in
Figure 1a. It represents the following information:
a) table type expresses the type of table whether database
table or flat file
b) type of load shows the loading type of table including the
following: TI (truncate/ insert), I (insert), U (update), M

374 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

(merge) or SCDx (Slowly changing dimension where x


states type) if possible.
c) number of rows or sizes denotes the quantity of projected
or actual lines and overall size of the table in bytes.
Representation of transformation based recording can be
seen on Figure 1b. Each transformation shows what kind of
transformation occur, key or condition used, transformation
description, DQ dimensions that can be incorporated as shown
in Table 2, and transformation type as indicated in Table 1.
Regular data refreshment need to be made for DW with data
collected from different sources system. However, these data
must be fully accurate, consistence and correct. In cleansing
phase, data is cleaned and adjusted to the ETL process and then
selected, aggregated and aligned to the business process
dimension model. Based on the above description, a proposed Fig. 1. Visual representation of tables (a) and ETL operation (b)
quality-based framework for ETL can be depicted in Figure
2.Complete all content and organizational editing before V. PRACTICAL EVIDENCE
formatting. Please note sections A-D below for more To explain the use of the ETL framework in Figure 2,
information on proofreading, spelling and grammar. consider the following illustration. The real condition in the
system as shown in Figure 3a is then translated into a DW
TABLE II. DATA QUALITY DIMENSIONS IN ETL
billing fact table which is correlated with dimensions of
Data Quality Dimension ETL Process student and date as illustrated in Figure 3b.
Comprehensive Based on the proposed ETL framework as shown in Figure
[24]
ness 2, flow of data is represented by four kind of files: data
[27;31;24;29;32 sources, extract files, preload files and target tables. In this
Accuracy
Relevance ;23]
case, the 'bill' will be processed into the following tables: Bill,
Clarity [24]
Bill_Extract, Bill_Preload and finally loaded to Billing fact.
Content Applicability [24] Data will be loaded in batches every day at night. Visual
Conciseness [24;26]
representation of this ETL process can be seen in Figure 4.
Informati [27;31;24;29;32
Consistency
on Soundness ;26]
Quality Correctness [24;23]

[20] Currency [24;29; 26]

Convenience [26]
Process
[27;31;24;29;32
Timeliness
;26]
[28;30;1; 21;29;
Access Accessibility
26]
Security [28;23]
Infrastruct
ure
Maintainability [25;24;22]

Speed [24; 26]

After data flow visualization can be obtained, a series of Fig. 2. Proposed quality-based framework for ETL design in data
sentences can be decomposed. Based on transformation to the warehouse development
other, combination of transformation type conducted, key and
description and also DQ dimensions incorporation, which
combination of these condition describe the 'story' of the
transformation of data records. Overall description of this
process can then be conveyed amongst ETL architects,
developers, business users and other groups or individuals
who are interested.

375 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

from the source tables, fixing, normalizing and then integrating


it into the DW to get a better business picture. During this
process DQ can be incorporated as follow:
 Accuracy: accuracy is used to ensure if data is accurately
loaded to the target DW as per the expectation through value
comparison between data in source and target DW. Common
operators are minus and intersect operators. The result of
these operators can be considered as deviation in value in the
target DW and the source system.
 Applicability: specific strategies can be employeed to judge
applicability of ETL in DW marketing:
a) Auditability. It is essential to be able to trace the path that
data takes from source to DW target and be able to
identify any transformation that are applied on values
along the way.
b) Subject orientation. Workloads are to be divided into
business subject areas rather than source system
groupings.
c) Repeatability. Re-run jobs are needed to achieve
consistent and predictable results each time.
Fig. 3. Model of source and target tables  Conciseness: removal of unneeded elements (concieseness)
is common in ETL, although in fact the users in the
university just pay attention to the need for information, not
about conciseness.
 Consistency: consistency can be reached through referential
integrity accross various data sources.
 Correctness: correctness of the data and its structures being
transformed can be measured againts the rules and
requirements of the organization itself.
 Currency: the ETL process is a regular event; DW data
refreshment are processed daily. ETL also changes and
evolves as the DW evolves, so ETL process must be
designed for ease of modification; ETL process must be
automated and operational procedure documented.
 Convenience: convenience means corresponse to user need.
Therefore, user needs should be the driver of ETL. Input
from business users sometimes are needed to deal with issues
related to existing rows as described in the following
Fig. 4. Visualization of ETL data flow process for student billing scenarios.
 Usually, before facts are processed, it is important to ensure
Based on visualization process on Figure 4, ‘the story’ of that each dimension table associated with facts must contain
this process can be constructed as follow: the process reads at least one row. But the ETL process must be able to explain
current day transaction in billing and billing detail from the unexpected. Therefore, one solution were prepared: save
database and joins them (after checking its quality such as the bad row of fact to an error table in order to keep the fact
completeness) and recorded in flat file bill_extract. Anytime table and dimensions table clean. However, correction plan
bill_extract inserted new record will be aggregated (after already prepared to correct the system:
checking its quality such as completeness), grouped by a) Error table rows are automatically sent via surrogate key
ID_bill. Every bill will be joined (after checking its quality pipeline when additional data are inserted in the
such as completeness) to recognize a student on the bill. Bill dimension members
which has all data needed to be recorded to Billing_Fact table b) There is a need to assign a specific people to evaluate the
and billing without a student or missing data are saved to error errors and communicate them to those who may concern
table. for correction
VI. RESULT AND DISCUSSIONS  Timeliness: to ensure data timeliness, the DW is refreshed on
daily basis, which means the users have a one-day delayed
Populating a DW database with fresh data, extracted from view.
appropriate sources, transformed and cleansed is the  Traceability: traceability can be used in ETL for resolution
responsibility of ETL for compliance with target schemata. A of source elements in the target models during
series of transformation processes (from simple to complex transformation in order to avoid repeated mapping execution.
operations) must be passed by data sources. The question is Traceability facilitates audit processes and establishes
whether the data resulting from this transformation process is information credibility.
still accurate and reliable after being populated into the DW? The  Accessibility: access data on demand can be realized by ETL
next question, is the data that should be extracted is really through integration of many data sources.
extracted, transformed and loaded?  Security: for confidential data in the university, security
DQ enforcement is very time-consuming because of the requirements in ETL can be:
complexity of process that must be carried out since parsing data a) secure data transfer

376 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

b) secure server access by IT people, but • ETL


c) secure data access. also by everyone process
Legislative requirements have to be adhered during the models
more usable
data transformation, where also taking into account any
wider implications.
 Maintainability: ease of maintenance is very important in the Based on comparison between the proposed framework
ETL workflow. The simpler the ETL design, the easier to and others framework [33;34], the following conclusion can
maintain and the more complex the ETL design, the more be drawn:
difficult to track changes.
• The proposed framework used flow of process to show
 Speed: speed won't be a serious problem at the beginning due
‘the story’ of process whilst others framework more to
to a little data. However, as the university data grows larger,
the ETL process will take progressively longer. The
the technical approach. Therefore, it is difficult to
following are scenario to speed up the ETL: understand for non IT people.
a) Lead data incrementaly to improve ETL performance. • DQ dimensions are incorporated in the entirety of ETL
b) Partition large table such as admission to improve data processes since data extraction till loading the data into
processing. DW database
c) Data caching speeds up the process through direct access Ease of customisation is the advantage of the proposed
to memory. Otherwise, access to the hard drive will be framework because the ETL processes is easy to understand.
slower than access to memory
VII. CONCLUSIONS AND FUTURE WORK
In order to be useful, the proposed framework should be In this paper, we proposed an ETL framework which
compared with other framework. Through comparison with incorporates data quality to improve information processes in
other framework, the advantages and disadvantages of each data warehouse development. To be a single framework, ETL
framework can be known as the basis for future improvement. design of DW should be treated as an integral part of the DW
development. Therefore, our future work is synthesizing our
TABLE III. COMPARISON BETWEEN THE ETL PROPOSED FRAMEWORK proposed ETL DQ-based framework with our previous DQ-
AND OTHER FRAME WORK
based works in requirements analysis and conceptual design
ETL Proposed Automated UML activity [35], logical design [36] and physical design [37] to be a
Framework ETL Testing diagram for single framework for the DW development as described in
[33] modeling
[38,39]
ETL [34]
DQ Comprehensiveness Completeness Usability and
incorporated , accuracy, clarity, , consistency, ease of
REFERENCES
applicability, uniqueness, maintenance. [1] Kimball, R., Ross, M., Thornthwaite, W., Mundy, J. and Becker, B.
conciseness, validity, (2008). The Data Warehouse Life Cycle Toolkit: Practical Techniques
consistency, timeliness, for Building Data Warehouse and Business Intelligent Systems. Second
correctness, accuracy Edition. Wiley Publishing, Inc. IN 46256
currency, [2] Li, Lunan. (2010). A Framework Study of ETL processes Optimization
convenience, Based on Metadata Repository. 2nd International Conference on
timeliness, Computer Engineering and Technology. 978-1-4244-6349-7/IEEE
traceability, [3] Celko, J. and McDonald, J. (1995). Don’t warehouse dirty data.
interactivity, Datamation 41(19), 42–53 (1995)
accessibility, [4] Conner, D. (2003). Data warehouse failures commonplace, Network
security, World, vol. 20, p. 24.
maintainability, [5] English, L.P. (1999). Improving Data Warehouse and Business
speed Information Quality. John Wiley & Sons
DQ Value comparison, Integrity No specific [6] Prakash, N., Singh, Y., Gosain, A. (2004). Informational Scenarios for
incorporatio trace the path, constraint, explanation Data Warehouse Requirements Elicitation. In: Atzeni, P., Chu, W., Lu,
n technique business subject field H., Zhou, S., Ling, T.-W. (eds.) ER 2004. LNCS, vol. 3288, pp. 205–
areas rather than mapping, data 216. Springer, Heidelberg
source system freshness, [7] Agosta, L. (2004). Data Warehousing Lessons Learned: A Time of
groupings, measure Growth for Data Warehousing, in DM Review Magazine, 2004, pp.
referential integrity, aggregation Retrieved on 29/3/2011, from
automated process, http://www.dmreview.com/article_sub.cfm?articleId=1012461
save the bad row of [8] Conner, D. (2003). Data warehouse failures commonplace, Network
fact to an error World, vol. 20, p. 24.
table, partition [9] Watson, H., Ariyachandra, T. and Matyska, Jr, R. J. (2001). Data
large table Warehousing Stages of Growth, Information Systems Management,
ETL process ETL process shows More to the More to the Vol. 18, Issue 3, June 2001, pp. 42–50
‘the story’ of technical technical [10] Cowie, J. and Burstein, F. (2007). Quality of data model for supporting
process approach. approach. mobile decision making. Decision Support Systems 43, 1675–1683
Tool used Flow of processes Sequence UML activity [11] Sarsfield, S. (2011). The Butterfly Effect of Data Quality, The Fifth
diagram diagram MIT Information Quality Industry Symposium
Strength(s) • ETL processes • Automated • ETL [12] Simitsis, A and Vassiliadis, P. (2008). A Method for the Mapping of
can be described process maintenanc Conceptual designs to Logical Blue Prints for ETL Processes.
more clearly • Efficient in e effort can DOI:10.1016/j.dss.2006.12.002
• ETL processes detecting be easily [13] Saha, B and Srivastava, D. (2014). Data Quality: The other Face of Big
can be errors with predicted Data.
understood not different by [14] Romero, R., Mazon, J. N., Trujillo, J., and Serano, M. (2009). Quality
only technically data designers of Data Warehouses, Ensyclopedia of Database Systems. 2230-2235
volumes

377 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

[15] Zhang Zhongping and Zhao Ruizhen. (2006). Design of architecture IEEE International Conference on Dependable, Autonomic and
for ETL based on metadata-driven, Computer Applications and Secure Computing. 978-0-7695-4612-4/11 © 2011 IEEE Computer
Software, vol. 26, Jun. 2006, pp. 61-63 Sociaty. DOI 10.1109/DASC.2011.194
[16] Ponniah., P. (2001). Data Warehousing Fundamentals: A [39] Munawar, Naomie Salim, and Roliana Ibrahim. 2012. Comparative
Comprehensive Guide for IT Professionals. New York: Wiley Study of Quality Dimensions for Data Warehouse Development: A
[17] Popovic, A. and Coelho, P. S. and Jacklic, J. (2009). The Impact of Survey. A. Ell Hassanien et al. (Eds.): AMLTA 2012, CCIS 322, pp.
Business Intelligence System Maturity on Information Quality. 465–473, 2012. © Springer-Verlag Berlin Heidelberg 2012
Information Research 14(4).
[18] Jaklic, J., Coelho, P. S. And Popovic, A. (2009). Information Quality
Improvement as a Measure of Business Intelligence System Benefits.
WSEAS Transactions on Business and Economics. Issue 9. Volume
6. September 2009. ISSN. 1109-9526
[19] Kimball, R. and Ross, M. (2013). Data Warehouse Toolkit. 3rd
edition. Wiley Publishing.
[20] Eppler, M. J. (2006). Managing Information Quality: Increasing the
Value of Information in Knowledge-Intensive Products and Processes
(2nd ed.): Springer.
[21] Marco, D. (2000). Building and Managing the Meta Data Repository:
A Full Lifecycle Guide.Willey and Sons, Inc., New York
[22] Sumathi S. And Sivanandam S.N. (2006). Data Marts and Data
Warehouse: Information Architecture for the Millennium, Studies in
Computational Intelligence (SCI) 29, 75–150 Springer-Verlag Berlin
Heidelberg
[23] Celko, J. and McDonald, J. (1995). Don’t warehouse dirty data.
Datamation 41(19), 42–53 (1995)
[24] Ranjit Singh and Kawaljeet Singh. (2010). A Descriptive
Classification of Causes of Data Quality Problems in Data
Warehousing. IJCSI International Journal of Computer Science
Issues. Vol. 7, Issue 3, No 2 May 2010. ISSN : 1694-0784
[25] March, S. and Hevner, A. (2007). Integrated decision support systems:
A data warehousing perspective. Decision Support Systems 43(3),
1031–1043 (2007)
[26] Giannoccaro, A., Shanks, G., and Darke, P. (1999). Stakeholder
Perceptions of Data Quality in a Data Warehouse Environment. Proc
10th Australian Conference on Information Systems
[27] Helfert, M. and von Maur E. (2001). A Strategy for Managing Data
Quality in Data Warehouse Systems, In the Proceedings of the
International Conference on Information Quality, 2001, Boston, MA.
[28] Jarke, M., Jeusfeld, M., Quix, C., Vassiliadis, P. (1999). Architecture
and Quality in Data Warehouses: An Extended Repository Approach.
Information Systems 24(3), 229–253
[29] Jarke, M. and Vassiliou, Y. (1997). Data Warehouse Quality: A
Review of the DWQ Project. In Proceedings of the 2nd Conference on
Information Quality. Cambridge, MA.
[30] Jeusfield, M., Quix, C.and Jarke, M. (1998). Design and Analysis of
Quality Information for Data Warehouses. In proceedings of 17th
International Conference on Conceptual Modelling (ER’98),
Singapore
[31] Nemoni, R and Konda, R. (2009). A Framework for Data Quality in
Datawarehouse. In J. Yang et. Al (Eds): UNISCON 2009, LNBIP 20,
pp 292 – 297. Springer-Verlag Berlin Heidelberg
[32] Piprani, B. and Erns, D. (2008). A Model for Data Quality Assessment
in R. Meersman, Z. Tari, and P. Herrero (Eds.): OTM 2008
Workshops, LNCS 5333, pp. 750-759, 2008. Springer-Verlag Berlin
Heidelberg 2008.
[33] Dakrory, S.B, Mahmoud, T.M., and Ali, A.A. (2015). Automated ETL
Testing on the Data Quality of a Data Warehouse. International
Journal of Computer Applications 131(16):9-16, December 2015.
Published by Foundation of Computer Science (FCS), NY, USA
[34] Muñoz, L., Mazón, J-N. and Trujillo, J. (2010). A family of
experiments to validate measures for UML activity diagrams of ETL
processes in data warehouses, Information and Software Technology
(2010), doi: 10.1016/j.infsof.2010.06.003
[35] Munawar Salim N. and Ibrahim R. (2014a). Quality-Based
Framework for Requirement Analysis in Data Warehouse.
International Conference on Advance Informatics : Concepts, Theory
and Application. ISBN: 978-1-4799-6984-5.
[36] Munawar Salim N. and Ibrahim R. (2014b). Quality Oriented Logical
Design for Data Warehouse Development. International Conference
on Advances in Computer and Electronics Technology (ACET)
ISBN: 978-1-63248-024-8.
[37] Munawar, Naomie Salim, and Roliana Ibrahim. (2015). Quality
Oriented for Physical Design Data Warehouse. ARPN Journal of
Engineering and Applied Sciences. Vol. 10, No. 3, February 2015.
ISSN 1819-6608
[38] Munawar, Naomie Salim, and Roliana Ibrahim. 2011. Toward Data
Quality Integration into the Data Warehouse Development. Ninth

378 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Spread of COVID-19 Deaths in Jakarta: Cluster and


Regression Analysis
1st Intan Saskia 2nd Ro’fah Nur Rachmawati 3rd Derwin Suhartono
Statistics Department, School of Statistics Department, School of Computer Science Department, School
Computer Science Computer Science of Computer Science
Bina Nusantara University Bina Nusantara University Bina Nusantara University
Indonesia, Jakarta 11480 Indonesia, Jakarta 11480 Indonesia, Jakarta 11480
[email protected] [email protected] [email protected]

Abstract—Jakarta as the center of the capital city of combined into another cluster. Determining the cluster is by
Indonesia has a very high mobility and population density. This looking at the closest distance between objects, it will become
has resulted in the spread of COVID-19 cases also have a very one cluster. Therefore, this study aims to classify areas at the
high increasing trend. Regional clustering and the detection of sub-district level in Jakarta based on distribution of COVID-
variables that affect COVID-19 deaths can be an early warning 19 cases using the K-Means method. After the regional
or the basis for government policies in handling the spread of clusters were formed, Bayesian regression analysis was
disease outbreaks. This study aims to classify areas at the sub- carried in each cluster and sub-district to identify variables
district level in Jakarta based on distribution of COVID-19 that had an effect on COVID-19 deaths.
cases using the K-Means method. After the regional clusters
were formed, Bayesian regression analysis was carried in each In section 2 of this paper, there is a description of the
cluster and sub-district to identify variables that had an effect dataset and its pre-processing to define the number of clusters
on COVID-19 deaths. The number of deaths is assumed to have used. The methodology and brief review of K-Means and
Normal distribution, and statistical inference in Bayesian Bayesian regression using INLA are described in section 3.
regression using the Integrated Nested Laplace Approximation The results and discussions of this research are described in
(INLA) approach. This study produced several interesting section 4. Some closing statements and possible developments
results including: (1) there are 4 clusters that indicate areas of this research are described in section 5.
prone to spread with a high case rate, fairly high risk, low risk
to very low risk areas. (2) most of Jakarta's sub-districts, which II. DATASET DESCRIPTION
is about 45%, are included in areas with a fairly high risk of
spreading. (3) In general, the number of recovered cases is a A. Data
significant variable on the majority decrease number of Cluster analysis on spread of COVID-19 cases in Jakarta
COVID-19 deaths in each cluster. uses a dataset of positive, recovered, and death cases. The data
is in the form of daily cumulative data in each sub-district for
Keywords—K-Means, Bayesian regression, INLA, COVID-19,
the period January 1, 2021 to May 31, 2021. The dataset is
cluster analysis
obtained from the Jakarta COVID-19 official website.
I. INTRODUCTION B. Pre-processing
Corona Virus Disease-19 or commonly referred to as Data pre-processing in this research includes data
Covid-19 comes from the SAR-CoV-2 virus. WHO (World preparation, cleaning, normalization, transformation [4], and
Health Organization) has also determined that the virus has also define the number of optimal K number cluster that will
become a pandemic on March 11, 2020. Until May 31, 2021, used in this research. In regression analysis, the death case as
according to WHO data, there were 170,051,718 confirmed the response variable will be assumed to have Normal
cases of COVID-19 from around the world. Each country has distribution using square root transformation.
a different way to suppress the spread of Covid-19 cases and
in some countries implement a lockdown system or close 1) Optimal K Number of Clusters
access to entry or exit completely. To find the optimal K value or number of clusters, it can
be done using many methods such as the Elbow, Silhoutte,
Corona virus can infect humans and animals. In humans,
GAP Statistics, and NbClust Method. In this research, we will
the virus attacks the respiratory organs. Symptoms if humans
use the Elbow Method. The K optimal number is obtained as
are infected with Covid-19 are fever 38oC, dry cough,
shown in Figure 1, it shows the optimal number of cluster [5].
shortness of breath, sore throat/swallowing, and aches or
The graph shows the number K = 4 is the optimal angle for the
feeling tired [1]. DKI Jakarta is the province with the highest
number of clusters, because there was a drastically decline and
number of cases in Indonesia. Until May 31, 2021, there were
was followed by a stable graph when the point was at 4.
430,059 positive cases with a mortality rate of 1.7%. Regional
clustering and the detection of variables that affect COVID- 2) Fit the Distribution for Response Variable
19 deaths can be an early warning or the basis for government Finding the distribution for the data used is one of the
policies in handling the spread of disease outbreaks. steps to being able to use INLA regression. The variable used
Several researchers have conducted research on Covid-19 as the response (dependent) variable is the case of death. The
globally [2] and nationally [3] using clustering with the K- distribution of the data used is shown in Figure 2 (left). The
Means method. K-means is a non-hierarchical method that histogram shows that distribution of data is not symmetrical
divides objects based on almost the same characteristics, they and tends to be skewed to the right. This indicates that the
will be put together in the same group/cluster, as well as mean, median, and mode are not the same and quite far from
objects that have different characteristics, then they will be Normal. In addition, the frequency of the number of cases

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

379 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

dying as much as 75-100 is more than 800 and the number of


cases dying of more than 300 has a frequency close to zero.

Fig. 1. Elbow Method

Histogram of dataMeninggal Histogram of sqrtMeninggal

Fig. 3. Research Flowchart


800
800
Frequency

Frequency

400

c. Allocate data to the closest group. The clustering is


400

based on the distance between the data and the center


0

0 50 100 150 200 250 300 0 5 10 15 of the group. Generally, the distance used is the
dataMeninggal sqrtMeninggal
Euclidean distance concept.
Fig. 2. Histogram of Death Cases
d. Recalculate group center with group members. The
The data is transformed using square root and a comparison group center can be calculated using the group mean or
with the previous one can be seen in Figure 2 (right). The data median.
becomes more spread out in variance and close to Normal e. Re-allocate data using the new cluster center. If the
distribution. By determining the fit distribution is a Normal, center of the group does not change, then the grouping
the next step is to perform INLA regression with the process is complete. If the center of the group changes
dependent variable data which is assumed to have normally the process can be repeated to point c [9].
distributed.
B. Euclidean Distance
III. METHOD In clustering process using Euclid's distance, data with
The purpose of clustering is to identify cluster in which different directions but having the same distance from the data
cluster members have similar characteristics, and are quite center will be in the same cluster [10]. Euclid's formula can be
different from members in other clusters [6,7]. The research defined as follows:
flowchart is represented in Figure 3. Clustering using K-
Means and regression analysis for each cluster and sub-district ∑ (1)
are the fundamental analysis of this research.
with, ‘X’ and ‘Y’ are two classes.
A. K-Means
K-Means is the simplest clustering algorithm, which uses In [11], the accuracy value obtained for Euclidean distance
K input parameters and then divides n objects into K groups. method is 93.33%, with a fairly high accuracy, therefore this
Each group has a large variation, while the members in the indicates the Euclidean distance method is good to use.
group have a small variation. The grouping steps are as C. Elbow Method
follows: K-Means depends on K; which needs to be specified to
a. Choose a number of K groups. By using the Elbow perform clustering analysis [12]. The optimal K value can be
method selected K=4. found by various methods. The elbow is considered a useful
method for determining K values and has the advantage of
b. Initialize the cluster center using random values, easy visualization and interpretation of results. The optimal K
although there are still various ways of doing the value can be seen visually, by taking into account the total
initialization. diversity within the cluster (WSS) as a function of the total
number of clusters. The number of groups that is getting
bigger is inversely proportional to the WSS value. The
formula for WSS is:
∑ ∑ (2)

380 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

where K = the number of clusters, n = the number of objects, Positif Sembuh Meninggal

15000
xi = ith element in the cluster, and cj = the centroid of jth cluster.

14000

300
12000
D. Integrated Nested Laplace Approximations (INLA)

250
10000
The INLA method is developed as an alternative to

10000

200
traditional Markov chain Monte Carlo. INLA is introduced on

8000
models that can be expressed as latent Gaussian Markov

150
6000
random fields and can be described through a bridge named

5000

100
“linear predictor”. The linear predictor could include terms on

4000
covariates and different types of random effects. The strategy

50
2000
used in INLA is to reformulate the problem as a series of
subproblems and apply only the Laplace approximation to the

0
0

0
1 2 3 4 1 2 3 4 1 2 3 4

Cluster Cluster Cluster

most Gaussian density [14, 15].


Fig. 4. Cluster Distribution
This method can be divided into three main tasks: first
proposing an approximation of | to the posterior Figure 5 presents the pie chart of sub-district numbers for
| , second proposing an approximation of | , for each cluster. Figure 5 shows that the majority of sub-districts
the marginal of the conditional distribution given the data in Jakarta, which is around 45%, are in Cluster 3, which is an
and hyperparameter | , and the last one explores area with a fairly high risk of COVID-19.
| and uses it for numerical integration. Then, the
posterior marginal approximation returned by INLA has the
following form
| ∑ ! " , # ! " #Δ , (3a)
! " # % | & ' , (3b)
where ' denotes the vector with the elements
()* excluded, and ! " # is the density value calculated
during exploration on | [16, 17].
IV. RESULTS AND DISCUSSIONS
The best value of K is 4 from the Elbow method. Fig. 5. Number of sub-districts per Cluster
Therefore, we form the cluster as many as 4. Detailed area
information per cluster has been successfully identified with Cluster 1 is included in the low-risk cluster category for
K-Means method, and the result is presented in Table 1. COVID-19 cases. This cluster consists of 2 sub-districts,
namely the P. Seribu Selatan and the P. Seribu Utara. Figure
TABLE 1. SUBDISTRIC DISTRIBUTION EACH CLUSTER 6 represents map of cluster 1.
Cluster Sub-districts

1 (2 sub- districs) P. Seribu Utara, P. Seribu Selatan.

Cengkareng, Duren Sawit, Jagakarsa, Tanjung


2 (6 sub-districs)
Priok, Cakung, Pasar Minggu.
Grogol Petamburan, Kali Deres, Kebon Jeruk,
Kembangan, Palmerah, Kemayoran, Cilandak,
3 (20 sub-districs) Kebayoran Lama, Pesanggrahan, Tebet, Cipayung,
Ciracas, Jatinegara, Kramat Jati, Makasar, Pasa
Rebo, Pulo Gadung, Cilincing, Koja, Penjaringan.
Taman Sari, Tambora, Cempaka Putih, Gambir,
Johar Baru, Menteng, Sawah Besar, Senen, Tanah
4 (16 sub-districs) Abang, Kebayoran Baru, Mampang Prapatan,
Pancoran, Setiabudi, Matraman, Kelapa Gading,
Pademangan.

Based on Figure 4, cluster 1 contains areas with quite low- Fig. 6 Map of Cluster 1
level of confirmed (positive) cases, recovered, and deaths.
Cluster 2 contains areas with high-level of confirmed cases,
recovered, and deaths. Cluster 3 contains areas with fairly
high-level of confirmed cases, recovered, and deaths. Cluster
4 contains areas with low-level of confirmed cases,
recovered, and deaths. By having 4 clusters, the data will be
presented in a more detailed, more effective and informative
area.

Fig. 7. Percentage of Significant Variables in Cluster 1

381 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

In cluster 1, confirmed cases variable has a significant


effect of 50% as shown in Figure 7 with the result that there
is only 1 of 2 sub-districts, namely, P. Seribu Utara and P.
Seribu Selatan. The recovered cases have a significant effect
of 100% in this cluster.

Fig. 10. Percentage of Significant Variables in Cluster 3

Figure 10 shows the percentage of significant variables in


cluster 3. There are 15 of the 20 sub-districts whose
confirmed cases have a significant effect on death, and 17 of
20 sub-districts whose recovered cases have a significant
Fig. 8. Percentage of Significant Variables in Cluster 2 effect on death.

It can be seen in Figure 8 which is the percentage of


significant variables in cluster 2. There are 3 of 6 sub-districts
whose confirmed cases have a significant effect on death, and
5 of 6 sub-districts of the recovered cases have a significant
effect on the death of COVID-19 cases.

In Figure 9, the confirmed cases variable that has a


significant effect is found in the area with code 1, namely
Cengkareng which is the only sub-district in West Jakarta that
is included in cluster 2 and the only sub-district that has a
confirmed cases as a significant variable. The recovery cases
are a significant variable in the areas with codes 3, 4, and 6,
which are Pasar Minggu in South Jakarta, Cakung in East
Jakarta, and Tanjung Priok in North Jakarta. There are 2 sub-
districts that have confirmed and recovery cases as significant
variables on death cases, namely the area with code 2 which Fig. 11. Map of Cluster 3
indicates Jagakarsa in South Jakarta and code 5 for Duren
Sawit in East Jakarta. In Figure 11, the red areas with codes 10, 12, and 14 are
the sub-districts of Tebet, Ciracas, and Kramat Jati with
significant confirmed cases. Only the Tebet in South Jakarta
experienced a significant effect confirmed cases in increasing
death cases. Likewise, Ciracas and Kramat Jati sub-districts
that are located in East Jakarta. The green areas are scattered
in West Jakarta with code 2 namely Kebon Jeruk, in South
Jakarta with codes 7 and 8 namely Cilandak and Kebayoran
Lama, in East Jakarta with code 17 is Pulo Gadung, and in
North Jakarta is Cilincing with code 18. There are 11 sub-
districts colored blue, which indicates that the confirmed and
recovered cases have significant effect on death cases. North
Jakarta has 2 sub-districts, namely Koja and Penjaringan with
codes 19 and 20. Kemayoran, with code 6, is the only sub-
district in Central Jakarta where both confirmed and
recovered cases are significant, and also only Pesanggrahan
with code 9 in South Jakarta. West Jakarta has 3 sub-districts,
namely Grogol Petamburan, Kembangan, and Palmerah sub-
Fig. 9. Map of Cluster 2 districts with codes 1, 4, and 5. Codes 11, 13, 15, and 16 are
located in East Jakarta which is the code for Cipayung,
Jatinegara, Makassar, and Pasar District Rebo.

382 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

fairly high-risk level category. This identifies that confirmed


cases actually have a majority influence on the increase in
death cases in areas with low-risk categories. Meanwhile,
recovered cases have the majority influence on the decline in
deaths in areas with a fairly high-risk category. However,
confirmed cases dominate for the entire cluster and sub-
district, it indicates confirmed cases has a significant effect
on the majority decrease of death cases.
Fig. 12. Percentage of Significant Variables in Cluster 4
V. CONCLUSIONS
Cluster 4 is a low-risk cluster category for Covid-19 cases This study uses the K-Means method to classify sub-
and has 16 sub-districts. With a percentage of 81.25% or 13 districts level based on spread cases of COVID-19 in Jakarta.
of the 16 sub-districts had a significant confirmed cases and This research purposes to classify sub-district that have low
12 of the 16 sub-districts or 75% had a significant recovered until high-risk level, and investigate variables that
variables on death cases. significantly effects the COVID-19 death cases through
cluster and regression analysis.
By dividing into 4 clusters, it will be more effective and
informative. The four clusters are cluster 1 with a quite low
risk, cluster 2 with a high risk level case rate, cluster 3 with
fairly high risk, and cluster 4 with low risk of COVID-19
cases. By using INLA regression method, it will be easier to
find variables that have a significant effect on each cluster and
sub-district. From the results obtained, it is known that the
majority of sub-districts in Jakarta, which is around 45%, are
included in the Cluster 2 category with a fairly high risk of
COVID-19 death. In general, in each cluster and sub-district,
the number of recovered cases dominates or in other words it
has a significant effect on the majority decrease of death cases.
Further research additional variables such as the level of
humidity, air temperature, to the level of population mobility
during the COVID-19 pandemic with daily data can increase
Fig.13. Map of Cluster 4 the interested investigation between disease and its
environmental condition so that the resulting analysis can be
In figure 13, the red areas with codes 4, 5, 12, and 14 have more detailed. The map of the analysis results shows that areas
significant confirmed cases and are spread across Central in the same cluster tend to be close together, so for further
Jakarta namely Gambir and Johar Baru, Pancoran in South research, spatial effects can be taken into account [18, 19, 21],
Jakarta, and Matraman in East Jakarta. Regions in green color as well other methods of clustering such as [20] in order to
indicate regions with significant recovered cases as in Sawah produce broader information.
Besar in Central Jakarta with codes 7, and North Jakarta
namely Kelapa Gading and Pademangan with codes 15 and REFERENCES
16. There are 9 sub-districts in blue area which means that [1] Ministry of Health Republic of Indonesia, 2020.
both confirmed and recovered are significant to death cases, [2] D. V. C. Chandu, "Identification of spatial variations in COVID-19
epidemiological data using KMeans clustering algorithm: a global
as in 2 sub-districts in West Jakarta with codes 1 and 2 perspective," 2020.
namely Taman Sari and Tambora, 4 sub-districts in Central [3] N. Dwitri, J. A. Tampubolon, S. Prayoga, P. F. I. R. Zer and D.
Jakarta with codes 3, 6, 8, and 9 namely Cempaka Putih, Hartama, "Penerapan Algoritma K-Means Dalam Menentukan Tingkat
Menteng, Senen, and Tanah Abang, 3 sub-districts in South Penyebaran Pandemi Covid-19 di Indonesia," Jurnal Teknologi
Informasi, vol. 4, pp. 128-132, 2020.
Jakarta with codes 10, 11, and 13 for Kebayoran Baru, [4] J. Luengo, D. García-Gil, S. Ramírez-Gallego, S. García and F.
Mampang Prapatan, and Setiabudi sub-districts. Herrera, Big Data Preprocessing, Springer, 2020.
[5] C. Shi, B. Wei, S. Wei, W. Wang, H. Liu and J. Liu, "A quantitative
TABLE 2. SUMMARY OF SIGNIFICANCE REGRESSION VARIABLES discriminant method of elbow point for the optimal number of clusters
in clustering algorithm," EURASIP Journal on Wireless
Percentage of Communications and Networking, 2021.
Number of Sub-districts
Cluster Significant Variables [6] W. Cai, J. Zhao and M. Zhu, "A real time methodology of cluster-
Confirmed Recovered Confirmed Recovered system theory-based reliability estimation using k-means clustering,"
1 50% 100% 1 2 Reliability Engineering and System Safety, 2020.
2 50% 83.33% 3 5 [7] A. A. Aldino, D. Darwis, A. T. Prastowo and C. Sujana,
3 75% 85% 15 17 "Implementation of K-Means Algorithm for Clustering Corn Planting
4 81.25% 75% 13 12 Feasibility Area in South Lampung Regency," Journal of Physics:
Conference Series, 2021.
[8] S. P. Patel and S. Upadhyay, "Euclidean distance based feature ranking
Table 2 summarizes the significance of the regression and subset selection for bearing fault diagnosis," Expert Systems With
variables. Table 2 shows that the effect of confirmed cases in Applications, p. 16, 2020.
each cluster is different. Confirmed cases have a fairly high [9] R. Zubaedah, F. Xaverius, H. Jayawardana and S. H. Hidayat,
percentage in cluster 4 with a low-risk level category, while "Comparing euclidean distance and nearest neighbor algorithm in an
recovered cases have the highest influence in cluster 3 with a

383 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

expert system for diagnosis of diabetes mellitus," Enfermería Clínica, with INLA in Statistical Downscaling Model,” International Journal of
pp. 374-377, 2020. Innovative Computing, Information and Control, vol. 17, no. 1, 2021.
[10] M. Ahmed, R. Seraj and S. M. S. Islam, "The k-means Algorithm: A [17] R. N. Rachmawati, A. Djuraidah, A.H.Wigena, I W. Mangku,
Comprehensive Survey and Performance Evaluation," Electronics, “Extreme Data Analysis Using Generalized Bayes Spatio-Temporal
2020. Model with INLA for Extreme Rainfall Prediction,” ICIC Express
[11] F. Liu and Y. Deng, "Determine the number of unknown targets in Letters, vol. 14, no.1, 2019.
Open World based on Elbow method," IEEE Transaction on Fuzzy [18] R. N. Rachmawati, N. H. Pusponegoro, “Spatial Bayes Analysis on
Systems, vol. 29, no. 5, 2021. Cases of Malnutrition in East Nusa Tenggara, Indonesia,” Procedia
[12] R. Gustriansyah, N. Suhandi and F. Antony, "Clustering optimization Computer Science, vol. 179, 2021.
in RFM analysis based on k-means," Indonesian Journal of Electrical [19] R. N. Rachmawati, “Additive Bayes Spatio-temporal Model with
Engineering and Computer Science, vol. 18, pp. 470-477, 2020. INLA for West Java Rainfall Prediction,” Procedia Computer Science,
[13] M. R. Mahmoudi, D. Baleanu, Z. Mansor, B. A. Tuan and K.-H. Pho, vol. 157, 2019.
"Fuzzy clustering method to compare the spread rate of Covid-19 in the [20] M. R. Mahmoudi, D. Baleanu, Z. Mansor, B. A. Tuan and K.-H. Pho,
high risks countries," Chaos, Solitons and Fractals, pp. 1-9, 2020. "Fuzzy clustering method to compare the spread rate of Covid-19 in the
[14] M. Blangiardo and M. Cameletti, Spatial and Spatio-Temporal high risks countries," Chaos, Solitons and Fractals, pp. 1-9, 2020.
Bayesian Models with R-INLA, John Wiley & Sons, 2015. [21] R. N. Rachmawati, D. Suhartono, A. Rahayu, “Mapping Sub-Districts-
[15] F. Lindgren and H. Rue, “Bayesian spatial modelling with R-INLA,” Level and Predicting Spatial Spread of Covid-19 Death Case in
Journal of Statistical Software, vol.63, no.19, 2015. Jakarta”, Communication in Mathematical Biology and Neuroscience,
[16] A. Djuraidah, R. N. Rachmawati, A.H.Wigena, I W. Mangku, vol. 2021, 2021.
“Extreme Data Analysis Using Spatio-Temporal Bayes Regression `

384 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Indonesian Banking Stock Price Prediction with


LSTM and Random Walk Method
1st Mike Christ Heru 2nd Ro’fah Nur Rachmawati 3rd Derwin Suhartono
Mahematics and Computer Science Statistics Department, School of Computer Computer Science Department, School of
Department, School of Computer Science Science Computer Science
Bina Nusantara University Bina Nusantara University Bina Nusantara University
Indonesia, Jakarta 11480 Indonesia, Jakarta 11480 Indonesia, Jakarta 11480
[email protected] [email protected] [email protected]

Abstract— Investing in stock market is the challenging for entirely different way from the conventional digital (von
every new investor, as the stock market always move in dynamic Neumann) computer [5].
way. When using technical or fundamental analysis approach,
investor can reduce the loss probability and increase the profit The structure of the neural network itself can be divided to
probability. When one tries to analyze the stock market data, three parts: input layer, part that receive the input values;
any techniques can be used. For example, the LSTM as the part hidden layer(s) as a set of neuron (the smallest unit of neuron)
of Neural Network and Machine Learning, which need past data between input and output layer; and output layer, which
to train the model and try to give the best prediction result based usually only has one neuron as the output value [6]. The
on the model generated by the data. The other example of structure also includes the neuron connection, which is called
techniques used in this paper is the Random Walk which come as weights.
from Integrated Nested Laplace Approximation (INLA) library
of R language which approximate the Bayesian Inference. Both One of the ANN model that widely-used is Long-Short
methods are used to get the best prediction result. To get some Term memory (LSTM), which is already introduced since
comparison, the data can be split to several period and from 1995, but after 2016, the use of this model widely-used in
several choices, the best result can be generated. As a result, the many research paper. The impact of the LSTM network has
LSTM always predict the best result (comparison using the been notable in language modeling, speech-to-text
RMSEP / Root Mean Square Error for Prediction value) and the transcription, machine translation, and many other
more data fed to the model will produce lower rate of RMSE, applications [7]. Otherwise, LSTM is one of the most
which is good for prediction result. successful RNNs (Recurrent Neural Network) architectures
which introduces memory cell, a unit of computation that
Keywords— stock market, machine learning, LSTM, INLA, replaces traditional artificial neurons in the hidden layer [8].
random walk RNN(s) or Recurrent Neural Network(s) is type of the Neural
Network with a closed loop feedback [9].
I. INTRODUCTION
Another option for create a prediction based on model is
Investment always focuses on the return rate. In the
using the Random Walk method. Random Walk in simple
investment scope, terms “high risk produces high return” is
definition is a random movement without depending on
very important to understand. Since investing in stock market
previous movement. Random Walk also a stochastic process
is one of the high-risk investment instrument, therefore the
that created from a successive addition to independent and
prediction of a trend especially in stock market is a very
identical distributed random wariables [10].
challenging task due to the many uncertainties involved and
many variables that influence the market (e.g. economic The Random Walk theory in the movement of stock prices
conditions, investors’ sentiments, political events etc.) [1]. has been the object of considerable academic and professional
interest. The theory implies that statistically stock-price
In order to minimalize the risk of loss, the investor need
fluctuations are independent over time and may be described
certain speciality such as analysis speciality, that can be
by a random process – e.g., the tossing of a coin or the
divided as fundamental analysis and technical analysis. The
selection of a sequence of numbers from a random number
fundamental analysis will examine the underlying forces that
table [11].
affect the well-being of the economy, industry groups, and the
companies. The technical analysis use the market data to get In this paper, we tried to make a prediction of Indonesia
the supply and demand data from a certain company or the Stock Market Exchange especially in Banking sector using
entire market [2]. Technical analysis will use previous price LSTM and Random Walk method. LSTM will be created in
movements and/or another market data to assist in the decision Python programming language and Random Walk (RW) in R
making process on the trading asset market [3]. programming language. RW method already developed in R-
INLA, which is a package in R that do approximate Bayesian
The development of technologies is grown rapidly and one
inference for Latent Gaussian Models. Also in this paper, the
of each is the machine learning. Deep learning, as a form of
LSTM will use Adam as the optimization algorithms, which
the machine learning that train a model to make predictions
has a better optimization performance than the Stochastic
from new data, have improved dramatically the ability of
Gradient Descent (SGD) [12].
computers to recognize and label images, recognize and
translate speech, etc. [4] Neural Network or Artificial Neural The purpose of this paper is to evaluate and predict as well
Network (ANN), which is part of deep learning, is a simple the Banking companies stock price movement in IDX using
mathematical model of human brain which computes in an LSTM and RW. Also to analyze the result of the two method

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

385 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

proposed based on the prediction result and able to choose the


best method and model performed.
In the remaining part of the paper, we present the
theoretical background of the model used in this paper in
section 2. Some previous related works also explained here in
section 3. Section 4 is explaining about the model result and
comparison between models choosen. As the section 5
conclude and summarize the result from section 4.
II. THEORETICAL BACKGROUND
A. Machine Learning
Machine learning (ML) is part of the Artificial Intelligence
(AI). In general, AI bring positive impact to the daily social Fig 2. LSTM cell [16]
and economic activities [13]. ML itself is part of AI for task-
oriented studies, cognitive stimulation, and also theoretical The LSTM network can be represented by the following
analysis [14]. The Neural Network (NN) represents the simple equations :
mathematical model of human brain and motivated from the
computation process inside human brain that differs with the = ! "# ∙ %ℎ ' , ( + )# (8)
conventional digital computer [15].
* = ! " ∙ %ℎ ' , (+) (9)
B. Recurrent Neural Networks
̃ = ,-ℎ ". ∙ %ℎ ' , ( + ). (10)
Recurrent Neural Networks (RNNs) are a class of neural
networks which exhibit temporal behaviour due to directed = ∗ ' +* ∗ ̃ (11)
connections between units of an individual layer [16]. The
usage of RNN is widely use since 1980s which started by 0 = ! "1 ∙ %ℎ ' , ( + )1 (12)
Rumelhart, Hinton, and Williams to learn the strings of ℎ = 0 ∗ tanh (13)
characters [17]. RNNs usually used while using a sequenced
data (time series, text, audio). for be the forget gate result, * the input gate, 0 the output
gate, is the cell state, ̃ be the temporary cell state,) be the
bias vector, " be the weighted matrix, ℎ and the be the
output result and input, respectively.
D. INLA
The Integreated Nested Laplace Approximation (INLA) is
an approach to perform approximate fully Bayesian inference
on the class og the latent Gaussian models (LGMs) [20]. The
computation can be finished in term of minutes or seconds,
while the Markov Chain Monte Carlo (MCMC) technique
needs several hours until days [21].
For many years, Bayesian inference has relied upon
Fig. 1. Structure of RNN [16] MCMC methods to compute the joint posterior distribution of
the model parameters. This is usually computationally very
RNNs consists of input layer , followed by hidden layer
expensive as this distribution is often in a space of high
and output layer as . If we want to add the time-relational
dimension.[22]. In order to make the Bayesian inference
function of those layers, we can write the input layer as
faster, Rue et al [20] suggest focusing on individual posterior
, for the hidden layers, and for the output
marginals of the model parameters.
layer.The function of , , and can be written as
INLA itself has three main components that can make
= + −1 (4)
INLA work successfully: Latent Gaussian Model (LGM)
= ∑ (5) framework, Gaussian Markov Random Field (GMRF), and
= ∑ (6) also Laplace approximation [23].
= , =∑ (7) E. Random Walk
Random Walk is a term that introduced by Karl Pearson.
for and represent a sigmoid and a softmax
Random Walk can be explained as a random movement
activation, respectively [18]. which not depends on the previous movement. The
C. LSTM movement not only describe as a movement from one place
to another, but can be defined as a movement in spatial term,
LSTM is a type of RNN which has gained popularity in the number changes, price movement, etc. Random Walk also
time series / sequential analysis. The hidden layer of a LSTM a stochastic process as a result of successive addition to
network contains memory cells which in turn contains three independent and identical distributed random variable [10].
gates (input gate, output gate, forget gate) which are Random Walk is used if the data move in discrete points. If
responsible for updates to it’s cell gate [19]. the movement happen in continous points, then we should use
the Brownian Motion.

386 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Random Walk is defined as a series of independent and Bank Pembangunan Daerah Jawa Barat dan Banten Tbk) is
identically distributed (i.i.d.) random variables through time because these three candidates represent each banking sector
67 = 0 (using the data of December 31st, 2020). BBRI represents the
6 =6' +9 (14) state-owned enterprise as the BBRI is the biggest state-owned
where = 1,2,3, … and 9 is a series of i.i.d. = 0,1 random bank in Indonesia. Meanwhile, the BBCA is the largest
variables. Since 6 follow (14), then we can try to expand it private bank in Indonesia with more than 1,075 trillion
as Rupiahs (about 76.5 billion USD) and BJBR is the largest
6 = 6 '> + 9 ' + 9 local (province-level) development bank in Indonesia with
and if continuing this way, we can get a simpler equation total assets more than 140.93 trillion Rupiahs (about 10
6 = 67 + 9 + ⋯ + 9 billion USD).
As we already know that the 67 = 0, then Then, the historical price data for these three companies
6 = 9 + ⋯+ 9 (15) should be gathered from Yahoo Finance site. The historical
with 6 ~= 0, . data period is started from January 1st, 2011 until December
First order of Random Walk (RW1) for the Gaussian 31st, 2020. The LSTM (using Python) can fetch the stock
vector x = , … , B is constructred using an assumption of historical price data automatically using the Yahoo Finance
independent increments : library (yfinance), but the Random Walk still need to import
∆ = − ' ~ D 0, E ' (16) the data from the csv file.
with density of x is derived from its - − 1 increments as The test dataset should be one-year data period (2020) as
I J the purpose of the research to get the prediction of a-year
O >
F x|E ∝ E K exp{− ∑ ∆ } investment. Besides, the train dataset should at least one-year
>
I J
=E K exp Q− x R SxT (17) data period and at most of nine-year data period (since the raw
> data contains data from 2011 until 2020). Usually, one will
where S = EU and V is the structure matrix reflecting the take 50-80% of dataset as the train dataset. But in this
neighborhood structure of the model. E is the precision research the test dataset must be one year dataset (2020 data)
parameter and represented as W = log E and W is the prior. and the train will be in several options (2011-2019 data, 2016-
Second order of Random Walk (RW2) for the Gaussian 2019 data, 2018-2019 data, and 2019 data only) which
vector x = , … , B is constructred using an assumption of represent 10 years of data (combined with the one-year data
independent second-order increments : of test dataset), 5 years, 3 years, and 2 years of raw dataset.
∆> = − 2 + > ~ D 0, E
'
(18) This several options are created to make sure that we can get
with density of x is derived from its - − 2 increments as the best option from several possibilities of different
I K O
F x|E ∝ E K exp{− ∑ ∆> >
} timeframes.
>
I K Then, the model should be fitted with the train data to train
=E exp Q− x R SxT
K (19) the model and get the optimal model. The prediction using the
>
where S and E derived from the same formula as the first train data will produce the root mean square error (RMSE) of
order Random Walk. train data if it is compared and calculated with the real train
data. The same thing goes with the test data. After we gather
III. RELATED WORK all the RMSE and RMSE Prediction (RMSEP) from both
In 2015, the stock market prediction created using machine methods, we can make some conclusions based on the
learning [24]. But the artificial neural network receive 10 research result.
variables as the input vector: Simple 10-day moving average
A. Long-Short Term Memory (LSTM)
(SMA10), weighted 10-day moving average, momentum,
stochastic K%, stochastic D%, Relative Strength Index (RSI), As the dataset already been splitted to train and test, the
MACD, Larry William’s R%, A/D oscillators, and model will be created. The model itself will be in several
Commodity Channel Index (CCI). options. The consideration of trying several possible options
is to find the best for some of the possible options and if
Comparing the SVR and LSTM algorithm also have been possible will generalized the parameter used by the best
used in a stock market prediction in Indonesia [25]. The option from different stock and timeframe. The variation of
research conducted in 2020 and the comparison is based on the testing will be separated into 3 parts : model with 1 layer of
Mean Square Error (MSE) value. LSTM, 2 layer, and also 3 layer of LSTM inside the neural
Random walk is not only used in financial data, but its network. Most of the options also involving the Dropout layer
application is very broad, including in the field of statistical to help reduce the overfit probability. One of the options also
hydrometeorology such as a research about estimation of use only pure LSTM combined with the Dense layer (without
extreme rainfall patterns in West Java, Indonesia [26] and also using Dropout) only to get the comparative result of using
extreme data analysis using spatio-temporal Bayes regression Dropout layer with model that not use it. The rate of Dropout
[27] which both are using Random Walk as a significant layer may differs between each test. But in general, the test
temporal effect. will involving Dropout layer with rate of 0.3 and/or 0.6.
To get additional comparation, we add GridSearchCV to
IV. METHODOLOGY get the optimal result from several options of parameters (e.g.
We need to filter the sample from the population (as the number of epoch, validation split ratio, and batch size). It is
focus of the research is in Indonesian banking company). The obvious that when using this GridSearchCV, the runtime for
main reason of choosing BBRI (PT Bank Rakyat Indonesia each test will be longer that the regular test runtime because
Tbk), BBCA (PT Bank Central Asia Tbk), and BJBR (PT the GridSearchCV tries all the possible parameters combined.

387 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

After the model produce the best model, we will continue the TABLE I. SAMPLE OF LSTM PARAMETERS AND LAYERS
process to predict the result. The model will predict the train Test Case Parameters and layers
and test dataset. The result of predicting train dataset will be 1 LSTM(100), DO 0.3, Dense 1, Epochs 500, Batch size 32,
compared with the real data and the root-mean square error Validation split 0.3
5 LSTM(100), DO 0.3, Dense 1, Epochs 1000, Batch size
will be counted. As a result, the RMSE of train/fit will be 32, Validation split 0.3
produced. The same thing goes with the test dataset and will 6 LSTM(100), DO 0.6, Dense 1, Epochs 1000, Batch size
produce the RMSE Prediction (RMSEP). To get better 32, Validation split 0.3
objective data, plotting the result will be a good choice and 8 LSTM(100), DO 0.3, LSTM(100), DO 0.3, Dense 1,
this will simplify this paper’s reader to understand and Epochs 500, Batch size 32, Validation split 0.3
9 LSTM(100), LSTM(50), Dense(10) relu, Dense (1)
compare the result more easily. linear, Epochs 500, Batch size 32, Validation split 0.3
B. Random Walk (RW) 16 LSTM(100), DO 0.3, LSTM(100), DO 0.3, LSTM(100),
DO 0.3, Dense 1, Epochs 500, Batch size 32, Validation
The process-line of Random Walk is more simpler. The split 0.3
process begin with read the csv file that contain all the 21 LSTM(100), DO 0.3, LSTM(100), DO 0.3, LSTM(100),
DO 0.3, Dense 1, Epochs 1000, Batch size 32, Validation
historical price data of certain stock. Before start to run the
split 0.3
model, we need to make sure that we already declare all the
important parameter to several variables.
For the RW, there are two kind of method, RW1 and
The next thing to do is declare the formula of the model
RW2, that also use 4 different periods and combined with
and also fit the model (as a train_result). With given data and
total of 5 test cases each and the result is the average value
also choice of model, the train_result will contain a model that
after doing those 5 test cases.
can be used to give random prediction for the next period of
time (in this case 1 Year of data - 2020). The prediction TABLE II. RESULTS TABLE
procedure almost have no difference with the fitting
procedure beside of prediction data been set to null value Model BBRI BBCA BJBR
since the “next” data want to be predicted using the model. LSTM RMSE 0.010696 0.006906 0.012707
Then, the prediction result should be translated using RMSEP 0.021312 0.017625 0.011419
posterior sample. Because if we translate the result to a plot- RW1 RMSE 0.000884 0.001283 0.000646
figure, the result will be a straight-line or half-parabolic RMSEP 0.576570 0.546065 0.361881
curve. By using the posterior sample, the result will be more RW2 RMSE 0.027613 0.016703 0.033218
realistic, since the result has a fluctuative movement. RMSEP 1.446038 0.708617 1.019239

V. RESULTS
Table II is the best result from each model for each stock.
A. Data We found that for both LSTM and RW model (RW1 and
The raw data from the Yahoo Finance are retreived using RW2), which use the 9-years train dataset (2011-2019
yfinance library (for LSTM method) and download manually period), produce the best result. For the LSTM model, the
as csv files (for RW). We also need to take the appropriate results are all gathered from the 3-layers LSTM. For the
columns and store it to the certain variables. In this case, the plotting result for each model (LSTM, Random Walk 1,
dataset will contain the closing price (contain the date, Random Walk 2) will be represented below (Figure 3-11).
opening price, highest price, lowest price, closing price, and
transaction volume). The data itself can be described as below
: 1) Stock Name / Company: PT Bank Rakyat Indonesia Tbk
(BBRI), PT Bank Central Asia Tbk (BBCA), and PT Bank
Pembangunan Daerah Jawa Barat dan Banten Tbk (BJBR). 2)
Number of row data: All row from 2011-2020 consists of
2486 row and the 2020 data only consists of 241 rows of data.
For 1 year train dataset (2019 full period), consists of 258
rows, 2 years train dataset (2018-2019) 519 rows, 4 years
train dataset (2016-2019) 1019 rows and 9 Years (2011-2019)
with 2245 rows of data.
Fig. 3. BBRI stock prediction result of the test dataset 2020 using LSTM
B. Results
Before we get this results in table II, we gather the results
from every testcase created before. For the LSTM model,
there are 22 test cases in total, which 7 test cases using one
and three layer of LSTM, 8 test cases using two layer of
LSTM. Most of the testcases use the Dropout layer after each
LSTM layer and the Dense(1) for the last layer to get the
expected output. In total, there are 88 test cases for each stock,
since the number of different train dataset is 4 different
periods and combined with total of 22 test cases each. For
further explanation about the LSTM test cases, please refer to Fig. 4. BBRI stock prediction result of the dataset 2011-2020 using
table I. Random Walk 1

388 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Fig. 5. BBRI stock prediction result of the dataset 2011-2020 using Fig. 10. BJBR stock prediction result of the dataset 2011-2020 using
Random Walk 2 Random Walk 1

Fig. 11. BJBR stock prediction result of the dataset 2011-2020 using
Random Walk 2
Fig. 6. BBCA stock prediction result of the test dataset 2020 using LSTM
TABLE III. RMSE AND RMSEP COMPARISON BY LENGTH OF TRAIN
DATASET PERIOD
Stock Model Train RMSE RMSEP
Name dataset (No
Test)
1 (5) 0.023868 0.033683
LSTM 2 (12) 0.023061 0.034599
4 (2) 0.016859 0.028043
9 (5) 0.010696 0.021312
1 0.001229 0.989123
BBRI RW1 2 0.001142 0.805056
Fig. 7. BBCA stock prediction result of the dataset 2011-2020 using 4 0.001049 0.608940
Random Walk 1 9 0.000884 0.576570
1 0.079346 3.309870
RW2 2 0.076676 2.877598
4 0.046995 1.984291
9 0.027613 1.446038
1 (5) 0.022515 0.037593
LSTM 2 (6) 0.014824 0.027759
4 (1) 0.011672 0.022242
9 (5) 0.006906 0.017625
1 0.000903 1.417132
BBCA RW1 2 0.001283 1.286490
4 0.001460 0.771824
9 0.001283 0.546065
Fig. 8. BBCA stock prediction result of the dataset 2011-2020 using
1 0.089042 6.414885
Random Walk 2
RW2 2 0.057918 1.888152
4 0.026919 1.764594
9 0.016703 0.708617
1 (5) 0.021807 0.019997
LSTM 2 (8) 0.016049 0.011694
4 (1) 0.016606 0.011682
9 (5) 0.012707 0.011419
1 0.001847 0.564671
BJBR RW1 2 0.002398 0.435946
4 0.000744 0.392123
9 0.000646 0.361881
1 0.041567 1.658118
RW2 2 0.038364 1.439865
Fig. 9. BJBR stock prediction result of the test dataset 2020 using LSTM 4 0.042495 1.485498
9 0.033218 1.019239

For more reference of data, table III describe the best


result of every train dataset period used. The data showed that
the more train dataset used, the better result will be produced.

389 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

This can be applied also not only in LSTM but also in first [5] Csáji, B. C. (2001). Approximation with artificial neural networks.
Faculty of Sciences, Etvs Lornd University, Hungary, 24(48), 7.
order and second order Random Walk.
[6] Kukreja, H., Bharath, N., Siddesh, C. S., & Kuldeep, S. (2016). An
VI. CONCLUSION AND FUTURE WORK introduction to artificial neural network. Int J Adv Res Innov Ideas
Educ, 1, 27-30.
This paper conducted an experiment about comparing the [7] Sherstinsky, A. (2020). Fundamentals of recurrent neural network
prediction result using LSTM and Random Walk model to get (RNN) and long short-term memory (LSTM) network. Physica D:
the best result in predicting Indonesian stock market price. Nonlinear Phenomena, 404, 132306.
There are some findings and insights that are discovered such [8] Roondiwala, M., Patel, H., & Varma, S. (2017). Predicting stock prices
using LSTM. International Journal of Science and Research (IJSR),
as: 1) The normalization for Random Walk is neccessary, 6(4), 1754-1756.
because the program could stop due to the Run-Time Error [9] Fausett, L. V. (1994). Fundamentals of Neural Networks. Prentice-
especially for a large number computation. This problem Hall.
occurs when we use the BBCA dataset (which has largest [10] Lawler, G. F., & Limic, V. (2010). Random Walk : A Modern
value of dataset) and this problem solved after we try to do Introduction (Cambridge Studies in Advanced Mathematics).
Cambridge: Cambridge University Press.
translation for every row of data using the mean value of data. doi:10.1017/CBO9780511750854
2) More dataset used for the prediction, the RMSE for fitting [11] Van Horne, J. C., & Parker, G. G. (1967). The Random-Walk Theory:
and prediction (RMSEP) will decreased. This happen because An Empirical Test. Financial Analysts Journal, 23(6), 87-92.
when the knowledge for the model is big enough, the model [12] Zhang, Z. (2018, June). Improved adam optimizer for deep neural
can produce a good weight matrices that can produce an networks. In 2018 IEEE/ACM 26th International Symposium on
optimal result. Quality of Service (IWQoS) (pp. 1-2). IEEE.
Therefore, we can conclude that LSTM model produce the [13] Lu, H., Li, Y., Chen, M., Kim, H., & Serikawa, S. (2018). Brain
intelligence: go beyond artificial intelligence. Mobile Networks and
best result compared to the both first and second order Applications, 23(2), 368-375.
Random Walk model. Random Walk can not produce the [14] Mitchell, R., Michalski, J., & Carbonell, T. (2013). An artificial
better RMSEP compared to the LSTM because the INLA intelligence approach. Berlin: Springer.
itself computes the posterior marginal distributions. So, when [15] Haykin, S. (2009). Neural Networks and Learning Machine. Ontario:
we try to plot the prediction results for the future as the Pearson.
posterior expectations, we will get some straight line from [16] Karim, F., Majumdar, S., Darabi, H., & Chen, S. (2017). LSTM Fully
both first order and second order Random Walk. But if we Convolutional Networks for Time Series Classification. IEEE access,
6, 1662-1669.
want to try to modify the result, we can try to use the function
[17] Medsker, L. R., & Jain, L. C. (2001). Recurrent Neural Networks
of inla.posterior.sample(), which the function create - sample Design and Applications. CRC Press.
(- is number of samples to draw within user setup) from the [18] Mikolov, T., Karafiát, M., Burget, L., Černocký, J., & Khudanpur, S.
approximate joint posterior distribution of the latent effects (2010). Recurrent neural network based language model. Eleventh
and also the hyperparameters. The sample can be in different annual conference of the international speech communication
association.
set of data, since the sample also implement the Random
[19] Ojo, S. O., Owolawi, P. A., Mphahlele, M., & Adisa, J. A. (2019). Stock
Walk model and always different set of data which directly Market Behaviour Prediction using Stacked LSTM Networks. 2019
cause a difference of RMSEP for each sample if we want to International Multidisciplinary Information Technology and
compare it with the real data. Engineering Conference (IMITEC), 1-5.
LSTM in this case is an effective model in predicting the [20] Rue, H., Nartino, S., & Chopin, N. (2009). Approximate Bayesian
pattern of large amounts data. However, there are some inference for latent Gaussian models by using integrated nested Laplace
approximations. Journal of the royal statistical society: Series b
limitations in this research that may be developed for the (statistical methodology), 71(2), 319-392.
future work. The stock prediction only focused on comparing [21] Martins, T. G., Simpson, D., Lindgren, F., & Rue, H. (2013). Bayesian
the best model between LSTM and Random Walk, but not try computing with INLA: new features. Computational Statistics & Data
to predict long period such as one-full year of future data. Analysis, 67, 68-83.
Also, the external condition like geopolitic issue, pandemic [22] Gómez-Rubio, V. (2020). Bayesian inference with INLA. CRC Press.
(e.g. Covid-19 global pandemic), homeland security, and etc [23] Wang, X., Yue, Y. R., & Faraway, J. J. (2018). Bayesian regression
modeling with INLA. Chapman & Hall/CRC.
may effect the stock price movement.
[24] Patel, J., Shah, S., Thakkar, P., & Kotecha, K. (2015). Predicting stock
ACKNOWLEDGMENT market index using fusion of machine learning techniques. Expert
Systems with Applications, 42(4), 2162-2172.
The authors want to thank Finn Lindgren and Håvard Rue [25] Arfan, A., & Lussiana, E. T. P. (2020). Perbandingan Algoritma Long
for helping the authors to finish the Random Walk part from Short-Term Memory dengan SVR Pada Prediksi Harga Saham di
the code generation until fixing errors. Indonesia. Petir: Jurnal Pengkajian dan Penerapan Teknik Informatika,
13(1), 33-43.
REFERENCES [26] Rachmawati, R. N. (2021). Estimation of Extreme Rainfall Patterns
Using Generalized Linear Mixed Model for Spatio-temporal data in
[1] Khaidem, L., Saha, S., & Dey, S. R. (2016). Predicting the direction of West Java, Indonesia. 5th International Conference on Computer
stock market prices using random forest. ArXiv, abs/1605.00003. Science and Computational Intelligence 2020, 179, 330–336.
[2] AS, S. (2013). A study on fundamental and technical analysis. https://doi.org/doi.org/10.1016/j.procs.2021.01.013
International Journal of Marketing, Financial Services & Management [27] Djuraidah, A., Rachmawati, R. F. N., Wigena, A. H., & Mangku, I. W.
Research, 2(5), 44-59. (2021). Extreme Data Analysis using Spatio-Temporal Bayes
[3] Nti, I. K., Adekoya, A. F., & Weyori, B. A. (2019). A systematic review Regression with INLA in Statistical Downscaling Model. International
of fundamental and technical analysis of stock market predictions. Journal of Innovative Computing, Information and Control, 17(1), 259-
Artificial Intelligence Review, 1-51. 27.
[4] Heaton, J. B., Polson, N. G., & Witte, J. H. (2017). Deep learning for
finance: deep portfolios. Applied Stochastic Models in Business and
Industry, 33(1), 3-12.

390 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Exploration of React Native Framework in


designing a Rule-Based Application for healthy
lifestyle education
Anik Hanifatul Azizah Siti Zuliatul Faidah Muhammad Bahrul Ulum Putri Handayani
Information System Informatics Engineering Informatics Engineering Public Health
Universitas Esa Unggul Universitas Esa Unggul Universitas Esa Unggul Universitas Esa Unggul
Jakarta, Indonesia Jakarta, Indonesia Jakarta, Indonesia Jakarta, Indonesia
[email protected] [email protected] [email protected] [email protected]

Abstract— Researches indicates the implementation of hybrid the University of Riau did not exercise regularly, based on
applications is more profitable for mobile application the above explanation for the application of a healthy
development solutions. Hybrid application combine the lifestyle, many people in Indonesia are still not aware of it
advantages of web and native application. React Native is a [7]. The importance of an information or reminder of the
hybrid framework for developing mobile applications. React lifestyle carried out by the community is still lacking such as
Native framework can create within two platform applications
by compiling the code written in React. The utilization of React
lack of awareness in drinking water, sports activities, lack of
Native in a rule-based application can build a solution for sleep, the number of calories of food consumed with the
healthy lifestyle education. The aim of this study is to build a number of calories burned during activities. Therefore, it is
rule-based application for healthy reminder in daily activities. not uncommon for many people of productive age to be
By developing an application in React Native, the study will infected with various diseases.
design a comprehensive mobile application that make users Marpaun (2018) states, humans cannot be
easy to use. Moreover, this study will explore and construct an separated from smartphones and become part of their
application that guide the user to maintain their daily health. lifestyle [8]. The increase in the field of mobile platforms
To develop the application, author uses waterfall methodology. (Android, Ios, Windows Phone) causes the development of
Before building the application, a systematic survey was
conducted to gain relevant data from the users and also to
native mobile applications to be inefficient both in terms of
invent a rule-based that will be the way of thinking of the time and development costs [9] [10] [11]. Therefore, the
application design. The results indicate that React Native implementation of hybrid applications is more promising in
framework can be utilized in building a reminder application mobile application development solutions. React Native is a
about healthy lifestyle education. This study built a mobile hybrid framework for developing mobile applications [12]
application product that has good performance and helps users [13] [14] [15].
change the lifestyle in order to improve the quality of a healthy Based on the description previously explained, the
lifestyle. purpose of this research is to create an application, namely
Exploration of React Native Framework in designing a
Keywords— Rule-based application, React Native, daily Rule-Based Application for healthy lifestyle education.
reminder, healthy lifestyles.
Regarding the purpose of this paper, some research
questions are derived as follows:
I. INTRODUCTION
1. How to design a system to determine the health quality
A healthy lifestyle is a pattern of living habits that
of users by utilizing react native technology.
adhere to the principle of maintaining health. Consuming
2. How users can understand and monitor health quality to
sufficient mineral water in a day is a form of effort in
implement a healthy lifestyle.
maintaining health [1]. In addition, Mander (2012) revealed,
3. How to build a system regarding healthy lifestyle
exercising, eating a balanced diet, and getting adequate rest
education by using rule-based theory.
are other efforts to maintain health [2]. Our bodies need
calories from the food we eat as a source of energy to carry
out daily activities, excess calories or lack of calories in the II. LITERATURE REVIEW
body is not good for health, if the body lacks calories it will
React Native is a framework for creating mobile
also cause the body to become weak and underweight. The
application using JavaScript code. The React Native
application of a healthy lifestyle needs to be done from the
framework has a set of components for the iOS and Android
start so that it has a positive impact on the body [3] [4].
platforms to build mobile apps with a truly native look.
According to research in Pacitan district which
Using the React Native Framework, we can render User
states that 78.6% consume less water [5]. Another study by
Interfaces for both iOS and Android platforms. React Native
Jaka, Darwin, and Ari (2015), as many as 82.2% of the 197
is an Opensource framework, which could be compatible
sample of Riau students had poor sleep hours [6].
with other platforms such as Windows or MacOS [16] [17]
Furthermore, research conducted by Rony, Eka, and Elda
[18] [19].
(2015) found that 81.93% of medical student respondents at
React Native uses native components from Android

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

391 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

and IOS but remains in the JavaScript programming To figure out the daily calorie needs for men and
language. Basically, React Native is native, unlike Phonegap women, the author uses the TDEE (total daily energy
/ cordova, which is mobile web-based. The development of expenditure) calculation, which calculating how many
React Native could be faster, many companies are using this calories people usually burn in a day depending on the
framework because of its convenience and performance [20] people activity level. Before calculating the TDDE (total
[21]. daily energy expenditure), a formulation was made using the
The principle of React Native is actually identical to Mifflin-St. Jeor which involves calculating weight, height,
that of React except that React Native does not manipulate and age [27].
the DOM through the Virtual DOM. It runs in a background
process (which interprets JavaScript written by the III. METHODOLOGY
developer) directly on the end device and communicates
with the native platform via serialization, asynchronous and A healthy lifestyle is a life habit that adheres to the
batched bridges [21]. React components encapsulate principle of maintaining health. There are various factors
existing native code and interact with native APIs through that affect a healthy lifestyle, some of these factors include:
React's declarative UI paradigm and JavaScript. This age, gender, occupation and weight. Factors Supporting a
enables the development of native apps for the entire team healthy lifestyle:
of new developers, and can make existing native teams work 1. Body Mass Index (BMI)
faster [22] [23]. 2. The habit of drinking enough water each day
Rule-based system is created to solve problems with 3. Total sleep
rules made based on expert knowledge. The rule has a 4. Sufficient physical activity
condition(if) and an action(then). These regulations will be 5. Maintain a normal weight
entered into the application engine. With some pattern 5. Consume various of health food
matching and applier rules. The engine will match the
existing settings and determine the corresponding rules. The method of the study was started with the survey.
Rule-based is easy to use and understand, but rule-based The survey was held to obtain the data about previous
cannot create new rules or modify existing rules by itself application exist regarding this topic and develop a
because rule-based is not designed to be able to learn [24]. comparison between proposed application and other
Supporting Factors for a Healthy Lifestyle 1. Body applications. The questionnaire also distributed to 166
Mass Index (BMI). BMI or Quatelet index is a form of respondents which are resulted 83% of the respondents are
measurement or screening method used to measure body still have not implemented a healthy lifestyle and 86% of the
composition as measured by weight and height which is 166 respondents have difficulty implementing a healthy
then measured by the BMI formula [25]. BMI is a value lifestyle.
taken from the calculation between a person's weight (BB) Author applied waterfall methodology to implement the
and height (TB). In Indonesia, BMI is categorized into 4 application. The methodology is divided into five stages,
levels, namely thin, normal, fat and obese. Physical activity namely requirements definition and analysis, system and
is any movement produced by skeletal muscles that requires software design, implementation and unit testing, integration
energy expenditure. Lack of physical activity is an and system testing and also operation and maintenance [28]
independent risk factor for chronic disease and overall is [29].
estimated to cause death globally (Kemenkes RI, 2014)
According to the Indonesian Ministry of Health, the
BMI threshold is determined by referring to the provisions IV. RESULT AND DISCUSSION
of FAO/WHO. For the benefit of Indonesia, the threshold Several factors causing people experienced unhealthy
was modified based on clinical experience and research lifestyle are laziness, lack of reminder activity, environment
results in several developing countries. The BMI threshold factors and they don’t know how to maintain a healthty
for Indonesia is as follows: lifestyle properly. To help users solving these problems,
proposed idea were provided several features as follows:
• The feature displays the calories information needed
• The feature shows information on the number of
calories of food consumed
• The feature shows information on the number of
calories burned according to the type of activity
• The feature displays the user's sleep time information
• The feature displays the amount of water consumed
by the user
Figure 1. Weight status according to WHO • The feature displays information about Body Mass
Index (BMI), Body Mass Index (BMI) status and
To calculate the Body Mass Index (BMI) in order to ideal weight
determine a person's nutritional status obtained from the The study developed by using React native version
comparison of weight and height. The BMI value is 0.63.4. The dependencies used in application development
calculated to determine whether the nutritional status of the are React Native Firebase, React Navigation, React Native
body is normal or not, by calculating weight divided by Paper, React Native Picker and supporting components. The
height times height [26].

392 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

database structure was developed in three roots, they are


activity, food and users.

Figure 4. Use case diagram for admin

Information architecture design of the application were


developed in three main forms namely, Profile, Summary
and Home which Summary form was started identification
of the people age, gender, height, weight and activity levels.
The Profile form was consists of those registered
information in the beginning while the users log in.
afterward the Summary form was consist of date, BMI total,
amount of water consumed, total of calories consumed, total
of calories burned and BMI status. While Home form were
involved categorizing the activities of the user, identifying
Figure 2. Database structure
calories needed, identifying amount of water and sleep time
needed and also remaindering the unappropriated status
The use case diagram for users of the application was
from overall above health conditions.
shown in Figure 3 and also the use case diagram for admin
Below are the main menu from React Native
was shown in Figure 4.
framework built in this study, they are home interface,
calories form interface, sleep information and summary of
BMI status.

Figure 3. use case diagram for user

Figure 5. User interface design

393 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

mobile applications. In Proceedings of the 6th Balkan Conference in


The application was developed in Indonesian language Informatics (pp. 213-220).
because the users would be public citizen in Indonesia with
[10] Batmetan, J. (2018). Studi Komparasi Tingkat Adopsi Sistem
various kind of culture, ethnic, age and social status. The
Operasi Berbasis Smartphone Pada Generasi Milenial (Studi Kasus
user interface designed in various features to support
di Universitas Negeri Manado).
[11] Pressman, R. S. & Maxim, B. R. (2015). Software Engineering A
V. CONCLUSION Practitioners’s Approach Eighth Edition. 8th ed
This study contributes to help users maintain a healthy [12] Danielsson, W. (2016). React Native application development.
lifestyle and tell the users with BMI (Body Mass Index) Linköpings universitet, Swedia, 10(4).
features, the measurement of the number of calories of food [13] Hansson, N., & Vidhall, T. (2016). Effects on performance and
consumed, recording the number of calories burned, the
usability for cross-platform application development using React
number of hours the user sleeps, the amount of water
Native. 92.
consumed by the user, and information on the number of
calories for each activity and meal. This application could [14] Lelli, A., & Bostrand, V. (2016). Evaluating Application Scenarios
help change the user's lifestyle in order to balancing the with React Native. November, 63.
body's calories consumed and activity level. This study [15] Wijonarko, D., & Aji, R. F. (2018). Perbandingan Phonegap Dan
helps users change the lifestyle in order to improve the React Native. 1(2), 1–7.
quality of a healthy lifestyle with daily record. [16] Eisenman, B. (2017). Learning React Native, 2nd Edition.
The study has made a Reminder Application about [17] Masiello, E., & Friedmann, J. (2017). Mastering React Native. Packt
Healthy Lifestyle Education by understanding the concepts
Publishing Ltd.
in the React Native Framework, the use of styling, and also
[18] Wu, W. (2018). React Native vs Flutter, Cross-platforms mobile
components in the React Native Framework.
Future suggestion would be helpful to develop more application frameworks.
effective application, such as, embedded more detail [19] Zammetti, F. (2018). React native: a gentle introduction. In Practical
information on food and physical activity, augmented the React Native (pp. 1-32). Apress, Berkeley, CA.
application with location authentication would be a [20] Paul, A., & Nalwaya, A. (2019). React Native for Mobile
powerful development and increase the amount of activity Development. React Native for Mobile Development. California:
by a realtime activity which users done can increase the Apress, Berkeley, CA. https://doi. org/10.1007/978-1-4842-4454-8.
accuracy of the application.
[21] Fentaw, A. E. (2020). Cross platform mobile application
development: a comparison study of react native vs flutter.
[22] Anggit, L., Pamungkas, B., Informatika, F., & Telkom, U. (2020).
REFERENCES Analisa Perbandingan Kinerja Cross Platform Mobile Framework
[1] An, R., & McCaffrey, J. (2016). Plain water consumption in relation React Native dan. 7(1), 2195–2203
to energy intake and diet quality among US adults, 2005–2012. [23] Abrahamsson, R., & David, B. (2017). Comparing modifiability of
Journal of Human Nutrition and Dietetics, 29(5), 624–632. React Native and two native codebases. 52.
[2] Mander, T. (2012). Better life better health - Lifestyle and diet for a [24] Chen, H.; Jakeman, A. J.; Norton, J. P. 2008. Artificial Intelligence
healthy future. Menopause International, 18(4), 123–124. techniques: An introduction to their use for modeling environmental
[3] Public Health England. (2014). Everybody Active, Every Day About systems. Mathematics and Computers in Simulation 78, p(379–400).
Public Health England. Public Health England, October, 1–23. [25] Destiara, F., Hariyanto, T., & Ragil, C. A. (2016). Hubungan Indeks
[4] Suharjana. (2012). Kebiasaan Berperilaku Hidup Sehat Dan Nilai- Massa Tubuh (IMT) dengan Body Image pada Remaja di Asrama
Nilai Pendidikan Karakter. 2, 189–201. Putri Sangau Malang. Journal Nursing News, XI(1), 31–37.
[5] Aprillia, D. D. (2015). Konsumsi Air Putih, Status Gizi, Dan Status [26] Dattilo, A. M., & Saavedra, J. M. (2020). Nutrition education:
Kesehatan Penghuni Panti Werda Di Kabupaten Pacitan. Jurnal Gizi application of theory and strategies during the first 1,000 days for
Dan Pangan, 9(3), 167–172. healthy growth. In Nutrition Education: Strategies for Improving
[6] Jaka Sarfriyanda & Darwin Karim & Ari Pristiana Dewi. (2015). Nutrition and Healthy Eating in Individuals and Communities (Vol.
Hubungan Antara Kualitas Tidur Dan Kuantitas Tidur Dengan 92, pp. 1-18). Karger Publishers.
Prestasi Belajar Mahasiswa. JOM Vol. 2 No. 2, 2(37), 1–31. [27] Mugnolo, C. E. (2021). The Adolescent in American Print and
[7] Wahyudi, R., Bebasari, E., & Nazriati, E. (2017). Gambaran tingkat Comics (Doctoral dissertation, UC Irvine).
stres pada mahasiswa Fakultas Kedokteran Universitas Riau tahun [28] Yeung, M. K., & Chan, A. S. (2021). A systematic review of the
pertama. Jurnal Ilmu Kedokteran, 9(2), 107-113. application of functional near-infrared spectroscopy to the study of
[8] Marpaun, J. (2018). Pengaruh Penggunaan Gadget Dalam cerebral hemodynamics in healthy aging. Neuropsychology review,
Kehidupan. KOPASTA: Jurnal Program Studi Bimbingan 31(1), 139-166.
Konseling, 5(2), 55–64. [29] Kuitunen, M. (2019). Cross-Platform Mobile Application
[9] Xanthopoulos, S., & Xinogalos, S. (2013, September). A Development with React Native.
comparative analysis of cross-platform development approaches for

394 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Design of Water Information Management System


in Palm Oil Plantation
Andreas Wahyu Krisdiarto Eddy Julianto Irya Wisnubhadra
Faculty of Agricultural Technology Faculty of Industrial Engineering Faculty of Industrial Engineering
Institute of Agriculture STIPER Atma Jaya Yogyakarta University Atma Jaya Yogyakarta University
Yogyakarta, Indonesia 55282 Yogyakarta, Indonesia 55281 Yogyakarta, Indonesia 55281
[email protected] [email protected] [email protected]

Teddy Suparyanto Digdo Sudigyo Bens Pardamean


Bioinformatics and Data Science Bioinformatics and Data Science Bioinformatics and Data Science
Research Center Research Center Research Center
Bina Nusantara University Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia 11480 Jakarta, Indonesia 11480 Jakarta, Indonesia 11480
[email protected] [email protected] [email protected]

Abstract—The water level on peatlands is a critical factor in ditches/channels so that the groundwater in peatlands is
the production of oil palm plantations on peatlands because oil appropriate to the needs of the plants being cultivated.
palm requires water but should not be inundated. The optimal Control of water availability by installing a sluice gate
water depth from the surface should be controlled from 70 to 80 (flapgate) that can regulate the water level of the peat soil
cm by opening or closing the drain gate. Currently, most
while keeping water out of the land [4]. The drainage network
measurements are made with a piezometer. Then the opening
and closing of the sluice gate at the end of the primary channel must be able to be used to drain stagnant water quickly. This
are done manually. In many cases, the distance between the system is necessary because delays in drying will cause
plantation block and the floodgate is quite far and is limited by agricultural production to decrease or plants to rotten [5]. In
inaccessible infrastructure (roads), so opening or closing take a the use of peatlands for oil palm plantations, care must be
lot of time and money. Currently, a water level measurement taken so that the peat does not become inundated for a long
system has been designed automatically using a microcontroller. time to support plant growth, but also must not be too dry so
This study aims to develop a design of a drainage management that the peat does not dry out irreversibly and form pseudo-
information system in oil palm plantations. The information sand. Singh et al [6] states that a good water table depth for
system includes the water level of peatlands data in oil palm
oil palm is 50-75 cm. The water level in the secondary canal
plantations taken from the water level sensor in the drainage
system. The system uses the data from the water level sensor to is generally maintained at a depth of 60-70 cm during the
manage the drainage system's sluice gates. This system was rainy season. In the dry season, the water level in the middle
developed using the SDLC waterfall method. The final result of of the oil palm plantation blocks can reach 50 cm. In this
this research was a design of a water information management condition, the water level in the secondary canal can reach
system that can regulates automatic and real-time peatland oil 70–80 cm. The distance of the secondary channel to the water
palm plantations water level. level measurement in the center of the block is 150 m [7].
Water level measurements in oil palm blocks generally
Keywords—Oil palm, water level, automatic control, use a piezometer, which is measure manually [8], [9]. Then
information management system
the opening and closing of the floodgates are also done
I. INTRODUCTION manually by the operator. The distance between the water
level measurement points in the planting block and the
Palm oil is the massive foreign exchange contributor from
floodgates ranges from 2-20 km, with varying access. In
agriculture for Indonesia, with a land area of 14.3 million
some locations, even to reach the floodgates, the operator
hectares in 2018, with around 1 million hectares of oil palm
must use a speed boat. During the rainy season, the
plantations planted in peatland areas [1], [2].
floodgates’ controll are frequently delayed, resulting in
Oil palm is a plant that requires water but cannot stand
flooded oil palm trees. Because oil palm trees' roots cannot
flooding. The optimum groundwater table depth for oil palm
withstand flooding, their productivity is hampered. In
plantations on drained peatlands ranges from 60-85 cm [3].
addition, if it continues to be inundated for up to 2-3 days, the
When the water depth is less than 45 cm from the soil surface,
palm tree can die. Likewise, if the condition of the door is
the roots will be inundated, which can cause the tree to rot.
open (drained water) until it is too shallow, the oil palm tree
When the groundwater depth is more than 85 cm, the roots
will lack water. Lack of water in the land causes late
cannot absorb water well, and the tree will dry out.
flowering, then production of Fresh Fruit Bunches (FFB)
Controlling the water level is vital because excess (flooded)
decreases. Speed and accuracy of sluice gate settings are
or lack of water can cause oil palm production losses of up to
significant to maintain oil palm productivity. The digital
40%.
water level monitoring tool developed by Krisdiarto [10] can
In water management on peatlands, the most important
be the basis for applications in oil palm plantations that have
thing to pay attention to is the groundwater level. The
a considerable distance between the plant blocks and the
groundwater level must be adjusted so that the land is not too
floodgates.
wet so that plants can grow well. However, peatland moisture
The current regulatory system has delays and costs quite
must also be maintained so that the peat does not dry out. One
a bit in long distances or inadequate access. However, these
of the water management techniques in peatlands is to make
problems solutions can be created to be more efficient,

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

395 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

precise, and faster with the application of automation Maintenance. Figure 1 shows the SDLC with the Waterfall
technology. The application connects the measurement of the model from Ian Sommerville [30]. This paper will only
water level in the block. This technology operates automatic present the first three stages in SDLC, which are (1)
sluice opening and closing mechanism at the end of the water Requirement Definition and (2) System and Software Design.
channel. This drainage automation requires a system for
collecting and managing water level data as well as relating
land condition parameters. III. RESULT AND DISCUSSION
Data collection and management to build an information A. Requirements Definition
system related to environmental/ecological [11]–[16] and
Istomo [31] states that uncontrolled drainage can result in
agricultural production data [17]–[22] has been extensively
subsidence because peat has non-re-wet-able or irreversible
used in answering agronomics problems [23]–[27]. In
drying characteristics. This characteristic means that once it
research, both are the main focus in increasing yields
experiences excessive dryness (over drained), it cannot return
production and anticipating natural events through an
to its original physical properties. The destructive colloidal
automated monitoring system. Since drainage in peatlands is
nature of peat causes the peat to no longer hold water. Peat
one of the main problems affecting oil palm quality, a
depth usually ranges from 1-8 m. Although in some areas,
monitoring system is needed using water level data in
peat depth can reach 24 m [32].
drainage.
The water level in drainage channels in water systems in
This study aimed to develop a design of a drainage
oil palm plantations is expected not to exceed 40 cm below
management automation system on peatland oil palm
the ground surface [33]. The water level limit serves to (1)
plantations. The information system includes water level data
keep the peat wet, (2) prevent the road from collapsing due to
of oil palm plantations taken from the water level sensor in
the high cost of road construction on peatlands and efforts to
the drainage system. The system analyzes data from the water
maintain the service time are one of the main priorities for
level sensor to determine how to control the sluice gates in
cultivation, and (3) reducing the rate of land subsidence.
the drainage system. The design of the drainage management
Excessive land subsidence can result in (1) Land
information system is expected to be able to regulate the
subsidence in the peat-downstream area, lowering the peat
drainage system in oil palm plantations efficiently, precisely,
soil level below river level in the future. This subsidence
and quickly. The water level in the oil palm block is
reduces the cost of cultivating peatlands; (2) Trees become
controlled so that the plant health is maintained and the
more prone to tilt and collapse, which reduces productivity;
productivity of the oil palm is guaranteed.
(3) Minimizes the risk of Cu nutrient deficiency. Fertilizers
II. METHODOLOGY for this mineral are relatively expensive, delays in correcting
fertilizer deficiencies can result in plant death; (4) Prevents
This research created a design for an automated drainage oxidation of the pyrite layer (in acid sulfate peatlands). The
management system for oil palm plantations on peatland. The oxidation of the pyrite layer results in plant poisoning.
aim was to develop a Weed E-Catalog. Weed E-Catalog is an
Especially for acid sulfate peatlands, temporary flows
electronic database of various types of weeds and an
originating from tidal rivers are regulated to facilitate land
information system that can be accessed online.
washing and overcome drought; and (5) Maintain a well-
maintained water level in drainage channels, especially
during the dry season. Which one, maintenance functions as
a natural firebreak to reduce the possibility of fire
propagation between blocks.
The main buildings that function to maintain the water
level are canal blocking and sluice gates. Based on best
practices issued by RSPO (2013), for every 20 cm water level
drop in the collection drain, one canal block is created.
However, optimization of the number and placement of canal
blocks must be based on (1) Detailed contour maps, (2)
Velocity of water flow, (3 ) Map of the direction of water
flow in the drainage system during the dry and rainy seasons,
(4) Historical maps of floods, droughts, and fires (if any), (5)
Data/information on the maximum and minimum water
levels of rivers, as well as maximum and minimum flood
heights in each affected block, (6) Data on the rate of land
subsidence, and (6) Daily monitoring data on the water level
Fig. 1. Waterfall Model [10]
in the primary (main drain) and secondary (collection drain)
ditches [34].
The research method used in the study is the Software
Development Life Cycle (SDLC) with the Waterfall model
[28]. The waterfall model was proposed by Winston W. 1) Sluice Control with Microcontroller
Royce in 1970 to describe a possible software engineering Gupta (2018) conducted a study on automatic irrigation
practice [29]. SDLC consists of 5 stages, namely (1) of plants using the PIC16F877A microcontroller [35].
Requirements Definition, (2) System and software design, (3) Samsugi also performed the same research with a different
Implementation, (4) Testing & Integration, and (5) controller, namely Arduino [36]. Furthermore, Bishnu and

396 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

colleagues conducted a similar study using the ATMEGA- Figure 2 shows the scheme of waterways in oil palm
328 microcontroller and concluded that this Automatic Plant plantations [40].
Irrigation System succeeded in meeting the desired goals. The water level sensor is placed on each Collection Drain.
The hardware and software used to perform their functions Sensors and motor controllers for each Flap-gate will be
properly to produce the desired results required for farmers in connected to the Arduino module. Arduino will later carry
the irrigation field. The designed system helps farmers to out the sending data process via the GSM module to the
carry out the irrigation process at night as well. The system Drainage Information System server. The data is obtained by
design does not require the physical presence of farmers the server according to the required limits. After that, the
during irrigation in the fields. The system is monitored server will send commands to the flap gate motor controller
automatically and controls the pump on and off [37]. via the GSM module in the Arduino. The same is true for
In connection with a large field of land, wireless sensors and motor controllers on the Main Drain. Figure 3
technology can overcome this. Aruna has implemented shows a schematic of the sensor and motor controller
wireless technology on a sensor network that is placed in installation.
several locations. As a result, wireless technology can
provide data to the server. Each sensor sends data to the
server wirelessly. The system was built using a 16F877A
microcontroller and a ZigBee wireless module [38]. To reach
a wider area, Suresh uses GSM wireless technology instead
of ZigBee [39].

B. System Design
The main principle of water management in peatlands is
"water level elevation in drainage canals should be
maintained as high as possible, but is still expected to provide
optimum groundwater depth for plant growth". The minimum
groundwater depth that is still very possible for plant growth
is also known as the optimum groundwater depth. Such depth Fig. 3. Sensor and Motor Controller Installation Schematic
of groundwater allows a positive effect on plant growth and
soil maturity. The water level setting limit for each Field Drain is
between 40-60 cm [41]. The water level in each Filed Drain
will certainly not be much different from the Collection Drain
in each plantation block. If the sensor has read the water
position at the upper or lower limit of the settings made, the
Arduino will send data to the server. Via Arduino, the server
will instruct the motor controller on the block to open or close
the gate. The same thing was also done for Main Drain.
Figure 3 shows a flow diagram of the gate arrangement on
the plantation.
IV. CONCLUSION
1. Production of oil palm production on peatland area can be
optimizated by automation control of water level.
2. Water Information Management System is an important
support to water level automation control by sluice
opening and closing mechanism. It consists of 1) monitor
sensor system according to water channels design, 2)Data
transfer system,3) Data analysis system, 4)Water gate
opening and closing mechatronic system.
REFERENCES
[1] Direktorat Jenderal Perkebunan, “Statistik Perkebunan Indonesia
2017-2019: Komoditas Kelapa Sawit,” 2018. [Online]. Available:
Fig. 2. Drainage schemes in oil palm plantations [40] http://ditjenbun.pertanian.go.id/?publikasi=buku-statistik-kelapa-
sawit-palm-oil-2011-2013.
The information system developed in this study includes
[2] M. Sihombing, “PP Gambut Kurangi Lahan Sawit Indonesia 1 Juta
three types of channels, namely (1) Field drainage, which
functions to trap existing water and drain it on the ground Ha,” Bisnis.com, 2017. https://ekonomi.bisnis.com/read/20170427/99/
surface; (2) Collection drains, that function to collect water 648893/indonesia.
from a specific area and drain it to the disposal; (3) Outlet [3] S. E. Page, R. Morrison, C. Malins, A. Hooijer, J. O. Rieley, and J.
drains, that function to remove water from the specific area. Jauhiainen, “Review of peat surface greenhouse gas emissions from oil
palm plantations in Southeast Asia,” ICCT white Pap., vol. 15, pp. 1–

397 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

78, 2011. smallholder-owned plantations,” 2021 IOP Conf. Ser. Earth Environ.
[4] S. M. Napitupulu and B. Mudian, “Pengelolaan sumber daya air pada Sci., 2021.
lahan gambut yang berkelanjutan,” in Proceedings ACES (Annual Civil [21] E. Firmansyah, D. Nurjannah, S. Dinarti, D. Sudigyo, T. Suparyanto,
Engineering Seminar), 2016, vol. 1, pp. 330–337. and B. Pardamean, “Learning Management System for oil palm
[5] B. Slamet, “Manajemen Hidrologi di Lahan Gambut,” J. Lestari, vol. smallholder-owned plantations,” 2021 IOP Conf. Ser. Earth Environ.
1, no. 1, pp. 101–105, 2016. Sci., 2021.

[6] R. P. Singh, A. Embrandiri, M. H. Ibrahim, and N. Esa, “Management [22] E. Firmansyah, B. Pardamean, C. Ginting, H. Gahara, D. Putra
of biomass residues generated from palm oil mill: Vermicomposting a Suwarno Saragih, and T. Suparyanto, “Development of Artificial
sustainable option,” Resour. Conserv. Recycl., vol. 55, no. 4, pp. 423– Intelligence for Variable Rate Application Based Oil Palm Fertilization
434, 2011. Recommendation System,” 2021 International Conference on
Information Management and Technology (ICIMTech), 2021.
[7] S. Sabiham and S. Sukarman, “Pengelolaan lahan gambut untuk
pengembangan kelapa sawit di Indonesia,” J. Sumberd. Lahan, vol. 6, [23] B. Pardamean, J. W. Baurley, A. S. Perbangsa, D. Utami, H. Rijzaani,
no. 2, 2012. and D. Satyawan, “Information technology infrastructure for
[8] P. D. Turner, R. A. Gillbanks, and others, Oil Palm Cultivation and agriculture genotyping studies,” J. Inf. Process. Syst., vol. 14, no. 3, pp.
Management. Kuala Lumpur: Incorporated Society of Planters., 2003. 655–665, 2018.

[9] S. D. Tarigan, “Neraca Air Lahan Gambut Yang Ditanami Kelapa [24] J. W. Baurley, A. S. Perbangsa, A. Subagyo, and B. Pardamean, “A
Sawit di Kabupaten Seruyan, Kalimantan Tengah,” J. Ilmu Tanah dan web application and database for agriculture genetic diversity and
Lingkung., vol. 13, no. 1, pp. 14–20, 2011. association studies,” Int. J. Bio-Science Bio-Technology, vol. 5, no. 6,
pp. 33–42, 2013.
[10] A. W. Krisdiarto, “Alat Ukur Tinggi Muka Air Lahan Perkebunan,”
S00201811112, 2018. [25] R. E. Caraka et al., “Rainfall forecasting using PSPline and rice
production with ocean-atmosphere interaction,” in IOP Conference
[11] R. E. Caraka, S. A. Bakar, B. Pardamean, and A. Budiarto, “Hybrid
Series: Earth and Environmental Science, 2018, vol. 195, no. 1, p.
support vector regression in electric load during national holiday
12064.
season,” in 2017 International Conference on Innovative and Creative
Information Technology (ICITech), 2017, pp. 1–6. [26] A. Budiarto et al., “SMARTD Web-Based Monitoring and Evaluation
System,” in 2018 Indonesian Association for Pattern Recognition
[12] R. E. Caraka et al., “Analysis of plant pattern using water balance and
International Conference (INAPR), 2018, pp. 172–176.
cimogram based on oldeman climate type,” in IOP Conference Series:
Earth and Environmental Science, 2018, vol. 195, no. 1, p. 12001. [27] H. Soeparno, A. Perbangsa, and B. Pardamean, “Best Practices of
Agricultural Information System In the Context of Knowledge and
[13] R. E. Caraka et al., “Generalized Spatio Temporal Autoregressive
Innovation,” 2018, doi: 10.1109/ICIMTech.2018.8528187.
Rainfall-Enso Pattern In East Java Indonesia,” in 2018 Indonesian
[28] S. Shylesh, “A study of software development life cycle process
Association for Pattern Recognition International Conference
(INAPR), 2018, pp. 75–79. models,” in National Conference on Reinventing Opportunities in
Management, IT, and Social Sciences, 2017, pp. 534–541.
[14] R. E. Caraka, R. C. Chen, T. Toharudin, B. Pardamean, S. A. Bakar,
and H. Yasin, “Ramadhan short-term electric load: a hybrid model of [29] W. W. Royce, “Managing the development of large software systems:
cycle spinning wavelet and group method data handling (CSW- concepts and techniques,” in Proceedings of the 9th international
GMDH),” IAENG Int J Comput Sci, vol. 46, pp. 670–676, 2019. conference on Software Engineering, 1987, pp. 328–338.

[15] R. E. Caraka et al., “Employing Best Input SVR Robust Lost Function [30] I. Sommerville, Software Engineering (9th Edition). Boston: Pearson
Education, Inc, 2011.
with Nature-Inspired Metaheuristics in Wind Speed Energy
Forecasting,” IAENG Int. J. Comput. Sci, 2020. [31] C. Kusmana, C. W. Istomo, B. SW, S. IZ, and S. S. T Tiryana, “Manual
of mangrove silviculture in Indonesia,” Korea Int. Coop. Agency
[16] R. E. Caraka, R. C. Chen, T. Toharudin, M. Tahmid, B. Pardamean,
(KOICA), Jakarta, 2008.
and R. M. Putra, “Evaluation performance of SVR genetic algorithm
and hybrid PSO in rainfall forecasting,” ICIC Express Lett Part B Appl, [32] W. Giesen and A. Euroconsult, “Part of the project on promoting the
vol. 11, no. 7, pp. 631–639, 2020. river basin and ecosystem approach for sustainable management of SE
[17] T. W. Cenggoro, A. Budiarto, R. Rahutomo, and B. Pardamean, Asian lowland peat swamp forests: case study Air Hitam Laut river
basin,” ARCADIS Euroconsult, Arnhem, the Netherlands, p. 125, 2004.
“Information System Design for Deep Learning Based Plant Counting
Automation,” in 2018 Indonesian Association for Pattern Recognition [33] J. M. Saragih and others, “Pengelolaan Lahan Gambut di Perkebunan
International Conference (INAPR), 2018, pp. 329–332. Kelapa Sawit di Riau,” Bul. Agrohorti, vol. 4, no. 3, pp. 312–320, 2016.
[18] J. W. Baurley, A. Budiarto, M. F. Kacamarga, and B. Pardamean, “A [34] Redaksi, “Tata Air Lahan Gambut Untuk Manajemen Banjir Dan
web portal for rice crop improvements,” in Biotechnology: Concepts, Kekeringan Di Perkebunan Kelapa Sawit,” Majalah Sawit Indonesia,
Methodologies, Tools, and Applications, IGI Global, 2019, pp. 344– 2016. https://sawitindonesia.com/tata-air-lahan-gambut-untuk-
360. manajemen-banjir-dan-kekeringan-di-perkebunan-kelapa-sawit-
bagian-pertama-2/ (accessed Jul. 20, 2021).
[19] D. P. Putra, M. P. Bimantio, A. A. Sahfitra, T. Suparyanto, and B.
Pardamean, “Simulation of Availability and Loss of Nutrient Elements [35] A. Gupta, S. Kumawat, and S. Garg, “Automatic Plant Watering
in Land with Android-Based Fertilizing Applications,” 2020 System,” Imp. J. Interdiscip. Res., vol. 2, no. 4, 2016.
International Conference on Information Management and [36] S. Samsugi, Z. Mardiyansyah, and A. Nurkholis, “Sistem Pengontrol
Technology (ICIMTech), 2020, pp. 312–317. Irigasi Otomatis Menggunakan Mikrokontroler Arduino UNO,” J.
[20] A. Umami et al., “Application of expert system for oil palm Teknol. dan Sist. Tertanam, vol. 1, no. 1, pp. 17–22, 2020.

398 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

[37] B. D. Kumar, P. Srivastava, R. Agrawal, and V. Tiwari, 5654–5657, 2014.


“Microcontroller based Automatic Plant Irrigation System,” Int. Res. J. [40] K. H. Lim, S. S. Lim, F. Parish, and R. Suharto, “RSPO manual on best
Eng. Technol., vol. 4, no. 5, pp. 1436–1439, 2017. management practices (BMPs) for existing oil palm cultivation on
[38] P. Aruna, R. Ranjith Raja, N. Rajeshwaran, M. Vignesh, and M. peat.” RSPO Secretariat Sdn Bhd, Kuala Lumpur, 2014.
Vijayakumar, “Automatic Irrigation Control Using Wireless Sensor [41] H. Soewandita and others, “Kajian Pengelolaan Tata Air Dan
Network,” Int. Res. J. Eng. Technol., vol. 3, no. 2, pp. 1300–1302, Produktivitas Sawit Di Lahan Gambut (Studi Kasus: Lahan Gambut
2016. Perkebunan Sawit PT Jalin Vaneo di Kabupaten Kayong Utara,
[39] R. Suresh, S. Gopinath, K. Govindaraju, T. Devika, and N. S. Vanitha, Provinsi Kalimantan Barat),” J. Sains \& Teknol. Modif. Cuaca, vol.
“GSM based automated irrigation control using raingun irrigation 19, no. 1, pp. 41–50, 2018.
system,” Int. J. Adv. Res. Comput. Commun. Eng., vol. 3, no. 2, pp.

399 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

A Hydrodynamic Analysis of Water System in


Dadahup Swamp Irrigation Area
Sentot Purboseno Teddy Suparyanto Alam Ahmad Hidayat
Department of Agricultural Bioinformatics and Data Science Bioinformatics and Data Science
Engineering Research Center Research Center
Institute of Agriculture STIPER Bina Nusantara University Bina Nusantara University
Yogyakarta, Indonesia 55283 Jakarta Indonesia 11480 Jakarta, Indonesia 11480
[email protected] [email protected] [email protected]

Bens Pardamean
Bioinformatics and Data Science
Research Center,
Computer Science Department, BINUS
Graduate Program - Master of
Computer Science
Bina Nusantara University,
Jakarta, Indonesia 11480
[email protected]

Abstract—Dadahup Swamp Irrigation Area (DIR) in within the system as the water requires to flow in one
Kapuas Regency, Central Kalimantan is developed for direction. The two-way flow direction potentially causes the
agricultural activities to provide food security after the potentially toxic wastewater as the side product of the
pandemic. The water system consists of various channels, gates, agricultural activities that use chemical substances (e.g.,
and a flood embankment structure, which is also connected to fertilizer and pesticides) to get trapped in the water system and
three different rivers: Barito River, Mangkatif River, and affect the pH level of the water since the mass of water only
Kapuas Mumpung River. To quantify flood risk assessments in flows back and forth between the upstream and the
the area, we perform a hydrodynamic analysis using the downstream. The condition takes place due to the SPU as the
unsteady flow method via HEC-RAS software version 4.1.0. We
main channel that connects to the three rivers has different
use geometrical profiles from topography measurements and
the outer boundary of the Dadahup DIR as input for the
river flow directions simultaneously.
hydrologic model. Our results estimate the maximum surface To understand the mechanism of the dynamics of water
water level from both cross and longitudinal sections in the flow in DIR Dadahup and identify the root problems that arise
channels and find that several points near the flood from the complexity of the water system can be modeled using
embankment area are prone to flood events. Our modeling a hydrodynamic analysis. One of the very common methods
approach provides a preliminary assessment for the local in hydrodynamic analysis is to simulate the geometrical
government to formulate a flood mitigation plan and policy in
conditions and the hydraulic system of the water flow utilizing
the area.
an unsteady flow hydrodynamic analysis via HEC-RAS
Keywords—flood, hydrology, modeling, river, water system software [2]. This is a software package developed by
American Society of Civil Engineer (ASCE) which were
I. INTRODUCTION initially designed for one-dimensional simulation. However,
the current version, i.e., HEC-RAS ver. 5 and above, has now
The development of Food Estate (Lumbung Pangan)
included the capabilities for various two-dimensional
Swamp Irrigation Area (Daerah Irigasi Rawa (DIR))
analyses.
Dadahup, Kapuas Regency, Central Kalimantan, Indonesia is
expected to be able to tackle the lack of food security which II. LITERATUR REVIEW
has been predicted before and after the COVID-19 pandemic
[1]. One of the water systems developed to be grown in the The most prominent application of simulations via HEC-
DIR Dadahup has the main water resource from Barito River, RAS is to estimate disaster risks, for example, flood risk
containing the stream that flows through the main primary modeling and also identifications of flood risk areas. The
channel (Saluran Primer Utama (SPU)) into the system of diagnosis of both hazard and natural hazard via a modeling
DIR Dadahup. Along with the Barito River, there are two approach for risk assessments can provide a preliminary
other rivers: Kapuas Murung and Mangkatif that also description to plan a more accurate mitigation plan and policy
contribute to the interconnecting streams of water inside the for the local government [3], [4]. Besides, our previous studies
DIR system. show that the mathematical modeling methods can also be
used to assess social vulnerability in the area of the study,
The water surface movement in the Dadahup DIR system which is a key consideration for future disaster mitigation
as well as the flow direction affects the supply of water to policy [3], [5]. A study by Kamal et al. demonstrated using
crops that can impact the outcome of the agricultural activities HEC-RAS simulation along with GIS tool to perform
around the area. This is caused by whether the water surface spatiotemporal flood risk modeling in Toudgha River,
elevation level is far from the root zones of plants or the level Morroco [6]. Next, another study employed a hydrological
may be high enough to submerge the zones. Meanwhile, the model via two-dimensional HEC-RAS with GIS data to
flow direction can determine the quality of water transported

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

400 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

simulate the distribution of floods along with discharge and V1 as well as gate V face directly the Kapuas Murung
quantification with different return periods in Juwana River, River while gates P2, S2, and V2 face the Mangkatif River.
Indonesia [7]. Moreover, GIS data in the format of Furthermore, there are two gates in the supporting primary
Triangulated Irregular Network (TIN) for Digital Elevation channel, i.e., gates M1 and M2, which are located along the
Model (DEM) is integrated by Tunas et al. into a hydrologic path of the secondary channel from L1 to L2. In this study, we
model via HEC-RAS to simulate more accurate flood risk focus on analyzing the area of the water system from the
assessments in Lantikadigo River in Central Sulawesi, channel that links gate L1 to L2 to the channel from V1 to V2.
Indonesia [8]. Meanwhile, Prastica et al. performed flood
discharge analysis in Bengawan Solo River in Bojonegoro, To perform more precise hydrodynamic analyses, we first
Indonesia by comparing two different methods: Nakayasu need to examine the current condition of the macro water
synthetic unit hydrograph and HEC-RAS analysis [9]. On the system of DIR Dadahup as initial conditions for our modeling.
other hand, Suprayogi et al. showed that the simulation of the A.1. Water Gates
flood inundation with varying return periods identifies some The remaining structures of water gates in DIR Dadahup
flood risk parts of Winongo River, Yogyakarta that could be are not maintained properly, hence causing the water to enter
related to the land-use conversion in the urban areas [10]. The or leave the system unregulated. During the wet season, water
simulation from both methods identified that the can easily break into the system. On the other hand, water can
unaccommodated water loading capacity in the river causes also leak out of the system during the dry season, therefore the
the flood in Bojonegoro city. A study by Sholichin et al. used conservation of water mass within the system may be violated.
the software to characterize floods in the Ciliwung River in In addition, the channels near the gates are mostly covered by
terms of water level profile and the reduction of inundation water hyacinth and other plants that may potentially clog the
area [11]. water flow (see Fig. 2).
III. MATERIALS AND METHODS
A. The Existing Macro Water System
The water system of DIR Dadahup consists of a 28-km
SPU, a supporting primary channel (Saluran Primer
Pembantu) with a length of 72,9 km, an 82-km secondary
channel, and a 62,7-km collector channel. Moreover, DIR
Dadahup is surrounded by a flood embankment structure.
Barito River is located in the east part of the structure while
Mangkatif River is in the west part. Our study area is depicted
in Fig. 1.

Fig. 2. The location and the condition of gate L1

A.2. The Main Primary Channel (SPU)


The channel transports water from the Barito River to DIR
Dadahup and therefore is the main water source for the area.
The channel starts from point A in the Ranggailung DIR
system to point V in the Tabukan area. The water flow inside
the Ranggailung system from point A to F is regulated by the
SPU from A to E and leaves at point F. Meanwhile, point H to
L belongs to Janamas DIR system and the Dadahup system is
from L to V. Both systems are also regulated by the SPU. The
area of our study only covers the Dadahup area from L to V,
hence the channel connecting gate L1 to L2 which were
functioned as a drainage channel is now operated as a
supplementary channel to transport water through point L
from the Barito. The main water source of the SPU was
Fig. 1. The study area of DIR Dadahup initially obtained from the Barito in the Ranggailung system
(point A). However, because of heavy sedimentation in the
Several areas in the flood embankment experience a SPU of the Jamanas system, the water from the Ranggailung
condition that causes a decrease in the water level and also has is unable to run properly to point L (see Fig. 3). The condition
an open channel. This implies that when the Barito River of SPU from L to V is relatively better than that of SPU from
floods take place, a large amount of water can enter the water A to L.
system and potentially give rise to flood events in DIR
Dadahup. The floodwater from the Barito rive can also break
into DIR Dadahup via secondary channels which are
connected to the SPU. DIR Dadahup is equipped with gates to
regulate water flow, starting from gates positioned in the front
of the Barito River, which is called L1 and P1. Next, gates S1

401 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

A.3. Supporting Primary Channel (SPP)


SPP aims to supply fresh water from the Barito River, its
current condition is almost similar to the SPU: the area
contains a large number of sediments and is covered by
invasive plants that potentially clog the flow as well (see Fig.
5).

Fig. 3. The condition of the SPU in Janamas system

It is known that the water from the Barito River is typically


turbid and cloudy and has a brownish color. However, when
the water reaches point N, the water gets clearer (discolored).
This change is related to the sedimentation of suspended load
contained in the water of the Barito River. Two major factors
Fig. 5. The current condition of the SPP in Dadahup DIR
accelerate the sedimentation process: the availability of
coagulants and the low-velocity flow of water [12], [13].
A.4. The Secondary Channel
When the water level in Kapuas Murung River is high, a The condition of this channel is akin to SPU and SPP that
reverse flow of water happens from the south area (point V) also causes a clog for the influx and the outflux of water in
to the north (point L). Meanwhile, once the water level gets Dadahup DIR. The channel initially had two functional
low the Barito River flows from L to V. The condition runs purposes: supplying channel and drainage channel. Because
continuously and causes slack water that eventually decreases of the direct connection to the SPU, it is always assumed that
the pH level of water. both functional channels are the same.
The low pH level in the SPU gives rise to the faster A.5. Collector Channel
coagulation process of the sedimentation of the suspended
load since one of the major factors affecting the coagulation is The channel collects and contains the influx of water from
the acidity (low pH level) [14]. Besides, several areas have the drainage channel. The water can be transported to the
lower velocity flow due to the water moves back and forth Mangkatif, the Kapuas Murung, or even the Barito when the
from and to the same points. The measurement of velocity water level hits low in one or more rivers. The sediment
during the changes of flow direction is estimated to be condition in this channel is the same as the previous channel.
between 0.08 meter/second to 0,1 meter/second. This B. The Hydrodynamic Analysis of the Macro Water System
accelerates the sedimentation of flocs as a result of the The hydrodynamic analysis in this work is based on the
coagulation process at the bottom of the water channel. above description of the macro water system of Dadahup DIR.
Moreover, the bathymetry survey has shown that the thickness The analysis is performed by using HEC-RAS software
of the sediments is varying based on the flow velocity, the version 4.1.0. The goal of the modeling via HEC-RAS is to
turning points of the stream, and the pH level at the area. The estimate the water elevation level in correlation to the flood
thickest sediment found was in the area around point O. Point risk assessment. For the purpose of modeling we require
O is the closest point to point L. It is known that point L is the geometrical data of Dadahup DIR which were obtained from
main gate for water from the Barito to enter into the Dadahup measurements of topography and also the outer boundary of
system. Other points at which the thickness of the sediments the Dadahup system which is the Barito, the Kapuas Murung,
is above average are point Q dan S. At the two points the and the Mangkatif. The outer boundary for the Barito contains
current that flows back into point L always happens when the two points: gate L1 dan P1.
surface water level in the Kapuas Murung is relatively high.
The measurements of the bathymetry survey are visualized in
Fig. 4.

Fig. 4. The bathymetry of the main cross-section of SPU in Dadahup DIR

402 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

condition is known as slack water. From the research studies,


the pH level of slack water eventually hits lower if the
dynamics of the flow keep continuing the above condition.
During the wet season, the pH level of water for crop irrigation
can decrease to 3,2, which is way below the safe level for rice
cultivations.

Fig. 8. The longitudinal section of the water surface of the SPU fro point L
Fig. 6. The hydrograph of the Mangkatif at gate L2 dan P2 to V

We divided the Dadahup system into several segmented According to the above-simulated water movement, the
channels: SPP, SPU, the secondary channel (SK), the right water level in the right and left part of the flood embankment
collector channel (SKKa), and the left collector channel is never high and therefore it is safe from flood events.
(SKKb). From that, the total junctions in our model contain 48 However, SPU is always reported by the local people as the
points. source of floods for the crop area. Meanwhile, we observed
the cross-section of the SPU as shown in Fig. 8, the maximum
level of the surface water is estimated between 1,51 meters
above sea level (masl) to 2,38 masl. Therefore, considering the
elevation of crop area is between 1,1 masl to 1,5 masl, the
flood may occur if the water level rises beyond the elevation
of the area.

Fig. 9. The cross-section of water surface in the SPU

Fig. 7. The geometry of the Dadahup DIR

IV. RESULTS AND DISCUSSION B. The Secondary Channel P


A. The Main Primary Channel (SPU)
The result of our analysis of the existing condition of the
water system shows that the water flows in the SPU back and
forth from L (north) to V (south). It is observed from the
pattern of wave ripples on the water surface which is produced
when the two streams meet in opposite directions. The Barito
is unable to fully transport water to point V due to this flow
condition. It is well-known that the pH level in the Barito at
point L is about 5,5 – 6,5 and therefore the river is suitable for
the main water resource in the Dadahup system. However, as Fig. 10. The longitudinal section of water surface in channel P
the amount of water does not flow perfectly, the water at
points O and S only move back and forth between the upriver
(L) and downriver (V). This water with this hydrodynamic

403 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

The longitudinal section of the water surface in channel P


has the water elevation from around -2.5 m to 2.0 m (see Fig.
14). Meanwhile, the cross-section of the water surface in
channel P based on Fig. 15, the water elevation is estimated
between -1.4 m to 1.5 m.

E. The collector channel L1 – V1

Fig. 11. The cross-section of water surface in channel P

The longitudinal section of water surface in channel P as


shown in Fig. 10, the water elevation is estimated between -
1.4 m to 2.0 m. Meanwhile, the cross-section of the water
surface in channel P based on Fig. 11, the water elevation is
estimated between -1.4 m to 2.2 m.

C. Secondary Channel Q
Fig. 16. The longitudinal section of the collector channel

The simulation result for the collector channel next to both


the Barito River and Kapuas Murung River demonstrated that
several points in the embankment area around points M1 to
N1 are prone to flood events because of the maximum surface
water level there is higher than the structure itself (see Fig.
16).
F. The Collector Channel L2-V2
Fig. 12. The longitudinal section of water surface in channel Q

Fig. 13. The cross-section of water surface in channel Q


Fig. 17. The longitudinal section of water surface in channel L2 – V2
From Fig. 12 we estimated the water elevation of the
longitudinal section in channel Q is between -3 m to 1.7 m. The simulation in Fig, 17 in the collector channel next to
On the other hand, the water elevation in the cross-section is the Mangkatif showed that the height of the embankment
about -0.8 m to 1.5 m (see Fig. 13). structure is still below the estimated maximum water level in
the area. The area of the structure which is prone to floods is
D. Secondary Channel R located between Q2 – S2 and T2 – V2.
Finally, our hydrologic analysis can be improved to take
into account the weather and climate condition of the area as
long as agricultural implications for the food state in the
Dadahup DIR. Apart from understanding the flood risk
assessment in the area, utilizing predictive models to
understand the pattern of local climate data such as rainfall
prediction is very useful to predict crop productivity [15]–
[17]. This climate data analysis is formulated as a time series
Fig. 14. The longitudinal section of water surface in channel R analysis that finds much application in various fields and has
been used in our previous studies [18]–[20]. Predicting the
maximum water level and the area changes in inundation
parts that affect agricultural activities for different return
periods from our hydrologic model can be approached via
time series models. Furthermore, the efforts for achieving
food security in the food estate can be combined with
multimodal data such as the availability and loss of nutrients
in the crop area into the simulated spatiotemporal climate and
hydrological model [21]. Using our prototype of agricultural
Fig. 15. The longitudinal section of water surface in channel R

404 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

knowledge and information system for our past studies in oil [8] I. G. Tunas, Y. Arafat, and H. Azikin, “Integration of Digital
palm plantations [22]–[24], we can use that the collected data Elevation Model (DEM) and HEC-RAS Hydrodynamic Model for
for the complete modeling approach that includes hydrologic flood routing,” IOP Conf. Ser. Mater. Sci. Eng., vol. 620, p. 12026,
properties, crop properties, climate condition, and 2019, doi: 10.1088/1757-899x/620/1/012026.
socioeconomic condition can be used to plan accurate [9] R. M. S. Prastica, C. Maitri, A. Hermawan, P. C. Nugroho, D.
mitigation plans as well as quantifying the analysis of food
Sutjiningsih, and E. Anggraheni, “Estimating design flood and
security in the area of the DIR Dadahurip.
HEC-RAS modelling approach for flood analysis in Bojonegoro
V. CONCLUSION city,” IOP Conf. Ser. Mater. Sci. Eng., vol. 316, p. 12042, 2018,
doi: 10.1088/1757-899x/316/1/012042.
In this work, we established a hydrologic model to identify
risk factors related to the potential flood risk. We mapped the [10] S. Suprayogi, R. Latifah, and M. A. Marfai, “Preliminary Analysis
maximum water surface level due to water discharge in the of Floods Induced by Urban Development in Yogyakarta City,
area, given the initial condition of the current water system Indonesia,” Geogr. Tech., vol. 15, no. 2, pp. 57–71, 2020.
and also the flood embankment structure surrounded the [11] M. Sholichin, T. B. Prayogo, and M. Bisri, “Using HEC-RAS for
system. The water in the main primary channel (SPU) was analysis of flood characteristic in Ciliwung River, Indonesia,” IOP
found to be not fully flowed and only moved back and forth Conf. Ser. Earth Environ. Sci., vol. 344, p. 12011, 2019, doi:
between the upriver and the downriver, creating slack water 10.1088/1755-1315/344/1/012011.
which is dangerous for the crop. Besides, the relatively low
elevation of the crop area compared with the simulated [12] “Coagulation and Flocculation in Water and Wastewater
maximum water level might cause it to be one of the inundated Treatment.” https://www.iwapublishing.com/news/coagulation-
areas in Dadahup DIR. Other channels such as collector and-flocculation-water-and-wastewater-treatment (accessed Jul.
channels were estimated to have maximum water level beyond 29, 2021).
the height of the flood embankment, which is prone to floods. [13] “Sediment Transport and Deposition.”
The hydrologic model can be further integrated with modeling https://www.fondriest.com/environmental-
of cropping patterns [25], weather conditions [15], and also
measurements/parameters/hydrology/sediment-transport-
socioeconomic situation of the area [22] to provide a more
deposition/#std6a (accessed Jul. 29, 2021).
accurate mitigation plan and policy to protect food estate in
the Dadahup DIR. [14] R. S.W., B. Iswanto, and . W., “Pengaruh pH pada Proses
Koagulasi dengan Koagulan Aluminium Sulfat dan Ferri Klorida,”
Indones. J. Urban Environ. Technol., vol. 5, no. 2, 2009, doi:
REFERENCES 10.25105/urbanenvirotech.v5i2.676.

[1] “Peningkatan Penyediaan Pangan Nasional melalui Kawasan Food [15] R. E. Caraka et al., “Rainfall forecasting using PSPline and rice
Estate di Provinsi Kalimantan Tengah,” Kementerian Koordinator production with ocean-atmosphere interaction,” IOP Conf. Ser.
Bidang Perekonomian Republik Indonesia, 2021. Earth Environ. Sci., vol. 195, p. 12064, 2018, doi: 10.1088/1755-
1315/195/1/012064.
https://www.ekon.go.id/publikasi/detail/3118/peningkatan-
penyediaan-pangan-nasional-melalui-kawasan-food-estate-di- [16] R. E. Caraka, R. C. Chen, T. Toharudin, M. Tahmid, B.
provinsi-kalimantan-tengah (accessed Jul. 29, 2021). Pardamean, and R. M. Putra, “Evaluation performance of SVR
[2] “HEC-RAS.” https://www.hec.usace.army.mil/software/hec-ras/ genetic algorithm and hybrid PSO in rainfall forecasting,” ICIC
(accessed Jul. 13, 2021). Express Lett. Part B Appl., vol. 11, no. 7, pp. 631–639, 2020.

[3] R. E. Caraka et al., “Cluster around Latent Variable for [17] R. E. Caraka et al., “Generalized Spatio Temporal Autoregressive
Vulnerability towards Natural Hazards, Non-Natural Hazards, Rainfall-Enso Pattern In East Java Indonesia,” in 2018 Indonesian
Social Hazards in West Papua,” IEEE Access, vol. 9, no. January, Association for Pattern Recognition International Conference
pp. 1972–1986, 2021, doi: 10.1109/ACCESS.2020.3038883. (INAPR), 2018, pp. 75–79, doi: 10.1109/INAPR.2018.8627042.

[4] A. A. Hidayat and B. Pardamean, “Count time series modelling of [18] R. E. Caraka et al., “Employing Best Input SVR Robust Lost
Twitter data and topic modelling: A case of Indonesia flood Function with Nature-Inspired Metaheuristics in Wind Speed
events,” 2021 Int. Conf. Biosph. Harmon. Adv. Res., 2021. Energy Forecasting,” IAENG Int. J. Comput. Sci., vol. 47, no. 3,
2020.
[5] P. A. Kaban, R. Kurniawan, R. E. Caraka, B. Pardamean, B.
Yuniarto, and Sukim, “Biclustering Method to Capture the Spatial [19] R. E. Caraka, S. A. Bakar, B. Pardamean, and A. Budiarto, “Hybrid
Pattern and to Identify the Causes of Social Vulnerability in support vector regression in electric load during national holiday
Indonesia: A New Recommendation for Disaster Mitigation season,” in 2017 International Conference on Innovative and
Policy,” Procedia Comput. Sci., vol. 157, pp. 31–37, 2019, doi: Creative Information Technology (ICITech), 2017, pp. 1–6, doi:
https://doi.org/10.1016/j.procs.2019.08.138. 10.1109/INNOCIT.2017.8319127.

[6] E. Kamal, A. Ahmed, A. Abdelah, A. Mohamed, and F. [20] R. E. Caraka, R. C. Chen, T. Toharudin, B. Pardamean, S. A.
Abdelouhed, “Flood risk modelling using hydrologic data, Bakar, and H. Yasin, “Ramadhan Short-Term Electric Load: A
HECRAS and GIS tools: Case of Toudgha River (Tinghir, Hybrid Model of Cycle Spinning Wavelet and Group Method Data
Morocco),” Disaster Adv., vol. 13, no. 5, 2020. Handling (CSW-GMDH),” IAENG Int. J. Comput. Sci., vol. 46,
no. 4, 2019.
[7] R. R. Syafri, M. P. Hadi, and S. Suprayogi, “Hydrodynamic
Modelling of Juwana River Flooding Using HEC-RAS 2D,” IOP [21] D. P. Putra, M. P. Bimantio, A. A. Sahfitra, T. Suparyanto, and B.
Conf. Ser. Earth Environ. Sci., vol. 412, p. 12028, 2020, doi: Pardamean, “Simulation of Availability and Loss of Nutrient
10.1088/1755-1315/412/1/012028. Elements in Land with Android-Based Fertilizing Applications,”

405 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

in 2020 International Conference on Information Management 172–176, doi: 10.1109/INAPR.2018.8627034.


and Technology (ICIMTech), 2020, pp. 312–317, doi: [24] A. Umami et al., “Application of expert system for oil palm
10.1109/ICIMTech50083.2020.9211268. smallholder-owned plantations,” 2021 IOP Conf. Ser. Earth
[22] H. Soeparno, A. S. Perbangsa, and B. Pardamean, “Best Practices Environ. Sci., 2021.
of Agricultural Information System in the Context of Knowledge [25] R. E. Caraka et al., “Analysis of plant pattern using water balance
and Innovation,” in 2018 International Conference on Information and cimogram based on oldeman climate type,” IOP Conf. Ser.
Management and Technology (ICIMTech), 2018, pp. 489–494, Earth Environ. Sci., vol. 195, p. 12001, 2018, doi: 10.1088/1755-
doi: 10.1109/ICIMTech.2018.8528187. 1315/195/1/012001.
[23] A. Budiarto et al., “SMARTD Web-Based Monitoring and
Evaluation System,” in 2018 Indonesian Association for Pattern
Recognition International Conference (INAPR), Sep. 2018, pp.

406 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Spatiotemporal Features Learning from Song for


Emotions Recognition with Time Distributed CNN
Andry Chowanda
Computer Science Department
School of Computer Science
Bina Nusantara University
Jakarta, 11480 Indonesia
[email protected]

Abstract—Building a system that can naturally interact with during the interaction [4]. Moreover, displaying the correct
humans has been one of the ultimate goals for researchers emotions leads to the increase of the trust in the systems
in the computer science field. The system should be able to (e.g. virtual humans) [2], [5]. The affective game can also
interpret both verbal and non-verbal meanings from the messages
conveyed by the interlocutors. A song can also be a vehicle to be an example of automatic emotions recognition system
express a message to the listeners, and capturing the emotions implementation. With the emotions captured and recognised
from the song automatically can provide a system that can by the players, the game system can dynamically adjust the
have the digital feeling when they are listening to the song. game difficulty. The implementation of emotions recognition
Emotions can be automatically captured and processed through in games evidently enhance the players’ game experiences [6].
several modalities via sensors. Deep learning has been the
golden standard of learning architecture in many fields. The Moreover, by recognising emotions from songs, a sophisti-
emotions recognition model can be trained well with some of the cated and improved human-computer interaction system (e.g.
deep learning architectures. Convolution Neural Networks (CNN) virtual humans) can be designed. Virtual humans can have a
is famous to train models that have multi-dimensional input digital feeling when they are listening to songs. Hence they
features. However, it has a limitation when dealing with features can appreciate the songs not only from the meaning of the
that have temporal information. This research aims to use Time
Distributed layers to CNN architecture to learn Spatio-temporal lyrics but also from the emotions imbued in the songs.
features from the songs (audio signals). Eight architectures were Deep learning has been the golden standard of learning
proposed in this research to explore the potential of learning architecture in many fields. The emotions recognition model
Spatio-temporal features from songs with CNN architecture. The can be trained well with some of the deep learning architec-
best model presented in this paper achieved 99.95%, 93.41%, tures. Convolution Neural Networks (CNN) is famous to train
1.84, 2.03 in training accuracy, testing accuracy, training loss
and testing loss, respectively. models that have multi-dimensional input features. However,
Index Terms—Emotions Recognition, Audio Signals, Spatio- it has a limitation when dealing with input/features that have
Temporal Features, Deep Learning, Time Distributed temporal information. The limitation occurs when Recurrent
Neural Networks (RNN) architectures (e.g. Vanilla RNN,
I. I NTRODUCTION Long-Short Term Memory (LSTM) and Gated Recurrent Units
Emotions are an essential part of social interactions. They (GRU)) step in to deal with the temporal information. One
convey essential meanings as part of communication. Humans of the significant drawbacks of the RNN architectures is that
interact with each other via several communication methods, the training process requires enormous resources (e.g. memory
one of them is through songs. The singers communicate and time), as it is beyond the bounds of possibility to run
with the listeners through the songs. The singers express the process in parallel due to the nature of the recurrent
emotions and the other messages with the meaning of the process. This research aims to use Time Distributed layers
lyrics, intonation of the singers, and the harmonic tone of the to CNN architecture to learn Spatio-temporal features from
music [1]. To build a good system that can naturally interact songs (audio signals). Time Distributed Layers allow applying
with humans, both emotions and messages conveyed during the CNN layer (e.g. Convolution, Max Pooling, Activation,
the interaction should be captured, processed, and appropri- Flatten, Dense layers) to every temporal slice of the fea-
ately displayed to the human interlocutors [2], [3]. Emotions tures. Hence the Time Distributed layers allow the learning
can be automatically captured and processed through several architecture to learn Spatio-temporal features from the inputs.
modalities such as facial expressions, speech prosody and Eight architectures were proposed in this research to explore
body gesture via sensors such as microphones and cameras. the potential of learning Spatio-temporal features from songs
Automatic emotions recognition system can be implemented in with CNN architecture. The best model proposed in this paper
a wide variety of applications. Recognising emotions conveyed achieved 99.95%, 93.41%, 1.84, 2.03 in training accuracy, test-
by the human interlocutors provides the system with several ing accuracy, training loss and testing loss, respectively. The
interaction strategies. For example, the system can mirror rest of the paper is organised as follows: the following section
the human interlocutors’ emotions to provide more sympathy discusses the related works done in the emotions recognition

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

407 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

fields. The methodology proposed in this research is illustrated Modelling [5], [11]. Other researchers explores deep learning
in the Time Distributed Architecture section, and the results architectures to model the automatic Emotions Recognition,
are discussed in the Results and Discussions section. Finally, they are: Deep Neural Networks (DNN) [11], [12], Convolu-
the conclusion and future work direction are presented in the tional Neural Networks (CNN) [10], [12]–[14] and Recurrent
last section. Neural Networks (RNN) [12], [13], [15], [16]. The literature
demonstrates that the model trained with deep learning is
II. R ELATED W ORK superior to the one trained with conventional machine learning
A. Emotions Recognition algorithms.
Automatic emotions recognition systems can be imple- III. T IME D ISTRIBUTED A RCHITECTURE
mented in several systems such as virtual humans, games
and other affective systems. During the social interaction, the
interlocutors exchange social signals cues as the mean of
communication [7], [8]. Those signals can be automatically
captured, processed, interpreted and synthesised by using sen-
sors and several tools [7], [8]. Emotions can be captured from
facial cues (e.g. facial expressions), speech (voice prosody, the
meaning of the words), body gestures. Several emotional mod-
els have been used to help the researcher to classify emotions.
The example emotions classifications are the discrete basic
emotions such as basic emotions proposed by James, Ekman,
and Lazarus; and the dimensional models of emotion such as
the Circumplex model, the Vector model and the Plutchik’s
model. Machine learning techniques can be implemented to
train emotions recognition (e.g. classification) models with the
input of the signals captured. Most of the research implements
uni-modality (e.g. only using facial expressions or only using
voice prosody) for the learning input. There are also several
efforts done to use multi-modal inputs or features to increase
the training results. Literature shows multi-modal features Fig. 1. Research Methodology
provide more superior results compared to the one with uni-
modal features. However, there is still no golden standard Fig 1 demonstrates the proposed research methodology that
which combination of modality should be used to train the consists of five phases, and they are Dataset, Pre-Processing,
automatic emotions recognition model [7]. Feature Extraction, Model Training, and Model Evaluation.
The dataset used in this research is The Ryerson Audio-Visual
B. Emotions Recognition from Signals with Deep Learning Database of Emotional Speech and Song (RAVDESS) [17].
Deep learning architectures have been widely implemented The RAVDESS dataset consists of Audio and Visual data
to train the emotions recognition model from signals fea- of Speech and Song from 4 professional actors. The dataset
tures or inputs. The general pipeline of emotions recognition was annotated with six basic emotions (happy, sad, angry,
model training using deep or machine learning are data fear, surprise, and disgust) plus neutral (calm) for Speech
pre-processing, features extraction & dimension reduction, data and four basic emotions (happy, sad, angry and fear)
model training. Several techniques can be applied in the pre- plus neutral (calm) for Song data. The song dataset was
processing phase, such as framing & windowing, where the used in this research to model the emotions recognition from
signals were framed, segmented and windowed. The features the song (i.e. audio signals). Several preprocessing methods
are then extracted and selected using several techniques. Three were implemented to the dataset before the features extraction
general features can be implemented in this domain, namely: process took place. The label for each file was extracted
Prosodic, Spectral or combination. Prosodic features deal with from the audio data filename. Gender information can also be
the auditory quality of the signals. Some of the examples of extracted from the filename. There was 920 original audio data
Prosodic features are Entropy, Auto Correlation, Jitter, Shim- file used in this research. The data was loaded with the sample
mer, Pitch, Energy and Duration [9]. The Spectral features are rate of 16.0 kHz, and the maximal padding length for each
the frequency-based features, for example, The Mel Frequency audio set was 3.0 s. Moreover, the data was also augmented
Cepstrum Coefficients (MFCC) [10]. Finally, several machine with a Signal Noise ratio low set to 15 and Signal Noise ratio
learning algorithms and deep learning architectures can be im- high to 30. Fig 2 shows the sample of the original signal (left)
plemented to train the automatic emotions recognition model. and augmented signals with noise (right) of an angry female
Some researchers explore several machine learning to model singer. With the augmented data, there are a total of 2,392
the automatic Emotions Recognition, they are: Support Vector audio data where 2,208 (92.31%) were used for training and
Machine (SVM) [11], Logistic Regression [11], Tree-Based 184 (7,69%) used for testing. There are 444 neutral (calm),

408 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

450 happy, 456 sad, 435 angry, 423 fear audios in the training layers in the architecture with the drop out rate of p = 0.1.
data, 31 neutral (calm), 34 happy, 38 sad, 53 angry, 28 fear MODEL-4 and MODEL-6 implement dropout layers in all the
audios in the testing data. blocks of layers in the architecture with the drop out rate of
p = 0.3. Fig 4 illustrates the proposed Time Distributed CNN
TABLE I architectures. The architecture consists of a block of features
T HE E XPERIMENTS D ETAILS extraction layers (consists of Conv2D, Batch Norm, Exponen-
tial Linear Unit (ELU) activation function (see equation 1),
No Name Type N Frame Dropout
1 MODEL-1 Time Distributed 5 No Max Pooling, and Dropout Layers), a block of flattening and
2 MODEL-2 Time Distributed 5 No* dense layers (consists of 2 flatten layers and two dense layers)
3 MODEL-3 Time Distributed 5 0.1 and a Softmax activation layer. The architecture takes the input
4 MODEL-4 Time Distributed 5 0.3
5 MODEL-5 Normal 1 No
of temporal features and extracts Spatio-temporal features with
6 MODEL-6 Normal 1 No* six blocks of features extraction layers (with 64 filters in the
7 MODEL-7 Normal 1 0.1 first three blocks and 128 filters in the last three blocks).
8 MODEL-8 Normal 1 0.3 The activation function implemented in the features extraction
layers block was the ELU activation function (see equation 1).
The next step was to extract the features from the augmented The ELU can handle negative values and smooth slowly until
audio data. The Mel-Spectrogram representation was used the output x is equal to the value of −α.
in the features extraction from the audio file. The overview (
processes were to sample the features with a window of 256, x, x>0
hop length of 512 and then padded with zeros to match f (x) = x
(1)
α(e − 1), x ≤ 0
the value of the vector length subject to the Fast Fourier
Transform (FFT, i.e. 2048). The next step was to transform α
the time domain to the frequency domain for each window θt+1 = θt − √ m̂t (2)
v̂t + 
using FFT. The number of audio samples between adjacent
Short-time Fourier Transform (STFT) columns was set to The first flatten layer flatten the temporal layers (i.e. Time
128. Next, Mel frequencies were generated with the number Distributed Layers), and the second flatten layer make sure
of Mel bins of 128 and the maximum frequency of 4.0 all the features were flattened to 1x128. The features are then
KHz. For each window, the Mel-Spectrogram were generated dense to 128 and then 64 before being classified using the
in correspondence to the Mel frequencies. To generate the Softmax activation function. The learning architecture imple-
Spatio-temporal features, the features were framed with time ments ADAM optimiser (see equation 2) [18]. The update
distributed window, with window step of 128 and window size of the weight time t + 1 (θt+1 ) is from the weight time
of 64. Therefore, five windows were extracted from each audio t (θt ) and calculated with learning rate α, the exponential
data, resulted in a total of 920 testing data and 11,040 training moving average of the gradient ∇f (xt ) (v̂t ), Exponential
data (see Fig 4. Fig 3 demonstrates the Mel-Spectrogram Moving Average (EMA) of the gradient ∇f (xt ) (m̂t ) and
representation of random audio signals from the song with the regularisation value of  that prevents the value is being
Anger (above) and Sad (Below) emotions and Male (left) and dividing by zero. The hyper-parameters used in this research
Female (right) subjects. were the recommended hyper-parameters from Keras, they are:
Eight architectures were proposed and explored in this α = 0.001, β1 = 0.9, β2 = 0.999,  = 1e − 08, λ = 5e − 6 and
research in the Model Training phase. Table I illustrates the batch size of 64. The final step of this research was to evaluate
architectures proposed in this paper. MODEL-1 to MODEL- each model trained by the eight proposed architectures. The
4 were the proposed architectures that implement Distributed performance metrics used in this research were training accu-
Layers to learn Spatio-temporal features from songs (i.e. audio racy, testing accuracy, training loss, training loss and confusion
signals). MODEL-5 to MODEL-8 serve as the baseline archi- matrix for each class from the best model.
tectures in this research. MODEL-1 to MODEL-4 implements
IV. R ESULTS AND D ISCUSSION
Distributed Layers with five time frames from the features
with several settings of the number of dropouts. MODEL-5 From 2,392 audio data, a total of 2,208 (92.31%) were
to MODEL-8 take the form of a typical CNN architecture, used for training and 184 (7,69%) used for testing in eight
where no temporal information are learned from the features. architectures. There are 444 neutral (calm), 450 happy, 456
The dropout settings between the baseline architectures (i.e. sad, 435 angry, 423 fear audios in the training dataset.
MODEL-5 to MODEL-8) and the proposed architectures Moreover, there are 31 neutral (calm), 34 happy, 38 sad, 53
(Time Distributed CNN, i.e. MODEL-1 to MODEL-4) are angry, 28 fear audios in the testing dataset. The dataset was
identical. The no dropout is implemented in all layers for trained in the architectures with Google Colaboratory platform
architecture MODEL-1 and MODEL-5, while MODEL-2 and with NVIDIA K80, 12 GB GPU for a total of 125.1 minutes
MODEL-6, the dropout layer is only implemented after the (2.09 hours), excluding the exploratory process. Overall, the
block of layers with the drop out rate of p = 0.7. MODEL-3 architectures with Time Distributed layers implemented took
and MODEL-5 implement dropout layers in all the blocks of the longest time compared to the architectures without the

409 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Fig. 2. Sample of Original Signals (Left) vs. Noise Augmented Signals (Right)

Fig. 3. Log Mel Spectrogram of Anger (Above) and Sad (Below) Emotions with Male (Left) and Female (Right) Subjects

Time Distributed layers. The result was expected as the ar- 3, MODEL-4 respectively. Moreover, the maximum of 78
chitectures with Time Distributed layers implemented learned epochs (2.63 minutes), 51 epochs (1.733 minutes), 158 epochs
spatial features and temporal features. In total, 113.37 minutes (5.32 minutes), 58 (1.97 minutes) were achieved by MODEL-
(1.89 hours) were required to finish all the training processes 5, MODEL-6, MODEL-7, MODEL-8, respectively.
from architecture MODEL-1 to MODEL-4. Meanwhile only
took 11.65 minutes (0.195 hours) to complete all the training Fig 5 demonstrates the overall of the eight models’
process from architecture MODEL-5 to MODEL-8. The time (MODEL1 to MODEL-8) training and testing results. There
required stated in the paper is not including the features are two bars in every MODEL. The left bar illustrates the
extractions with the Mel-Spectrogram representation process model’s training accuracy, and the right bar shows the model’s
and exploratory process (e.g. trying out layers and explor- testing accuracy. The best training accuracy was achieved by
ing the hyper-parameters). The maximum epochs was set to several models (MODEL-1, MODEL-2 and MODEL-5 with
200, and an early stopping method was implemented to the the training accuracy score of 100%). In contrast, the best
training process resulting in 67 epochs (16.68 minutes), 70 testing accuracy was performed by the architecture with Time
epochs (17.53 minutes), 95 epochs (25.93 minutes), and 200 Distributed CNN and only implements the dropout layer in the
epochs (53.22 minutes) for MODEL-1, MODEL-2, MODEL- last block (dropout rate = 0.7), MODEL-2 (93.40%). MODEL-
1 and MODEL-5 demonstrate a relatively big gap between

410 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Fig. 6. Accuracy - MODEL-2

Fig. 4. Proposed Architecture

Fig. 5. Overall Results

their training accuracy and testing accuracy, indicating the Fig. 7. Loss - MODEL-2
models trained were over-fitted. In general, all the models
trained with the architectures without Time Distributed layers
implemented demonstrate relatively good training accuracy problems than the one trained with MODEL-1. Fig 6 illustrates
score but eminently low in the testing accuracy score. Albeit the training and testing accuracy history of MODEL-2, and
the best training accuracy score of the models trained with Fig 7 demonstrates the training and testing loss history of
the architectures without Time Distributed layers was 1, the MODEL-2. The training process was automatically stopped at
best testing accuracy score of the models trained with the ar- 70 epochs by the early stopping mechanism set in the training
chitectures without Time Distributed layers was only 69.60%. process. The training started with 27.99% of training accuracy,
The results would indicate that all the models trained with 19.02% of testing accuracy, 2.03 of training loss, 1.84 of
the architectures without Time Distributed layers were highly testing loss. It stopped with 100% of training accuracy, 93.40%
over-fitted. of testing accuracy, 0.005 of training loss, 0.23 of testing loss.
The results in Fig 5 indicate that MODEL-1 and MODEL- Moreover, the model achieved the maximum values of 100%,
2 provide the best model to recognise emotions from a song 93.40%, 0.004 and 0.22 for training accuracy, testing accuracy,
(i.e. audio signals). Despite having an identical training score training loss, and testing loss, respectively. Fig 8 demonstrates
with MODEL-2, the model trained with MODEL-1 setting the confusion matrix for each class of model trained with
has a lower testing accuracy score (88.0%). Hence, the model the MODEL-2 settings during the testing phase. The Y-axis
trained with the MODEL-2 setting provides less over-fitting indicates the predicted classes, while the X-axis indicates the

411 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

actual classes. The labels for each class are Neutral (Calm), and text classification (Natural Language Processing). Finally,
Happy, Sad, Angry, and Fear (Fearful) for 0, 1, 2, 3, 4 labels in emotions recognition models can be implemented in several
the X-axis, respectively. The matrix indicates relatively good affective systems such as virtual humans, games and others.
results for each class in the model trained with the MODEL-2
R EFERENCES
setting.
[1] A. M. Proverbio, E. Camporeale, and A. Brusa, “Multimodal recognition
of emotions in music and facial expressions,” Frontiers in human
neuroscience, vol. 14, p. 32, 2020.
[2] A. Chowanda, P. Blanchfield, M. Flintham, and M. Valstar, “Erisa:
Building emotionally realistic social game-agents companions,” in In-
ternational conference on intelligent virtual agents. Springer, 2014,
pp. 134–143.
[3] D. Suryani, V. Ekaputra, and A. Chowanda, “Multi-modal asian conver-
sation mobile video dataset for recognition task,” International Journal
of Electrical and Computer Engineering (IJECE), vol. 8, no. 5, pp.
4042–4046, 2018.
[4] E. Lieskovská, M. Jakubec, R. Jarina, and M. Chmulı́k, “A review on
speech emotion recognition using deep learning and attention mecha-
nism,” Electronics, vol. 10, no. 10, p. 1163, 2021.
[5] R. Sutoyo, A. Chowanda, A. Kurniati, and R. Wongso, “Designing an
emotionally realistic chatbot framework to enhance its believability with
aiml and information states,” Procedia Computer Science, vol. 157, pp.
621–628, 2019.
[6] M. T. Akbar, M. N. Ilmi, I. V. Rumayar, J. Moniaga, T.-K. Chen,
and A. Chowanda, “Enhancing game experience with facial expression
Fig. 8. Confusion Matrix - MODEL-2 recognition as dynamic balancing,” Procedia Computer Science, vol.
157, pp. 388–395, 2019.
[7] A. Vinciarelli, M. Pantic, D. Heylen, C. Pelachaud, I. Poggi, F. D’Errico,
and M. Schroeder, “Bridging the gap between social animal and unsocial
V. C ONCLUSION AND F UTURE W ORK machine: A survey of social signal processing,” IEEE Transactions on
Affective Computing, vol. 3, no. 1, pp. 69–87, 2011.
Eights architectures were proposed, explored, and eval- [8] J. Wagner, F. Lingenfelser, T. Baur, I. Damian, F. Kistler, and E. André,
uated using the Ryerson Audio-Visual Database of Emo- “The social signal interpretation (ssi) framework: multimodal signal
tional Speech and Song (RAVDESS) dataset. The dataset was processing and recognition in real-time,” in Proceedings of the 21st ACM
international conference on Multimedia, 2013, pp. 831–834.
augmented, and the features were extracted from the audio [9] H. K. Palo and M. N. Mohanty, “Analysis of speech emotions using
signals. The Mel-Spectrogram representation was used in the dynamics of prosodic parameters,” in Cognitive informatics and soft
features extraction from the audio file. Time Distributed CNN computing. Springer, 2020, pp. 333–340.
[10] M. D. Pawar and R. D. Kokate, “Convolution neural network based
architecture was used to learn Spatio-temporal features from automatic speech emotion recognition using mel-frequency cepstrum
songs (audio signals). Time Distributed Layers allow to apply coefficients,” Multimedia Tools and Applications, vol. 80, no. 10, pp.
of the CNN layer. The Time Distributed layers will enable the 15 563–15 587, 2021.
[11] A. Chowanda, R. Sutoyo, S. Tanachutiwat et al., “Exploring text-
learning architecture to learn Spatio-temporal features from based emotions recognition machine learning techniques on social media
the inputs. The results showed that the architectures with conversation,” Procedia Computer Science, vol. 179, pp. 821–828, 2021.
Distributed Layers implemented performed more superior than [12] Z. Yao, Z. Wang, W. Liu, Y. Liu, and J. Pan, “Speech emotion
recognition using fusion of three multi-task learning-based classifiers:
those without the Distributed Layers used. In general, the Hsf-dnn, ms-cnn and lld-rnn,” Speech Communication, vol. 120, pp.
models trained with CNN architectures without the Distributed 11–19, 2020.
Layers indicate that the models were remarkably over-fitted. [13] U. Kumaran, S. R. Rammohan, S. M. Nagarajan, and A. Prathik,
“Fusion of mel and gammatone frequency cepstral coefficients for
Albeit the models trained with CNN architectures with the speech emotion recognition using deep c-rnn,” International Journal of
Distributed Layers implemented require more computational Speech Technology, vol. 24, no. 2, pp. 303–314, 2021.
power and time, the overall models demonstrate superior [14] T. Anvarjon, S. Kwon et al., “Deep-net: A lightweight cnn-based speech
emotion recognition system using deep frequency features,” Sensors,
results. The best result was achieved by a model trained with vol. 20, no. 18, p. 5212, 2020.
MODEL-2 settings, where it attains the maximum values of [15] S. Mirsamadi, E. Barsoum, and C. Zhang, “Automatic speech emotion
100%, 93.40%, 0.004 and 0.22 for training accuracy, testing recognition using recurrent neural networks with local attention,” in
2017 IEEE International Conference on Acoustics, Speech and Signal
accuracy, training loss, and testing loss, respectively. Processing (ICASSP). IEEE, 2017, pp. 2227–2231.
There are several experiments done in the next research. For [16] S. Yang, Z. Gong, K. Ye, Y. Wei, Z. Huang, and Z. Huang, “Edgernn:
future direction research, a combination of the architectures a compact speech recognition network with spatio-temporal features for
edge computing,” IEEE Access, vol. 8, pp. 81 468–81 478, 2020.
can be explored. For example: the Distributed Layers CNN and [17] S. R. Livingstone and F. A. Russo, “The ryerson audio-visual database
RNN (e.g. LSTM) can be combined to obtain better results. of emotional speech and song (ravdess): A dynamic, multimodal set
Moreover, attention models such as Transformer architecture of facial and vocal expressions in north american english,” PloS one,
vol. 13, no. 5, p. e0196391, 2018.
also can be explored to increase the results. More variation of [18] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
the dataset can be used in the next research experiments. More- arXiv preprint arXiv:1412.6980, 2014.
over, the architectures should be quite general and be imple-
mented in other areas or problems, such as facial expressions
recognition (visual), activity recognition (signals processing),

412 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

AR-Mart: The Implementation of Augmented


Reality as a Smart Self-Service Cashier in the
Pandemic Era
2nd Gusti Pangestu
Computer Science Department
1st Chasandra Puspitasari School of Computer Science 3rd Anita Rahayu
Computer Science Department Bina Nusantara University Computer Science Department
School of Computer Science Jakarta, Indonesia School of Computer Science
Bina Nusantara University [email protected] Bina Nusantara University
Jakarta, Indonesia Jakarta, Indonesia
[email protected] 4th Bening Insaniyah Al-Abdillah [email protected]
Computer Science Department
School of Computer Science
Bina Nusantara University
Jakarta, Indonesia
[email protected]

Abstract—Considering the recent booming of cashier-less checkout concept has boomed abroad, especially in countries
checkout technology and the future trend of increasing self-service with a capable IoT ecosystem, in Indonesia this concept is still
and contactless conditions due to Covid-19, it is necessary to find an in the early stages of application and still requires a lot of trial
alternative checkout concept suitable for local retail conditions. and error before it can be implemented en masse, especially to
Using augmented reality, the aim of this study is to compare the avoid buyers running away without paying.
proposed method with the conventional method of cashier checkout
using barcode scanner. The method used in this study is marker- 2020 is a tough year for shop owners and self-
based tracking, where the marker is an image file which will be service/franchise entrepreneurs who provide in-store-based
uploaded to Vuforia SDK Kit. The result of the proposed method for shopping needs. The existence of the corona virus (Covid-19)
AR-Mart as a smart cashier is faster, more accurate and can reduces requires that the shopping process is not carried out through
the duration for cashier checkout significantly. touch to prevent transmission of the virus. Various
disinfectant facilities are provided in front of the store as well
Keywords—Augmented Reality, Vuforia, AR-Mart, Smart as inside the store and arrangements are made to reduce
Cashier contact between buyers, shop assistants and between buyers
I. INTRODUCTION themselves. This clearly indicates the need for accelerating
new adaptations in the in-store shopping process and
The retail world, especially supermarkets, both large and integrating new shopping concepts.
small, has stagnated for decades in the service sector and in-
store shopping experience [1]. Smart technologies (e.g., smart Technology is persevering with to develop at a remarkable
devices / mobile apps) have become an integral part of modern rate. Augmented reality (AR) has emerged as a brand new
lifestyles and consumer practices of people, driving a global generation to be had to outlets to have interaction with clients
transformation of the business environment. Among the in a completely unique and brilliant way [5][6]. Reflecting on
industries hardest hit is global retail [2][3]. It is not uncommon the cashier less checkout technology which has boomed in
for large supermarkets to pile up queues of buyers who want recent times and the future trend of increasingly self-service
to pay for their groceries. The long queue is often one of the and non-touch conditions due to Covid-19, it is necessary to
causes of the decline in consumer shopping satisfaction. In look for alternative checkout concepts that are adapted to local
fact, retailers and supermarket franchise owners have been retail conditions. Some of the practical requirements that need
trying to reduce queues and shorten check-out times for a long to be met include being easy to use, implemented and can be
time, but a truly practical shopping experience has only been operated not only at large retailers/supermarket franchises, but
widely introduced in recent years with the help of technology also in small and medium-sized shops and booths in shopping
(cashier less checkout). centers [1]. This is important considering that it takes a large
amount of capital to equip local shops/supermarkets with
Since its launch in 2018, Amazon Go has made the various sensors and cameras. However, the fundamental
concept of cashier less checkout even more recognizable and transformation of the retail environment brought about by AR
provides a new shopping experience [4]. This shopping application technology and its increasing popularity among
concept is not a completely new concept because for the past customers has accelerated the need for retail brands to better
few years there has been a lot of discussion about this concept understand the impact of AR applications on consumer
as an answer to the stagnation of in-store shopping experience behavior [2].
and long queues at checkout. This concept is basically a
combination of sensor and camera technology placed in The basic idea for using AR-based technology is that
various corners of stores and grocery shelves, artificial buyers simply put their groceries on the checkout counter and
intelligence, and various smartphone/device applications as after being scanned by a camera at the payment desk, the
well as a combination of other technologies that ultimately system will automatically calculate the total to be paid. In
support the practicality of shopping. Although the cashier less addition, the existence of AR makes buyers and cashiers able
to see the details of the prices of goods purchased and can

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

413 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

correct if there are errors in price labeling or the prices listed codes are required to recognize these images. The Vuforia
are not appropriate. Then, with the development kit available SDK uses a variety of algorithms to identify and track the
on applications that are easy to access and use, buyers and features present in the image. These features are detected by
retailers can shorten the time for calculating costs because the comparing them with known objects in the database. Once
calculation process is carried out in real time and in parallel to detected, Vuforia will track the image in the camera's field of
reduce the potential for queues and minimize touch/contact view.
during this pandemic. On the other hand, the opportunity to
develop the use of AR is still wide open in terms of improving Vuforia is an Augmented Reality Software Development
the quality of shopping experience and user experience when Kit (SDK) for mobile devices that enables the creation of AR
shopping in-store [7]. Thus, there are still many potentials uses applications [11][12]. Vuforia SDK is also available to be
of AR in the world of shopping to improve the quality of combined with unity, namely Vuforia AR Extension for Unity
shopping which can still be researched and developed in the Vuforia is an SDK provided by Qualcomm to help developers
future. create Augmented Reality (AR) applications on mobile
phones (iOS, Android). The Vuforia SDK has been
II. LITERATURE REVIEW successfully used in several mobile applications for both
platforms [11]. AR Vuforia provides a way of interacting that
A. Augmented Reality uses a mobile phone camera to be used as an input device, as
Research related to Augmented Reality (AR) technology an electronic eye that recognizes certain markers, so that on
is in great demand in various parts of the world in recent years. the screen a combination of the real world and the world
AR technology has been applied in many fields, including drawn by the application can be displayed.
tourism, arts, commerce, manufacturing industry, health,
C. Marker-based Tracking
entertainment, and education [8]. For example, the application
of AR technology in tourist attractions can introduce historical Marker-based Augmented Reality, also known as marker-
sites only by using cellphone cameras, screen software, and based tracking, is a type of Augmented Reality that recognizes
other technological means to integrate the appearance of markers and identifies the pattern of the marker to add a virtual
actual historical sites. In addition to seeing the original object to the real environment [13]. The marker is a black and
appearance, additional information can also be obtained using white square illustration with bold black sides, a black pattern
this technology. in the center of the square and a white background.
In the retail sector, AR technology allows consumers to be Algorithms of AR are classified into two methods. One
able to view all information about products without opening method is marker-based AR method which uses artificial
the packaging. By scanning the image or QR code on the markers, the other is marker less AR method which uses
product, consumers can not only see some information about natural features instead of artificial markers [14]. The marker-
the product, but also get other image information about the based method uses markers as special markers that have a
product, how to use it, to advertisements that are relevant to special pattern so that when the camera detects a marker,
the product [7][9]. The use of AR in the retail world is three-dimensional objects can be displayed. The virtual
currently very diverse and has targeted various industries coordinate point on the marker serves to determine the
ranging from food and beverage, clothing, footwear, furniture position of the virtual object to be added to the real
including eyewear manufacturers. Given an example that environment. The position of the virtual object will be
leading cosmetic manufacturers such as L’Oréal and Sephora perpendicular to the marker. The virtual object will stand in
have used AR to create augmented simulations of the use of line with the Z-axis and perpendicular to the X-axis (right or
their make-up products [6]. IKEA has also done the same left) and Y-axis (front or back) of the virtual marker's
thing by creating an AR application based on their product coordinates. The illustration of the virtual marker coordinates
catalog so that potential customers can simulate the placement can be seen in Figure 1.
of furniture in their room virtually [2][9].
III. METHODOLOGY
In the context of simplifying cashier payments, the Augmented reality or AR is a display technology that uses
opportunity to use AR is in line with the self-checkout model, computers or smart phones to process digital content
cashier less checkout, or more widely known as self-service superimposed on the real environment, such as 2D images, 3D
technology. These two concepts put forward the lack of models, sounds, videos, texts, etc. The viewing angle depends
interaction between shopkeepers and consumers so that it has on the location of the device [14]. Therefore, in order to
an impact on a more practical shopping experience and improve the user experience, two components need to be
shortens queue times. These two concepts are applied to self- considered, namely the detection part and the display part
checkout cashiers using self-checkout machines which can be [15].
computers, self-scanning devices (handheld barcode
scanners), or smartphones. Various applications such as The method used in this study is marker-based tracking,
mobile wallets, optimizing checkout systems using AI, service where the marker is an image file with .JPG extension which
robots, machine learning, augmented reality and various other will be uploaded to Vuforia. Markers are important in AR
automations are also increasingly being used in retail today technology, because markers act as triggers that will be
[10]. Especially, currently during the Covid-19 pandemic, recognized by the camera to run AR applications. Markers that
retailers need to adapt to avoid too many touches in the have been uploaded will be assessed for quality by the Vuforia
checkout area to minimize the risk of infection. system. Vuforia is an AR Software Development Kit (SDK)
for mobile devices that enables the creation of AR-Mart
B. Image Target Detection applications.
Image target is an image that can be identified and tracked
by the Vuforia SDK. No special black and white areas or

414 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Fig. 1. Illustration of Marker and Coordinates

Fig. 4. Display of marker detection results on the screen

B. Define the Object of each Marker


Marker-based AR requires a marker to activate
augmentation. Markers are clear patterns that cameras can
easily recognize and process and are visually independent of
the environment around them; they can be paper objects or
physical objects that exist in the real world. Maker based AR
works by scanning a tag that causes an enhanced experience
(object, text, video, or animation) to appear on the device. It
usually requires software in the form of an application that
Fig. 2. Conceptual Framework Diagram allows users to scan tags from their device with a camera.
When the AR tracking system recognizes the marker, it The AR-Mart prototype that we propose uses 2D objects.
knows the location of the virtual world in the real world and It requires a marker or marker to be able to bring up objects
what functions will be used at that location [14]. With the into the AR camera that can be applied using a smartphone
Unity 3D game engine, any media or game can be created in camera. The created object will appear on the AR camera
the virtual world at a location controlled by markers. Since the when the marker is recognized. Each loaded object will vary
scene in Unity3D has the same orientation as the scene in the based on the marker processed. Then the process of giving
real world, i.e., X-axis is a width (horizon), Y- axis is a height material to the previously modeled object is carried out so that
(vertical), and Z-axis is a depth, then move or rotate the a real impression will appear. Giving material to the object
camera around the markers, the new position can be will define the color of the intended object.
calculated. In this study, we want to get the size and shape of C. Method on tracking found
the object, and then we want to find the position of each mark
around the object. The shape of the object can be calculated At this stage, the object detection process is carried out by
using the AR technology of the virtual world. There are 4 step the camera. In this study, a smartphone with a minimum 5-
concepts that generally describe the process from marker megapixel camera is used, so that objects can be captured and
detection to the formation of the total price according to the detected clearly. Objects will be detected if the appearance of
detected marker. The conceptual framework diagram can be the object is the same as in the database. In Figure 4, it can be
seen in Figure 2. seen two objects caught by the camera and display the price of
each object. The price of each object is marked with a number
A. Marker Database Training in rupiah denominations with yellow color writing, and the
The database training was conducted to determine the key- green color is the amount or total of the total price. In this
point of each marker. The result of this process is the rating study, we used the detection process provided by Vuforia.
information marked with stars sign as in Figure 3. The higher if(mTrackableBehaviour.TrackableName=="twisko"){
rating indicates that the marker more unique / different from arlist.Add(2000);
the others, so as to reduce the potential for error detection. count.tambah(2000);
Figure 3 shows that there are 5 marker data used for testing in }
else
this study. Each marker has one type of image, each of which if(mTrackableBehaviour.TrackableName=="biskuat"){
has its own uniqueness and characteristics. It can be seen, that arlist.Add(5000);
4 of the 5 markers have a maximum rating of 5 stars. This count.tambah(5000);
}
indicates that the marker is very nice and unique, so it is less else if(mTrackableBehaviour.TrackableName=="lays"){
likely to have error detection. count.tambah(10000);
}
else if(mTrackableBehaviour.TrackableName=="oreo"){
count.tambah(9000);
}
else if(mTrackableBehaviour.TrackableName=="jetz"){
count.tambah(2000);
}
else{
arlist.Add(0);
}
int tot = count.hitungTotal();
Debug.Log(tot);
string tottext = tot.ToString();
count.changeText(tottext);

Fig. 3. Rating result of each marker


Fig. 5. Method OnTrackingFound() Script

415 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Based on the script in Figure 5, mTrackableBehaviour B. Detection Marker Accuracy


attribute is an object of the TrackableBehaviour class. The In the second test, the accuracy of the detected object is
object is used to get the important attributes of the high even though the object is randomly positioned. This can
TrackableBehaviour class, namely the TrackableBehaviour be seen in the table II that all the object in all position can still
attribute in the form of a string. The main method used in this be detected accurately, and the total price (green font) are
implementation is OnTrackingFound(). OnTrackingFound() showing the same result. To maximize the accuracy, further
is a method that is executed when the application detects a study is required especially in detecting the smaller object and
marker. It is in this method that the process of matching the remove the white boundary of the object. Also, this study is
TrackableName attribute with the marker name attribute is still limited in two-dimensional image marker and have not
carried out. use real object.
D. Creating Method for Total Calculation
TABLE II. TABLE TYPE STYLES
At this stage, a method is created to calculate the total price
that needs to be paid by the customer. Figure 4 shows that the Positio Number of
Marker position
prices for each object have been added up and the total price n detected markers
will automatically appear on the screen marked with a green
number on the left of the screen. In this method, the
countTotal() attribute is used to get the total amount.
1 5
Determining this total calculation can be done
automatically in one step when the AR-Mart application is
run. If observed and implemented, this proposed method can
make it easier and faster for buyers to get the total price of
goods purchased at retail stores. AR-Mart is much easier and
faster than the barcode scanner method which is still widely 2 5
used in retail stores, such as supermarkets, which causes a lot
of queues at the cashier because it requires scanning items one
by one. Of course, this is not good because in a pandemic,
people need to buy basic food for daily needs safely and
according to health protocols. Then, to support this, AR-Mart 3 5
is able to be a solution to avoid long queues and an easy way
of implementation.
IV. RESULT AND DISCUSSION
The implementation of prototyping AR-Mart application
is using Unity 2018.2 software as the engine and Visual Studio 4 5
2017 as IDE for developing the program in C# language. This
application still not final and it will further be developed for
the next study. In testing process, two scenarios were used,
namely duration testing for marker detection and detection
marker accuracy when the marker are set to random position.
5 5
A. Duration of marker detection
In this test, we compare the length of time required for
marker detection from the proposed method and barcode
detection using a scanner that has been widely used in retail
stores or supermarkets. The test results can be seen in Table 1. V. CONCLUSION
Table 1 shows the results differ significantly. By using the Based on the results shown in table I and II it can be
method proposed in this study, the time needed to get the total concluded that this method is faster and more accurate
price is shorter and more effective. This means that all markers compared to the conventional method of barcode scanning.
captured simultaneously by the camera can be directly Furthermore, the Augmented Reality (AR) that is used in this
calculated to get the total price. Meanwhile, if using study using Unity engine is easy to implement, and the
conventional methods, barcode scans must be done one by one
platform for development is mature as an application
and it takes time.
developing tool. It also does not require many sensors/
TABLE I. THE TESTING RESULTS OF TIME MARKER DETECTION
cameras in its operation. This AR-Mart solution can be very
helpful for retailer and customer, especially in this era of
Number of Covid-19 pandemic when indoor activities including in retail,
Proposed Method Barcode Scanner
markers store or supermarket should be reduced significantly or
1 1,006.81 milliseconds 1,587.34 milliseconds limited. This proposed method can help reduce the time
needed for checking-out in the self-service cashier and avoid
2 1,103.24 milliseconds 4,693,27 milliseconds direct contact. Furthermore, when combined with smart
3 1,189.58 milliseconds 7,402.92 milliseconds cashier (cashless cashier), the time used for queue and
4 1,361.63 milliseconds 11,945.65 milliseconds checking-out will be significantly reduced.
5 1,497.47 milliseconds 15,847.26 milliseconds

416 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

VI. ACKNOWLEDGMENT [7] A. Poushneh, “Augmented reality in retail: A trade-off between user’s
control of access to personal information and augmentation quality,”
This work is supported by Research and Technology J. Retail. Consum. Serv., vol. 41, no. October 2017, pp. 169–176,
Transfer Office, Bina Nusantara University as a part of 2018.
Penelitian Terapan Binus entitled “Design and Prototyping [8] Y. Chen, Q. Wang, H. Chen, X. Song, H. Tang, and M. Tian, “An
AR-Mart sebagai Solusi Digitalisasi Swalayan Berbasis overview of augmented reality technology,” J. Phys. Conf. Ser., vol.
Teknologi Augmented Reality (AR)” with contract number: 1237, no. 2, 2019.
No.018/VR.RTT/III/2021 and contract date: March 22, 2021. [9] P. Ashok Kumar and R. Murugavel, “Prospects of augmented reality
in physical stores’s using shopping assistance app,” Procedia Comput.
Sci., vol. 172, pp. 406–411, 2020.
REFERENCES
[10] P. Sharma, A. Ueno, and R. Kingshott, “Self-service technology in
[1] V. Shankar et al., “How Technology is Changing Retail,” J. Retail., supermarkets – Do frontline staff still matter?,” J. Retail. Consum.
2020. Serv., vol. 59, no. October 2020, 2021.
[2] S. R. Nikhashemi, H. H. Knight, K. Nusair, and C. B. Liat, [11] F. Bellalouna, “The Augmented Reality Technology as Enabler for
“Augmented reality in smart retailing: A (n) (A) Symmetric Approach the Digitization of Industrial Business Processes: Case Studies,”
to continuous intention to use retail brands’ mobile AR apps,” J. Procedia CIRP, vol. 98, pp. 400–405, 2021.
Retail. Consum. Serv., vol. 60, no. June 2020, p. 102464, 2021.
[12] H. Pranoto and F. M. Panggabean, “Increase the interest in learning
[3] V. Shankar, “How Artificial Intelligence (AI) is Reshaping Retailing,” by implementing augmented reality: Case studies studying rail
J. Retail., vol. 94, no. 4, pp. vi–xi, 2018. transportation.,” Procedia Comput. Sci., vol. 157, pp. 506–513, 2019.
[4] J. Leng, Y. Feng, J. Wu, and J. Li, “Overview of cashier-free stores [13] Sanni Siltanen, Theory and applications of marker-based augmented
and a virtual simulator,” ICIT Int. Conf. Proceeding Ser., pp. 393– reality. Julaisija - Utgivare, 2012.
399, 2019.
[14] C. Kaewrat and P. Boonbrahm, “Identify the object’s shape using
[5] G. McLean and A. Wilson, “Shopping in the digital world: Examining augmented reality marker-based technique,” Int. J. Adv. Sci. Eng. Inf.
customer engagement through augmented reality mobile Technol., vol. 9, no. 6, pp. 2193–2200, 2019.
applications,” Comput. Human Behav., vol. 101, no. July, pp. 210–
[15] P. A. Rauschnabel, “Augmented reality is eating the real-world! The
224, 2019.
substitution of physical products by holograms,” Int. J. Inf. Manage.,
[6] M. Y. C. Yim, S. C. Chu, and P. L. Sauer, “Is Augmented Reality vol. 57, no. October 2020, p. 102279, 2021.
Technology an Effective Tool for E-commerce? An Interactivity and
Vividness Perspective,” J. Interact. Mark., vol. 39, no. August, pp.
89–103, 2017.

417 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Immersive Experience with Non-Player Characters


Dynamic Dialogue

Muhammad Fikri Hasani Yogi Udjaja*


Computer Science Deparment Computer Science Deparment
School of Computer Science School of Computer Science
Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia 11480 Jakarta, Indonesia 11480
[email protected] [email protected]
[email protected]

Abstract—Non-Player Character (NPC) is one of the non-task-oriented [13]. Task-oriented is a chatbot that talks
important elements in the game, because NPCs can liven up the with humans to achieve a task, for example, FAQ chatbot,
atmosphere in the game by means of intense interaction with chatbot for data collection, chatbot for consultation, etc.
players with various functions. This has an impact on the game While non-task-oriented is a chatbot that can talk with
experience which is more immersive than the game being humans in general, such as Google Assistant, Siri, Alexa, or
played. This study provides an overview so that NPCs are able Cortana. Several chatbots were also developed to find out the
to have dynamic dialogue with players, and this study also sentiments and emotions of users [14], as well as adding other
discusses chatbots as a communication technology that is inputs other than text (multimodal) to increase awareness of
currently emerging and its impact when combined with NPC.
the chatbot itself [15][16].
Keywords—Non-Player Character, Dynamic Dialogue,
With the presence of chatbots combined with NPCs, it is
Dialogue System, Chatbot, Immersive, Game Experience. hoped that the experience felt by players will be more
immersive, so that indirect engagement with the game will
I. INTRODUCTION also increase.
The main thing in games is enjoyment in playing or II. CHATBOT AND NON PLAYABLE CHARACTER
feeling fun when playing [1]. Where this is what makes many
people interested in playing games. As time goes by, many A. Non-Player Character in Games
games have been made and some of them have similarities. Non-Player Character or NPC is an object that cannot be
Therefore, game developers always develop their games so controlled by the player but has an important role in the game
that they become the best. The more developed the more itself so that it makes the world in the game feel alive [17].
criteria that must be improved, such as multimedia elements, These roles can be in the form of: helping players, providing
gameplay and so on [2]. In this case, all game developers are challenges, managing resources, providing narrative
competing to increase interest in the games they make, so a
expositions, trading and many others depending on the game
standard game experience is needed that can bring out the
model made. Based on this, in general, NPC behavior can be
charm of the created game [3].
Studies on game experience have been carried out by classified into 6 parts, namely: move, content in-game,
many researchers. Such as game experience for learning [4], cooperation, combat, trading and chat [18].
for treating the elderly [5], for the blind [6], for social B. Intelligence Non-Player Character
interaction [7], for cognitive enhancement [8], for
competitive games [9,10], etc. Based on these studies, it all In general, NPCs are programmed with behavior that does
depends on the case being studied. This is the inspiration for not change, with artificial intelligence NPCs have limited
this research to improve the game experience in general, knowledge of the scope of the environment and social
regardless of who the object is. experience. As described in figure 1, all are based on
With the help of artificial intelligence, immersive game information obtained in the game being played.
experiences can be created. This artificial intelligence
becomes a driving system for an object so that the object in
question has its own ability [11]. These objects can be
environment, obstacle, character, etc. Before making games,
these objects are designed for various purposes [12]. One of
the objects that will be raised in this study is a non-player
character or commonly called an NPC. In general, in games,
NPCs are an element that adds to the aesthetics of the game
and supports several quests when a challenge is given. It's just
that all existing NPCs have been programmed using a certain
dialogue system and when asked to talk again the same dialog
will appear (Fix Dialog). What if this NPC spoke according
to his own will and possessed human-like emotions?
To achieve this, the initial study used the chatbot concept.
The Chatbot is a dialogue system that seeks to imitate humans
in speaking [13]. Chatbot itself consists of task-oriented and
Fig. 1. General Architecture Intelligence NPC [19]

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

418 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Figure 1 explains that based on the existing information, involve eider sequence to sequence model or Generative
it is processed into knowledge, then decisions are made based Adversarial Network (GAN). In [28] the author used LSTM
on the goals, behavior and social properties of the NPC and and RNN for text generation combined with Reinforcement
produce actions that will be captured by the player. This is Learning and GAN to enhance the generated answer. In [29]
something that usually happens because technology is getting use sequence to sequence model to generate answer for
more sophisticated and changes little by little bring an human resource domain, in that they use an LSTM as encoder
amazing experience. When viewed from the side of the and another LSTM as decoder. The drawback for generative
conversation, many robots have been developed that can based chatbot is that the data needed is huge and sometimes
provide the desired answer. If this NPC is given control like the response is nonsense [13]. Therefore, [30] try to use little
a chatbot, the player's experience will be more immersive. dataset to tackle such problem. In [30] the author did
comparative study for multiple encoder-decoder architecture
C. Chatbot
or seq2seq model for domain specific and little data (around
Chatbot is an example of dialogue system, which is a 567 data) with BiLSTM + BERT vectorizer gave the highest
computer program that supports spoken, text-based, or BLEU score. Another implementation of generative chatbot
multimodal conversational interactions with humans [13]. In is using personality on generating text response [31] [32]. In
its history, there are two kinds of usage of chatbot, which is [31] and [32] the authors tried to generate chatbot response
task/goal-oriented and non-task-oriented. Task oriented is a with personality embedding as additional parameter input,
chatbot that converse with human to solve a task at hand, such that the response of chatbot itself is having the
while non-task oriented is a chatbot that engage with human predefined personality.
in general conversation.
There exists multiple way to build chatbot dialogue, such
as rule based/pattern based, retrieval based, and generation
based [13]. Pattern based will use user pattern input to
determine the response. In early days, many patterns based
chatbot was researched using handcrafted pattern
[20][21][22]. While using pattern based chatbot is handy,
static rule cannot solve the high number of user input variance
especially if user have some typos or use synonyms and
different terms to state their intention. Developers need to
build a huge database of patterns to accommodate it. This Fig. 2. Sequence to Sequence Model [28]
problem gives a rise to machine learning based chatbot. In
machine learning based chatbot. Machine learning based D. Emotionally aware chatbot
chatbot mainly divided into 2 categories, which is retrieval Research about chatbot with goal to enhance the
based and generative based. understanding of context and return appropriate response has
Retrieval based will try to retrieve the most relevant been described in previous section. Several researchers though
dialogue from a database of dialogues with user chat [22]. that they should not only focus on the understanding of
There exist several ways to do retrieval based chatbot. In [22], context and map it into response, but also on user experience.
they used learning to rank approach with rankSVM to choose One of the aspects that has been explored is the understanding
of user’s input sentiment and emotion and return the
the most relevant replies for short text single-turn
appropriate response both based on user utterance context and
conversation data. In [23], they used Deep Attention Model
user emotion or sentiment. In [33] they developed a general
(DAM) as modification of Transformer [24] to select the best chatbot with empathic and sentiment aware capability called
response in multiturn conversation. In [25] the authors use XiaoIce. This chatbot can increase the interaction’s
external knowledge from document combined with utterance immersiveness by understanding the query sentiment and
as parameters to match with the responses; hence their deliver the response with interactive persona. In [34] they built
proposed neural network is called Document Grounded emotionally aware chatbot called CAIRE for open domain/
Matching Network (DGMN), which mimic how human general conversation. They used GPT as their pretrain
answer such question based on external knowledge. In [26] language model with fine-tuning with PersonaChat and
use approach of seq2seq which in nature is a generative EmpathicDialogue dataset to better give persona and empathic
model. Seq2seq or sequence to sequence usually used for answer. Both of this chatbot uses generative approach. In [35],
machine translation or to generate output sequence based on the author built a sequence to sequence named Emotional
input sequence [27] like illustrated in figure 2, therefore this Chatting Machine (ECM) that using emotion category as
approach normally used for generative chatbot. In [26] the embedding, internal memory module that capture emotion
author combine retrieval based and generative based dynamic during decoding, and external memory module that
approach for better response. The drawback of retrieval- help select word with emotion or generic words. While the
based approach is that the answer is a fixed set, therefore resulting text generation is promising, the drawback for [35]
developer need to build a corpus of replies. is that the insufficient text data with emotion label.
Generative based chatbot is another approach for answer E. Multimodal Chatbot
generation. This approach differs from retrieval based and Like what have been explained in previous section that
rule based in that the AI model will build terms into a enhancing the experience, immersiveness and believability of
complete answer [13]. Therefore, this approach will give chatbot, only understanding the context from text is not
more dynamic answers than retrieval or rule-based enough. One of existing approach is understanding sentiment
approaches. Mostly the research about generative chatbot and emotion from user utterance. Another approach is to

419 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

enhance the chatbot understanding using multi modal the Town. The architecture is described in figure 4. In [38]
approach. In here, the goal is to mimic how human think and they built a module named productionist that main work is to
understand. Human in nature is not only bound on one generate a text procedurally based on game state, conversation
modality. To understand someone in interaction we often see state, and grammar.
their gesture, hear their intonation in voice, their emotion, etc.
Therefore, human combine all the data into one representation
that our brain can understand, and give action based on those
stimuluses. Multimodal learning in artificial intelligence is
inspired from how our brain understand those multi-domain
data. Multi modal learning is an approach to represent datas
Fig. 4. Architecture to produce a dialogue using expresionist-produced
from two or more different domains [36], for example in grammar [38]
picture data is represented as pixel value and text data is
represented as terms. Dynamic NPC dialogue generation also can be developed
using seq2seq model like in generative based chatbot. In [40],
the author used OpenMNT which is a seq2seq/ encoder-
decoder-based model and modify the base model from RNN
into Bidirectional Neural Network (BiRNN). The original text
will be paraphrased into multiple sentences by the model. The
generated text then will be altered into altered text based on
the game state. Terms and game state is manually tagged, and
the altered sentence will be displayed on the game.
Another interesting dynamic NPC dialogue is in [41]. In
[41] the author argued that to increase the interaction of player
Fig. 3. Example of multimodal chatbot which use image and text [15] and the game, especially NPC, interaction should not be only
from player and NPC, but also between NPC. They built a
Multimodal implementation in chatbot have been done by group-based NPC conversation. This conversation is using
multiple researchers, in which some will be discussed here. In Ceptre, a linear-logic-based modeling tool. In Ceptre, a model
[15] the author is using image from fashion picture like clothes is an unordered collection of declarative if… then… rules that
and text data as input for their chatbot. Figure 3 explain the describe how the simulation can change components of its
architecture for multimodal chatbot in [15]. Another case is in state based on its current properties of its state. Because this is
[16] and [37] is by translating the data domain to another an agent-based group conversation, they modelled the change
similar data domain. In [16] they built a chatbot as customer in sentiment change based on another agent response, and also
service for call center service line. The system will translate the agent’s emotional responses.
the speech input into text using speech recognition and done
intent classification on the translated text. After that the IV. CONCLUSION AND DISCUSSION
system will reply using synthetized voice from generated text Looking at the current approaches from dynamic NPC
or reply and direct the user into the appropriate call center dialogues generations, many still focused on how to produce
channel. In [37] the author built a text and speech chatbot for the dialogue procedurally based on in-game variables, such as
class immersion. Speech recognition is built to translate the game state, or player character’s state, which is true. Because
user utterance into text. The multimodality in here is the the designer of the game does not want the NPC to move or
output, in that user will get speech synthetize, visual act freely outside the scenarios that have been predefined. To
representation by chatbot animation, and text generation. be able to integrate chatbot based technology we can either
III. DYNAMIC NPC DIALOGUE use retrieval-based approach to the predefined dialogues
combined with game state or game variable as another vector,
Dialog NPC is one of the components that define the video or use generative based approach with game state, NPC
game experience. Traditionally the NPC dialogue is defined job/personality as additional vector. Chatbot approaches that
by team of designers and writers [38]. However, this can be used for game dialogue is that using emotional and
implementation is taxing for writer teams and has a fixed set sentiment as input like used in [35], combined with emotional
of NPC dialogue/ Fix dialog. Because of that, several state like in [42] to generate believable NPC response.
researchers have done some research on how to create NPC Another approach for more immersive behavior is the
dialogue dynamically to enhance user experience while approach in [41] that enable the NPC-to-NPC conversations;
interacting with it. therefore, the game world will be livelier with not only player
Truthfully, there exists several researches with focus on to NPC interaction, but can be in the form of player-NPC-
how to increase the user experience with NPC dynamic NPC-player interaction, that takes the input of not only user
dialogue. In [39], the author developed a political text game utterance but also another NPC dialogue.
that put player as a king, and his viziers. In here, the author For author friendly approach that won’t affect the story
created 3 NPCs which represent each of the vizier that respond progression, the first step maybe to incorporate the chatbot
with player command and the state of the game. They used a approach in nonvital NPC such as a villager in fantasy role
tool called Perfectionist which is grammar-based tool to help played game or built an NPC as plugin for several games that
generate the dialogue from NPC. Therefore, the author can can help player progression. Using natural language
design the grammar for each NPC with their respective generation based on seq2seq learning for NPC dialogue itself
persona. Therefore, the text generation in [39] is controlled by still can be explored in the sense of more controlled generation
several in-game variables. Another research that used using multiple variable such as player coordinate or positions,
Perfectionist is [38] with their research game called Talk of player level, game state or story progression, etc.

420 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

REFERENCES [23] Moore, K., Zhong, S., He, Z., Rudolf, T., Fisher, N., Victor, B., &
Jindal, N. (2021). A comprehensive solution to retrieval-based chatbot
[1] Udjaja, Y., Tanuwijaya, K., & Wairooy, I. K. (2019). The use of role construction. ArXiv, abs/2106.06139.
playing game for Japanese language learning. Procedia Computer
Science, 157, 298-305. [24] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez,
A. N., Kaiser, L., and Poloshukhin, I..(2017).Attention is All you Need
[2] Kristiadi, D. P., Udjaja, Y., Supangat, B., Prameswara, R. Y., Warnars, in 31st Conference on Neural Information Processing Systems (NIPS
H. L. H. S., Heryadi, Y., & Kusakunniran, W. (2017, November). The 2017), Long Beach, CA, USA
effect of UI, UX and GX on video games. In 2017 IEEE International
Conference on Cybernetics and Computational Intelligence [25] Zhao, X., Tao, C., Wu, W., Xu, C., Zhao, D., & Yan, R. (2019). A
(CyberneticsCom) (pp. 158-163). IEEE. Document-grounded Matching Network for Response Selection in
Retrieval-based Chatbots. ArXiv, abs/1906.04362.
[3] Udjaja, Y. (2018). Gamification assisted language learning for
Japanese language using expert point cloud recognizer. International [26] Qiu, M., Li, F., Wang, S., Gao, X., Chen, Y., Zhao, W., Chen, H.,
Journal of Computer Games Technology, 2018. Huang, J., & Chu, W. (2017). AliMe Chat: A Sequence to Sequence and
Rerank based Chatbot Engine. Proceedings of the 55th Annual
[4] Mittal, A., Gupta, M. P., Chaturvedi, M., Chansarkar, S. R., & Gupta, Meeting of the Association for Computational Linguistic (pp. 498–
S. (2021). Cybersecurity Enhancement through Blockchain Training 503).
(CEBT)–A serious game approach. International Journal of
Information Management Data Insights, 1(1), 100001. [27] Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence
learning with neural networks. In Advances in neural information
[5] Udjaja, Y., Rumagit, R. Y., Gazali, W., & Deni, J. (2021). Healthy processing systems (pp. 3104-3112)
Elder: Brain Stimulation Game for the Elderly to Reduce the Risk of
Dementia. Procedia Computer Science, 179, 95-102. [28] Chou, T. L., & Hsueh, Y. L. (2019). A task-oriented chatbot based on
lstm and reinforcement learning. In Proceedings of the 2019 3rd
[6] Yanfi, Y., Udjaja, Y., & Juandi, A. V. The Effect of User Experience International Conference on Natural Language Processing and
from Teksologi. Information Retrieval (pp. 87-91)
[7] Tjernberg, W. (2021). Tabletop game player experience in the age of [29] Sheikh, S. A., Tiwari, V., & Singhal, S. (2019). Generative model
digitization: Social and material aspects of play. chatbot for Human Resource using Deep Learning. In 2019
[8] Lau, S. Y. J., & Agius, H. (2021). A framework and immersive serious International Conference on Data Science and Engineering (ICDSE)
game for mild cognitive impairment. Multimedia Tools and (pp. 126-132). IEEE.
Applications, 1-55. [30] Kapočiūtė-Dzikienė, J. (2020). A domain-specific generative chatbot
[9] Sasmoko, Harsono, J., Udjaja, Y., Indrianti, Y., & Moniaga, J. (2019, trained from little data. Applied Sciences, 10(7), 2221.
March). The Effect of Game Experience from Counter-Strike: Global [31] Jiwei Li, Michel Galley, Chris Brockett, Georgios P Spithourakis,
Offensive. In 2019 International Conference of Artificial Intelligence Jianfeng Gao, and Bill Dolan (2016). A persona-based neural
and Information Technology (ICAIIT) (pp. 374-378). IEEE. conversation model, arXiv preprint arXiv:1603.06155.
[10] Sasmoko, Halim, S. A., Indrianti, Y., Udjaja, Y., Moniaga, J., & [32] Song, H., Zhang, W. N., Cui, Y., Wang, D., & Liu, T. (2019).
Makalew, B. A. (2019, March). The Repercussions of Game Exploiting persona information for diverse generation of
Multiplayer Online Battle Arena. In 2019 International Conference of conversational responses. arXiv preprint arXiv:1905.12188.
Artificial Intelligence and Information Technology (ICAIIT) (pp. 443-
447). IEEE. [33] Zhou, L., Gao, J., Li, D., & Shum, H.-Y. (2020). The Design and
Implementation of XiaoIce, an Empathetic Social Chatbot.
[11] Smith, G. (2017). GLaDOS: Integrating Emotion-Based Behaviours Computational Linguistics, 1–62. doi:10.1162/coli_a_00368 .
into Non-Player Characters in Computer Role-Playing Games
(Doctoral dissertation). [34] Lin, Z., Xu, P., Winata, G. I., Siddique, F. B., Liu, Z., Shin, J., & Fung,
P. (2020). CAiRE: An End-to-End Empathetic Chatbot. Proceedings of
[12] Warpefelt, H., & Verhagen, H. (2017). A model of non-player character the AAAI Conference on Artificial Intelligence, 34(09), 13622-13623.
believability. Journal of Gaming & Virtual Worlds, 9(1), 39-53.
[35] Zhou, H., Huang, M., Zhang, T., Zhu, X., & Liu, B. (2018). Emotional
[13] Adamopoulou, E., & Moussiades, L. (2020). Chatbots: History, chatting machine: Emotional conversation generation with internal
technology, and applications. Machine Learning with Applications, 2, and external memory. Thirty-Second AAAI Conference on Artificial
100006. doi:10.1016/j.mlwa.2020.100006 Intelligence.
[14] Lee, H., Ho, C., Lin, C., Chang, C., Lee, C., Wang, Y., Hsu, T., & Chen, [36] Baltrušaitis, T., Ahuja, C., & Morency, L. P. (2018). Multimodal
K. (2020). Investigation of Sentiment Controllable Chatbot. ArXiv, machine learning: A survey and taxonomy. IEEE transactions on
abs/2007.07196. pattern analysis and machine intelligence, 41(2), 423-443.
[15] Liao, L., Zhou, Y., Ma, Y., Hong, R., & Chua, T. (2018). Knowledge- [37] Griol, D., Molina, J. M., & De Miguel, A. S. (2014). Developing
aware Multimodal Fashion Chatbot. Proceedings of the 26th ACM multimodal conversational agents for an enhanced e-learning
international conference on Multimedia experience. ADCAIJ: Advances in Distributed Computing and
[16] Lockett, J., Wijesinghe, S., Phillips, J., Gross, I., Schoenfeld, M., Artificial Intelligence Journal, 3(1), (pp. 13-26).
Hiranpat, W.T., Marlow, P.J., Coarr, M., & Hu, Q. (2019). Intelligent [38] Ryan, J., Mateas, M., & Wardrip-Fruin, N. (2016). Characters who
Voice Agent and Service (iVAS) for Interactive and Multimodal speak their minds: Dialogue generation in Talk of the Town. In Twelfth
Question and Answers. FQAS.. Artificial Intelligence and Interactive Digital Entertainment
[17] Warpefelt, H., & Verhagen, H. (2015). Towards an updated typology Conference
of non-player character roles. In Proceedings of the International [39] Lessard, J., Brunelle-Leclerc, E., Gottschalk, T., Jetté-Léger, M. A.,
Conference on Game and Entertainment Technologies (pp. 1-9). Prouveur, O., & Tan, C. (2017). Striving for author-friendly procedural
[18] Mo, Y. T., & Kim, S. K. Research on the Influence of Randomness of dialogue generation. In Proceedings of the 12th International
Non-PlayerCharacter Interaction Behavior on Game Experience.. Conference on the Foundations of Digital Games (pp. 1-6).
[19] Yoo, K. S., & Lee, W. H. (2008, September). An Intelligent Non Player [40] Hämäläinen, M., & Alnajjar, K. (2019). Creative contextual dialog
Character based on BDI Agent. In 2008 Fourth International adaptation in an open world rpg. In Proceedings of the 14th
Conference on Networked Computing and Advanced Information International Conference on the Foundations of Digital Games (pp. 1-
Management (Vol. 2, pp. 214-219). IEEE. 7).
[20] Wallace, R. S. (2009). The Anatomy of A.L.I.C.E. Parsing the Turing [41] Morrison, H., & Martens, C. (2017). A generative model of group
Test (pp. 181–210). conversation. In Proceedings of the 12th International Conference on
[21] Weizenbaum, J. (1966). ELIZA---a computer program for the study of the Foundations of Digital Games (pp. 1-7)
natural language communication between man and machine. [42] Chowanda, A., Blanchfield, P., Flintham, M., & Valstar, M. (2016).
Communications of the ACM, 9(1) (pp. 36-45). Computational models of emotion, personality, and social
[22] Ji, Y., Lu, Z., and Li, H., (2014). An Information Retrieval Approach relationships for interactions in games. In Proceedings of the 2016
to Short Text Conversation. International Conference on Autonomous Agents & Multiagent
Systems (pp. 1343-1344).

421 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Explainable Supervised Method for Genetics


Ancestry Estimation

Arif Budiarto Bens Pardamean


Computer Science Department Computer Science Department
School of Computer Science BINUS Graduate Program - Master of Computer Science
Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia Jakarta, Indonesia
[email protected] [email protected]

Abstract—Ancestry estimation is one crucial stage in relatively massive amount of genetics variants, called Single
genomic research. It generates scores that represent the Nucleotide Polymorphism (SNP). In the first wave of this
admixed genetics profile as the result of human evolution. In the research, basic statistical methods and the Bayesian method
previous research, we implemented multiple unsupervised were the most popular approach in estimating this information
methods to estimate these scores from large genomics data [9]–[11]. Almost all these methods are trained in an
obtained from the 1000 Genome Project. These methods were unsupervised learning approach to extract the hidden pattern
limited to only cluster the samples to the five global populations within genetics data. The capability of estimating the ancestry
in the dataset. The main challenge arose when implementing scores of each sample can be done using the clustering process
these methods to cluster the samples into more specific sub-
as the proxy [12], [13]. The probability of a sample being in a
populations. In this paper, we proposed a supervised approach
to answer this challenge. Two state-of-the-art supervised
certain cluster can be interpreted as the ancestry score. As in
machine learning methods, XGBoost and Deep Neural Network other domains, the way this unsupervised method clusters the
(DNN), were applied to the same dataset. These methods were data can be controlled, which means the sample will be
aimed to classify samples, both into five main populations and grouped based on the same characteristics of its data. In
also into 26 sub-populations. In the first classification task, both genomics data, this similarity can be derived from several
methods achieved similar results to our previous unsupervised other factors than the population stratification itself, such as
approach. Interestingly, for the second classification task, which somatic mutation [5]. Previously, we have successfully
posses a relatively higher difficulty, DNN yielded better developed unsupervised models to estimate the ancestry
performance in the train, validation, and test dataset, despite its information score using K-Means [12] and Gaussian Mixture
overfitting problem. Furthermore, the feature importance Model (GMM) algorithm [13]. In general, the GMM model
scores from each model were calculated using Shapley Additive can outperform the K-Means model since it carefully focused
Explanations (SHAP) method. Finally, 11 overlapped SNPs on the mixture within the data associated with multiple data
from all models were evaluated based on the reported Minor distributions which were used to model the data.
Allele Frequency (MAF) from the 1000 Genome Project.
Overall, only using these 11 SNPs, we could differentiate each On the other hand, fewer studies have been done to
population in regards to its average MAF. implement supervised methods in this classification task
including Support Vector Machine (SVM) and Multi-Layer
Keywords—genomics, ancestry, genetic marker, supervised, Perceptron (MLP) [6], [14], [15]. Unfortunately, these studies
explainable machine learning, did not provide enough explanation on the mechanism of the
models. It refers to the elaboration of certain genetics markers
I. INTRODUCTION contribute to the different profile among populations. In the
Ancestry estimation is one of the required and significant current study, we aimed to build two more advanced
steps in genetics research including Genome-Wide supervised models, XGBoost and Deep Neural Network
Association Study (GWAS) [1], [2], and forensic genetics [3], (DNN) as comparable models to our previous approaches. We
[4]. The ultimate goal of this estimation is to detect a unique also highlighted some specific genetic markers derived from
pattern within someone’s genetics profile that represents the the models to explain the different characteristics between
admixed features of the ancestral estimation. The benefit of populations.
using this estimation score in GWAS is to lower the bias
caused by the admixed genetics profile of the samples. This II. RESEARCH METHODOLOGY
mixed genetics profile is part of the evolution process which A. Data
happens in the process of meiosis [5], [6].
As the continuation of our previous studies [12], [13], in
The human genome consists of DNA chains which the current study, we used the same dataset which consists of
separated into 22 chromosomes. There are two admixture one public dataset from 1000 Genome Project (1kGP) [16],
levels found in human genetics based on the scope of the and our private dataset from Indonesia Colorectal Cancer
mixture across chromosomes [7]. The first one is global Consortium (IC3) [17]. With regards to some preprocessing
genetic ancestry which represents the general proportion of steps, as explained in our previous paper, we only used 2405
ancestral information from the whole genome [8]. In contrast, unrelated individuals from the 1kGP dataset. Originally, each
local genetic ancestry only looks at the variation in a specific sample in the 1kGP dataset has more than 40 million SNPs.
genome region in a particular chromosome [6]. Global However, we only considered 3,576 out of them as the
ancestry information is more common to be used in GWAS predictors in this study. The selection of these polymorphisms
rather than the local ancestry information. was based on our published GWAS study [17]. Each SNP
value is one digit of a number which represents the number of
The development of computational methods has been done
genetic alterations in a particular base position (i.e., 0, 1, or 2).
to help researchers in extracting this information from a

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

422 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

The illustration of the dataset employed in this study is process the data by incorporating its weight. This method is
depicted in Figure 1. also complemented by a backpropagation step which
gradually updates the neuron weights until the best fit model
is achieved [25].
Before implementing these two models, the 1kGP dataset
was split into three subsets, namely train, validation, and test
subset, with stratified random sampling. The proportion of
each subset was 60%, 20%, and 20%, respectively. In the
global population classification task, each model is aimed to
classify the data into its corresponding population. The
XGBoost model used the “gbtree” booster, “multi:softprob”
objective function, and 22 number of trees. It was fitted to the
Fig. 1. Example of Genetics Data from 1000 Genome Project train data which was validated using the validation data. These
XGBoost models were developed using the XGBoost library
The samples in the 1kGP dataset are grouped into five in Python [18].
major populations, namely African (AFR), America (AMR),
East Asian (EAS), European (EUR), and South Asian (SAS). DNN models for both classification tasks were also fitted
Furthermore, each sample is also categorized into 26 different to the train and validated on the validation data. In general, the
sub-populations which represents the specific ethnicity. This architecture used for both tasks was a stack of three blocks of
data is considered an imbalance data since the smallest group dense layers with a ReLU activation function. A dropout layer
(AMR) is almost half of the biggest group (AFR). While the was also added to the architecture to keep only 70% of the
rest of the groups are almost equal. neurons before passing them to the last dense layer as the
classification layer with a “softmax” activation function. The
IC3 data was used in the evaluation step to measure the model was trained for 100 epochs, optimized using a default
performance of the classification model we developed. This “Adagrad” optimizer, and evaluated using a categorical cross-
dataset consists of 173 samples from Indonesia. Given this entropy loss function. A slightly different architecture was
location, all samples were considered as EAS population. implemented for the sub-population classification task. The
B. Machine Learning Methods stack consists of 5 dense layers and a drop-out layer that kept
70% of the neurons. “Adamax” optimizer with a 0.001
In our previous research, we successfully implemented an learning rate was used to optimized the network. Based on
unsupervised learning approach to estimate the ancestry these architectures, the total trainable parameters for global
scores based on genetics data. GMM has shown a promising and sub-population classification models were 468,357
capability to model complex genetics data and clustered the parameters, and 1,100,058 parameters, respectively. Both
samples into their corresponding population group. The nature DNN models were built using the Keras library in Python [26].
of the mixed genetics profile across the samples is the reason
why GMM outperformed the KMeans clustering model. To evaluate the performance of each model, the test subset
However, when evaluating the model into our IC3 data, this was used. Confusion matrix and accuracy score were used to
model failed to predict one sample out of 173 samples. compare the performance between models. Additionally, all
the models were also evaluated using the IC3 dataset to test
In the current research, we aimed to build a classification whether they have enough generalization capability for totally
model using a supervised learning approach that practically different data, in terms of the study and genotyping
can achieve better performance than the unsupervised method. technology used.
Two objectives are set as the main tasks for this classification.
The first one is to classify the samples into their corresponding C. Model Explainability
population, similar to the objective of our previous Both XGBoost and DNN are considered black-box
unsupervised models. Additionally, we also developed a models. Therefore, a special algorithm needs to be employed
supervised model which can classify each sample to the to explain the mechanism inside the model. In the logistic
corresponding sub-population. regression method, this explainability information is
Two advanced supervised machine learning methods, automatically provided by the model using the coefficient and
XGBoost and DNN, were employed in building the model. p-value for each predictor which represents its significance
XGBoost is commonly used to model tabular data either in towards the output.
regression or classification tasks. The basic concept of XGBoost library in Python also provides the built-in
XGBoost is combining several small and weak decision tree feature importance method to generate the significance score
models into one ensemble model which unified the prediction for each predictor based on the information gain of each
power from all the decision tree models [18], [19]. predictor in multiple generated trees [27]. Yet, in this study,
On the other hand, DNN offers a more advanced non- we used Shapley Additive Explanations (SHAP) method [28]
linear data modeling that can extract hidden power within the to explain the significance score of each predictor we used for
data. It is popularly known as a powerful method to deal with both models to allow us to do a fair comparison. As mentioned
massive and complex data in computer vision [20]–[22] and in its name, the SHAP method was based on the Shapley
natural language processing [23], [24] tasks. This method was values which are commonly used in the game theory
inspired by the biological process in the human brain to approach. This model-agnostic method performs 2F model
process information. A DNN model is comprised of several evaluations to each data, where F is the total number of
blocks or hidden layers to map the input to the output data. features available in the model. Each evaluated model has a
Each block has multiple neurons with the corresponding different number of features, increasing from no feature at all
weights. Each neuron handles a simple linear regression to to all features at once. The difference output between a model

423 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

that uses a particular feature and the one that does not use this values which can be interpreted as the correct predictions are
feature represents the significant score of it, or we can call it quite promising for each population. To further investigate
Shap value. Therefore, each feature has varied marginal this result, we generated two others performance scores,
effects toward the output across all data rows. similar to the performance evaluation in a popular computer
vision competition called ImageNet competition which also
In our case, to get the significance score from each SNP, has more than 10 predicted classes [29]. These scores are top2
we aggregated the Shap values across all data using the and top3 accuracy which were generated by looking at the 2
average function. So, in the end, we had a single Shap value and 3 highest predicted classes. Interestingly, our DNN model
that summarizes the effect of one SNP in all samples. successfully predicted the correct class in the top 2 and 3
Furthermore, this summarized Shap value also represents the highest classes in more than 80% of the test data. In contrast,
effect of each SNP in predicting the population for each the XGBoost model achieved only 64% and 78% for top2 and
sample. Eventually, we generated an N x M matrix of Shap top3 prediction, respectively. The complete comparison of
values, where N is the total number of samples, and M is the both two models can be found in Table 1.
total number of populations. By looking at this matrix then we
selected 25 SNPs with the highest Shap value from both Finally, both models were evaluated using IC3 data. DNN
models in both task classifications. Among 100 SNPs, we model achieved 100% accuracy for the first task by predicted
included only SNPs which was replicated in a minimum of all samples to be in the EAS population since all samples in
two models. Finally, we obtained the reported minor allele the IC3 dataset are from Indonesia. Whilst the XGBoost
frequency (MAF) of these selected SNPs for all populations model failed to classify one sample correctly. Similar to the
in the 10000 Genome Project. MAF can be interpreted as the evaluation using test subset data from the 1kGP dataset, both
proportion of samples with a mutation in a specific SNP. A models failed to achieve a good accuracy score. Despite this
good ancestry marker is a SNP with a distinct MAF across relatively low accuracy, the predicted class in the DNN model
different populations. were categorized in the EAS population. The proportion of the
predicted classes were 82.1%, 16.1%, 1.2%, and 0.6% for
III. RESULT AND DISCUSSION Kinh in Ho Chi Minh City (Vietnam), Chinese Dai in
A. Global Population Classification Xishuangbanna, Southern Han Chinese, and Han Chinese in
Beijing, respectively. On the other hand, the XGBoost model
This first classification task was done to be compared with predicted 5% of the samples as Bengali from Bangladesh, and
our previous reports which used unsupervised methodologies. Sri Lankan Tamil from the UK, which was categorized as the
Our previous GMM model successfully clustered the samples SAS population. This result indicates that the models failed to
from the same population with 99% accuracy. Yet, this model separate samples with very similar genetics profiles from the
only evaluated the same dataset that was used in the training same global population. The reason for it is because of the
process. Our proposed supervised models in the present study mixed marriage between ethnicities within the population
also yielded 99% classification accuracy in the training subset. which makes them possess a similar genetics profile [5].
However, only DNN could yield the same accuracy for
validation and test subsets, while XGBoost only achieved C. Model Explainability
97% and 96% of accuracy for validation and test subset data, SHAP algorithm was implemented to obtain the
respectively. The full performance comparison for both significance score for each SNP in all models. 25 top SNPs
models can be seen in Table 1. from each model were selected based on the Shap values.
B. Sub Population Classification Eleven out of them were overlapped in all models as reported
in Table 2. More than 35% of them are located on
The second classification task offered a more challenging chromosome 2. Furthermore, Figure 3 clearly illustrates a
problem to be solved since it has 5 times more classes to be distinction of average MAF for all reported populations in the
predicted for each sample. In the training data, Both XGBoost 1000 Genome Project. EAS population has the most
and DNN models achieved more than 90% accuracy. distinctive average MAF among all populations based on
However, when evaluated to the validation and test subset, these 11 SNPs. It was supported by the statistical t-test
DNN outperformed XGBoost accuracy performance. performed in the MAF data, as can be seen in Table 3. The
XGBoost yielded only less than 50% of accuracy in both only significant difference found in the t-test was between
subsets, while DNN achieved more than 50% accuracy for AFR and EAS populations, where both populations have the
both. highest and the lowest average MAF, respectively.
TABLE I. MODEL PERFORMANCE FOR EACH MODEL IN BOTH TASKS IV. CONCLUSION
XGBoost DNN We proposed the implementation of two supervised
Train Accuracy 0.99 0.99 methods to estimate ancestry estimation score from genetics
Global Population Val Accuracy 0.95 0.99
data and use this score to classify each sample into the
Test Accuracy 0.96 0.99
Train Accuracy 0.96 0.99
appropriate population group. These two models achieved a
Val Accuracy 0.42 0.63 better accuracy score in classifying the data into five main
Sub Population Test Accuracy 0.39 0.61 populations than our previous unsupervised approach.
Top2 Accuracy 0.64 0.85 Furthermore, in the sub-population classification task, both
Top3 Accuracy 0.78 0.95 models achieved a relatively low accuracy score in the
validation and test subset.
Although the accuracy for both models was seemed
relatively low, the confusion matrix for both models, as
depicted in Figure 2, shows an interesting result. The diagonal

424 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

(a) (b)

(c) (d)
Fig. 2. Confusion Matrix for both Models in both Classification Tasks. (a) Confusion matrix of the XGBoost model in the global population classification
task. (b) Confusion matrix of the XGBoost model in the sub-population classification task. (c) Confusion matrix of the DNN model in the global
population classification task. (d) Confusion matrix of the DNN model in the sub-population classification task

TABLE II. OVERLAPPED SNPS IN ALL MODELS populations were still in the same main population as the
actual sub-population. The complex admixed genetics profile
SNP ID Chromosome Position among the samples within the same main population seems to
rs434504 1 4815477 be the main cause of the model to learn the hidden pattern in
rs2814778 1 159174683 the data. Both models achieved 100% accuracy in predicting
the main population for all samples in the IC3 dataset. More
rs6731972 2 26151206
specifically, the DNN model also successfully predicted the
rs260687 2 109578855 sub-population within the EAS population for each sample in
rs4953863 2 133672853 IC3 data. Our study was not only focused on the classification
rs1446585 2 136407479 performance but also the strategy to explain our models. We
rs35391 5 33955673
used the SHAP method to obtain a significance score for each
SNP. We summarized 17 SNPs that belong to the top 5
rs10962599 9 16795286 predictors in each model. The statistical t-test shows that all
rs1834640 15 48392165 these SNPs have significant differences across all populations.
rs4411464 15 63995423 A more advanced neural network architecture, such as
rs6132532 20 2315543 convolutional neural network, recurrent neural network, or
attention-based network can be explored in further research
specifically for the sub-population classification task.
However, the top2 and top3 accuracy scores show a
promising result from the DNN model since the predicted sub-

425 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

“Estimating Local Ancestry in Admixed Populations,” Am. J. Hum.


Genet., vol. 82, no. 2, pp. 290–303, Feb. 2008, doi:
10.1016/j.ajhg.2007.09.022.
[11] A. Raj, M. Stephens, and J. K. Pritchard, “fastSTRUCTURE:
variational inference of population structure in large SNP data sets.,”
Genetics, vol. 197, no. 2, pp. 573–89, Jun. 2014, doi:
10.1534/genetics.114.164350.
[12] A. Budiarto, B. Mahesworo, J. Baurley, T. Suparyanto, and B.
Pardamean, “Fast and Effective Clustering Method for Ancestry
Estimation,” Procedia Comput. Sci., vol. 157, pp. 306–312, Jan.
2019, doi: https://doi.org/10.1016/j.procs.2019.08.171.
[13] A. Budiarto, B. Mahesworo, A. A. Hidayat, I. Nurlaila, and B.
Pardamean, “Gaussian Mixture Model Implementation for
Population Stratification Estimation from Genomics Data,” Procedia
Comput. Sci., vol. 179, pp. 202–210, Jan. 2021, doi:
10.1016/J.PROCS.2020.12.026.
[14] J. Meisner and A. Albrechtsen, “Haplotype and Population Structure
Inference using Neural Networks in Whole-Genome Sequencing
Data,” bioRxiv, p. 2020.12.28.424587, Dec. 2020, doi:
10.1101/2020.12.28.424587.
[15] C. J. Battey, P. L. Ralph, and A. D. Kern, “Predicting geographic
location from genetic variation with deep neural networks,” Elife,
vol. 9, pp. 1–22, Jun. 2020, doi: 10.7554/eLife.54507.
[16] R. A. Gibbs et al., “A global reference for human genetic variation,”
Nature, vol. 526, no. 7571, pp. 68–74, Oct. 2015, doi:
Fig. 3. Boxplot Visualization for Average MAF in All Populations
10.1038/nature15393.
[17] I. Yusuf et al., “Genetic risk factors for colorectal cancer in
TABLE III. T-TEST RESULT FOR MAF ACROSS POPULATIONS multiethnic Indonesians,” Sci. Rep., vol. 11, no. 1, pp. 1–9, May
2021, doi: 10.1038/s41598-021-88805-4.
AFR AMR EAS EUR SAS [18] T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting
system,” in Proceedings of the ACM SIGKDD International
AFR 1 0.081 0.015* 0.378 0.1 Conference on Knowledge Discovery and Data Mining, 2016, vol.
AMR - 1 0.373 0.716 0.691 13-17-Augu, pp. 785–794, doi: 10.1145/2939672.2939785.
[19] B. Mahesworo, T. Wawan Cenggoro, A. Budiarto, F. Rosyking
EAS - - 1 0.484 0.543 Lumbanraja, and B. Pardamean, “Phosphorylation Site Prediction
EUR - - - 1 0.556 using Gradient Tree Boosting,” scik.org, 2020, doi:
10.28919/cmbn/4653.
SAS - - - - 1
[20] D. Shen, G. Wu, and H.-I. Suk, “Deep Learning in Medical Image
* Significant at 95% confidence level (α=0.05). Analysis,” Annu. Rev. Biomed. Eng., vol. 19, no. 1, pp. 221–248,
Jun. 2017, doi: 10.1146/annurev-bioeng-071516-044442.
REFERENCES [21] J. Donahue et al., “Long-Term Recurrent Convolutional Networks
[1] C. Joyner, C. Mcmahan, J. Baurley, and B. Pardamean, “A two- for Visual Recognition and Description,” IEEE Trans. Pattern Anal.
phase Bayesian methodology for the analysis of binary phenotypes Mach. Intell., vol. 39, no. 4, pp. 677–691, 2017, doi:
in genome-wide association studies,” no. July, pp. 1–11, 2019, doi: 10.1109/TPAMI.2016.2599174.
10.1002/bimj.201900050. [22] B. Pardamean, T. W. Cenggoro, R. Rahutomo, A. Budiarto, and E.
[2] P. M. Visscher et al., “10 Years of GWAS Discovery: Biology, K. Karuppiah, “Transfer Learning from Chest X-Ray Pre-trained
Function, and Translation,” American Journal of Human Genetics. Convolutional Neural Network for Learning Mammogram Data,”
2017, doi: 10.1016/j.ajhg.2017.06.005. Procedia Comput. Sci., vol. 135, pp. 400–407, Jan. 2018, doi:
[3] M. Arenas et al., “Forensic genetics and genomics: Much more than 10.1016/J.PROCS.2018.08.190.
just a human affair,” PLoS Genet., vol. 13, no. 9, Sep. 2017, doi: [23] P. Badjatiya, S. Gupta, M. Gupta, and V. Varma, “Deep Learning for
10.1371/JOURNAL.PGEN.1006960. Hate Speech Detection in Tweets,” Proc. 26th Int. Conf. World Wide
[4] C. Li, “Forensic genetics,” Forensic Sci. Res., vol. 3, no. 2, p. 103, Web Companion, 2017, doi: 10.1145/3041021.3054223.
Apr. 2018, doi: 10.1080/20961790.2018.1489445. [24] J. Harer, C. Reale, and P. Chin, “Tree-Transformer: A Transformer-
[5] T. EA, “Identity by descent: variation in meiosis, across genomes, Based Method for Correction of Tree-Structured Data,” 2019,
and in populations,” Genetics, vol. 194, no. 2, pp. 301–326, 2013, [Online]. Available: http://arxiv.org/abs/1908.00449.
doi: 10.1534/GENETICS.112.148825. [25] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nat. 2015
[6] D. M. Montserrat, C. Bustamante, and A. Ioannidis, “Lai-Net: Local- 5217553, vol. 521, no. 7553, pp. 436–444, May 2015, doi:
Ancestry Inference with Neural Networks,” in ICASSP, IEEE 10.1038/nature14539.
International Conference on Acoustics, Speech and Signal [26] F. Chollet and others, “Keras.” GitHub, 2015.
Processing - Proceedings, May 2020, vol. 2020-May, pp. 1314– [27] T. W. Cenggoro, B. Mahesworo, A. Budiarto, J. Baurley, T.
1318, doi: 10.1109/ICASSP40776.2020.9053662. Suparyanto, and B. Pardamean, “Features Importance in
[7] E. R. Martin, I. Tunc, Z. Liu, S. H. Slifer, A. H. Beecham, and G. W. Classification Models for Colorectal Cancer Cases Phenotype in
Beecham, “Properties of global- and local-ancestry adjustments in Indonesia,” Procedia Comput. Sci., vol. 157, pp. 313–320, Jan. 2019,
genetic association tests in admixed populations,” Genet. Epidemiol., doi: 10.1016/J.PROCS.2019.08.172.
vol. 42, no. 2, pp. 214–229, Mar. 2018, doi: 10.1002/gepi.22103. [28] S. Lundberg and S.-I. Lee, “A Unified Approach to Interpreting
[8] T. TA and B. JL, “Local and global ancestry inference and Model Predictions,” Adv. Neural Inf. Process. Syst., vol. 2017-
applications to genetic association analysis for admixed December, pp. 4766–4775, May 2017, Accessed: Jul. 30, 2021.
populations,” Genet. Epidemiol., vol. 38 Suppl 1, no. 0 1, 2014, doi: [Online]. Available: https://arxiv.org/abs/1705.07874v2.
10.1002/GEPI.21819. [29] Jia Deng, Wei Dong, R. Socher, Li-Jia Li, Kai Li, and Li Fei-Fei,
[9] J. Byun, Y. Han, I. P. Gorlov, J. A. Busam, M. F. Seldin, and C. I. “ImageNet: A large-scale hierarchical image database,” 2009 IEEE
Amos, “Ancestry inference using principal component analysis and Conf. Comput. Vis. Pattern Recognit., pp. 248–255, 2009, doi:
spatial analysis: A distance-based analysis to account for population 10.1109/CVPRW.2009.5206848.
substructure,” BMC Genomics, vol. 18, no. 1, pp. 1–12, Oct. 2017,
doi: 10.1186/s12864-017-4166-8.
[10] S. Sankararaman, S. Sridhar, G. Kimmel, and E. Halperin,

426 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Memorize COVID-19 Advertisement: Customer


Neuroscience Data Collection Techniques by Using
EEG and fMRI

Maria Seraphina Astriani Lee Huey Yi Andreas Kurniawan


Computer Science Department, Faculty Neuroscience Business School Computer Science Department, Faculty
of Computing and Media Barcelona, Spain 08001 of Computing and Media
Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia 11480 Jakarta, Indonesia 11480
Raymond Bahana
[email protected] [email protected]
Computer Science Department, Faculty
of Computing and Media
Bina Nusantara University
Jakarta, Indonesia 11480
[email protected]

Abstract—Coronavirus Disease (COVID-19) confirmed information of how to protect yourself and the severity if
cases in the world still occurred more than 1.5 years after the someone is infected with this virus [5].
first cases outbreak in Wuhan, China. Education is a main key
to deal with this pandemic. The information on how to prevent An advertisement can be categorized as an effective
COVID-19 continues to be informed by direct approach and by advertising if the purpose of the advertisement is achieved [6,
using advertisements on television, radio, printed media, and on 7]. Consumer behavior and marketing research has viewed the
the internet are being provided to gain the awareness to the human body as a "black box" which cannot be physically or
people. Consumer neuroscience is necessarily needed and physiologically obtained. Consumer neuroscience is
important for understanding consumer behavior. This research necessarily needed and important for understanding consumer
paper proposed the techniques to collect the visual data of behavior in order to make a decision [8, 9]. Data taken from
COVID-19 advertisements by using electroencephalogram the questionnaire, although it easier to observed, it may
(EEG) and Functional Magnetic Resonance Imaging (fMRI) to produce the invalid results if the respondent is faking it. This
understand the brain activity. The results of this research can is the reason why special tools are needed to help researchers
be useful to create a better COVID-19 advertisement that can obtain the data and understand how the brain works, especially
attract people to memorize the health protocol. while the respondents have a look on the COVID-19
advertisements. There are two data collection techniques
Keywords—brain activity, data collection, memory, COVID-
proposed in this paper: electroencephalogram (EEG)-based
19, advertisement, consumer neuroscience, EEG, fMRI
and Functional Magnetic Resonance Imaging (fMRI)-based.
I. INTRODUCTION These techniques are able to collect the data from respondent
while they the visual COVID-19 advertisements. These
Countries in the world are currently at war to fight the guidelines can help researchers to conduct the research to
Coronavirus Disease (COVID-19). This pandemic has been understand how brain activity works while someone see the
declared as a public health emergency and has changed advertisements. The results of this research can be useful to
people's lives and global economy [1]. Based on World Health create a better COVID-19 advertisement that can attract
Organization Coronavirus Disease (COVID-29) Dashboard people to memorized.
(covid19.who.int), the confirmed cases still occur more than
1.5 years after the first cases of COVID-19 outbreak in II. LITERATURE REVIEW
Wuhan, China [2].
A. Human Memory
Indonesia is the most populous country in Southeast Asia
The occipital lobe is our brain's visual processing center,
and based on the data provided by the committee to handle
located in the back of the skull [10]. Most of the information
COVID-19 and national economic recovery
recorded and processed by the human eye is done in this area
(https://covid19.go.id/peta-sebaran-covid19), there have been
of the brain.
more than 4,1 million confirmed cases on September 13th,
According to neuroimaging studies [11, 12, 13, 14, 15, 16,
2021 [3]. The COVID-19 task force and volunteers are trying
17], the primary brain regions involved in memory are the
to push down the confirmed cases by encourage people to
amygdala, hippocampus, cerebellum, and prefrontal cortex, as
follow strict health protocols. Indonesian COVID-19 task
illustrated in Fig. 1.
force have launched 6 M health protocol rules (5 + 1 M) to
support government programs so there will be no COVID-19
anymore [4]. Education is a main key to deal with this
pandemic. Information on how to prevent COVID-19
continues to be informed to people by direct approach and by
using advertisements on television, radio, printed media, and
on the internet. To increase public awareness of the danger of
COVID-19, it is necessary to provide the public with the

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

427 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

portable than other brain imaging techniques. EEG, which


directly monitors the electrical activity of the brain while the
human brain is continually working and is particularly strong.
Even if this person was thinking or doing something specific
in their head, these sensors on the scalp would pick up those
signals [10]. EEG electrodes numbers can be up to 256
channels.
2) Functional Magnetic Resonance Imaging (fMRI):
fMRI is an excellent tool for formulating data-driven
intelligent hypotheses or for elucidating the detailed neural
mechanisms underlying learned cognitive capacities [26].
fMRI signals are presumed to result from changes in the
activity of the neuronal populations responsible for the
Fig. 1. Primary brain regions involved in memory [18] functions in question (for example, stimulus- or task-
selective neurons).
The amygdala is actively involved in the process of fear
memory formation. When an event is emotionally arousing, III. METHODS AND DATA COLLECTION TECHNIQUES
the amygdala's primary function is to regulate emotion and to
The inductive approach is used as a research method in this
facilitate memory encoding at a deeper level [12, 15].
research. Researchers need to gather the data mined from the
The hippocampus is in the temporal lobe and is associated
respondents, analyze it – look for patterns by using machine
with declarative and episodic memory, more precisely normal
learning approach, and develop the conclusion.
recognition and spatial memory, which are both involved in
recall tasks [14]. Visual data COVID-19 advertisements are needed as the
Researchers learn that the cerebellum is in communication dataset of this research. The data should be in image or video
with the body's senses and uses that input to optimize motor media format. The data need to be separated into two folders:
activity [19, 20]. The cerebellum also plays a role in image and video. Based on our analysis of the existing
procedural memory processing, motor learning, and classical COVID-19 advertisements, each advertisement needs to be
conditioning. labeled according to its content or the illustration style (Table
The prefrontal cortex was found to be heavily involved in I). Each advertisement can have more than one label. These
the memory of semantic tasks especially dorsolateral labels can be useful at the analysis stage (after brain activity
prefrontal cortex (DLPFC) has been implicated in executive data has been obtained and analyzed using machine learning-
control tasks such as information integration, decision deep learning) to determine the pattern which COVID-19
making, data maintenance and manipulation, and data advertisement can attract people.
updating [16, 21, 22].
TABLE I. LABELS OF COVID-19 ADVERTISEMENT
B. Consumer Neuroscience
Consumer neuroscience investigates marketing and Label Description
behavior of customer issues through neuroscience.
Conventional research conducted to determine consumer Show the illustration taken from the
Camera
camera
behavior and marketing needs to view the human organism as Illustration drawn by hand or created by
a "black box". This information cannot be assessed directly Art
digital technology
and needs tool to access the data reflected from brain activity. Use graphic/illustration to present the
Infographic
Recently, modern techniques and methods in neuroscience information
have facilitated a much more direct look into the organism's Use more text (rather than illustration) to
Text
convey information
"black box" as the basis for the sub-discipline of consumer
Use more casual words and not bound by
neuroscience. Consumer neuroscience is very important to Non-formal
grammar
learn the behavior of consumer, especially to know and to
Famous Use of celebrity or famous people
understand the human behavior in the decision-making
process [9]. Shock Use disturbing/scary text or image

Consumer neuroscience explores the neural underpinnings


and properties that influence consumer choices [23]. One of A special system (named AdShow system) needs to be
the catalytic components is found in human emotions and developed to be able to display images or videos of COVID-
affective processes, which are important to help memorizing 19 advertisements to respondents using EEG and fMRI.
the advertisement [24]. Marketers can use this information to Respondents who use this system will see two advertisements
help them understand the effect of advertisement to the on each question number with the same type of media format
consumers and identify why some of the advertisement may but have different labels between the first and the second
be more effective than the others [23]. advertisement. We recommend (minimum) twenty question
C. Customer Neuroscience Data Collection Technique numbers or minimum 40 advertisements need to be prepared
in the AdShow system. But the more the better because it will
1) Electroencephalogram (EEG): EEG is especially well provide more legit information (the pattern) how the brain
known for its extremely high temporal resolution which works when saw the advertisement. Respondents will not be
records cognitive processes in the time period in which given any specific questions on each question number, but
cognition occurs [25]. EEG is less expensive and more they need to choose one advertisement based on their personal

428 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

preference. Respondents could choose their preferred The output of EEG data is represented in time series data.
advertisement (one advertisement) by pressing the button on The output data from EEG is usually using text as a format.
the keypad that is connected to EEG or fMRI. This data will be an input in analyze step and machine learning
approach (Long Short-Term Memory [LSTM]) can be used to
A. Electroencephalogram (EEG) help researcher find the pattern of the advertisement preferred
EEG is a device that can record brainwave activity and is by the respondents. If researcher decided to convert the text
widely used since it is less expensive than other brain imaging data into image, then Convolutional Neural Network (CNN)
devices. The respondent may wear an EEG head cap with is the recommended approach to analyze this format of data.
electrodes, and the data (activation) must be recorded
exclusively on the particular locations of electrodes (Fig. 2). B. Functional Magnetic Resonance Imaging (fMRI)
O denotes occipital, F stands for frontal, FC is fronto-central, One of the advantages of using fMRI is this device has the
and C stands for central. ability to find out in more detail how the brain functions. fMRI
is not recommended to be use if the respondent has metal
When respondents see the advertisement, they use their implants.
eyes to see it visually. Occipital lobe plays an important role
while people see something so O1, O2, and Oz data need to If the fMRI device used in the research requires the
be mined. Frontal is closely related to memory and the respondent to laying down, usually the existed monitors are
position of the amygdala, hippocampus, cerebellum, and not big enough to display the advertisements properly, so the
prefrontal cortex are in F7, F3, Fz, F4, F8, FC5, FC1, FC2, additional accessories (monitor or special goggle) is required
FC6, C3, Cz, and C4 area so these data should be collected for the respondents to be used. If the respondent can sit up
from respondent’s brain. straight while using fMRI, no additional monitor is needed
because this monitor is usually already attached to the device
and the size is quite big enough (Fig. 4). To help the
respondents choose their preferred advertisement, they can
use finger tapping or a keypad that is connected to fMRI
device.

Fig. 2. The location of electrodes

A monitor can be used to display the advertisement Fig. 4. Environment Setting – fMRI
questions to the respondents by using AdShow system. One
The output data produced by fMRI is very similar like
out of two COVID-19 advertisements must be chosen by the
EEG, because it produced time series data. However, fMRI
respondent by pushing the button on the given keypad. The
data is usually in image format rather than text. CNN is
illustration of EEG’s environment setting can be found in Fig.
generally used for fMRI data analysis processes. However, the
3.
use of the Neural Network approach - feedforward neural
networks (FNN) is very promising and have the opportunity
to be explored more to find out the pattern of COVID-19
advertisements.
IV. CONCLUSION AND RECOMMENDATION
EEG and fMRI devices can be used to collect the data from
the brain to figure out which advertisements draw people's
attention so it can be memorized. The advertisement data (in
image and video format) should be labeled according to its
content or the illustration style. If the budget is the primary
issues in the research, EEG may be the best option to be
chosen. The respondent can wear a full EEG cap, and
LSTM/CNN must be used to evaluate these 15 channels. If the
goal of the research is to learn more about how the brain works
and how it relates to COVID-19 advertisement, then fMRI is
Fig. 3. Environment Setting – EEG the ideal device to be used. The respondent will need to use

429 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

additional accessories to navigate or select the preferred the medial temporal lobe: A human evoked ‐ potential study,”
COVID-19 advertisement. Hippocampus, vol. 19, no. 4, pp. 371-378, 2009
[11] K. L. Anderson, R. Rajagovindan, G. A. Ghacibeh, K. J. Meador, and
Another method to collect the data from the respondent is M. Ding, “Theta oscillations mediate interaction between prefrontal
using the combination of eye tracking (heat map), mouse cortex and medial temporal lobe in human memory,” Cerebral cortex,
tracking, and EEG. This technique can be performed in the vol. 20, no. 7, pp.1604-1612, 2010
future research to help researchers determine which [12] C. S. Inman, J. R. Manns, K. R. Bijanki, D. I. Bass, S. Hamann, D. L.
Drane, R. E. Fasano, C. K. Kovach, R. E. Gross, and J. T. Willie,,
advertisement can attract people to memorize the health “Direct electrical stimulation of the amygdala enhances declarative
protocol. The combination of this hybrid combination also has memory in humans,” Proceedings of the National Academy of
the opportunity of having a better analysis result and more Sciences, vol. 115, no. 1, pp.98-103, 2018
affordable than fMRI. [13] M. J. Kane and R. W. Engle, “The role of prefrontal cortex in working-
memory capacity, executive attention, and general fluid intelligence:
ACKNOWLEDGMENT An individual-differences perspective,” Psychonomic bulletin &
review, vol. 9, no. 4, pp.637-671, 2002.
This work is supported by Research and Technology [14] W. Klimesch, “EEG alpha and theta oscillations reflect cognitive and
Transfer Office, Bina Nusantara University as a part of Bina memory performance: a review and analysis,” Brain research reviews,
Nusantara University’s International Research Grant entitled vol. 29, no. 2-3, pp.169-195, 1999.
Consumer Neuroscience to Gain Health Awareness on [15] J. L. McGaugh, L. Cahill, and B. Roozendaal, “Involvement of the
COVID-19 Advertisement with contract number: amygdala in memory storage: interaction with other brain systems,”
No.017/VR.RTT/III/2021 and contract date: 22 March 2021 Proceedings of the National Academy of Sciences, vol. 93, no. 24,
pp.13508-13514, 1996.
REFERENCES [16] C. M. Tyng, H. U. Amin, M. N. Saad, and A. S. Malik, “The influences
of emotion on learning and memory,” Frontiers in psychology, vol. 8,
[1] A. Hampshire, P. J. Hellyer, E. Soreq, M. A. Mehta, K. Ioannidis, W. p.1454, 2017
Trender, J. E. Grant, and S. R. Chamberlain, “Associations between
dimensions of behaviour, personality traits, and mental-health during [17] Y. Yu, C. Zeng, S. Shu, X. Liu, and C. Li, “ Similar effects of substance
the COVID-19 pandemic in the United Kingdom,” Nature P on learning and memory function between hippocampus and striatal
communications, vol. 12, no. 1, pp.1-15, 2021. marginal division,” Neural regeneration research, vol. 9, no. 8, p.857,
2014.
[2] World Health Organization, WHO Coronavirus (COVID-19)
dashboard. Accessed on: July 18, 2021. [Online]. Available: [18] Lumen, Parts of the Brain Involved with Memory. Accessed on: July
https://covid19.who.int/ 30, 2021. [Online]. Available: https://courses.lumenlearning.com/wsu-
sandbox/chapter/parts-of-the-brain-involved-with-memory/
[3] Komite Penanganan COVID-19 dan Pemulihan Ekonomi Nasional,
Peta sebaran COVID-19, Indonesia. Accessed on: July 18, 2021. [19] R. Chai, Y. Tran, G. R. Naik, T. N. Nguyen, S. H. Ling, A. Craig, and
[Online]. Available: https://covid19.go.id/peta-sebaran-covid19 H. T. Nguyen, “ Classification of EEG based-mental fatigue using
principal component analysis and Bayesian neural network,” In 2016
[4] M. Martini, G. N. W. Putra, K. Y. Aryawan, and G. B. Widiarta, 38th Annual International Conference of the IEEE Engineering in
“Sosialisasi pencegahan Covid-19 dengan pelaksanaan health Medicine and Biology Society (EMBC), pp. 4654-4657, 2016
education kepada para pedagang menggunakan media pembelajaran:
leaflet dalam meningkatkan pengetahuan tentang pencegahan COVID- [20] M. H. Rosenbloom, J. D. Schmahmann, and B. H. Price, “The
19, Di Pasar Benyuning Buleleng,” in Proc. Senadimas Undiksha, pp. functional neuroanatomy of decision-making,” The Journal of
677-682, 2020 neuropsychiatry and clinical neurosciences,vol. 24, no. 3, pp.266-277,
2012
[5] R. Siregar, A. R. B. Gulo, and L. R E. Sinurat, “Edukasi tentang upaya
pencegahan COVID-19 pada masyarakat di Pasar [21] M. J. Imburgio and J. M. Orr, “Effects of prefrontal tDCS on executive
Sukaramaikecamatan Medan Area tahun 2020,” Jurnal Arbdimas function: Methodological considerations revealed by meta-analysis,”
Mutiara, vol. 1, no. 2, pp. 191-198, Sep. 2020 Neuropsychologia, vol. 117, pp.156-166, 2018
[6] [qq] B. J. Ali, “Assessing (The impact) of advertisement on customer [22] L. E. Mancuso, I. P. Ilieva, R. H. Hamilton, and M. J. Farah, “Does
decision making: Evidence from an educational institution,” Afak for transcranial direct current stimulation improve healthy working
Science Journal, vol. 6, no. 1, pp.267-280, 2021 memory?: a meta-analytic review,” Journal of Cognitive Neuroscience,
vol. 28, no. 8, pp.1063-1089, 2016
[7] M. E. Ansari and S. Y. E. Joloundar, “An investigation of TV
advertisement effects on customers' purchasing and their satisfaction,” [23] B. E. Blum, “Consumer neuroscience: A multi-disciplinary approach
International Journal of Marketing Studies, vol. 3, no. 4, p. 175, 2011 to marketing leveraging advances in neuroscience,” Psychology and
Economics, 2016
[8] L. Robaina-Calderín and J. D. Martín-Santana, “A review of research
on neuromarketing using content analysis: key approaches and new [24] T. Ambler, A. Ioannides, and S. Rose, “Brands on the Brain: Neuro‐
avenues,” Cognitive Neurodynamics, pp.1-16, 2021 Images of Advertising,” Business Strategy Review, vol. 11, no. 3, pp.
17-30, 2000
[9] P. H. Kenning and M. Linzmajer, “Consumer neuroscience: An
overview of an emerging discipline with implications for consumer [25] W.C. Chou, J. R. Duann, H. C. She, L. Y. Huang, and T. P. Jung, T.P.,
policy,” Journal für Verbraucherschutz und Lebensmittelsicherheit, “Explore the functional connectivity between brain regions during a
vol. 6, no. 1, pp. 111-125, 2010 chemistry working memory task,” PloS one, vol. 10, no. 6,
p.e0129019, 2015
[10] C. James, S. Morand, S. Barcellona‐Lehmann, C. M. Michel, and A.
Schnider, “Neural transition from short‐to long‐term memory and [26] N. K. Logothetis, “What we can do and what we cannot do with fMRI,”
Nature, vol. 453, no. 7197, pp.869-878, 2008

430 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Development of Stock Market Price Application to


Predict Purchase and Sales Decisions Using Proximal
Policy Optimization Method
Alexander A.S Gunawan Bilqis Ashifa S Reinert Y. Rumagit Heri Ngarianto
Computer Science Department Computer Science Department, Computer Science Department, Computer Science Department,
School of Computer Science School of Computer Science, School of Computer Science, School of Computer Science,
Bina Nusantara University Bina Nusantara University Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia 11480 Jakarta, Indonesia 11480 Jakarta, Indonesia 11480 Jakarta, Indonesia 11480
[email protected] [email protected] [email protected] [email protected]

Abstract— Stocks are an investment that many investors record price movements and the number of transactions
choose because stocks are able to provide attractive returns. Stock (volume). Indicators are divided into four categories, namely
market prices experience fluctuating changes in stock prices from trend indicators, volatility indicators, momentum indicators,
one time to another. Stocks use indicators from technical analysis and volume indicators.
to evaluate stocks in order to predict stock market price
Trend indicators are trading towards a strong trend that
movements. The fluctuation of stock market price fluctuations
affects the decision to buy and sell shares where these decisions do reduces risk and increases profit potential [4]. Volatility
not occur every day. In this research, we used deep reinforcement indicator is a technical analysis that looks at changes in market
learning algorithm called as the Proximal Policy Optimization prices over a certain period of time [4]. Momentum indicator
method to predict stock buying and selling decisions. The decision compares current prices with previous prices from several
to buy and sell shares affects profits. The data used are indicators periods ago [4]. Volume indicator is a mathematical formula
of technical analysis and historical data. The Proximal Policy that is visually represented in the most commonly used charting
Optimization method allows to develop automatic buy and sell platforms. Indicators also affect trading on the stock market [4].
decision via iterative policy optimization based on previous sample The fluctuation of stock market price fluctuations affects the
data to reduce sample complexity. Proximal Policy Optimization
decision to buy and sell shares where these decisions do not
method generates rewards to maximize profit. The results of this
research indicate that the cumulative profit during the last one occur every day. An investor must carry out an analysis of
year from the results of our deep reinforcement learning approach buying, selling at stock market prices and generating the
is an increase compared to the cumulative profit by manual expected or optimal profit at a minimum level of risk before
decision approach. deciding to invest in the stock market. An investor must carry
Keywords—stocks, indicators, Proximal Policy Optimization, out an analysis of buying, selling at stock market prices and
buying and selling decisions, profit generating the expected or optimal profit at a minimum level of
risk before deciding to invest in the stock market.
I. INTRODUCTION We have chosen one of deep reinforcement learning called
Currently, many people have an interest in investing in the as the Proximal Policy Optimization (PPO), to predict buying
capital market. The capital market facilitates various financial and selling at stock market prices. The Proximal Policy
instruments or products that are traded, such as bonds (bonds), Optimization method is the application of the algorithm of
stocks, mutual funds, and others. One of the most popular Reinforcement Learning where machine learning occurs
financial instruments in the community is stocks. Stocks are an through interaction with the environment [11]. Reinforcement
investment that many investors choose because stocks can learning can be applied to trading because of the lack of a
provide attractive returns when the company gets a profit. Stock baseline, quality and availability of data, observations of some
market prices tend to fluctuate and experience fluctuating financial markets, and the dilemma of exploration and
changes in stock prices from one time to another. Stock market exploitation [3]. Usually only open, high, low and close price
prices that experience fluctuating changes are one of the data can be accessed freely, which may not be sufficient to
triggers for risk experienced by investors. produce a successful trading strategy.
Stocks use technical analysis to evaluate shares in order to Reinforcement learning uses rewards and punishments as
predict future price movements [10]. According to Menkhoff signals for positive and negative behavior. Reward aims to give
and Taylor [8], three important facts of technical analysis, the agent feedback on how well they behave. An agent is
namely that almost all professionals use technical analysis to something that interacts with the environment by taking certain
some degree, most professionals use some combination of actions, making observations, and receiving rewards.
technical and fundamental analysis, and technical analysis Environment is everything outside the agent. Communication
becomes relatively more important for short-term horizons. with the environment is limited by rewards (obtained from the
Technical analysis uses indicators to see chart movements that environment), actions (carried out by the agent and given to the

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

431 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

environment), and observation or state (some information other Several different approaches have been taken to
than the reward received by the agent). Actions are things that reinforcement learning. The on-policy method tries to develop
agents can do in the environment. Policy defines how agents or improve the policies used to decide to generate data [14].
select actions. Policies can be categorized based on the criteria
of stationary or non-stationary. Non-stationary policies depend III. METHODOLOGY
on time steps and are useful for limited horizon contexts where
the cumulative rewards that agents seek to optimize are limited
to a number of time steps in the future [5]. The value function
estimates how well agents use policy strategies to visit the state
and generalize the expected reward ideas [6].
The Proximal Policy Optimization method is a development
of the Policy Gradient and Trust Region Policy Optimization
(TRPO) method [13]. The gradient policy method has poor data
and data resistance. Meanwhile, the TRPO method has
disadvantages, namely it is relatively complicated and does not
fit with the architecture, such as sharing parameters between
policy and value functions [12]. The Proximal Policy
Optimization method is much easier to implement and more
efficient to operate than TRPO [13]. The Proximal Policy
Optimization method allows iterative policy optimization based
on previous sample data to reduce sample complexity [1]. The
Proximal Policy Optimization method has an efficient and
effective technique for policy learning. This technique is a clip.
The clipped probability ratio used by the Proximal Policy
Optimization method directly in the objective function with the
clipped surrogate objective function is one solution to the
Proximal Policy Optimization method. Figure 1. Research Method Design
Our research is based on several relevant journals, namely This research will be split into some steps, that is:
by John Schulman et al [13] which describes the algorithm of A. Communication
the proximal policy optimization method. It also based on B. Quick Plan
another journal by Binoy B. Nair et al [9] which describes a C. Modelling Quick Design
recommendation system capable of finding patterns for D. Construction
decision making process of buying and selling shares based on E. Delivery and Feedback
the temporal association rule. Our research objective is to A. Communication
develop and evaluate the prediction system for making The communication stage consists of two stages, namely
decisions to buy or sell stocks based on Proximal Policy problem identification and literature study. The first stage is
Optimization method. Next, this paper discusses the related problem identification. The problem to be resolved through this
works in chapter 2, followed by the methodology in chapter 3. research is to predict the buying and selling of stock market
Result and analysis will be described in chapter 4. And finally prices using the Proximal Policy Optimization (PPO) method
the conclusion is delivered in chapter 5. and how the results of these predictions will be evaluated by the
user.
II. RELATED WORKS The second stage is conducting a literature study. Literature
Deep Reinforcement learning is a type of machine learning studies are conducted by reading books, articles, and journals
technique that supports agents to learn in an interactive related to stock theory, predicting purchases and sales, and the
environment through trial and error using feedback from their Proximal Policy Optimization method. After all the theories
own actions and experiences. Non-stationary policies on finite have been studied, research can be made.
time steps, implemented by a finite number of future agents [5]. B. Quick Plan
A value function of how well agents use strategic policies to The next steps in the Quick Plan are the stage of collecting
generalize the idea of expected rewards [6]. The Proximal stock data, adding indicators, and selecting data. Data was
Policy Optimization method allows iterative policy collected by downloading an existing dataset. The dataset
optimization based on a previous sample of data to reduce the comes from the yahoo finance website which is legal to
number of samples [1]. According to Sudharsan Ravichandran download. Downloaded data are the companies COTY, MGM,
[11], there are two models, namely: and AMZN. The COTY dataset shows that it is decreasing. The
1. Model-based learning, agents exploit previously AMZN dataset shows that it is up. The MGM dataset shows that
learned information to complete a task. the data is stable.
2. Learning without models, agents only rely on trial-and- The dataset I use is from April 14, 2015 to April 14, 2020.
error experience to take the right action. The COTY and MGM dataset are 1260 data. The AMZN
dataset is 1259 data. The dataset contains date, open price, high

432 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

price, low price, close price, volume. After the dataset is


collected, we add several indicators. Stock indicators are
divided into several categories. These categories are trend
indicators, volatility indicators, momentum indicators, and
volume indicators. The used trend indicators are the Average
Directional Movement Index (ADX), Detrend Price Oscillator
(DPO), Mass Index (MI). The volatility indicator only uses one
indicator, namely Bollinger Bands (BB). The momentum
indicator uses two indicators, namely the Relative Strength
Index (RSI) and the Awesome Oscillator (AO). The used
volume indicators are On-balance Volume (OBV) and Chaikin
Money Flow (CMF).
The addition of indicator data to the dataset can be tested
using multicollinearity. to ascertain whether there is
intercorrelation or collinearity between independent variables.
Intercorrelation can be seen with the correlation coefficient
value of the VIF value (Variance Inflation Factor). If VIF
returns a value greater than 5, then one of the features, namely
the indicator, must be reduced. Multicollinearity testing on the
added indicators is presented in Table 1.
Table 1. Multicollinearity testing
VIF Factor Features
0 181.2 Intercept
1 1.2 ADX
2 1.6 DPO
3 1.6 MI
4 1.3 BB
5 3.2 RSI
6 2.6 AO
7 1.3 OBV
8 1.7 CMF

The dataset that has been collected by adding eight indicator Figure 2. Features on COTY, MGM, and AMZN data
techniques will be used in the Policy Optimization model C. Modelling Quick Design
shown in Figure 2. This stage is divided into 3 stages, namely the stage of
making the environment, the training stage for the Proximal
Policy Optimization model, and the testing phase and training
analysis. In this research, the design of the Proximal Policy
Optimization method was carried out in several stages as
follows:

a) Environment creation stage


In this research, we designed the making environment. Each
environment is represented in the Gym by the Env class.
Environment consists of action space, observation space, reset()
function, step() function, and render() function. The type of
action space used is the continuous action space of the
environment. Action space uses a Box where continuous action
space is set to two dimensions as continuous. The independent
limit for each dimension consists of a minimum value and a
maximum value. The minimum value dimension is [0,0] and
the maximum value dimension is [3,1]. The first index of the
minimum and maximum values is used to determine whether to
buy, sell, or do nothing. The second index at the minimum and
maximum values is used to determine the number of shares to
be bought or sold.

433 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

The observation space defines the structure of the is a place to learn the Proximal Policy Optimization model that
observation in the environment. The used observations are 13 has been created first. Gamma fits the discount factor for future
features obtained from data collection. These features are open rewards. Gamma can be considered as how far future agents
price, high price, low price, close price, volume, ADX, DPO, should be concerned with possible rewards. The gamma is set
MI, BB, RSI, AO, OBV, and CMF. Besides observing the 13 to 0.99. N_steps is the number of steps that must be executed
features, there are additional observations, namely balance, for each environment per update. The number of step n_steps is
maximum net worth, shares owned, total shares sold, total sales set to 512. The learning rate corresponds to the strength of each
value. The feature is adjusted for values measured on different gradient descent update step. The learning rate is set to 0.00025.
scales. The reset() function is used to reset the environment. The ent_coef or entropy coefficient ensures the agent
The step() function takes an action at each step. Action is explores action space during training. The entropy coefficient
specified as its parameter. The step() function returns four ent_coef is set to 0.0007. Vf_coef or value coefficient to be
parameters, namely observation, rewards, done and info. multiplied by the loss function value and subtracted from the
Rewards calculation is divided into two, namely real money loss calculation. The value coefficient vf_coef is set to 0.5.
rewards and ratio rewards. The calculation of rewards in the
form of real money or real rewards can be written with the
following formula: (4)

(1) (5)

(2) (6)
Meanwhile, the calculation of rewards in the form of a ratio Where
is obtained from • is the probability ratio policy.
• is policy, is the action in timestep t, and is the
(3) state in timestep t.
Done will be true when all datasets have been processed.
The render () function is used to display the desired result of • is cliprange parameter.
each step. • is value function coefficient and is entropy
coefficient.
b) The training stage for the Proximal Policy Optimization
model • Target value function
Proximal policy optimization (PPO) is part of the actor- and S is the entropy bonus.
critic method. In the stock market, the type of information used With the clipped surrogate objective, we have two
is stock history data and eight indicators into the state with probability ratios, one non clipped and one clipped [13]. Then,
varying amounts of value. The actor-critic method in the stock choose the minimum of the two probability ratios. Then the
market is to combine the position of the current agent into a final result of the clipped surrogate objective is the lower bound
state in addition to stock history data and eight indicators [2]. of the non clipped objective. The clipped objective function
The actor-critic method aims to adjust the policy actor's does not make the PPO greedy to support actions with positive
parameters by means of the actions of maximizing the total advantages, and it is much faster to avoid actions with a
discounted future reward predicted by the critic [2]. negative advantage function than a small number of samples
Process the entire PPO algorithm each time the episode [7].
stops. The PPO obtains a limited sequential mini-batch of The maximum value for gradient clipping max_grad_norm
samples, for example tuple experiences. In a mini-batch, the is set to 0.5. Lam corresponds to the lambda parameter, λ, which
time step from the start of the tuple is randomly selected, but all is used when calculating the Generalized Advantage Estimate
subsequent tuples in the mini-batch must be continuous. PPO (GAE, (A_t) ̂). Lam can be thought of as how much the agent
allows for multiple iterations of optimization and empirically relies on the estimated current value when calculating the
provides better sample efficiency. updated value estimate. We initialize the lam to 0.95.
In this research, after we has designed the environment, then (7)
Proximal Policy Optimization is trained based on the Nminibatches are used for division into small groups for
environment. The training model uses train data from April 14 training. Nminibatches is the result of a multiple of the batch
2015, to April 14 2019 for three companies. The COTY and size. The nminibatches is set to 16. Noptepochs is the number
MGM dataset for training are 1009 data and the AMZN dataset of epochs used when optimizing surrogate. The nminibatches is
is 1008 data. The Proximal Policy Optimization model has set to 8. Cliprange or epsilon corresponds to an acceptable
parameters such as policy, environment, gamma, n_steps, divergence threshold between old and new policies during the
ent_coef, learning_rate, vf_coef, max_grad_norm, lam, gradient descent update. The cliprange is set to 0.1. The training
nminibatches, noptepochs, and cliprange. process consists of 800 000 simulation steps.
The policy, which implements actor-critic, uses a multi-
layer perceptron (MLP) which consists of 2 layers of 64. MLP c) The testing stage and analysis phase
accepts observations as input and displays hidden Analysis of training results using average rewards from
representations for the policy and value network. Environment each episode. The training process is 800 000 steps. The

434 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

training process uses 796 episodes. One episode consists of


1004 steps. Rewards for each step are accumulated until every
multiple of steps is 1004. The accumulated rewards will be used
to generate average rewards. Average reward yields for the
three companies increased.
Testing using the environment again with the data used is
the test data from April 14, 2019 to April 14, 2020. COTY,
MGM, and AMZN datasets are 252 data. The resulting training Figure 4. Rewards and average rewards graph on MGM data
model is loaded back using the testing environment. Prior to
testing, the environment is reset which results in initialization
of the observation. Testing is done until done is True. The
actions generated from the model are used in the step()
environment section to generate re-observation, rewards, done,
info. The testing process will be carried out continuously until
the result is done, step() in the environment, is True. Done will
Figure 5. Rewards and average rewards graph on COTY data
generate True if all test data has been processed.
D. Construction
Figures 3 to 5 show the rewards per step up and for each
This stage takes the stage of making an application by
data set. Figures 3 to 5 also show the average rewards per
implementing the design that has been determined in the
episode increasing for each data. Training on the Proximal
previous stage. Application development uses a prototyping
Policy Optimization model represents good performance using
model with the python programming language and the
data from three companies.
Tensorflow and Stable-Baselines libraries for the learning
B. Decision on Sale and Purchase of Shares
process based on the Proximal Policy Optimization model,
Every day, decisions are made not always to buy or sell.
while application design will use python and the Kivy library
This decision depends on the results of the action given by the
on the desktop application. The user interface is divided into
PPO model. The buy criterion occurs when the action is worth
several parts, namely the start page, the sign in page, the sign
0 to 1. Apart from the action, a buying decision occurs if you
up page, the admin page, and the user page.
have enough money to buy a number of shares on that day. If
E. Delivery and Feedback
there is insufficient money, a share purchase transaction does
The delivery and feedback stages carry out the application
not occur even though the result of the action is worth 0 to 1.
evaluation stage by the user. User evaluation is done by
The selling criterion occurs if the action is worth 1 to 2. A
distributing questionnaires. Questionnaires were distributed to
selling decision also occurs if you have shares on that day
14 people voluntarily to evaluate the stock decision prediction
which were obtained from the previous day's transaction. If you
application. The respondents of this questionnaire are people
do not own a number of shares, there will be no sales
who have played stocks and never played stocks. In addition,
transaction. Following are the results of the decision to sell and
13 respondents work as employees in different companies and
buy shares on the COTY data testing from the results of the
1 student.
COTY training data with the assumption of an initial
IV. RESULTS AND ANALYSIS
investment of $10000 which is shown in Table 2.
A. Results of the Training Stage
Training on the Proximal Policy Optimization model was Table 2. The results of predictive decisions on testing COTY data
conducted three times. Training was conducted on COTY, MG,
and AMZN company data. The training process uses a Date Open Deci Shares Shares Profit Cumulati
sion bought sold ve Profit
sequential process (time-series). The training process on COTY
2019-05-21 13,56 buy 735 0 0,00 0,00
data takes ± 58 minutes. MGM data takes ± 51 minutes during 2019-05-22 13,4 buy 737 0 0,00 0,00
the training process. For AMZN data, it takes ± 54 minutes 2019-05-23 13,03 sell 0 737 -390,29 -390,29
during the training process. The following are charts on each 2019-06-03 12,38 buy 776 0 0,00 -390,29
data from the results of the training that has been carried out. 2019-06-04 13,08 sell 0 776 152,91 -237,38
2019-07-19 11,1 buy 114 0 0,00 -237,38
2019-07-22 11,1 sell 0 114 152,91 -84,47
2019-09-16 10,59 buy 958 0 0,00 -84,47
2019-09-17 10,81 sell 0 958 363,67 279,20
2019-11-06 11,95 buy 867 0 0,00 279,20
2019-11-07 13,14 sell 0 867 1395,40 1674,60
2019-11-22 11,89 buy 958 0 0,00 1674,60
2019-11-25 11,6 sell 0 958 1117,58 2792,18
2019-11-26 11,54 buy 134 0 0,00 2792,18
Figure 3. Rewards and average rewards graph on AMZN data 2019-11-27 11,52 sell 0 134 1114,90 3907,08
2020-02-03 10,29 buy 1080 0 0,00 3907,08
2020-02-04 10,65 sell 0 134 1503,70 5410,78
2020-02-24 10,9 buy 1055 0 0,00 5410,78
2020-02-25 10,72 sell 0 1055 11313,8 16724,58

435 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

2020-02-26 10,48 buy 146 0 0,00 16724,58 2019-11-21 31,969999 sell 0 333 687,17 5904,99
2020-02-27 9,93 sell 0 146 1233,50 17958,08 2019-11-25 32,09 buy 333 0 0,00 5904,99
2020-03-20 4,16 buy 489 0 0,00 17958,08 2019-11-26 32,150002 sell 0 333 707,15 6612,14
2020-02-04 31,549999 buy 113 0 0,00 6612,14
2020-03-23 4,46 sell 318 171 1380,20 19338,28
2020-02-05 32,689999 buy 227 0 0,00 6612,14
2020-03-27 6,35 sell 249 69 1981,22 21319,50 2020-02-10 31,24 buy 321 0 0,00 6612,14
2020-04-01 4,96 sell 59 190 11635,1 32954,61 2020-02-11 32,369999 sell 72 249 869,55 7481,69
2020-04-03 4,48 sell 8 51 1606,79 34561,40 2020-02-12 33 sell 44 28 914,91 8396,60
2020-04-06 5,06 buy 1629 0 0,00 34561,40 2020-02-13 32,25 buy 267 0 0,00 8396,60
2020-04-07 5,5 sell 47 1582 2328,19 36889,59 2020-02-14 31,790001 sell 20 247 759,09 9155,69
2020-04-08 5,78 buy 1137 0 0,00 36889,59 2020-02-18 31,540001 sell 10 10 754,09 9909,78
2020-02-19 32,25 sell 4 6 761,19 10670,97
2020-04-09 6,19 buy 1159 0 0,00 36889,59
2020-02-20 31,879999 sell 1 3 759,71 11430,68
Every sale of shares, investors get a profit or loss which will be 2020-02-21 31,75 buy 338 0 0,00 11430,68
accumulated for a year to find out whether the end of the year 2020-02-24 30,09 sell 0 338 198,50 11629,18
2020-02-25 29,950001 buy 51 0 0,00 11629,18
can gain or lose from the initial investment. Figure 6 is a graph 2020-02-26 28,120001 sell 14 37 105,17 11734,35
of the cumulative profit on testing COTY data. 2020-02-27 26 buy 238 0 0,00 11734,35
2020-02-28 24,780001 sell 0 238 -214,87 11519,48
2020-04-06 11,64 buy 70 0 0,00 11519,48
2020-04-07 15,17 buy 213 0 0,00 11519,48

Figure 6. Cumulative Profit on COTY data testing

Table 3 is the result of sales and purchase decisions on


MGM data testing from the results of COTY training data with
an initial investment assumption of $10000.
Table 3. The results of predictive decisions on testing
Date Open Dec Shares Shares Profit Cumulativ
isio bought sold e Profit
n Figure 7. Cumulative Profit on COTY data testing
2019-05-14 25,469999 buy 163 0 0,00 0,00
2019-05-15 25,690001 buy 303 0 0,00 0,00 Figure 7 shows a graph of cumulative profit on testing
2019-05-16 26,219999 sell 255 48 196,45 196,45 MGM data. The cumulative profit on the MGM data is lower
2019-05-20 25,469999 sell 45 210 5,20 201,65
2019-05-21 25,51 sell 6 39 6,99 208,64 than the COTY data because the MGM data sells less shares
2019-05-22 25,59 sell 2 4 7,48 216,12 than the COTY data. The total shares sold on the MGM data
2019-05-24 25,5 buy 383 0 0,00 216,12 are 5380 shares, while on the COTY data it is 8918 shares.
2019-05-28 25,76 sell 0 383 106,88 323,00
2019-05-31 24,92 buy 309 0 0,00 323,00 Figure 8 shows a graph of cumulative profit on testing AMZN
2019-06-03 24,85 sell 0 309 85,25 408,25 data.
2019-06-04 24,18 buy 149 0 0,00 408,25
2019-06-05 25,85 sell 79 70 334,08 742,33
2019-06-06 25,950001 sell 69 10 341,98 1084,31
2019-06-07 26,33 buy 342 0 0,00 1084,31
2019-06-10 26,629999 sell 0 342 470,80 1555,11
2019-06-12 27,870001 buy 132 0 0,00 1555,11
2019-06-13 27,879999 buy 297 0 0,00 1555,11
2019-06-14 27,68 buy 354 0 0,00 1555,11
2019-06-17 27,67 sell 0 354 409,18 1964,29
2019-06-20 28,18 buy 369 0 0,00 1964,29
2019-06-21 27,93 sell 0 369 316,93 2281,22
2019-06-24 27,719999 buy 215 0 0,00 2281,22
2019-06-25 27,91 sell 0 215 357,78 2639,00
2019-06-26 27,780001 buy 372 0 0,00 2639,00
2019-06-27 28 sell 0 372 439,62 3078,62
2019-08-13 28,459999 buy 156 0 0,00 3078,62
2019-08-14 28,33 sell 0 156 419,34 3497,96
2019-08-26 28,09 buy 338 0 0,00 3497,96
2019-08-27 28,450001 sell 0 338 541,02 4038,98
2019-11-13 30,959999 buy 340 0 0,00 4038,98 Figure 8. Cumulative Profit on COTY data testing
2019-11-14 31,1 sell 8 332 588,62 4627,60
2019-11-15 31,299999 sell 1 7 590,22 5217,82
The cumulative profit on the AMZN data is lower than the
2019-11-20 31,68 buy 333 0 0,00 5217,82 COTY data because the AMZN data sells 126 shares while the

436 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

COTY data is 8918 shares. In addition, the cumulative profit of three training times. The resulting cumulative profit from
AMZN data is higher than that of MGM data because it predicts evaluating the prediction decisions is at COTY $36889.59,
more buying and selling decisions. Therefore, the cumulative MGM $11519.48, and AMZN $24482.96 with an initial
profit from the three data has increased every day even though investment of $10000. The comparison of the cumulative
the profit for each data is different. profit results for the last one year between manual
In the manual calculation, we assume that at the beginning of calculation and testing is much different where the
the year an investor buys all shares with an initial investment of cumulative profit obtained from the model is more
$ 10 000 and until the end of the year the shares are not sold. profitable than manual calculation.
When compared with testing on the model, the cumulative
profit results are better than manual calculations based on the REFERENCES
assumption. The following is a comparison of the cumulative [1] Daberius, K., Granat, E., Karlsson, P. Deep Execution - Value
profit results on manual calculation and testing on the PPO and Policy Based Reinforcement Learning for Trading and
model shown in Table 4. Beating Benchmarks. SSRN: 3374766, 2019.
[2] Fischer, Thomas G. Reinforcement Learning in Financial
Table 4. Comparison of cumulative profit on testing and manual Markets - A Survey. Econstor: 12, 2018.
Testing Count [3] Huan,Chien Yi. Financial Trading as a Game: A Deep
manually Reinforcement Learning Approach. arXiv:1807.02787v1 [q-
Data COTY $36889,59 $-4623,36 fin.TR], 2018.
Data MGM $11519,48 $-4580,75 [4] Investopedia. Investopedia. Taken back from Investopedia:
Data AMZN $24482,96 $990 https://www.investopedia.com/, 2016.
[5] Lavet, V., Islam, R., Pineau, J., Henderson, P., Bellemare, M.
Figure 9 shows a graph of the manual count on the COTY, G. An Introduction to Deep Reinforcement Learning.
MGM, and AMZN data. arXiv:1811.12560v2 [cs.LG], 2018.
[6] Liang, Z., Chen, H., Zhu, J., Jiang, K., Li, Y. Adversial Deep
Reinforcement Learning in Portfolio Management.
arXiv:1808.09940v3 [q-fin.PM], 2018.
[7] Lim, H. K., Kim J. B., Heo, J. S., Han, Y. H. Federated
Reinforcement Learning for Training Control Policies in
Multiple IoT Devices. Sensors, MDPI:20,1359., 2020
[8] Menkhoff, L., Taylor, M.P. The Obstinate Passion of Foreign
Exchange Professionals: Technical Analysis. Journal of
Banking and Finance 45, 936-972, 2007.
[9] Nair, B. B., Mohandas, V. P., Nayanar, N., Teja, E. S. R.,
Vigneshwari, S., Teja, K. V. N. S. A Stock Trading
Recommender System Based on Temporal Association Rule
Mining. SAGE: 10.1177/2158244015579941, 2015.
Figure 9. Cumulative Profit on manual calculation [10] Ong, Edianto. Technical Analysis for Mega Profit. Jakarta: PT
Gramedia Pustaka Utama, 2017.
V. CONCLUSION [11] Ravichandiran, S. Hands-On Reinforcement Learning with
Based on the results of the research and analysis that has Python. Birmingham: Packt, 2018.
been carried out, it can be concluded as follows: [12] Schulman, J., Levine, S., Moritz, P., Jordan, M., Abbeel, P.
a) The predictive application of stock decisions that has been Trust Region Policy Optimization. arXiv: 1502.05477v5
[cs.LG], 2017.
developed can predict the sale and purchase of stock market [13] Schulman, J., Wolski, F., Dhariwal, P., Radford, A., &
prices so that it can help investors make decisions in selling Klimov, O. Proximal Policy Algorithms.
or buying shares. arXiv:1707.06347v2 [cs.LG], 2017.
b) The results of the evaluation of the prediction of buying and [14] Sutton, R. S., Barto, A. G. Reinforcement Learning: An
selling decisions using the Proximal Policy Optimization Introduction. Cambridge: Westchester, 2018.
method have been good, which is evidenced by the increase
in the results of the rewards and average rewards for the

437 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Exploiting Facial Action Unit in Video for


Recognizing Depression using Metaheuristic and
Neural Networks
Habibullah Akbar Sintia Dewi Yuli Azmi Rozali
Computer Science Computer Science Psychology
Esa Unggul University Esa Unggul University Esa Unggul University
Jakarta, Indonesia Jakarta, Indonesia Jakarta, Indonesia
[email protected] [email protected] [email protected]

Lita Patricia Lunanta Nizirwan Anwar Djasminar Anwar


Psychology Computer Science English
Esa Unggul University Esa Unggul University Pamulang University
Jakarta, Indonesia Jakarta, Indonesia Tangerang Selatan, Indonesia
[email protected] [email protected] [email protected]

Abstract— The ubiquity of coronavirus cases around the focused on facial expressions only. In [6], local curvelet
world has been severe and its impact is not only affecting the transform and binary pattern features were proposed to
economy and physical health, but also mental health such as recognize depression within a video. The scheme was able to
depression. Unfortunately, the number of coronavirus cases achieve 97.6% accuracy evaluated on 58 participants recorded
may inhibit people to look for general practitioners or hospitals. in 200 videos. Besides the binary pattern, facial features have
This study represents research on facial behaviour analysis on also been emerging. In [7], facial features such as head pose,
recognizing depression from facial action units extracted from facial landmark motion, facial expressions, and eye gaze
images or videos. We aimed to find a reduced set of facial action characterize facial behaviour and corresponds to human
unit features using the metaheuristic approach. We utilized
behaviour. In [8], the authors proposed Facial Action Units as
particle swarm optimization to select the best predictors and
feed them to optimized standard feedforward neural networks.
the underlying features for recognizing human emotion.
We obtained 97.83% accuracy for depression detection based on Facial action units (AUs) are the building blocks of
Distress Analysis Interview Corpus Wizard-of-Oz (DAIC WOZ) individual muscle or group of muscles within the face. It
database containing 189 video sessions associated with the measures the muscle expression within the face to convey
Patient Health Questionnaire depression label. This level of emotion, attitude, and mood. AU features tracks points that
accuracy requires almost 9 minutes. However, this level of represent lips, nose, brows, and eye frame by frame. As these
accuracy is higher than other state-of-the-art methods.
points changes within temporal space, facial behaviour could
Keywords—depression, PHQ-8, facial action unit, PSO
be determined. Theoretically, AU can be used to a wide range
of basic emotions such as happiness, anger, contempt, disgust,
I. INTRODUCTION fear, and surprise. However, in this study, we are only
interested in depression.
The impact of COVID-19 on health has not been limited
to physical symptoms, but also affecting mental state due to In [9], AUs has been used to detect depression from videos
several factors including uncertainty, loss of jobs, financial using Support Vector Machine (SVM). The accuracy for using
problems, isolation from the social, and the fear of getting 17 AU can only reach 88%. This result seems to correspond
sick. These conditions, if not treated properly may lead to as the number of AU used is inadequate. Recently, [10]
depression. It is difficult to define the word depression proposed a significant study whereby they developed three
precisely as it can refer to clinical depression, bipolar disorder, layers consisting of Active Appearance Model (AAM), SVM,
or major depressive disorder. The above factors characterize facial matrix, and feedforward neural network (FFNN) to
some types of depression. However, it is generally difficult to classify AU features. They obtain 87.2% accuracy for
distinguish between normal sadness, the grief of loss from real depression classification when using samples where happiness
depression. and sadness are introduced in the training. However, when
using the top four AUs obtained from the facial matrix, the
Since 2017, it has been estimated that the prevalence of accuracy drops to 84.51% that was experimented based on
depression reaches more than 300 million or 4.4% of the Depression Anxiety Stress Scale (DASS) questionnaire [11].
world’s population [1]. The pandemic situation worsens this
situation. There have been some efforts to measure depression From these previous studies, this paper is questioning
in a more precise manner. The eight-item Patient Health whether these four AUs is the only top predictor for
Questionnaire depression scale (PHQ-8) has been regarded as depression recognition. We introduce a metaheuristic-based
a valid diagnostic tool for measuring depressive disorder [2]. feature selection scheme for recognizing depression using
This questionnaire Depression Scale (PHQ-8) map out AUs and classification. Fig. 1 provides a visual illustration of
feeling, cognitive mental state, and movement or how one the appearance of depression recognition scheme developed
speaks to others. More detail about this questionnaire can be in this study that takes input from videos and extracts the facial
found in [2]. However, the advancement of artificial action unit features frame by frame, perform the averaging
intelligence has spurs interest in recognizing depression based operation, select the best features, and then classify whether a
on facial expression [3], audio features [4], and their person in the video is depressed or not. The detail of the
combination with text information [5]. However, this study scheme is given in the following section.

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

438 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

II. METHODS the AUs, we utilized a metaheuristic approach using Particle


Swarm Optimization (PSO) [17]. The purpose of using PSO
A. Dataset and Experimental Setup is to improve the performance of the neural networks. We
The dataset used in this study is Distress Analysis choose to test the metaheuristic approach because it has been
Interview Corpus- Wizard-of-Oz (DAIC-WOZ) database used for a wide range of applications such as image
[12], [13]. The dataset contains 189 sessions of 7-33 segmentation [18], 3D model recognition [19], and gene
minutes video and audio recording associated with the expression [20].
questionnaire and facial features extracted using the In PSO, we have a population of a collection of AU
Conditional Local Neural Fields (CLNF) and facial action candidates. Each collection of AUs is represented as having a
unit (AU). position within axes or coordinates that are orthogonal to each
CLNF has been proposed by [14] for detecting and other. It has direction or momentum in movement, awareness
tracking originally 68 facial landmarks altogether. CLNF of its own best location, and is influenced by other collections
models the geometrical variations as a point distribution of AUs. The position of an AU collection should be updated
that is robust to and models the variations of local as it aims to find an optimum collection of AUs that can
landmarks. An extension of this model was done by [7]. represent depression. The position can be represented as given
They simplify the tracking process by adding a face in (1).
validation stage utilizing three layers of a convolutional xi +1 = xi + vi +1 (1)
neural network trained on the Labeled Face Parts in the
Wild [15], [16]. The landmarks are tracked based on the The vi+1 is the velocity that combines its direction,
CLNF model from the previous frame. However, for this awareness, and social influence as shown in (2).
study, we utilize the facial features only as of the input for
the depression recognition scheme. The AU is then used vi +1 = 0.5vi + a1r1()(pi − xi) + a2r2()(pbest − xi) (2)
to estimate the facial expression of a face including its
intensity, appearance, and geometry features based on Variables a1 and a2 denote the weight in updating for
CLNF. Some of the AUs are given in Fig. 1 while the awareness and social influences. Variables r1, r2 are generated
complete set of AUs used in this study is given in Table randomly within the range of 0 and 1. In this study, we set a1
1. The AU is provided from the dataset and used as and a2 as 2 and use a range of [10, 20,30, 40,50] for iteration.
features and feeded into the neural networks. We use the
C. Depression Recognition
mean value from the frames of the video sessions.
The recognition of the depression state is formulated as a
sessions contains classification problem that aims to minimize the mean square
B. Feature Selection error function in predicting depression labels based on the
reduced set collection of AUs obtained from the feature
The Facial Action Coding System (FACS) has determined selection process. Error is derived based on the ground truth
which set of AU corresponds to human basic emotion. FACS collected from the associated PHQ-8 questionnaire as given in
can be divided into 46 categories or AU [10]. On the other the DAIC-WOZ dataset.
hand, the AUs provided within the DAIC-WOZ dataset have
not specifically explained which AUs belong to depression. The 20 AUs are fed as input into the feedforward neural
Therefore, we are interested to explore whether it is possible networks (FFNN). We used FFNN because they are fast and
to determine depression using a reduced set of AUs. To select effective as explained in [10]. FFNN classifiers can predict
depression label that minimizes mean square error function
based on a set of hyperparameters correspond to its
architecture. Besides the hyperparameter, the training
algorithm that updates weights and biases also need to be
addressed. In this study, we experimented with three kinds of
training backpropagation algorithms as explained in the
following.
1) Levenberg-Marquardt (LM) Backpropagation
Algorithm - This algorithm model the classification problem
as a nonlinear least square problem. It combines gradient
descent and Gauss-Newton methods that solve the Hessian
matrix at a minimum point.
2) Bayesian Regularization (BR) Backpropagation
Algorithm - This algorithm is suitable for a large system of
linear equations [21]. BR has been observed as performing
better than LM in terms of the lowest sum of square errors,
less overfitting, and having a higher correlation coefficient
[22].
3) Scaled Conjugate Gradient Backpropagation
Fig. 1 Depression recognition scheme used in this study (the Algorithm - This algorithm is suitable for a large system of
predictor’s image is adopted from [7].
linear equations [21]. It has fewer steps and is thus able to

439 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

speed up the training process by avoiding the computation of accuracy and computational time. After experimenting with
the Hessian matrix and use a simple approximation approach. the hyperparameter, we study the effect of feature selection on
network performance.
D. Performance Evaluation
The performance of the FFNNs is evaluated based on the TABLE II. CONFUSION MATRIX
accuracy derived from the confusion matrix (see Table II).
Depression Prediction
The accuracy value can be obtained by comparing the number
state yes no
of correct prediction results against the three types. The total
true positive can be calculated using (1). yes x11 x12
no x21 x22
𝑇𝑇𝑃𝑎𝑙𝑙 = ∑3𝑗=1 𝑥𝑗𝑗 (3)

The variable 𝑥11 is the total true of Type 1, 𝑥22 is the A. Results on Levenberg-Marquardt Algorithm
total true of Type 2, and 𝑥33 is the total true of Type 3. The For the LM Backpropagation Algorithm, the effect of
overall accuracy A can be calculated using (2). epoch and the number of hidden nodes were interesting.
Generally, table III shows that minimal and maximal epochs
𝑇𝑇𝑃𝑎𝑙𝑙
𝐴= (4) lead to better accuracy with similar computing time.
𝐴𝑙𝑙

TABLE III. EXPERIMENTS OF EPOCH ON LEVENBERG-MARQUARDT


TABLE I. LIST OF AUS USED IN THIS STUDY BACKPROPAGATION ALGORITHM

No Features Full Name of Facial Features Accuracy CPU time


No Epoch
(%) (seconds)
1 AU01_r Inner brow raiser
1 10 0.865 0.717
2 AU02_r Outer brow raiser
3 AU04_r Brow lowerer 2 20 0.844 0.694
4 AU05_r Upper lid raiser 3 30 0.833 0.700
5 AU06_r Cheek raiser 4 40 0.795 0.672
6 AU09_r Nose wrinkler
5 50 0.756 0.667
7 AU10_r Upper lip raiser
6 60 0.764 0.688
8 AU12_r Lip corner puller
9 AU14_r Dimpler 7 70 0.778 0.694
10 AU15_r Lip corner depressor 8 80 0.747 0.691
11 AU17_r Chin raiser 9 90 0.798 0.668
12 AU20_r Lip stretcher
10 100 0.872 0.699
13 AU25_r Lips part
14 AU26_r Jaw drop However, Table IV shows that the LM is more sensitive to
15 AU04_c Brow lowerer the number of hidden nodes. Most of the accuracy cannot
16 AU12_c Lip corner puller exceed 90%, yet 90 nodes can do so.
17 AU15_c Lip corner depressor
18 AU23_c Lip tightener TABLE IV. OF NUMBER OF HIDDEN NODE ON LEVENBERG-
MARQUARDT BACKPROPAGATION ALGORITHM
19 AU28_c Lip suck
20 AU45_c Blink
Hidden Accuracy CPU time
No
Nodes (%) (seconds)
All is the total value of all elements in the confusion
matrix. In addition to accuracy, we utilize CPU time to 1 20 0.806 0.749
measure the length of the training process. Normally, the 2 30 0.860 0.909
optimization process (the use of PSO) will extend the training 3 40 0.652 0.989
time. The experiments were carried out using the following
specifications. We used Intel (R) Core (TM) i7-6700HQ CPU 4 50 0.814 1.590
@ 2.60GHz, 12GB RAM, 1TB HDD, Nvidia GeForce GTX 5 60 0.804 1.663
950 GPU. As for software, we use Matlab 2020b working on 6 70 0.780 2.336
Windows 10 Pro 64Bit.
7 80 0.833 3.098
III. RESULTS AND DISCUSSION 8 90 0.932 7.426
A. Experimental Results for Depression Classification 9 100 0.873 6.890
In this section, we analyze the discriminative ability of the For the ratio experiments, Table V shows that changing
neural networks with the following objectives: i) to study the the ratio significantly decrease the network’s performance.
effect of backpropagation training algorithms, and ii) the More specifically, lowering the training data leads to
hyperparameter from the neural network’s architecture decreased accuracy. Table V also shows that the higher
including epoch, the number of nodes within the hidden layer, learning rate value leads to better accuracy and longer
learning rate parameter, and the ratio for the training, computational time. However, the best learning rate value at
validation, and testing. The effects from these 0.1 can only produce 87.8% accuracy.
hyperparameters are evaluated based on the network’s

440 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

TABLE V. EXPERIMENTS OF TRAINING:VALIDATION:TESTING RATIO 0.001 was able to reach 96.1 % accuracy which is the best
AND LEARNING RATE ON LEVENBERG-MARQUARDT ALGORITHM
performance so far.
Accuracy CPU time
No parameters values TABLE VIII. EXPERIMENTS OF TRAINING:VALIDATION:TESTING
(%) (seconds)
RATIO AND LEARNING RATE ON BAYESIAN REGULARIZATION ALGORITHM
1 50:25:25 0.678 2.480
2 ratio 60:20:20 0.708 2.913 Accuracy CPU time
No parameters values
(%) (seconds)
3 80:10:10 0.768 3.864
1 50:25:25 0.885 13.987
4 0.001 0.686 2.774
2 ratio 60:20:20 0.907 13.736
Learning
5 0.01 0.802 4.203
rate 3 80:10:10 0.911 13.637
6 0.1 0.878 6.035
4 0.001 0,961 13,948
Learning
B. Results on Bayesian Regularization Algorithm 5 0.01 0,954 13,792
rate
For the BR Backpropagation Algorithm, the effect of 6 0.1 0,940 14,169
epoch and the number of hidden nodes were prominent. Table
VI shows that higher epochs can improve accuracy in general. C. Results on Scaled Conjugate Gradient Algorithm
However, the best performance was obtained at 80 epochs that For the SCG Backpropagation Algorithm, the required
reach 95.4% accuracy after the training process about a training time was fast. However, the produced accuracies were
second. not able to go beyond 74.8% which is reached by epoch 60
(see Table IX).
TABLE VI. EXPERIMENTS OF EPOCH ON BAYESIAN
REGULARIZATION BACKPROPAGATION ALGORITHM TABLE IX. EXPERIMENTS OF EPOCH ON SCALED CONJUGATE
Accuracy CPU time GRADIENT BACKPROPAGATION ALGORITHM.
No Epoch
(%) (seconds)
Accuracy CPU time
1 10 0.824 0.716 No Epoch
(%) (seconds)
2 20 0.838 0.733 1 10 0,711 0,641
3 30 0.856 0.778 2 20 0,668 0,614
4 40 0.880 0.846 3 30 0,712 0,620
5 50 0.920 0.889 4 40 0,727 0,626
6 60 0.943 0.948 5 50 0,682 0,614
7 70 0.927 1.010 6 60 0,748 0,625
8 80 0.954 1.068 7 70 0,693 0,617
9 90 0.937 1.149 8 80 0,721 0,623
10 100 0.932 1.217 9 90 0,706 0,622

In contrast, increasing the number of hidden nodes may 10 100 0,734 0,643
deteriorate the network’s performance (see Table VII). The
best accuracy was obtained at 50 nodes that produce a similar As can be observed in Table X, increasing the number of
result to the epoch experiment. Higher number of hidden hidden nodes for scaled conjugate gradients learning
nodes makes the backpropagation process takes a higher time. algorithm also increase the accuracy. Nevertheless, is severely
lower than the Bayesian regularization algorithm.
TABLE VII. EXPERIMENTS OF NUMBER OF HIDDEN NODE ON
BAYESIAN REGULARIZATION BACKPROPAGATION ALGORITHM
TABLE X. EXPERIMENTS OF NUMBER OF HIDDEN NODE ON SCALED
Hidden Accuracy CPU time CONJUGATE GRADIENT BACKPROPAGATION ALGORITHM
No
Nodes (%) (seconds)
Hidden Accuracy CPU time
1 20 0.930 2.394 No
Nodes (%) (seconds)
2 30 0.932 4.406 1 20 0,718 0,618
3 40 0.931 8.781 2 30 0,713 0,642
4 50 0.956 14.715 3 40 0,733 0,636
5 60 0.936 21.71 4 50 0,723 0,646
6 70 0.921 30.062 5 60 0,746 0,658
7 80 0.829 42.367 6 70 0,698 0,645
8 90 0.916 55.429 7 80 0,708 0,646
9 100 0.884 78.519 8 90 0,729 0,669
9 100 0,757 0,651
Surprisingly, Table IX shows that the effect training,
validation, and testing ratio did not significantly influence the
network’s performance. However, the learning rate value at Similarly, changing the training, validation, testing, and
learning rate values did not improve the performance

441 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

significantly. Only that at a learning rate value of 0.001 was REFERENCES


able to increase a bit. However, from all the experiments, SCG
[1] W. H. Organization and others, “Depression and other common
has been observed to has the least computational time.
mental disorders: global health estimates,” 2017.
TABLE XI. EXPERIMENTS OF TRAINING:VALIDATION:TESTING [2] K. Kroenke, T. W. Strine, R. L. Spitzer, J. B. W. Williams, J. T.
RATIO AND LEARNING RATE ON SCALED CONJUGATE GRADIENT Berry, and A. H. Mokdad, “The PHQ-8 as a measure of current
BACKPROPAGATION ALGORITHM
depression in the general population,” J. Affect. Disord., vol. 114,
Accuracy CPU time no. 1–3, pp. 163–173, 2009.
No parameters values
(%) (seconds)
[3] M. F. Valstar et al., “Fera 2015-second facial expression
1 50:25:25 0,644 0,652
recognition and analysis challenge,” in 2015 11th IEEE
2 ratio 60:20:20 0,740 0,667 International Conference and Workshops on Automatic Face and
3 80:10:10 0,659 0,653 Gesture Recognition (FG), 2015, vol. 6, pp. 1–8.
4 0.001 0,769 0,675 [4] Z. Wang, L. Chen, L. Wang, and G. Diao, “Recognition of audio
Learning depression based on convolutional neural network and generative
5 0.01 0,678 0,658
rate
antagonism network model,” IEEE Access, vol. 8, pp. 101181–
6 0.1 0,742 0,667
101191, 2020.

For the SCG Backpropagation Algorithm, the required [5] L. Yang, H. Sahli, X. Xia, E. Pei, M. C. Oveneke, and D. Jiang,
training time was fast. However, the produced accuracies were “Hybrid depression classification and estimation from audio video
not able to go beyond 76.9% (see Table XI). and text information,” in Proceedings of the 7th annual workshop
on audio/visual emotion challenge, 2017, pp. 45–51.
B. Experimental Results for Feature Selection
[6] A. Pampouchidou et al., “Video-based depression detection using
The results of applying the metaheuristic approach for local curvelet binary patterns in pairwise orthogonal planes,” in
selecting the best AUs are given in Table XII. The 2016 38th Annual International Conference of the IEEE
experiments are conducted using iteration starting from 10 to Engineering in Medicine and Biology Society (EMBC), 2016, pp.
50 by a tenth interval. The best classification accuracy on the
3835–3838.
DAIC-WOZ dataset was 97.83% with 20 iterations using the
BR backpropagation training algorithm. However, the [7] T. Baltrušaitis, P. Robinson, and L.-P. Morency, “Openface: an
computational time required for reaching this accuracy was open source facial behavior analysis toolkit,” in 2016 IEEE Winter
almost 9 minutes. Without PSO, BR was only able to reach Conference on Applications of Computer Vision (WACV), 2016,
96.1%. The worst is 93.21% with 50 iterations. For other pp. 1–10.
comparisons, the 20 iterations were also used for LM and SCG [8] P. Ekman and W. V Friesen, Facial action coding systems.
algorithms. PSO was observed to improve the LM
Consulting Psychologists Press, 1978.
performance by 1.22% and SCG by 2.43%.
[9] J. F. Cohn et al., “Detecting depression from facial actions and
From the observation, it is hard to determine which AUs vocal prosody,” in 2009 3rd International Conference on Affective
mostly correspond to depression. However, the AU06_r, Computing and Intelligent Interaction and Workshops, 2009, pp.
AU15_r, and AU26_r overlapped with the best BR
1–7.
performance except for AU12_r. Moreover, the BR requires
additional 9 features to reach 97.83%, compared to the [10] M. Gavrilescu and N. Vizireanu, “Predicting depression, anxiety,
benchmark method ([10]) that only reach 84.51% accuracy. and stress levels from videos using the facial action coding
system,” Sensors, vol. 19, no. 17, p. 3693, 2019.
IV. CONCLUSION
[11] S. H. Lovibond and P. F. Lovibond, Manual for the depression
anxiety stress scales. Psychology Foundation of Australia, 1996.
This study proposed a metaheuristics approach i.e.,
[12] J. Gratch et al., “The distress analysis interview corpus of human
particle swarm optimization as the method of selecting the
and computer interviews.,” in LREC, 2014, pp. 3123–3128.
best combination of facial action units for recognizing
depression in videos. The recognition procedure utilized a [13] D. DeVault et al., “SimSensei Kiosk: A virtual human interviewer
feedforward neural network and was tested on Distress for healthcare decision support,” in Proceedings of the 2014
Analysis Interview Corpus Wizard-of-Oz (DAIC WOZ) international conference on Autonomous agents and multi-agent
database. The experimental results show that particle swarm systems, 2014, pp. 1061–1068.
optimization was able to improve the performance of the [14] T. Baltrusaitis, P. Robinson, and L.-P. Morency, “Constrained
backpropagation training algorithms used in this study
local neural fields for robust facial landmark detection in the wild,”
(Levenberg-Marquardt, Bayesian Regularization, and Scaled
in Proceedings of the IEEE international conference on computer
Conjugate Gradient). The best result was obtained by
Bayesian Regularization that reaches an accuracy of 97.83% vision workshops, 2013, pp. 354–361.
with 20 PSO iterations. This accuracy was evaluated against [15] P. N. Belhumeur, D. W. Jacobs, D. J. Kriegman, and N. Kumar,
10% from the total dataset. This result suggests that the “Localizing parts of faces using a consensus of exemplars,” IEEE
metaheuristic approach can improve depression recognition Trans. Pattern Anal. Mach. Intell., vol. 35, no. 12, pp. 2930–2940,
performance. Furtherly, the other metaheuristics approach 2013.
such as clonal selection algorithm or ant colony optimization
[16] V. Le, J. Brandt, Z. Lin, L. Bourdev, and T. S. Huang, “Interactive
should be studied for facial features. Similar studies are also
needed for the impact of metaheuristics on text or audio facial feature localization,” in European conference on computer
expressions. vision, 2012, pp. 679–692.

442 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

[17] J. Kennedy and R. Eberhart, “Particle swarm optimization,” in binary PSO for feature selection using gene expression data,”
Proceedings of ICNN’95-International Conference on Neural Comput. Biol. Chem., vol. 32, no. 1, pp. 29–38, 2008.
Networks, 1995, vol. 4, pp. 1942–1948. [21] Y.-C. Du and A. Stephanus, “Levenberg-Marquardt neural
[18] H. Akbar, N. Suryana, and S. Sahib, “Chaotic clonal selection network algorithm for degree of arteriovenous fistula stenosis
optimisation for multi-threshold segmentation,” Int. J. Signal classification using a dual optical photoplethysmography sensor,”
Imaging Syst. Eng., vol. 8, no. 5, pp. 298–315, 2015. Sensors, vol. 18, no. 7, p. 2322, 2018.
[19] H. Akbar, N. Suryana, and S. Sahib, “Training neural networks [22] S. Gouravaraju, J. Narayan, R. A. Sauer, and S. S. Gautam, “A
using Clonal Selection Algorithm and Particle Swarm Bayesian regularization-backpropagation neural network model
Optimization: A comparisons for 3D object recognition,” in 2011 for peeling computations,” arXiv Prepr. arXiv2006.16409, 2020.
11th International conference on hybrid intelligent systems (HIS),
2011, pp. 692–697.
[20] L.-Y. Chuang, H.-W. Chang, C.-J. Tu, and C.-H. Yang, “Improved

TABLE XII. COMPARISONS OF BACKPROPAGATION TRAINING ALGORITHMS BASED ON THE NUMBER OF PSO ITERATIONS

AAM-SVM-FM-
Training algorithm BRa LMb SCGc
FFNN [10]
PSO Iteration 10 20 30 40 50 20 20 -
Accuracy (%) 96.35 97.83 94.95 96.60 93.21 91.98 74.47 84.51
CPU time (seconds) 242.56 537.56 980.13 717.29 950.29 84.81 15.95
1 Participant_ID √ √ √ √ √
2 Gender √ √ √ √ √
3 AU01_r √ √ √
4 AU02_r √ √ √ √ √ √
5 AU04_r √ √ √ √ √ √
6 AU05_r √ √ √ √ √
7 AU06_r √ √ √ √ √
8 AU09_r √ √ √ √ √
9 AU10_r √ √ √ √
10 AU12_r √ √ √ √ √
11 AU14_r √ √
Features
12 AU15_r √ √ √ √
13 AU17_r √ √ √
14 AU20_r √ √ √ √ √ √
15 AU25_r √ √ √ √ √
16 AU26_r √ √
17 AU04_c √ √ √
18 AU12_c √ √
19 AU15_c √ √ √
20 AU23_c √ √ √ √ √ √
21 AU28_c √ √ √ √
22 AU45_c √ √ √ √
a.
Bayesian Regularization.
b.
Levenberg-Marquardt.
c.
Scaled Conjugate Gradient.

443 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Review Literature Performance : Quality of Service


from Internet of Things for Transportation System
Nizirwan Anwar Arief Kusuma Among Praja Habibullah Akbar
Computer Science Economics and Business Computer Science
Esa Unggul University Esa Unggul University Esa Unggul University
[email protected] [email protected] [email protected]

M. Fachruddin Arrozi Adhikara Roesfiansjah Rasjidin Dewanto Rosian Adhy


Economics and Business Engineering Network of Research, Innovation and
Esa Unggul University Esa Unggul University Collaboration
[email protected] [email protected] [email protected]

Abstract — The Internet of Things (IoT) is one of the the location of vehicles can be obtained. Other sensors can
technologies that are becoming a trend nowadays. Needs and also be added which can include: the condition of the engine
implementation develop very fast. These conditions encourage (temperature, NVH levels, fuel consumption), braking
increased infrastructure requirement and good data conditions and others. The next step is to add a car sensor to
management. The number of points connected to IoT increases capture car condition information. The information taken is
rapidly, as does the quantity and quality of data sent. IoT is a the condition of the engine (temperature, sound, fuel
technology that connects objects (things) or devices in a digital consumption), braking conditions and others. All information
communication. Is a development from the previous internet is sent to the data center or server. The information obtained
connection that connects only computer devices. In a Scale-up
hereinafter becomes a large data and can be used as a reference
transportation system, it can occur due to an increase in
connected vehicles, the amount of data sent, the area or range of
by various parties. Among information that can be collected
vehicle mobility that is increasingly widespread. Data generated include:
by the IoT system will be widely used by various parties Information on the history of vehicle movements
according to their interests. Data increase will result in a
decrease in data quality in terms of data source trust, data Information on the condition of the vehicle unit
security from improper changes and timely and targeted data
distribution. The expected result of the research is an IoT system Information on vehicle identity and ownership
in transportation that is able to produce valid, accurate and up Technical information from the factory
to date data. The use of these data further improves the
performance of the transportation system. Transportation is The information can then be used for further analysis such
one area that has the potential to benefit from IoT. But unique as benchmarking brands and types of vehicles, distribution of
and challenging characteristics require a guarantee of the vehicle spare parts requirements, distribution of technician
quality of the IoT. The flow of information on transportation is needs and others. This information is very needed and useful
needed to improve services for users. Utilization of IoT in for many parties such as the general public of vehicle users,
transportation will provide major benefits such as ease of spare parts and workshop vendors, car manufacturers and
getting vehicle position information, accuracy in getting distributors and the government as policymakers. The
information on engine conditions and in the future leading to availability of this information provides many benefits
transportation automation. If the quality of the data flow is not ranging from the efficiency of maintenance of the vehicle,
guaranteed, the implementation of IoT will pose a potential optimization of spare part stock, analysis of load factors on the
hazard. road and others. As time goes on, more information is
collected and stored. Addition is due to data entry from the
Keywords—IoT architecture, contribution, extraction running system as well as additional vehicle units and system
mapping, focus area and contribution, object research and
area coverage. Adding data is not only in percentages but
implementation, transportation efficiency increasingly
increases rapidly or multiply. Problems arise regarding the
Introduction.
data collected, namely:
I. INTR0DUCTION Validation or correctness of data is not guaranteed due to
The IoT is one of the technologies that is developing today. untrusted sources of information or the existence of
The dynamics of society and the emergence of technology in connection disruptions that cause a decrease in data
the fields of communication, electronics and energy are quality.
driving the emergence and growth of the IoT. The application
is mostly carried out in the fields of health, agriculture, The importance of information on the results of data
transportation and others. The great benefit especially in processing that can lead to irresponsible data
providing convenience in the activities of daily life makes the manipulation.
use of the IoT inevitable. The growth of IoT connected devices Data that is time-dependent and there is no guarantee that
is very fast, according to CISCO's prediction that in 2020 there
data will be updated.
are around 25 billion devices connected in an IoT system. The
rate of growth is also very fast from year to year. One area that Increasing of coverage area and the number of connected
benefits from the IoT is Transportation or Automotive. The cars that are connected requires setting up the
Transportation Sector is one of the most users of IoT in communication network so that there is no overload of
addition to the health sector. One example of the use of IoT in traffic.
transportation is the use of GPS sensors for tracking so that

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

444 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

These problems will lead to a decline in the quality of data Selection of the scope of the journal to be used as reference
produced by the IoT. The decrease is the cause: material. The selection is made by determining the search
Trust in the correctness of the data keywords that are tailored to the Research Question from the
study. Search using keywords Performance Analysis, QoS,
Centralized data reduces network quality and process IoT, Transportation. Initial selection with initial criteria
speed and produces latency
Journal published in 2015 until 2019
Data lost due to connection loss
Journal in English
From several references and standards, the IoT architecture
stack can be (Fig. 1.) described as follows: Journal available in full text
Journals in the fields of Computers, Electronics,
Communication and Control
The second stage of selection uses a phased filtering approach
related to keywords (QoS and IoT). Filtering is done in
sequence
Relevance of Journal Titles with keywords
Relevance of Abstract Content with keywords
Fig. 1. IoT Architecture Stack Relevance of full paper with keywords
The three layers that form the main part of IoT have their Related references (snowballing)
respective characteristics. The top layer (Cloud or Data extraction is the process of summarizing a journal
Application) is relatively not much of a problem. This layer with reference to a particular extraction mapping. Extraction
has developed since the advent of computer technology. The mapping uses research classification in QoS on IoT [91][92].
lower layer is the part that is the source of problems ranging The final stage is the analysis of the results of data extraction
from increasing the number of devices, diversity in quality and that answers the research question and raises the gap between
standards, being the part that most influences investment existing research and the issues raised in the research.Journal
costs. Performance problems arise due to failure to maintain Selection (see Table 1).
data transfer from the lower layer to the upper layer. The role
of the second layer (networking/ fog) becomes very important. B. Journal Selection
Implementation in transportation also has the same problem. The process of finding a journal is carried out by utilizing
The top layer is good, the bottom layer of the number of various sources. The keywords of the journal are Internet of
devices and the condition of the device will cause problems. Things, IoT, QoS, Performance Analysis. From the journal
To maintain the desired performance, the role of the middle search process by referring to point 2 above, 369 journals were
layer in the form of a Method or Tool will be a solution. Those obtained. Then do the selection by looking at the title of the
problems will be resolved using New Method such as create journal. Journals whose titles have no relevance to the
algorithm or mechanism at middle layer of IoT. A new keywords above will be discarded and the number of journals
technology that has an emphasis on improving data quality left at this stage is 278. The next filter process is to open the
through three characteristics namely completeness, time abstract section of the journal. Journals that are maintained are
aware and correctness. those that have an abstract that is relevant to the keyword.
From this process, there are 165 journals left. The final filter
The problem raised is the quality of data from the IoT in process is to review the contents of the journal. The review
Transportation. Data quality degradation is caused by will choose which journals have relevance to the keywords.
scale-up systems that encourage rapid growth in the From this process, there are 98 journals left.
amount of data so that loss and delivery,
Research focuses on implementing the IoT on
transportation. The parameters observed were how and
what data were taken from the vehicle behavior and what
could be done to the collected data. The next parameter is
how to maintain the quality of the data.
II. LITERATIURE REVIEW METHOD Fig. 2. Filtering and Selection Process

A. Review a Process III. RESULT AND DISCUSSION


In the second chapter there will be a description of the Furthermore, a total of 98 journals (Fig. 2.) will be
process and the results of the review literature related to the processed by data extraction. To complement the existing
research conducted. The literature review process is directed journals, 5 additional journals were added which became
by referring to several models and one of them is done in the references from several journals that were reviewed. These
QoS mapping in IoT [91]. The stages of implementing the
literature review are:

445 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

five journals were chosen because they have strong relevance


to the keywords. TABLE 3 REFERENCE’S OF RESEARCH TYPE

Item Reference’s
TABLE 1. MAPPING A EXTRACTION Validated Research [16][4][46][81][104][9][30][52]
Evaluated Research [14][21][37][42][10][39][47][51][54][57]
No Description Item Explanation [65][66][68][67][76][77][70][74]
[80][87][95][101][17][3][15][86]
1 Focus Area Physical Sensor This section explains
Deployment the layers of IoT that [26][13][23][44][46][63][72][79][94][84]
Physical Layer are the focus of the Solution Proposal [5][91][33][48]
Link Layer research. [12][32][35][36][40][50][22][29]
Network [58][59][60][62][69][64][71][75][85][93]
Middleware [96][98][102][103][100][61][6][88][32][20]
Cloud [28][7][97][1][11][38][19][27][49]
2 Research Type Validated Research Type of research [78][82][83][99][43][25][8][24]
Evaluated Research conducted, whether Philosophical Papers -
Solution Proposal evaluating or validating Opinion Papers [2][41][34][73]
Philosophical a method / product or Experience Papers [56][55][92][53][89][18]
Papers proposing a solution to
Application Papers the problem or
Experience Papers explaining the TABLE 4. REFERENCE’S OF CONTRIBUTION TYPE
Opinion Papers experience carried out.
3 Contribution Tool The type of Item Reference’s
Type Method contribution from the Tool [21][31][35][37][40][22][29][46][56]
Process research, is the [60][69][85][6][13][38][27][83][99]
Model software to improve [25][24]
Metric QoS, or algorithm or
Method [12][36][59][64][98][61][88][32][53]
schema, or QoS
[20][7][1][11][43][8]
measurement or model.
Process [2][71][87][101][102][103][104][97][48]
4 Quality Factor Functional Explain the QoS
Suitability criteria that are the Model [41][50][34][4][10][47][51][57][58][55][62]
Performance objectives, whether the [65][68][67][75][81][93][95][96][73][90][17]
Eficiency functional quality of the [100][3][15][86][28][89][18][19][44][49][63]
Compatibility system, performance [72][78][79][82][92][52] [84][33]
Usability improvement or Metric [14][16][42][39][54][66][76][77][70][74][80]
Reliability security side. [9][30][26][23][45][5][91]
Security
Portability
Maintainability TABLE 5 REFERENCE’S OF QUALITY FACTOR
5 Research Area General Describe the areas or
Specific Area objects of the IoT Item Reference’s
implementation Functional Suitability [12][36][37][40][41][42][46][47][51][55]
discussed, whether they [66][71][76][70][74][85][101][102][103][3]
are general, industrial, [88][32][89][11][38][18][93][40][24][48]
transportation or other Performance Eficiency [22][4][56][57][59][64][87][98][17][86][26][
23][94][33]
A. Data Extraction Compatibility [73][61][5]
Extraction data uses the following references above Usability ---
Reliability [16][34][39][90][30][53][28][19][45][49]
Security [63][8][84][91]
TABLE 2 REFERENCE’S OF FOCUS AREA [2][14][21][31][34][50][29][10][54][58][60]
[62][65][68][69][67][75][77][80]
Item Reference’s [81][93][95][96][104][100][9][15][6]
Physical Sensor [29][76][45] [20][7][97][1][13][27][45][72][78][79]
Deployment [82][83][52][25]
Physical Layer [34][6][27][99] Maintainability -
Link Layer [50][23][67][3][15][7][23] Portability -
Network [69][77][101][102][37][42][4][51][65][71][85][87][95]
[90][86][89] [97][63][43] [8][84] TABLE 6. REFERENCE’S OF IMPLEMENTATION AREA
Application [5][12][39][44][45][11][51][53][58][60][61][59]
[63][64][70][74][80][96][104][17][30][88][32] Item Reference’s
[26][1][11][38] [82] [24][91] [33] General [3][12][14][16][21][31][35][36][37][40][41][42]
Middleware [48] [50][22][34][4][29][10][39][46][47][51][54][56]
Cloud [21][39][62][98][103][61][20][28][94] [57][58][55][59][60][62][65][66][68][69][64][67][
[31][35][58][68][75][81][93][73][100][9][53][13] 71][75][76][77][70][74][80][81][85][87][93][95][9
[18][19][44] [49][72][78][79][83][52][25] 6][98][101][102][103][104][73][90][17][100]
[9][3][61][15][30][6][88][32][86][84][24][5][91]
(Table 1.). The extraction data process can be summarized in [33][48]
the graph as follows: Extraction data is displayed in the form Health [20][28][89]
of pie charts (Fig. 3.). A review of 103 journals and a resume Transportation [18][19][27][44][45][49][63][72][76]
[79][82][83][94][99][43][52]
will be explained in the following explanation.
Environment [53][26][7][25][8]
Smart Home [97][1][13][11][23][38]

446 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

In terms of focus area (Table 2. and Fig. 3a.) and the The type of research that dominates is the solution and
research area (Table 3. and Fig. 3b.) is mostly in the Cloud evaluated proposal. Of all these types, most of them discuss
and Application section and at least in the sensor side. From QoS on the security side. On the functional side, suitability is
these results it can be seen that the attention of researchers is spread across all types of research.
mostly done in the upper layer compared to the lower layer. Relevance of Object Research with other Aspect
There are many types of research in terms of solution Distribution at Research Type.
proposals and evaluation proposals. Shows that research on From the journals discussed, most do not focus on the
QoS on IoT. provides an evaluation of existing research or field or object of IoT implementation. There are 16 journals
makes an answer to problems that arise. In terms of that focus on choosing the transportation object as an IoT
contributions (Table 4. and Fig. 3c.), spread evenly. Mostly implementation. Most of the journals use QoS parameters on
in the model because most research makes an alternative QoS the Security side. The type of research carried out is almost
from existing research. While other forms of QoS entirely on the solution proposal side and the layers discussed
measurement tools and software or algorithms that can are mostly on the Cloud side.
improve QoS do a lot of research.
In terms of quality parameters (Table 5. and Fig. 3d.), IV. CONCLUSION
research focuses on security, how data processed in IoT is The conclusion of the literature review is in the following
safe from outside attacks, data confidentiality, data theft and points;
other security disturbances. Another big focus is on the
The main problem in implementing IoT at transportation
functional side of IoT. Completeness, the correctness of the
system is on the side of variance of equipment and
data sent is an important concern. In terms of performance
environment. There is very little research in
including delay there is attention but not too much. QoS
implementation area objects do not indicate place specific. transportation;
Most of the research is general in nature, meaning that it Research in the general area does not show the
can be used in various places or not specifically paying characteristics of problems in the transportation
attention to the characteristics of the place of system environment. Trasportation need answers how to
implementation (Table 6 and Fig. 3e.). There are only a few overcome data overload, network connectivity,
studies that specifically determine objects, one of which is in congestion conditions when traffic jams occur and critical
the field of transportation. conditions that require data within a certain time limit.
The whole distribution of the research is given in Fig. 3. The research is more focused on the security side and at
the top layer (cloud layer); and
The main problem in the IoT system is the blocked data
flow from the sensor to the server. Data transfer is
interrupted when problems occur at the bottom and
middle layer. Data is lost due to a lost connection or data
overload. The middleware layer has the potential to
improve.
ACKNOWLEDGEMENT
This research is a decentralization grant funded by the
Indonesian Ministry of Education, Culture, Research and
Fig. 3. Distribution Reference’s a Research Technology cq DRPM and LLDIKTI3 Contract Number
234/E4.1/AK.04.PT/2021 in Scheme a Higher Education
B. Discussion and Summary Leading Applied Research (PTUPT) 2021. We thank you for
Relevance of the QoS to the layer your motivation and support to the Rector of Esa Unggul
From the extraction results it can be seen that the layers University and all other parties who cannot be named
that are widely discussed are Cloud and Application (both are individually for their cooperation and assistance.
the upper and middle layers of the IoT structure). Cloud
Layer uses a lot of Security parameters in its discussion while REFERENCE’S
other parameters are not too many. Whereas the application [1] Adiono, T., Tandiawan, B., & Fuada, S. (2018). Device protocol design
for security on internet of things based smart home. International Journal
layer uses QoS parameters in functional suitability. of Online Engineering, 14(7), 161–170.
Middleware has not been much research done. https://doi.org/10.3991/ijoe.v14i07.7306
Relevance of contributions to QoS [2] Ahmad, M., & Salah, K. (2018). IoT security : Review , blockchain
The contributions of the research are many in terms of solutions , and open challenges. Future Generation Computer Systems,
82, 395–411. https://doi.org/10.1016/j.future.2017.11.022
models, tools, metrics and methods. Most of the models use [3] Ali, B. A. (2018). Trust Based Scheme for IoT Enabled Wireless Sensor.
QoS parameters on the security side, and below them on the 1061–1080.
functional suitability side. On the other hand, in the form of [4] Amersho, B. L., Rodero, I., Parashar, M., Menaud, J.-M., Orgerie, A.-
metric contributions, many use QoS on the functional side. C., & Li, Y. (2017). End-to-end energy models for Edge Cloud-based
IoT platforms: Application to data stream analysis in IoT. Future
Whereas the methods and tools are almost the same between Generation Computer Systems, 87, 667–678.
QoS in security and functional suitability. https://doi.org/10.1016/j.future.2017.12.048
Relevance QoS with Research Type [5] Anna Larmo ; Felipe Del Carpio ; Pontus Arvidson ; Roman Chirikov
(2018) Comparison of CoAP and MQTT Performance Over Capillary
Radios, IEEE Xplore Digital Library

447 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

[6] Bae, W. il, & Kwak, J. (2017). Smart card-based secure authentication [28] Fernández-Caramés, T., & Fraga-Lamas, P. (2018). Design of a Fog
protocol in multi-server IoT environment. Multimedia Tools and Computing, Blockchain and IoT-Based Continuous Glucose Monitoring
Applications, pp. 1–19. https://doi.org/10.1007/s11042-017-5548-2 System for Crowdsourcing mHealth. Proceedings, 4(1), 37.
[7] Cao, X. H., Du, X., & Ratazzi, E. P. (2018). A Light-Weight https://doi.org/10.3390/ecsa-5-05757
Authentication Scheme for Air Force Internet of Things. (88). [29] Fu, K., & Xu, W. (2018). Inside Risks of Trusting the Physics of
[8] Chen, Y., Chanet, J. P., Hou, K. M., Shi, H., & de Sousa, G. (2015). A Sensors. Communications of the ACM, (2).
scalable context-aware objective function (SCAOF) of routing protocol https://doi.org/10.1145/3176402
for agricultural low-power and lossy networks (RPAL). Sensors [30] Guan, W. (2018). Reliability Analysis of the Internet of Things Based
(Switzerland), 15(8), 19507–19540. on Ordered Binary Decision Diagram. International Journal of Online
https://doi.org/10.3390/s150819507 Engineering, 14(8), 20–35.
[9] Chen, Y. J., Wang, L. C., & Wang, S. (2018). Stochastic Blockchain for [31] Hammi, M. T., Hammi, B., Bellot, P., & Serhrouchni, A. (2018).
IoT Data Integrity. IEEE Transactions on Network Science and Bubbles of Trust: A decentralized blockchain-based authentication
Engineering, PP(c), 1. https://doi.org/10.1109/TNSE.2018.2887236 system for IoT. Computers and Security, 78, 126–142.
[10] Cheng, P.-C., Garay, J. A., Herzberg, A., & Krawczyk, H. (2017). A https://doi.org/10.1016/j.cose.2018.06.004
security architecture for the Internet Protocol. IBM Systems Journal, [32] Hamrioui, S., & Lorenz, P. (2017). Efficient medium access protocol for
37(1), 42–60. https://doi.org/10.1147/sj.371.0042 Internet of things applications. International Journal of Communication
[11] Cheon, H., Jisu, H., Jin, P., & Shon, G. (2016). Design and Systems, 30(10), 1–10. https://doi.org/10.1002/dac.3227
Implementation of a Reliable Message Transmission System Based on [33] Hanumat Prasad Alahari (2017) A Survey on Network Routing
MQTT Protocol in IoT. 1765–1777. https://doi.org/10.1007/s11277- Protocols in Internet of Things (IOT), International Journal of Computer
016-3398-2 Applications (0975 – 8887)
[12] Chun, S. M., & Park, J. T. (2017). A mechanism for reliable mobility [34] Hortelano, D., Olivares, T., Ruiz, M. C., Garrido-Hidalgo, C., & López,
management for internet of things using CoAP. Sensors (Switzerland), V. (2017). From sensor networks to internet of things. Bluetooth low
17(1). https://doi.org/10.3390/s17010136 energy, a standard for this evolution. Sensors (Switzerland), 17(2), 1–
[13] Dang, T. L. N., & Nguyen, M. S. (2018). An Approach to Data Privacy 31. https://doi.org/10.3390/s17020372
in Smart Home using Blockchain Technology. 2018 International [35] Huang, J., Kong, L., Chen, G., Wu, M.-Y., Liu, X., & Zeng, P. (2019).
Conference on Advanced Computing and Applications (ACOMP), 58– Towards Secure Industrial IoT: Blockchain System with Credit-Based
64. https://doi.org/10.1109/acomp.2018.00017 Consensus Mechanism. IEEE Transactions on Industrial Informatics,
[14] Das, A. K., Zeadally, S., & He, D. (2018). Taxonomy and analysis of PP(c), 1–1. https://doi.org/10.1109/TII.2019.2903342
security protocols for Internet of Things. Future Generation Computer [36] Huang, M., Liu, A., Wang, T., & Huang, C. (2018). Green Data
Systems, 89, 110–125. https://doi.org/10.1016/j.future.2018.06.027 Gathering under Delay Differentiated Services Constraint for Internet of
[15] Deep, S., Zheng, X., & Hamey, L. (2018). January 5, 2018 12:3 Things. Wireless Communications and Mobile Computing, 2018.
WSPC/INSTRUCTION FILE ”A survey of security and privacy issues https://doi.org/10.1155/2018/9715428
in the Internet of Things from the layered context”. (April), 1–23. [37] Huang, Y., Huang, J., Cheng, B., He, S., & Chen, J. (2017). Time-aware
[16] Deif, D., & Gadallah, Y. (2017). A comprehensive wireless sensor service ranking prediction in the internet of things environment. Sensors
network reliability metric for critical Internet of Things applications. (Switzerland), 17(5). https://doi.org/10.3390/s17050974
Eurasip Journal on Wireless Communications and Networking, 2017(1). [38] Hwang, H. C., Park, J. S., Lee, B. R., & Shon, J. G. (2017). An enhanced
https://doi.org/10.1186/s13638-017-0930-3 reliable message transmission system based on MQTT protocol in iot
[17] Demir, A. K., & Abut, F. (2018). Comparison of CoAP and CoCoA environment. Lecture Notes in Electrical Engineering, 421, 982–987.
Congestion Control Mechanisms in Grid Network Topologies. https://doi.org/10.1007/978-981-10-3023-9_153
Gümüşhane Üniversitesi Fen Bilimleri Enstitüsü Dergisi, 0, 53–60. [39] Jaloudi, S. (2019). Communication protocols of an industrial internet of
https://doi.org/10.17714/gumusfenbil.436056 things environment: A comparative study. Future Internet, 11(3).
[18] Dhall, R., & Solanki, V. K. (2017). An IoT Based Predictive Connected https://doi.org/10.3390/fi11030066
Car Maintenance Approach. International Journal of Interactive [40] Jung, S. Y., Lee, S. H., & Kim, J. H. (2019). Reliability Control
Multimedia and Artificial Intelligence, 4(3), 16. Framework for Random Access of Massive IoT Devices. IEEE Access,
https://doi.org/10.9781/ijimai.2017.433 7, 49928–49937. https://doi.org/10.1109/ACCESS.2019.2911089
[19] Dhar, P., & Gupta, P. (2017). Intelligent parking Cloud services based [41] Kamboj, P., & Pal, S. (2019). QoS in software defined IoT network
on IoT using MQTT protocol. International Conference on Automatic using blockchain based smart contract. 430–431.
Control and Dynamic Optimization Techniques, ICACDOT 2016, https://doi.org/10.1145/3356250.3361954
5013(5), 30–34. https://doi.org/10.1109/ICACDOT.2016.7877546 [42] Khalifa, A., Stanica, R., Khalifa, A., Stanica, R., Evaluation, P.,
[20] Dhas, Y. J., & Jeyanthi, P. (2019). A Review on Internet of Things Methods, A., … Stanica, R. (2019). Performance Evaluation of Channel
Protocol and Service Oriented Middleware. (2019) International Access Methods for Dedicated IoT Networks To cite this version : HAL
Conference on Communication and Signal Processing (ICCSP), 0104– Id : hal-02042688 Performance Evaluation of Channel Access Methods
0108. https://doi.org/10.1109/iccsp.2019.8698088 for Dedicated IoT Networks.
[21] Di Martino, B., Rak, M., Ficco, M., Esposito, A., Maisto, S. A., & [43] Khanafer, M., Al-Anbagi, I., & Mouftah, H. T. (2017). An optimized
Nacchia, S. (2018). Internet of things reference architectures, security cluster-based WSN design for latency-critical applications. 2017 13th
and interoperability: A survey. Internet of Things, 1–2, 99–112. International Wireless Communications and Mobile Computing
https://doi.org/10.1016/j.iot.2018.08.008 Conference, IWCMC 2017, 2017, 969–973.
[22] Díaz, V., Martínez, J.-F., Martínez, N., & del Toro, R. (2015). Self- https://doi.org/10.1109/IWCMC.2017.7986417
Adaptive Strategy Based on Fuzzy Control Systems for Improving [44] Knieps, G. (2018). Internet of Things , big data and the economics of
Performance in Wireless Sensors Networks. Sensors, 15(9), 24125– networked vehicles. Telecommunications Policy, (September), 1–11.
24142. https://doi.org/10.3390/s150924125 https://doi.org/10.1016/j.telpol.2018.09.002
[23] Dinh, N., & Lim, S. (2016). Performance Evaluations for IEEE 802 . 15 [45] Koteswara Rao, N., & Swain, G. (2018). A systematic study of security
. 4-based IoT Smart Home Solutions. 274–283. challenges and infrastructures for Internet of Things. International
[24] Dionisis Kandris, George Tselikis (2017) COALA: A Protocol for the Journal of Engineering and Technology(UAE), 7(4.36 Special Issue 36),
Avoidance and Alleviation of Congestion in Wireless Sensor Networks, 700–706. https://doi.org/10.14419/ijet.v7i2.29.14001
Sensors 2017, 17, 2502; doi:10.3390/s17112502 [46] Kumar, P., & Dezfouli, B. (2019). Implementation and analysis of
[25] Dogo, E. M., Salami, A. F., Nwulu, N. I., & Aigbavboa, C. O. (2019). QUIC for MQTT. Computer Networks, 150, 28–45.
Blockchain and Internet of Things-Based Technologies for Intelligent https://doi.org/10.1016/j.comnet.2018.12.012
Water Management System. Transactions on Computational Science [47] Larmo, A., Ratilainen, A., & Saarinen, J. (2019). Impact of coAP and
and Computational Intelligence, 129–150. https://doi.org/10.1007/978- MQTT on NB-IoT system performance. Sensors (Switzerland), 19(1).
3-030-04110-6_7 https://doi.org/10.3390/s19010007
[26] Environment, I. (2019). Performance Evaluation of CoAP and MQTT _ [48] Lars Durkop, Bjorn Czybik, and Jurgen Jasperneite,(2019) Performance
SN in. 1–12. https://doi.org/10.3390/proceedings2019031049 Evaluation of M2M Protocols Over Cellular Networks in a Lab
[27] Fan, K., Zhang, C., Yang, K., Li, H., & Yang, Y. (2018). Lightweight Environment
NFC Protocol for Privacy Protection in Mobile IoT. Applied Sciences, [49] Lei, A., Cruickshank, H., Cao, Y., Asuquo, P., Ogah, C. P. A., & Sun,
8(12), 2506. https://doi.org/10.3390/app8122506 Z. (2017). Blockchain-Based Dynamic Key Management for
Heterogeneous Intelligent Transportation Systems. IEEE Internet of

448 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Things Journal, 4(6), 1832–1843. [69] Rana, S. M. S., Halim, M. A., & Kabir, M. H. (2018). Design and
https://doi.org/10.1109/JIOT.2017.2740569 implementation of a security improvement framework of Zigbee
[50] Li, Y., Chen, S., Ye, W., & Lin, F. (2018). A Joint Low-Power Cell network for intelligent monitoring in IoT Platform. Applied Sciences
Search and Frequency Tracking Scheme in NB-IoT Systems for Green (Switzerland), 8(11). https://doi.org/10.3390/app8112305
Internet of Things. Sensors, 18, 1–22. [70] Ratnasih, R., Perdana, D., & Bisono, Y. G. (2018). Performance
https://doi.org/10.3390/s18103274 Analysis and Automatic Prototype Aquaponic of System Design Based
[51] Limbasiya, T., & Das, D. (2019). IoVCom: Reliable Comprehensive on Internet of Things (IoT) using MQTT Protocol. Jurnal Infotel, 10(3),
Communication System for Internet of Vehicles. IEEE Transactions on 130. https://doi.org/10.20895/infotel.v10i3.388
Dependable and Secure Computing, PP(X), 1. [71] Sahidi, T. T., Basuki, A., & Tolle, H. (2018). MIOT framework, general
https://doi.org/10.1109/TDSC.2019.2963191 purpose internet of things gateway using smartphone. International
[52] Limbasiya, T., Das, D., & Sahay, S. K. (2019). Secure Communication Journal of Online Engineering, 14(2), 6–23.
Protocol for Smart Transportation Based on Vehicular Cloud. 1–10. https://doi.org/10.3991/ijoe.v14i02.7326
Retrieved from http://arxiv.org/abs/1912.12884 [72] Samad, A., & Shuaib, M. (2018). Internet of Vehicles ( IoV )
[53] Liu, C., Xiao, Y., Javangula, V., Hu, Q., Wang, S., & Cheng, X. (2018). Requirements , Attacks and Countermeasures. (March).
NormaChain: A Blockchain-based Normalized Autonomous [73] Samaniego, M., Espana, C., & Deters, R. (2018). Smart virtualization
Transaction Settlement System for IoT-based E-commerce. IEEE for IoT. Proceedings - 3rd IEEE International Conference on Smart
Internet of Things Journal, 4662(c), 1–14. Cloud, SmartCloud 2018, 125–128.
https://doi.org/10.1109/JIOT.2018.2877634 https://doi.org/10.1109/SmartCloud.2018.00028
[54] Lu, Y., & Xu, L. Da. (2018). Internet of Things ( IoT ) Cybersecurity [74] Sasaki, Y., & Yokotani, T. (2019). Performance Evaluation of MQTT
Research : A Review of Current Research Topics. IEEE Internet of as a Communication Protocol for IoT and Prototyping. Advances in
Things Journal, 4662(c). https://doi.org/10.1109/JIOT.2018.2869847 Technology Innovation, 4(1), 21–29. Retrieved from
[55] Luzuriaga, J. E., Perez, M., Boronat, P., Cano, J. C., Calafate, C., & https://doaj.org/article/e87340c16ad74f56bd56043dd7d6de76
Manzoni, P. (2016). Improving MQTT Data Delivery in Mobile [75] Shafi, Q., & Basit, A. (2019). DDoS Botnet Prevention using
Scenarios: Results from a Realistic Testbed. Mobile Information Blockchain in Software Defined Internet of Things. 2019 16th
Systems, 2016. https://doi.org/10.1155/2016/4015625 International Bhurban Conference on Applied Sciences and Technology
[56] Mardini, W., Yassein, M. B., AlRashdan, M., Alsmadi, A., & Amer, A. (IBCAST), 624–628. https://doi.org/10.1109/IBCAST.2019.8667147
B. (2018). Application-based power saving approach for IoT CoAP [76] Shambrook, P., Lander, P. J., & Maclaren, O. (2018). A Study into the
protocol. ACM International Conference Proceeding Series. Reliability of the Data Flow from GPS Enabled Portable Fitness Devices
https://doi.org/10.1145/3279996.3280008 to the Internet. International Journal of Exercise Science, (15).
[57] Masek, P., Hosek, J., Zeman, K., Stusek, M., Kovac, D., Cika, P., … [77] Sial, M. F. K. (2019). Security Issues in Internet of Things: A
Kröpfl, F. (2019). Implementation of True IoT Vision : Survey on Comprehensive Review. American Scientific Research Journal for
Enabling Protocols and Hands-On Experience. 2016. Engineering, Technology, and Sciences (ASRJETS), 53(1), 207–214.
https://doi.org/10.1155/2016/8160282 [78] Singh, M., & Kim, S. (2017). Blockchain Based Intelligent Vehicle Data
[58] Minoli, D., & Occhiogrosso, B. (2018). Blockchain mechanisms for IoT sharing Framework. CoRR. Retrieved from
security. Internet of Things, 1–2, 1–13. http://arxiv.org/abs/1708.09721
https://doi.org/10.1016/j.iot.2018.05.002 [79] Singh, M., & Kim, S. (2018). Branch based blockchain technology in
[59] Mohanty, S., Sharma, S., Vishal, V., & Tech, B. (2016). MQTT – intelligent vehicle. Computer Networks, 145, 219–231.
Messaging Queue Telemetry Transport IOT based Messaging Protocol. https://doi.org/10.1016/j.comnet.2018.08.016
1369–1376. [80] Singh, V. K., & Sharan, H. O. (2019). Security Analysis and
[60] Moustafa, N., Turnbull, B., Raymond, K., & Member, S. (2018). An Improvements to IoT Communication Protocols - CoAP. (July), 3041–
Ensemble Intrusion Detection Technique based on proposed Statistical 3045.
Flow Features for Protecting Network Traffic of Internet of Things. [81] Sopilnyk, L., Shevchuk, A., Kopytko, V., Sopilnyk, R., & Yankovska,
IEEE Internet of Things Journal, PP(c), 1. L. (2018). Сryptocurrency and Internet of Things: Problems of
https://doi.org/10.1109/JIOT.2018.2871719 Implementation and Realization. Path of Science, 4(9), 2001–2006.
[61] Moysiadis, V., Sarigiannidis, P., & Moscholios, I. (2018). Towards https://doi.org/10.22178/pos.38-1
Distributed Data Management in Fog Computing. 2018(i). [82] Technologies, D. (2019). Application of the Reference Model for
[62] Mukherjee, B., Wang, S., Lu, W., Neupane, R. L., Dunn, D., Ren, Y., Security Risk Management in the Internet of Things Systems. 0.
… Calyam, P. (2018). Flexible IoT security middleware for end-to-end https://doi.org/10.3233/978-1-61499-941-6-65
cloud–fog communication. Future Generation Computer Systems, 87, [83] Teng, Y., Leung, V., Song, M., Yu, R., & Liu, M. (2019). Performance
688–703. https://doi.org/10.1016/j.future.2017.12.031 Optimization for Blockchain-Enabled Industrial Internet of Things
[63] Ni, Y., Cai, L., He, J., Vinel, A., Li, Y., Mosavat, H., & Pan, J. (2019). (IIoT) Systems: A Deep Reinforcement Learning Approach. IEEE
Toward Reliable and Scalable Internet of Vehicles : Toward Reliable Transactions on Industrial Informatics, PP(c), 1–1.
and Scalable Internet-of-Vehicles : Performance Analysis and Resource https://doi.org/10.1109/tii.2019.2897805
Management. 1–17. https://doi.org/10.1109/JPROC.2019.2950349 [84] Triantafyllou, A., Sarigiannidis, P., & Lagkas, T. D. (2018). Network
[64] Oh, H., & Lim, S. (2016). Light-weight routing protocol in IoT-Based Protocols , Schemes , and Mechanisms for Internet of Things ( IoT ):
inter-device telecommunication wireless environment. International Features , Open Challenges , and Trends. 2018.
Journal of Electrical and Computer Engineering, 6(5), 2352–2361. [85] Vangelista, L., & Centenaro, M. (2018). Performance Evaluation of
https://doi.org/10.11591/ijece.v6i5.10504 HARQ Schemes for the Internet of Things †.
[65] Perwej, Y., Parwej, F., Mohamed Hassan, M. M., & Akhtar, N. (2019). https://doi.org/10.3390/computers7040048
The Internet-of-Things (IoT) Security : A Technological Perspective [86] Vázquez-gallego, F., Tuset-peiró, P., Alonso, L., & Alonso-zarate, J.
and Review. International Journal of Scientific Research in Computer (2020). Delay and Energy Consumption Analysis of Frame Slotted
Science, Engineering and Information Technology, (February), 462– ALOHA variants for Massive Data Collection in Internet-of-Things
482. https://doi.org/10.32628/CSEIT195193 Scenarios. Applied Sciences, 1–16.
[66] Profanter, S., Tekat, A., Dorofeev, K., Rickert, M., & Knoll, A. (2019). https://doi.org/10.3390/app10010327
OPC UA versus ROS, DDS, and MQTT: Performance evaluation of [87] Velinov, A., Mileva, A., & Stojanov, D. (2019). Power Consumption
industry 4.0 protocols. Proceedings of the IEEE International Analysis of the New Covert Channels in CoAP. 12(1), 42–52.
Conference on Industrial Technology, 2019-Febru, 955–962. [88] Verma, L. P., & Kumar, M. (2019). An IoT Based Congestion Control
https://doi.org/10.1109/ICIT.2019.8755050 Algorithm. Internet of Things, 100157.
[67] Pundir, S., Wazid, M., Singh, D. P., Das, A. K., Rodrigues, J. J. P. C., & https://doi.org/10.1016/j.iot.2019.100157
Park, Y. (2019). Intrusion Detection Protocols in Wireless Sensor [89] Wang, X., Chen, F., Ye, H., Yang, J., Zhu, J., Zhang, Z., & Huang, Y.
Networks integrated to Internet of Things Deployment : Survey and (2017). Data Transmission and Access Protection of Community
Future Challenges. IEEE Access, PP, 1. Medical Internet of Things. Journal of Sensors, 2017.
https://doi.org/10.1109/ACCESS.2019.2962829 https://doi.org/10.1155/2017/7862842
[68] Qian, Y., Jiang, Y., Chen, J., Zhang, Y., Song, J., Zhou, M., & Pustišek, [90] Weyns, D., Iftikhar, M. U., Hughes, D., & Matthys, N. (2018). Applying
M. (2018). Towards decentralized IoT security enhancement: A Architecture-Based Adaptation to Automate the Management of
blockchain approach. Computers and Electrical Engineering, 72, 266– Internet-of-Things. Ecsa, 2, 49–67. https://doi.org/10.1007/978-3-030-
273. https://doi.org/10.1016/j.compeleceng.2018.08.021 00761-4

449 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

[91] White, G., Nallur, V., & Clarke, S. (2017). Quality of service approaches [98] Yang, H., & Xie, X. (2019). An Actor-Critic Deep Reinforcement
in IoT: A systematic mapping. Journal of Systems and Software, Learning Approach for Transmission Scheduling in Cognitive Internet
132(September), 186–203. https://doi.org/10.1016/j.jss.2017.05.125 of Things Systems. IEEE Systems Journal, PP, 1–10.
[92] White, G., Palade, A., Cabrera, C., & Clarke, S. (2018, March). https://doi.org/10.1109/JSYST.2019.2891520
IoTPredict: collaborative QoS prediction in IoT. In 2018 IEEE [99] Yin, L., Ni, Q., & Deng, Z. (2019). Intelligent Multisensor Cooperative
International Conference on Pervasive Computing and Communications Localization Under Cooperative Redundancy Validation. IEEE
(PerCom) (pp. 1-10). IEEE. Transactions on Cybernetics, PP, 1–13.
[93] Wu, X. (2018). A Robust and Adaptive Trust Management System for https://doi.org/10.1109/TCYB.2019.2900312
Guaranteeing the Availability in the Internet of Things Environments. [100] Yurdakul, A. (2018). Designing a blockchain-based IoT infrastructure
KSII TRANSACTIONS ON INTERNET AND INFORMATION with Ethereum, Swarm and LoRa ¨. 1–6.
SYSTEMS, 12(5), 2396–2413. https://doi.org/10.3837/tiis.2018.05.026 [101] Zhai, C., Zou, Z., Chen, Q., Xu, L., Zheng, L. R., & Tenhunen, H.
[94] Xu, L., & McArdle, G. (2018). Internet of Too Many Things in Smart (2016). Delay-aware and reliability-aware contention-free MF–TDMA
Transport: The Problem, the Side Effects and the Solution. IEEE protocol for automated RFID monitoring in industrial IoT. Journal of
Access, 6(c), 62840–62848. Industrial Information Integration, 3, 8–19.
https://doi.org/10.1109/ACCESS.2018.2877175 https://doi.org/10.1016/j.jii.2016.06.002
[95] Xue, K., Meng, W., Li, S., Wei, D. S. L., Zhou, H., & Yu, N. (2019). A [102] Zhai, D., Zhang, R., Cai, L., & Yu, F. R. (2019). Delay Minimization
Secure and Efficient Access and Handover Authentication Protocol for for Massive Internet of Things with Non-Orthogonal Multiple Access.
Internet of Things in Space Information Networks. IEEE Internet of IEEE Journal of Selected Topics in Signal Processing, PP(c), 1–1.
Things Journal, PP(c), 1–1. https://doi.org/10.1109/JIOT.2019.2902907 https://doi.org/10.1109/JSTSP.2019.2898643
[96] Yang, A., Li, Y., Kong, F., Wang, G., & Chen, E. (2018). Security [103] Zhou, B., & Saad, W. (2018). Joint Status Sampling and Updating for
Control Redundancy Allocation Technology and Security Keys Based Minimizing Age of Information in the Internet of Things. 2018 IEEE
on Internet of Things. IEEE Access, 6, 50187–50196. Global Communications Conference (GLOBECOM), 1–6. Retrieved
https://doi.org/10.1109/ACCESS.2018.2868951 from http://arxiv.org/abs/1807.04356
[97] Yang, C., Li, X., Yu, Y., & Wang, Z. (2019). Basing Diversified [104] Zou, Y., & Lv, J. (2018). Information security transmission technology
Services of Complex IIoT Applications on Scalable Block Graph in Internet of things control system. International Journal of Online
Platform. IEEE Access, 7. Engineering, 14(6), 177–190. https://doi.org/10.3991/ijoe.v14i06.8707
https://doi.org/10.1109/ACCESS.2019.2899000

450 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Auto-Tracking Camera System for Remote


Learning Using Face Detection and Hand Gesture
Recognition Based on Convolutional Neural
Network
Daniel Imanuel Sutanto Verine Maria Vanessa Salim
Computer Science Department Computer Science Department Computer Science Department
School of Computer Science School of Computer Science School of Computer Science
Bina Nusantara University Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia Jakarta, Indonesia Jakarta, Indonesia
[email protected] [email protected] [email protected]

Hanry Ham
Computer Science Department
School of Computer Science,
Bina Nusantara University
Jakarta, Indonesia
[email protected]

Abstract— The purpose of this study is to develop an lecture direct recording classroom is a static camera which
auto-tracking camera used for remote learning with a direct has many disadvantages such as the teacher should sit or
recording classroom video type, which aims to focus on stand up in a limited area, so that the student will feel
reducing the model’s size so that the model can run on devices saturated, hard to read words on board (digital or
that have low computational capabilities, such as the use of the whiteboard) [3]. Wang, Quek, and Hu were experimenting
Central Processing Unit (CPU). This study used face detection on blended synchronous online classes [1]. The results
FaceBoxes to detect faces so that auto-tracking can be applied. found some problems with the effectiveness of how video
This study also used hand gesture Unified Gesture and was taken because of some factors such as the distance of
Fingertip Detection, which can be used for doing specific
the camera is too far away, and it is hard to focus on the
commands towards the camera. The idea to implement this
multi-modal is to help the lecturers create a seamless transition
presentation object because of the static camera. Survey also
during the lecturing process thus the lecturers do not need to taken on lecturers and students of Binus University and the
worry the their movement that could make the camera results is 63% of lecturers and 77% of students having video
pointing to the wrong direction due to the static movement.. technical issues for example some parts of the area are not
This study conducted experiments on the number of batch captured.
sizes, CNN architecture, and optimizer as well as the IoT Tan, Kuo, and Liu [4] build a system for automatic
protocols. While the model inferences are running on CPU, we
recording video named Intelligent Lecturer Tracking and
used Raspberry Pi for controlling the servo motor to follow the
Capturing (ITLC) on 2019. ITLC combined face detection
face in the frame by defined protocols. The expected result is
that an auto-tracking camera can enhance learning more with infrared (IR) thermal sensor and also captured modules
effectively and study a more accurate model and inference time using servo motor and microcontroller. This method have
to do face detection and recognize hand gestures in real-time limitation with the sensor which sometimes the object or
on CPU devices. teacher is not detected. Also there was numerous product for
auto tracking camera on teaching, or even like on event
Index Terms— Face Detection, Hand Gesture recognition, music or church, such as PTZ professional camera by Aver
Real-time, Auto-Tracking Camera, Remote learning. but it cost too expensive.
I. INTRODUCTION Ever since the breakthrough of CNN for problem in
computer vision on 2012 [5], which has achieved
Technology is playing an important role in education remarkable success in recent years, ranging from image
now more than ever before. People with busy schedules classification to object detection, there has been numerous
outside schools like work, live in different countries, or even CNN based face detection as a feature extractor such as
sick helped by technology to have education without S3FD [6], PyramidBox [7], DFSD [8], which became a
actually being present in class [1] . Due to the Covid-19 state-of-the-art for face detection. It’s the same with hand
pandemic, online learning has now become a standard for gesture recognition problem which have been around for
every level of education. The COVID-19 has resulted in years but as CNN becomes popular, many researchers used
schools shut all across the world. the method to solve the problem. Hand gesture recognition
There are many ways online learning is done such as have been used in Human Computer Interaction and play a
video, quiz, discussion board, hypertext pages and others[2] . crucial role as a medium of and as a replacement for mouse
Video is one of the most popular ways used in online and keyboard [9].
learning even in synchronous learning or asynchronous Even though these CNN based face detection or even
learning [2]. Many institutions including Binus University hand gesture recognition methods are robust to the large
use live lecture direct recording classroom type of video for variation of facial appearance and demonstrate
online learning. In general, the camera used for a live state-of-the-art performance, they are too time-consuming to

978-1-6654-4002-8/21/$31.00 ©2021 IEEE

451 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

achieve real-time speed, especially on CPU devices. Many There were numerous research on CNN face detectors
research on face detection are focusing on decreasing model which have high accuracy such as S3FD. This method uses
size hence can run real-time on low computational devices VGG16 as backbone and the architecture is Base
such as CPU. Multitask Cascade Neural Network (MTCNN) Convolutional layer, Extra Convolutional Layer, Detection
[10] and FaceBoxes [11] are examples of real-time Convolutional layer, Normalization layer, Prediction
state-of-the-art face detection. There were many hand Convolutional Layer and Multi-task Loss Layer [7]. DFSD
gesture recognition approaches in the last couple of years also uses VGG16 and cuts the classification layer and adds
and one of those is fingertip based detection which uses some additional structure. DFSD uses Feature Enhance
fingertip to classify the gesture and also the position of the model to send original features and make it discriminable
fingertip. Direct regression approach [12], YOLSE [13], and and robust. DFSD also added an anchor matching strategy
Unified Gesture and fingertip detection (UDF) [9] are which improved and initialized better. Those methods have
state-of-the-art for hand gesture recognition fingertip high accuracy but sacrifice the computational cost because
detection. UDF methods have better accuracy and also of the complexity of the layer so that these methods
inference time than the other 2 state-of-the-art. obviously cannot run on CPU or other low computational
power devices.
Authors proposed an auto-tracking system camera using
face detection and hand gesture recognition for online Zhang, et. al.[11] proposed a state-of-the-art face
learning. Face detection is used to detect the lecturer and detection model to run on CPU devices. FaceBoxes
follow the lecturer across the class. Our system is aiming at combined RPN on Faster R-CNN and multi-scale
helping lecturers and also students to have a better mechanism on SSD. FaceBoxes use a single fully
experience of online learning without even needing to be convolutional neural network consists of Rapidly Digester
involved in camera movement. In this study, face detection Convolutional Layers (RDCL) designed to reach real-time
is combined with hand gesture recognition to realize speed and Multiple Scale Convolutional Layers (MSCL) to
real-time and stable lecturer localization. After that, a detect variant scale of faces. RDCL is designed to shrink
microcontroller, providing communication and control input spatial size and reducing output channel which make
mechanism, and a servo motor with low power are used to FaceBoxes achieve real-time speed on CPU Devices.
pan the camera.
B. Hand Gesture Recognition
In this study, in addition to optimize neural networks, Wu, et. al. [13] introduce YOLSE or You only look what
authors researched model configuration that could improve you should see method to detect fingertip detection and
the result. Authors found that some configurations like some hand gesture recognition. YOLSE use heatmap-based fully
architecture CNN, batch size, Optimizer are crucial convolutional network (FCN). YOLSE use fingermap
parameters that could improve the performance. Authors approach as probabilistic representation on some fingertip,
decided to conduct experiment on CNN backbone which contain information of fingertip coordinate. YOLSE
MobileNetV3 large [16] and MobileNetV3 small [16] on started with detecting the hand area, then estimate the
hand gesture recognition, batch size 32, 64, and 128 on both possibility of fingertip on every pixel and represent on 5
face detection and hand gesture recognition along with individual channels on output picture called fingermap. On
optimizer called Adaptive Moment Adaptation (ADAM) real-time performance, YOLSE design a fully convolutional
[22] and SGD With Momentum [23]. network with less kernel and smaller. This method reducing
In summary, the main contribution of this study are: (1) computation with size input of 128 * 128 and reducing
proposing an intelligent lecturer tracking system with kernels amount and sizes.
low-cost, real-time, stable, self-adjusting, and contactless Mishra et al. [12] proposed the direct regression-based
devices, (2) preventing face detection failure by hand gesture and fingertip detection algorithm in 2019.
implementing hand gesture recognition for manual They employed MobileNetV2 [14] architecture as a
command, (3) optimizing the network model on accuracy backbone model and later produced a linear output using
and real-time performance on CPU. global average pooling. Afterward, from the same linear
II. PREVIOUS WORK output, they used three fully connected (FC) layers for
gesture classification, finger identification, and estimation
A. Face Detection of finger position. This algorithm is referred to as the Direct
Recent years we have witnessed the advance of CNN Regression approach as the final positional output of the
based face detectors. Kaipeng, Zhangpeng and Zhifeng fingertips are directly regressed from the FC layers
introduced Multi-task Cascaded Convolutional Network Unified Gesture and Fingertip Detection (UDF) [9] is a
(MTCNN) [11] which combined face detection and face CNN-based unified gesture recognition and fingertip
alignment in one cascaded CNN framework and designed detection algorithm that combines the classification of
lightweight CNN Architecture for real time performance. gestures and regression of fingertips together. Using a single
MTCNN algorithm is divided into 3 big parts which are CNN both the probabilistic output for each of the fingers is
Proposan Network (P-Net), Refine Network (R-Net), and predicted and the positional output of fingertips is regressed
Output Network (O-Net). MTCNN does initialize with in one forward propagation of the network. UDF method
applying resize on input to many different scales to build an unify the classification and regression into a single CNN
image pyramid. The output image pyramid will be the input using a lower-order binary representation. For gesture
of 3-stages cascaded framework.But the issue with MTCNN recognition and fingertip detection, the relevant portion of
is that even though it is lightweight, the speed of detection the hand is cropped from the input image using a bounding
decreases as the number of faces is increasing on the box and resized to (128 × 128). The resized image is used as
picture. So that means more faces, more times needed to the input to the proposed network for learning. During
detect [11]. detection, the real-time object detection algorithm ‘you only

452 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

look once’ (YOLO) [15] is used for hand recognition in the webcam, but for hand gesture recognition only done for
first stage. Later, that hand portion can be cropped and every 2 frame.(3) Output from face detection model and
resized to feed to the proposed framework. For feature hand gesture recognition will be processed by CPU to
learning, 16-layers visual geometry group (VGG) determined if the camera should be moved or not. CPU will
configuration given in is employed. This output is utilized to send signal to Raspberry Pi to move the camera pan/tilt to
generate both the probabilistic output and positional output. the right, up, down, or left. CPU and Raspberry Pi need to
be on the same network (using wireless network for this
III. METHOD research). CPU will do SSH to local IP of Raspberry Pi to
A. Auto-Tracking Camera System Design send the command. Library Paramiko is used to implement
the SSH protocol Python. (4) Raspberry Pi get the signal
We proposed auto-tracking camera system by from the CPU and translate it so that the servomotor
implementing real time face detection and hand gesture connected to it will move. Raspberry Pi give control signal
recognition on CPU devices. CPU devices connected to the to GPIO Pin, so that the jumper cable move pan/tilt as the
webcam to get input images from the environment. Webcam signal given. Webcam is attached with the servomotor to
is connected using USB cable to the CPU and attached to a create the movement of the video.
servomotor for pan/tilt movement. Raspberry Pi used to get
signal from CPU that execute the face detection and The output of face detection were ((x1,y1), (x2,y2)).
recognition model, the resulting output is passed on to the From that coordinate, we calculate the middle point of the
servomotor. The environment used is specific for the face by ((x1-x2) / 2, (y1-y2) / 2) times 2. This coordinate
classroom. Fig 1 show the overall environment to will be used as a comparison with the defined threshold. The
implement the system. distance of vertical threshold (above and bottom) are 20%
from above and 20% from bottom, and the distance of
horizontal threshold defined with the same percentage as
well. The region of interest is defined as Fig. 3 below.

Fig. 1 : Online Learning environment illustration

Fig. 3 : Defined Region of Interest (RoI)

When the coordinate of face is exceeding the threshold


then the CPU will send the signal in the form of command
string using SSH. Sent signal to Raspberry Pi is “echo
[servochannel] = [position]” which servochannel is pan or
tilt channel and position is value of pan/tilt degree. If the
face coordinate is exceeding the left threshold, then CPU
will send signal to the Raspberry Pi by substract -1 to the
position on pan channel and makes the servo move by 1
degree to the left, whereas when exceeding the right
threshold, the CPU will send signal by adding +1 value to
the position on pan channel and makes servo move by 1
degree to the right.
Zoom in/out are also added to the system based on
gesture detection. When 1 finger up gesture is detected, the
camera will start to zoom in and when L shape gesture is
detected, the camera will start to zoom out. Gesture detected
Fig. 2 : Proposed Auto-tracking system camera
based on fingertip detection using low order binary
The proposed auto-tracking camera system works as representation (10 / 01), if only index finger detected, it will
follows : (1) webcam connected directly to the CPU devices detect it as 1 finger up and when thumb also detected, it will
using USB cable, and used to give input into the CPU, (2) detect as L shape gesture.
The input picture from the webcam then processed by face
detection model and hand gesture recognition model. The
input feed into the network for every frame took from

453 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

B. Face Detection Model IShape gesture (or we call it 1 finger up on previous


Face detection method used for this study is FaceBoxes subsection) and JShape gestures (or L shape gestures) which
[11]. are equivalent to SingleOne and SingleEight on reference[19]
1) Dataset: Dataset used for train face detection in
this research is WIDERFACE [16]. WIDERFACE is the
benchmark for almost all face detection state-of-the-art as
also used in former research [11]. The dataset consists of
32,203 pictures and 393,703 faces with high amount of
variant scale, pose, occlusion, expression, appearance and
illumination. The dataset divided by 3 subcategory, those
are training, validation and testing, 40%, 10%, 50%
respectively. For evaluation we use Dataset of AFW[17],
PASCAL [18] and WIDERFACE [16] validation subset.
Authors used validation subset as evaluation dataset is Fig. 4 : All gesture from reference [19]
because authors do not have annotation for testing subset of
WIDERFACE. SCUT-Ego-Gesture dataset does not provide directly or
2) Training Method: This subsection introduces the divide dataset specific for training, validation or even test
data augmentation, matching strategy, loss function, and dataset. Authors divided those dataset manually to 3 parts
other implementation details. Data augmentation was for training, validation and test by mimicking the dataset
applied for each training image and those are Photo-metric used from previous authors and added some of the training
distortions, randomly crop five square patches from the and test dataset amount since authors get more amount of
original image: one is the biggest square patch, and the size the dataset than the one used on previous authors.
of the others range between [0.3, 1] of the short size of the
TABLE I. DISTRIBUTION DETAIL OF SCUT-EGO-GESTURE[19] DATASET
original image. Then we arbitrarily select one patch for
subsequent operations then do Scale transformation to 1,024 Training Validation
Gesture Class Test Set Total
Set Set
x 1,024 pixels. After that applied horizontal flipping with
SingleOne 3,885 267 1,204 5,356
probability of 0.5. During training, we need to determine
which anchors correspond to a face bounding box. We first SingleEight 3,738 257 1,161 5,156
match each face to the anchor with the best jaccard overlap,
and then match anchors to any face with jaccard overlap 2) Training Method: Authors train hand gesture
higher than a threshold (i.e., 0.35). We adopt a 2-class recognition with deep learning technique and optimized
softmax loss for classification and the smooth L1 loss for using mini-batch gradient descent optimization method.
regression. Authors use 2 different architectures for feature extractor
We experimented with Adam and SGD with momentum. which are MobileNetV3 large and MobileNetV3 small.
Authors also enable torch.backends.cudnn.benchmark flag Then, experiments also used 3 different batch size values,
to do computational and time efficiency in training and the same as experiments for face detection. Learning rate is
validation progress. Learning rate was set to 10-3 and set to 10-3 and gradually increases from 100 epoch & 150
gradually increases on 100 epoch & 150 epoch to 10-4 & 10-5 epoch to 10-4 & 10-5 subsequently.
subsequently. With all those parameters we also set a larger To reduce the risk of overfitting, data augmentation is
batch size from previous authors to see if we can find a applied during training by including new training images
better configuration. artificially generated from the existing images of the
3) Testing Method: Authors use one evaluation datasets Augmentation technique used is applied random
metrics to measure overall model quality on 3 different rotation, translation, shear transformation, illumination
dataset. The evaluation metrics we used in this research is variation, scaling, cropping, additive Gaussian noise, and
Average Precision similar to previous authors [11]. We use salt noise. Augmentation applied to every image training.
WIDERFACE dataset to evaluate and comparing the Authors use optimizer Adaptive moment estimation
experiments later. So that in total we use 3 dataset, AFW (ADAM) to optimize training models. authors also enable
[17] dataset with 205 pictures, PASCAL [18] dataset with torch.backends.cudnn.benchmark flag to do computational
851 pictures and WIDERFACE [16] dataset with 3220 and time efficiency in training and validation progress.
pictures. Testing was done on our CPU environment with 2 Binary cross-entropy loss function is used to optimized
core(s). probabilistic output and for positional output using mean
C. Hand Gestures Recognition Model squared error (MSE) loss.
3) Testing Method: The performance of the
Hand Gesture recognition method used for this study is
classification of hand gestures and that of estimation of the
Unified Gesture and Fingertip Detection [9].
fingertips position are evaluated separately. The
1) Dataset: Dataset we used to train the model is performance of the classification is assessed in terms of four
SCUT-Ego-Gesture dataset[19] the same as previous authors. measures, namely, accuracy, precision, recall, and F1 score.
In this study, authors only use 2 classification gestures from The higher the value of accuracy or F1 score, and the closer
many gesture from the dataset because we couldn’t have all the value of precision or recall to unity, the better is the
the dataset. Our dataset consists of 10,512 RGB hand performance of the classification algorithm. In all of these
pictures with 640x480 resolution. Those 2 gestures are evaluation metrics, unless otherwise stated, the confidence

454 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

threshold is set to 50%. To evaluate the performance of


estimation of fingertip position, the error in terms of mean
Euclidean distance between ground truth pixel coordinate
and regressed pixel coordinate.
IV. EXPERIMENTS AND RESULTS
A. Implementation Camera Results
The models we used to implement an auto-tracking
camera are FaceBoxes for face detection and Unified
Gesture and Fingertip Detection for Hand gesture
recognition. MobileNetV3 small chosen to be the
architecture for hand gesture as it performs real time on the
CPU environment. Hand detection used is YOLO. Even
though MobileNetV3 small has smaller accuracy on every Fig. 6 : Results how camera were moved after lecturer switch
metric compared to the large version, we still use it as positions
authors argue that the difference on performance metrics
doesn’t really have a big impact on real life use (small gap). Testing found that the hand detection is not really great on
detecting small hand, thus the classification gesture can only
Face is detected for every frame input from webcam. be done if lecturer distance to the camera is around 1 – 2
But, different with hand detection, we only do it every 2 meters. Face detection can detect faces up to 5 meters. The
frame as to increase the overall performance of the system. zoom in and zoom out is done by resizing frame using
Fingertip and classification gesture detection is only fed OpenCV library resulting in digital zoom. Zoom in and
when the hand is detected by YOLO. Fig. 5 shows
zoom out done with scale of 5 pixels. As future research is
illustration of how face and hand is detected.
done we can use hand gesture as many other features
(remain customizable).
Authors do a little experiment which uses Raspberry Pi as a
CPU device for model to detect. But the result is that
Raspberry Pi has poor performance on doing inference for
the model and resulting in 0.46s inference time. This is not
enough to reach real time performance, thus we keep on
using our CPU devices but it’s a great idea for future
research to use a more low computational devices such as
Raspberry Pi to lower the overall cost.

Fig. 5 : Face and Hand gesture detection B. Experiments Variables


Parameters can affect to the deep learning model
progress and performance. Some of the experiments authors
conduct by changing some of the important parameters
based on research that could potentially improve the model
performance. Those experiments variables are:
1) Architecture CNN: MobileNetV3 large and
MobileNetV3 small on Hand gesture recognition.
MobileNet [20] is a lightweight neural network for feature
extractor. MobileNet use Depth Wise Separable
Convolutions which are a key building block for many
efficient neural network architecture. The basic idea is to
replace a full convolutional operator with a factorized
version that splits convolution into two separate layers. So
the purpose of the backbone is pretty clear that want to have
Fig.6 shows the starting position of lecturer and how camera moved to the
direction of the lecturer
a more efficient neural network architecture. In this study
we use large and small to see if we could have the accuracy
of large and still run real-time on our environment CPU or
we need to use the small version to reach real-time
performance. For the notes, previous works use VGG16 as
feature extractor and as we know it is that VGG16 has a
very complex layer and any other things that makes this
model can’t run real time on CPU devices.
2) Optimizer: Adam and SGD with Momentum on
Face Detection. Optimizer also has important role in deep
learning model development as it controls the value of
learning rate. Learning rate is crucial factor which indicate

455 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

how much information that the model will learn in an Moment


iteration. Larger learning rate means more information that um
the model will receive and vice versa. Research conducted 0.80 0.73138 0.3712
64 Adam 98.02 96.30
4660 9 37
on optimizer and find that one of the most popular optimizer
SGD
used is Adam which use Adaptive learning rate for each of with 0.79 0.71045 0.3505
the parameters so that the parameters that contribute more to 128 97.92 95.54
Moment 5851 5 37
the loss can have proper learning rate. Adam also use um
momentum for gradient vector reaching the more exact 0.81 0.74032 0.3748
128 Adam 97.61 96.83
6220 0 38
direction. Even though many of researcher use SGD pure
vanilla on their research, in this study authors want to see if
there is any improvement after we use Adam on face
detection FaceBoxes. 2) Evaluation of Architecture CNN UDF: Table 3
3) Batch Size: 32, 64, and 128 on both Model. Batch shows that MobileNetV3 small gives worse performance
size has high impact on training progress and also model than MobileNetV3 large for all cases.The results is kinda
performance as mini-batch gradient decent uses loss of expected because the MobileNetV3 small is created to have
small batch size to update the learn-able parameters. As a smaller model than large one. This experiments only show
described in former research [21][22], there are trade-offs that the small version of MobileNetV3 don’t really have a
between large and small batch sizes used in training. big difference on classification metrics but on pixel error it
Another former researcher indicates that large batch training kinda have big margin. Authors argue that this can be
can have a better generalization [23]. So to test it on our caused of information lost on fully connected layer. The
model to see how this will affect overall generalization, we information lost maybe caused by the last layer before fully
try on 32, 64 and 128 batch sizes for each face detection and connected layer that is not really have much parameters.
hand gesture recognition model.
C. Experiment Results TABLE III. ARCHITECTURE EXPERIMENTS RESULTS ON UNIFIED
GESTURE AND FINGERTIP DETECTION
The total of experiments conducted for each FaceBoxes
and Unified Gesture and Fingertip Detection are 6
experiments. Final best combination parameter then will be Archi-
Metric ( Mean)
implemented for the auto-tracking camera system. This Batch
tecture Accuracy Precision Recall F1 Pixel
Size
research tries to analyze the impact of architecture CNN, (%) (%) (%) Score Error
optimizer and Batch size on performance and inference MobileNet
32 99.3453 99.35275 99.33950 0.9934510 13,34625
time. All these experiments have the same total of epoch V3 Small
which is 200 but the best epoch is taken for each experiment MobileNet
32 99.5090 99.51040 99.50745 0.9950885 10,07090
V3 Large
to be compared. MobileNet
64 99.2226 99.23115 99.216 0.9922225 12,5661
V3 Small
1) Evaluation of FaceBoxes Optimizer: Table 2 shows MobileNet
64 99.4681 99.4768 99.4616 0.9946785 10,52515
implementation of Adam increases the overall performance V3 Large
of face detection on most of the experiments. But the MobileNet
128 99.0589 99.073 99.04945 0.9905845 15,0092
V3 Small
highest generalization performance comes from Optimizer MobileNet
128 99.4272 99.43235 99.4228 0.9942695 11,0594
SGD with Momentum on 64 batch size. Authors argue that V3 Large
the impact is not only on the optimizer but due to learning
rate and total epoch. Although maybe Adam can have faster 3) Evaluation of Batch size implementation on
convergence on the model, many researchers argue that FaceBoxes and Hand Gesture recognition: As shown in
SGD with momentum could do it better on longer training Table 2 and 3, decreasing the batch size also not improve
because Adam stuck on local minima at some point. Further the generalization performance instantly. However, on UDF
experiment and research is needed to find appropriate experiments it shows consistently how decreasing batch size
configuration hyperparameters to see which is producing a is improving the generalization performance. Authors argue
better model. that the inconsistent results on faceboxes caused by another
parameters such as learning rate which is not exactly the
TABLE II. OPTIMIZER EXPERIMENTS RESULTS ON FACEBOXES same as previous researchers [11]. But we can see that 128
batch size on both model perform poorly than the other
AveragePrecision (AP)
batch size so that authors argue 32 and 64 batch size can be
Batch Optimi- WiderFace (Validation experimented more to see if we can really have a better.
AFW PASCAL
Size zer Set)
Data-s Data-
Med-
4) Evaluation of Inference Time: Table 4 shows the
et (%) set (%) Easy Hard impact of MobileNetV3 large and small on Unified Gesture
ium
SGD and Fingertip Detection whereas the small architecture have
32
with
98.23 96.25
0.82 0.72907 0.3600 50% decrease inference time than the large one. This is
Moment 1285 1 41 because small version of MobileNetV3 is having less layer,
um
parameters and biases so that the complexity of the model is
0.82 0.75079 0.3816
32 Adam 98.69 96.67 decreases. Authors choose small one rather than large one
5365 7 70
SGD 0.82 0.75524 0.3860 because if we compare trade-off between performance
64 98.21 96.65 metrics as shows on Table 3 and inference time on table 4,
with 7300 4 72
the small version is better to implement than the large one.

456 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

TABLE IV. INFERENCE TIME DIFFERENCE BETWEEN LARGE AND SMALL SGD with Momentum can outperform Adam on longer
VERSION OF MOBILENETV3 ON UDF
training time.
Method Architecture Inference Time (ms)
Unified Gesture and
REFERENCES
MobileNetV3 Large 165.6
Fingertip Detection [1] Qiyun Wang, C. L. (2017). Designing and Improving a Blended
MobileNetV3 Small 87.7 Synchronous Learning Environment: An Educational Design
Research. International Review of Research in Open and Distance
Learning.
5) Evaluation of Model Size: Model size is affected by
[2] Jose Miguel Santos Espino, A. S. (2016). Speakers and boards: A
how many the learnable parameters (weights and biases) are survey of instructional video styles in MOOCs. Society for Technical
needed to do inference. Table 4 shows that the model which Communication.
is built by small version of MobileNetV3 has smaller model [3] Hsien-Chou Liao, M.-H. P.-C. (2015). An Automatic Lecture
size. MobileNetV3 small has simpler architecture, hence it Recording System Using Pan-Tilt-Zoom Camera to Track Lecturer
has smaller learnable parameters than large one for each and Handwritten Data. International Journal of Applied Science and
Engineering.
cell. Authors found that the UDF-MobileNetV3 small has
[4] Tan-Hsu Tan, T.-Y. K. (2019). Intelligent Lecturer Tracking and
11,425,126 parameters while UDF-MobileNetV3 large has Capturing System Based on Face Detection and Wireless Sensing
19,764,982 parameters. But authors also see and compare it Technology. MDPI Open Access Journals.
to UDF-VGG16 which used by previous authors and has [5] Alex Krizhevsky, I. S. (2012). ImageNet Classification with Deep
24,163,654 parameters. We see that there is a fair gap Convolutional.
between VGG16 and MobileNetV3 Large and a large gap [6] Shifeng Zhang, X. Z. (2017). S3FD: Single Shot Scale-invariant Face
on small version. Detector. arXiv.
[7] Xu Tang, D. K. (2018). PyramidBox: A Context-assisted Single Shot
TABLE V. INFERENCE TIME DIFFERENCE BETWEEN LARGE AND SMALL VERSION OF Face Detector. arXiv.
MOBILENETV3 ON UDF [8] Jian Li, Y. W. (2018). DSFD: Dual Shot Face Detector. arXiv.
Forward/ [9] Mohammad Mahmudul Alam, M. T. (2021). Unified Learning
Params Approach for Egocentric Hand Gesture Recognition and Fingertip
Backward Model
Method Architecture Size Detection. arXiv.
pass size Size (MB)
(MB)
(MB) [10] Kaipeng Zhang, Z. Z. (2016). Joint Face Detection and Alignment
Unified MobileNetV3 using Multi-task Cascaded Convolutional Networks. arXiv.
38.13 75.40 113.71
Detection Large [11] Shifeng Zhang, X. Z. (2017). FaceBoxes: A CPU Real-time Face
Fingertip Detector with High Accuracy. arXiv.
and Hand [12] Purnendu Mishra, K. S. (2019). Fingertips Detection in Egocentric
MobileNetV3
Gesture 13.04 43.58 56.81 Video Frames using Deep Neural Networks. International Conference
Small
Recognition on Image and Vision Computing New Zealand (IVCNZ).
[13] Wenbin Wu, C. L. (2017). YOLSE: Egocentric Fingertip Detection
from Single RGB Images. IEEE International Conference on
Computer Vision Workshops (ICCVW).
V. CONCLUSION
[14] Mark Sandler, A. H.-C. (2018). MobileNetV2: Inverted Residuals and
This research tries to implement real-time auto-tracking Linear Bottlenecks. The IEEE Conference on Computer Vision and
camera system for online learning which specified focus on Pattern Recognition (CVPR).
live lecture direct recording classroom type of video. The [15] Joseph Redmon, S. D. (2015). You Only Look Once: Unified,
results are not being tested on real fields yet and still need Real-Time Object Detection. arXiv.
some improvement on the model itself so that the model can [16] Shuo Yang, P. L. (2015). WIDER FACE: A Face Detection
run faster or user can use another higher devices power if Benchmark. arXiv.
can. The auto-tracking camera can be used for online [17] Xiangxin Zhu, D. R. (2012). Face Detection, Pose Estimation, and
learning so that the lecturer no need to manually adjust Landmark Localization in the Wild. IEEE.
camera position. Manual command by hand gesture is used [18] M. Everingham, L. G. (2009). The Pascal Visual Object Classes
(VOC) Challenge. International Journal of Computer Vision.
to give any other command such as zoom in and zoom out
and can be changed for future research. [19] Huang, Y. (2016, Oct 31). EgoFinger. SCUT HCII Lab.
[20] Andrew Howard, M. S.-C. (2019). Searching for MobileNetV3.
The experiments on face detection and hand gesture arXiv.
model used in this research : CNN (MobileNetV3 large and [21] Nitish Shirish Keskar, D. M. (2017). On Large-Batch Training for
MobileNetV3 small), optimizer (Adam and SGD with Deep Learning: Generalization Gap and Sharp Minima. arXiv.
Momentum), and batch size (32, 64, and 128). Overall, the [22] Dominic Masters, C. L. (2018). Revisiting Small Batch Training for
MobilenetV3 large version outperforms the small one on Deep Neural Networks. arXiv.
accuracy by small margin but loses on inference time by big [23] Ibrahem Kandel, M. C. (2020). The effect of batch size on the
margin. Trade-off between accuracy and inference time generalizability of the convolutional neural networks on a
histopathology dataset. ICT Express.
makes authors choose smaller versions to be implemented
on auto-tracking camera systems. Authors also find that on
face detection and hand gesture recognition, smaller batch
size tend to have better generalization as former researchers
[21][22][23] also found. But further research needed on face
detection which 64 values batch size have best
generalization as authors argue it causes by other
parameters. Optimizer on overall Adam has the best
generalization on face detection model. But same as batch
size this needs further research as Adam is fast but maybe

457 28 October 2021, Jakarta - Indonesia


2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

2021 1st International Conference on Computer Science and Artificial Intelligence


(ICCSAI)
ISBN: 978-1-7281-3333-1 | IEEE Part Number: CFP19H83-ART

Authors Index
Author TITLE Pages
Indonesia China Trade Relations, Social Media and Sentiment
Aditya Permana 334
Analysis: Insight from Text Mining Technique
Sinophobia in Indonesia and Its Impact on Indonesia-China Economic
Aditya Permana 340
Cooperation with the SVM (Support Vector Machine) Approach
Aditya Winata Predicting Stock Market Prices using Time Series SARIMA 92
Building Natural Language Understanding System from User Manual
Afiyati Reno 207
to Execute Office Application Functions
Agung Purnomo Big Data For Smart City: An Advance Analytical Review 307
Application of Internet of Things in Smart City: A Systematic
Agung Purnomo 324
Literature Review
Ahmad Ridhwan Naufal Development of Smart Restaurant Application for Dine-In 230
Ainur Rosyid E-Learning Service Issues and Challenges: An Exploratory Study 196
A Hydrodynamic Analysis of Water System in Dadahup Swamp
Alam Ahmad Hidayat 400
Irrigation Area
Blockchain Technology behind Cryptocurrency and Bitcoin for
Albert Ivando Owen 115
Commercial Transactions
Application of Internet of Things in Smart City: A Systematic
Albert Verasius Dian Sano 324
Literature Review
The Influence of UI UX Design to Number of Users Between ‘Line’
Alexander A S Gunawan 27
and ‘Whatsapp’
Blockchain Technology behind Cryptocurrency and Bitcoin for
Alexander A S Gunawan 115
Commercial Transactions
Factors that Affect Data Gathered Using Interviews for Requirements
Alexander A S Gunawan 161
Gathering
Utilization Big Data and GPS to Help E-TLE System in The Cities of
Alexander A S Gunawan 236
Indonesia
Development of Stock Market Price Application to Predict Purchase
Alexander A.S Gunawan 431
and Sales Decisions Using Proximal Policy Optimization Method
Alexander Agung Santoso Gunawan The Effect of UI/UX Design on User Satisfaction in Online Art Gallery 120
The Impact of E-Transport Platforms’ Gojek and Grab UI/UX Design to
Alexander Agung Santoso Gunawan 167
User Preference in Indonesia
Alexander Agung Santoso Gunawan Line Follower Smart Trolley System V2 using RFID 17
Development of Portable Temperature and Air Quality Detector for
Alexander Agung Santoso Gunawan 47
Preventing Covid-15
Ali Abdurrab Development of Smart Restaurant Application for Dine-In 230
Alicia Line Follower Smart Trolley System V2 using RFID 17
A Systematic Literature Review of Fintech Investment and
Almira Rahma Saphyra 184
Relationship with Bank in Developed Countries.
Alvin Wijaya The Effect of UI/UX Design on User Satisfaction in Online Art Gallery 120
Design of Cadets Administration System for Nusantara Cilacap
Ana Umul Fadilah 142
Maritime Academy Based On Website
RTR AR PHOTO BOOTH: THE REAL-TIME RENDERING AUGMENTED
Anderies 289
REALITY PHOTO BOOTH

XV
2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Finetunning IndoBERT to Understand Indonesian Stock Trader Slang


Anderies 42
Language
Utilization Big Data and GPS to Help E-TLE System in The Cities of
Andini Artika Dewi 236
Indonesia
Memorize COVID-19 Advertisement: Customer Neuroscience Data
Andreas Kurniawan 427
Collection Techniques by Using EEG and fMRI
Design of Water Information Management System in Palm Oil
Andreas Wahyu Krisdiarto 395
Plantation
Coronary Artery Disease Prediction Model using CART and SVM: A
Andrian Hartanto 72
Comparative Study
Andriatama Bagaskara Development of Smart Restaurant Application for Dine-In 230
Spatiotemporal Features Learning from Song for Emotions
Andry Chowanda 407
Recognition with Time Distributed CNN
The Influence of UI UX Design to Number of Users Between ‘Line’
Angelica Nadia 27
and ‘Whatsapp’
Exploration of React Native Framework in designing a Rule-Based
Anik Hanifatul Azizah 391
Application for healthy lifestyle education
Building Natural Language Understanding System from User Manual
Anis Cherid 207
to Execute Office Application Functions
AR-Mart: The Implementation of Augmented Reality as a Smart Self-
Anita Rahayu 413
Service Cashier in the Pandemic Era
Anthony Steven T Smart Electricity Meter as An Advisor for Office Power Consumption 202
Street View Object Detection for Autonomous Car Steering Angle
Antoni Wibowo 367
Prediction Using Convolutional Neural Network
Development of Robot to Clean Garbage in River Streams with Deep
Antonio Josef 51
Learning
Antonius Rianto Analysis of Big Data in Healthcare Using Decision Tree Algorithm 313
Enhancement Design for Smart Parking System Using IoT and A-Star
Arief Agus Sukmandhani 190
Algorithm
Arief Darvin Usability Evaluation of Learning Management System 269
Review Literature Performance : Quality of Service from Internet of
Arief Kusuma Among Praja 444
Things for Transportation System
Arif Budiarto Explainable Supervised Method for Genetics Ancestry Estimation 422
Armando Rilentuah Parhusip E-Learning Service Issues and Challenges: An Exploratory Study 196
Artha Bastanta Image Data Encryption Using DES Method 130
Asnan Habib Munassar Big Data For Smart City: An Advance Analytical Review 307
Aziza Chakir Analysis of Big Data in Healthcare Using Decision Tree Algorithm 313
Identify High-Priority Barriers to Effective Digital Transformation in
Bambang Dwi Wijanarko 76
Higher Education: A Case Study at Private University in Indonesia
Identify High-Priority Barriers to Effective Digital Transformation in
Bayu Rima Aditya 76
Higher Education: A Case Study at Private University in Indonesia
AR-Mart: The Implementation of Augmented Reality as a Smart Self-
Bening Insaniyah Al-Abdillah 413
Service Cashier in the Pandemic Era
Finetunning IndoBERT to Understand Indonesian Stock Trader Slang
Bens Pardamean 42
Language
Systematic Literature Review: An Intelligent Pulmonary TB Detection
Bens Pardamean 136
from Chest X-Rays
Bens Pardamean IoT Sensors Integration for Water Quality Analysis 361
Design of Water Information Management System in Palm Oil
Bens Pardamean 395
Plantation
A Hydrodynamic Analysis of Water System in Dadahup Swamp
Bens Pardamean 400
Irrigation Area
Bens Pardamean Explainable Supervised Method for Genetics Ancestry Estimation 422

XVI
2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Betley Heru Susanto Detrimental Factors of the Development of Smart City and Digital City 318
Development of Stock Market Price Application to Predict Purchase
Bilqis Ashifa S 431
and Sales Decisions Using Proximal Policy Optimization Method
The Impact of E-Transport Platforms’ Gojek and Grab UI/UX Design to
Bima Bagaskarta Ridwanto 167
User Preference in Indonesia
Boby Siswanto Smart Electricity Meter as An Advisor for Office Power Consumption 202
An Efficient System to Collect Data for AI Training on Multi-Category
Brian Haessel 22
Object Counting Task
Enhancement Design for Smart Parking System Using IoT and A-Star
Briant Stevanus 190
Algorithm
Development of Robot to Clean Garbage in River Streams with Deep
Brilyan Nathanael Rumahorbo 51
Learning
Bryan Gavriell Implementation of Face Recognition Method for Attendance in Class 148
Budi Yulianto Web Based Application for Ordering Food Raw Materials 1
Impact of Computer Vision With Deep Learning Approach in Medical
Charleen 37
Imaging Diagnosis
Intelligent Computational Model for Early Heart Disease Prediction
Charles Bernando using Logistic Regression and Stochastic Gradient Descent (A 11
Preliminary Study)
Coronary Artery Disease Prediction Model using CART and SVM: A
Charles Bernando 72
Comparative Study
AR-Mart: The Implementation of Augmented Reality as a Smart Self-
Chasandra Puspitasari 413
Service Cashier in the Pandemic Era
Impact of Computer Vision With Deep Learning Approach in Medical
Cheryl Angelica 37
Imaging Diagnosis
Christian Aditya Nugroho Image Data Encryption Using DES Method 130
Aspect Based Sentiment Analysis: Restaurant Online Review Platform
Clement Neonardi in Indonesia with Unsupervised Scraped Corpus in Indonesian 213
Language
Application of Internet of Things in Smart City: A Systematic
Corinthias P.M. Sianipar 324
Literature Review
A Comparison of Lexicon-based and Transformer-based Sentiment
Cuk Tho 81
Analysis on Code-mixed of Low-Resource Languages
Daniel Alexander Covid-19 Vaccine Tweets - Sentiment Analysis 126
Auto-Tracking Camera System for Remote Learning Using Face
Daniel Imanuel Sutanto Detection and Hand Gesture Recognition Based on Convolutional 451
Neural Network
Development of Portable Temperature and Air Quality Detector for
Danu Widhyatmoko 47
Preventing Covid-17
Darian Handoro Detrimental Factors of the Development of Smart City and Digital City 318
Darren Anando Leone A Survey: Crowds Detection Method on Public Transportation 258
Daryl Predicting Stock Market Prices using Time Series SARIMA 92
Utilization Big Data and GPS to Help E-TLE System in The Cities of
David Yu 236
Indonesia
Comparative of Advanced Sorting Algorithms (Quick Sort, Heap Sort,
Davin William Pratama Merge Sort, Intro Sort, Radix Sort) Based on Time and Memory 154
Usage
Dea Asya Ashilla Study on Face Recognition Techniques 301
Debri Pristinella Smart Tourism Services: A Systematic Literature Review 329
Derwin Suhartono Predicting Stock Market Prices using Time Series SARIMA 92
Sentiment Analysis using SVM and Naïve Bayes Classifiers on
Derwin Suhartono 100
Restaurant Review Dataset
Towards Classification of Personality Prediction Model: A
Derwin Suhartono 346
Combination of BERT Word Embedding and MLSMOTE

XVII
2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Spread of COVID-19 Deaths in Jakarta: Cluster and Regression


Derwin Suhartono 379
Analysis
Indonesian Banking Stock Price Prediction with LSTM and Random
Derwin Suhartono 385
Walk Method
Comparison of Gaussian Hidden Markov Model and Convolutional
Devriady Pratama 5
Neural Network in Sign Language Recognition System
Review Literature Performance : Quality of Service from Internet of
Dewanto Rosian Adhy 444
Things for Transportation System
Effective Methods for Fake News Detection: A Systematic Literature
Dewi Mutiara 278
Review
A Comparison of Artificial Intelligence-Based Methods in Traffic
Diamanta 32
Prediction
Design of Water Information Management System in Palm Oil
Digdo Sudigyo 395
Plantation
Application of Internet of Things in Smart City: A Systematic
Dimas Rizki Haikal 324
Literature Review
Performance Analysis Between Cloud Storage and NAS to Improve
Dimas Sekti Adji 263
Company’s Performance: A Literature Review
Identify High-Priority Barriers to Effective Digital Transformation in
Dina Fitria Murad 76
Higher Education: A Case Study at Private University in Indonesia
Dina Fityria Murad Student Performance Based on Student Final Exam Prediction 224
Expert System to Predict Acute Inflammation of Urinary Bladder and
Diyah Anggraeny 243
Nephritis Using Naïve Bayes Method
Exploiting Facial Action Unit in Video for Recognizing Depression
Djasminar Anwar 438
using Metaheuristic and Neural Networks
Dominikus Sutrisno Ariyanto IoT Sensors Integration for Water Quality Analysis 361
Design of Water Information Management System in Palm Oil
Eddy Julianto 395
Plantation
Building Natural Language Understanding System from User Manual
Edi Winarko 207
to Execute Office Application Functions
Design of Cadets Administration System for Nusantara Cilacap
Eduard Pangestu Wonohardjo 142
Maritime Academy Based On Website
Factors that Affect Data Gathered Using Interviews for Requirements
Edward Rezzky 161
Gathering
A Comparison of Artificial Intelligence-Based Methods in Traffic
Edy Irwansyah 32
Prediction
Edy Irwansyah A Review of Signature Recognition Using Machine Learning 219
The Search for the Best Real-Time Face Recognition Method for
Edy Irwansyah 249
Finding Potential COVID Patients
Development of Portable Temperature and Air Quality Detector for
Edy Irwansyah 47
Preventing Covid-16
Intelligent Computational Model for Early Heart Disease Prediction
Eka Miranda using Logistic Regression and Stochastic Gradient Descent (A 11
Preliminary Study)
Coronary Artery Disease Prediction Model using CART and SVM: A
Eka Miranda 72
Comparative Study
Indonesia China Trade Relations, Social Media and Sentiment
Eka Miranda 334
Analysis: Insight from Text Mining Technique
Sinophobia in Indonesia and Its Impact on Indonesia-China Economic
Eka Miranda 340
Cooperation with the SVM (Support Vector Machine) Approach
RTR AR PHOTO BOOTH: THE REAL-TIME RENDERING AUGMENTED
Eko Setyo Purwanto 289
REALITY PHOTO BOOTH
Elizabeth Ann Soelistio A Review of Signature Recognition Using Machine Learning 219

XVIII
2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Estimation of Technology Acceptance Model (TAM) on the Adoption


Elok Fitriani Rafikasari of Technology in the Learning Process Using Structural Equation 86
Modeling (SEM) with Bayesian Approach
Design of Cadets Administration System for Nusantara Cilacap
Emny Harna Yossy 142
Maritime Academy Based On Website
The Search for the Best Real-Time Face Recognition Method for
Eugene Reginald Patrick 249
Finding Potential COVID Patients
Evaristus Didik Madyatmadja Big Data For Smart City: An Advance Analytical Review 307
Evaristus Didik Madyatmadja Analysis of Big Data in Healthcare Using Decision Tree Algorithm 313
Evaristus Didik Madyatmadja Detrimental Factors of the Development of Smart City and Digital City 318
Application of Internet of Things in Smart City: A Systematic
Evaristus Didik Madyatmadja 324
Literature Review
Evaristus Didik Madyatmadja Smart Tourism Services: A Systematic Literature Review 329
Self-Checkout System Using RFID (Radio Frequency Identification)
Fachrurrozi Maulana 273
Technology: A Survey
Intelligent Computational Model for Early Heart Disease Prediction
Faqir M Bhatti using Logistic Regression and Stochastic Gradient Descent (A 11
Preliminary Study)
Felix Fauzan Implementation of Face Recognition Method for Attendance in Class 148
Ferdinand Nathaniel E Smart Electricity Meter as An Advisor for Office Power Consumption 202
Health Chatbot Using Natural Language Processing for Disease
Fernando Chai 62
Prediction and Treatment
Sentiment Analysis of E-commerce Review using Lexicon Sentiment
Ferry Agustius Wong 68
Method
Blockchain Technology behind Cryptocurrency and Bitcoin for
Franseda Ricardo 115
Commercial Transactions
Blockchain Technology behind Cryptocurrency and Bitcoin for
Frederik Arnold Cahyadi 115
Commercial Transactions
Impact of Computer Vision With Deep Learning Approach in Medical
Fredy Purnomo 37
Imaging Diagnosis
Performance Analysis Between Cloud Storage and NAS to Improve
Gabriel Eduardus 263
Company’s Performance: A Literature Review
Gerry Firmansyah E-Learning Service Issues and Challenges: An Exploratory Study 196
A Comparison of Artificial Intelligence-Based Methods in Traffic
Gian Avila 32
Prediction
AR-Mart: The Implementation of Augmented Reality as a Smart Self-
Gusti Pangestu 413
Service Cashier in the Pandemic Era
Exploiting Facial Action Unit in Video for Recognizing Depression
Habibullah Akbar 438
using Metaheuristic and Neural Networks
Review Literature Performance : Quality of Service from Internet of
Habibullah Akbar 444
Things for Transportation System
Handry Novianto A Survey: Crowds Detection Method on Public Transportation 258
Development of Robot to Clean Garbage in River Streams with Deep
Handy Pratama 51
Learning
Auto-Tracking Camera System for Remote Learning Using Face
Hanry Ham Detection and Hand Gesture Recognition Based on Convolutional 451
Neural Network
Harjanto Prabowo Cultural Tourism Technology Used and Themes: A Literature Review 355
Factors that Affect Data Gathered Using Interviews for Requirements
Hendra 161
Gathering
Impact of Computer Vision With Deep Learning Approach in Medical
Hendrik Purnama 37
Imaging Diagnosis

XIX
2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Application of Internet of Things in Smart City: A Systematic


Hendro Nindito 324
Literature Review
Hendro Nindito Cultural Tourism Technology Used and Themes: A Literature Review 355
Hendy Tannady Analysis of Big Data in Healthcare Using Decision Tree Algorithm 313
The Impact of E-Transport Platforms’ Gojek and Grab UI/UX Design to
Henry Hamilton Prasetya 167
User Preference in Indonesia
Towards Classification of Personality Prediction Model: A
Henry Lucky 346
Combination of BERT Word Embedding and MLSMOTE
Development of Stock Market Price Application to Predict Purchase
Heri Ngarianto 431
and Sales Decisions Using Proximal Policy Optimization Method
Heri Ngarianto Line Follower Smart Trolley System V2 using RFID 17
Comparison of Gaussian Hidden Markov Model and Convolutional
Herman Gunawan 5
Neural Network in Sign Language Recognition System
Herman Tolle Line Follower Smart Trolley System V2 using RFID 17
Hermantoro IoT Sensors Integration for Water Quality Analysis 361
Effectiveness of LMS in Online Learning by Analyzing Its Usability and
I Putu Gede Prama Duta 56
Features
Ignasius Kenny Bagus Student Performance Based on Student Final Exam Prediction 224
Determining the best Delivery Service in Jakarta using Tsukamoto
Ignatius Hansen 284
Fuzzy Algorithm
Street View Object Detection for Autonomous Car Steering Angle
Ilvico Sonata 367
Prediction Using Convolutional Neural Network
A Comparison of Lexicon-based and Transformer-based Sentiment
Imam Herwidiana Kartowisastro 81
Analysis on Code-mixed of Low-Resource Languages
Imam Sutanto E-Learning Service Issues and Challenges: An Exploratory Study 196
Indira Mannuela Level of Password Vulnerability 351
Indriani Noor Hapsari E-Learning Service Issues and Challenges: An Exploratory Study 196
Spread of COVID-19 Deaths in Jakarta: Cluster and Regression
Intan Saskia 379
Analysis
Design of Water Information Management System in Palm Oil
Irya Wisnubhadra 395
Plantation
Ivano Ekasetia Dhojopatmo Development of Smart Restaurant Application for Dine-In 230
Ivyna Johansen Study on Face Recognition Techniques 301
Development of Portable Temperature and Air Quality Detector for
Jarot Soeroso Sembodo 47
Preventing Covid-19
Sentiment Analysis using SVM and Naïve Bayes Classifiers on
Jason Cornelius Sugitomo 100
Restaurant Review Dataset
Jeffrey Clay S Smart Electricity Meter as An Advisor for Office Power Consumption 202
Jeffry Kosasih Usability Evaluation of Learning Management System 269
Jessy Putri Level of Password Vulnerability 351
Systematic Literature Review: An Intelligent Pulmonary TB Detection
Jimmy 136
from Chest X-Rays
Johanes Fernandes Andry Analysis of Big Data in Healthcare Using Decision Tree Algorithm 313
Junaedi Dede Line Follower Smart Trolley System V2 using RFID 17
Kefry The Effect of UI/UX Design on User Satisfaction in Online Art Gallery 120
The Search for the Best Real-Time Face Recognition Method for
Kevin 249
Finding Potential COVID Patients
Kristien Margi Suryaningrum Implementation of Face Recognition Method for Attendance in Class 148
Comparative of Advanced Sorting Algorithms (Quick Sort, Heap Sort,
Kristien Margi Suryaningrum Merge Sort, Intro Sort, Radix Sort) Based on Time and Memory 154
Usage
Compare the Path Finding Algorithms that are Applied for Route
Kristien Margi Suryaningrum 178
Searching in Maps

XX
2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Memorize COVID-19 Advertisement: Customer Neuroscience Data


Lee Huey Yi 427
Collection Techniques by Using EEG and fMRI
Exploiting Facial Action Unit in Video for Recognizing Depression
Lita Patricia Lunanta 438
using Metaheuristic and Neural Networks
Luwita Student Performance Based on Student Final Exam Prediction 224
Compare the Path Finding Algorithms that are Applied for Route
M Brian Aqacha Handoko 178
Searching in Maps
Review Literature Performance : Quality of Service from Internet of
M. Fachruddin Arrozi Adhikara 444
Things for Transportation System
A Comparison of Artificial Intelligence-Based Methods in Traffic
M. Ilham Hudaya 32
Prediction
Developing An Automated Face Mask Detection Using Computer
Mahatmaputra 109
Vision and Artificial Intelligence
Comparative of Advanced Sorting Algorithms (Quick Sort, Heap Sort,
Marcellino Marcellino Merge Sort, Intro Sort, Radix Sort) Based on Time and Memory 154
Usage
Memorize COVID-19 Advertisement: Customer Neuroscience Data
Maria Seraphina Astriani 427
Collection Techniques by Using EEG and fMRI
Effectiveness of LMS in Online Learning by Analyzing Its Usability and
Maria Susan Anggreainy 56
Features
Maria Susan Anggreainy Covid-19 Vaccine Tweets - Sentiment Analysis 126
Maria Susan Anggreainy Level of Password Vulnerability 351
Sentiment Analysis of E-commerce Review using Lexicon Sentiment
Maria Susan Anggreainy 68
Method
Auto-Tracking Camera System for Remote Learning Using Face
Maria Vanessa Salim Detection and Hand Gesture Recognition Based on Convolutional 451
Neural Network
Intelligent Computational Model for Early Heart Disease Prediction
Mediana Aryuni using Logistic Regression and Stochastic Gradient Descent (A 11
Preliminary Study)
Coronary Artery Disease Prediction Model using CART and SVM: A
Mediana Aryuni 72
Comparative Study
Performance Analysis Between Cloud Storage and NAS to Improve
Michael 263
Company’s Performance: A Literature Review
Michael Level of Password Vulnerability 351
Sentiment Analysis of E-commerce Review using Lexicon Sentiment
Michael Hakkinen 68
Method
Indonesian Banking Stock Price Prediction with LSTM and Random
Mike Christ Heru 385
Walk Method
Performance Analysis Between Cloud Storage and NAS to Improve
Minawati 263
Company’s Performance: A Literature Review
Effectiveness of LMS in Online Learning by Analyzing Its Usability and
Mochamad Rizky Febriansyah 56
Features
Muhamad Daffa Mennawi Line Follower Smart Trolley System V2 using RFID 17
RTR AR PHOTO BOOTH: THE REAL-TIME RENDERING AUGMENTED
Muhamad Fajar 289
REALITY PHOTO BOOTH
Muhamad Firman M Smart Electricity Meter as An Advisor for Office Power Consumption 202
The Impact of E-Transport Platforms’ Gojek and Grab UI/UX Design to
Muhammad Ashraf Rahman 167
User Preference in Indonesia
Muhammad Attamimi Line Follower Smart Trolley System V2 using RFID 17
Exploration of React Native Framework in designing a Rule-Based
Muhammad Bahrul Ulum 391
Application for healthy lifestyle education
Muhammad Fachri Akbar A Systematic Literature Review: Database Optimization Techniques 295

XXI
2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Muhammad Farrel Revikasha Waste Classification Using EfficientNet-B0 253


Muhammad Fikri Hasani Immersive Experience with Non-Player Characters Dynamic Dialogue 418
Development of Robot to Clean Garbage in River Streams with Deep
Muhammad Hafizh Ramadhansyah 51
Learning
Muhammad Nooryoku R Smart Electricity Meter as An Advisor for Office Power Consumption 202
Muhammad Siraz Hafizh Covid-19 Vaccine Tweets - Sentiment Analysis 126
Building Natural Language Understanding System from User Manual
Mujiono Sadikin 207
to Execute Office Application Functions
Extract Transform Loading (ETL) Based Data Quality for Data
Munawar 373
Warehouse Development
An Efficient System to Collect Data for AI Training on Multi-Category
Munif Faisol Abdul Rahman 22
Object Counting Task
Nasrullah Student Performance Based on Student Final Exam Prediction 224
Sentiment Analysis using SVM and Naïve Bayes Classifiers on
Nathaniel Kevin 100
Restaurant Review Dataset
Naufal Rifki Fauzan Covid-19 Vaccine Tweets - Sentiment Analysis 126
Sentiment Analysis using SVM and Naïve Bayes Classifiers on
Nayra Jannatri 100
Restaurant Review Dataset
Nelsen Ardian Implementation of Face Recognition Method for Attendance in Class 148
Nicklaus Rahardja Smart Tourism Services: A Systematic Literature Review 329
Self-Checkout System Using RFID (Radio Frequency Identification)
Nixon 273
Technology: A Survey
Exploiting Facial Action Unit in Video for Recognizing Depression
Nizirwan Anwar 438
using Metaheuristic and Neural Networks
Review Literature Performance : Quality of Service from Internet of
Nizirwan Anwar 444
Things for Transportation System
A Systematic Literature Review of Fintech Investment and
Noerlina 184
Relationship with Bank in Developed Countries.
Novita Hanafiah Waste Classification Using EfficientNet-B0 253
Novita Hanafiah A Survey: Crowds Detection Method on Public Transportation 258
Novita Hanafiah Usability Evaluation of Learning Management System 269
Self-Checkout System Using RFID (Radio Frequency Identification)
Novita Hanafiah 273
Technology: A Survey
Effective Methods for Fake News Detection: A Systematic Literature
Novita Hanafiah 278
Review
Determining the best Delivery Service in Jakarta using Tsukamoto
Novita Hanafiah 284
Fuzzy Algorithm
Novita Hanafiah A Systematic Literature Review: Database Optimization Techniques 295
Novita Hanafiah Study on Face Recognition Techniques 301
Estimation of Technology Acceptance Model (TAM) on the Adoption
Nur Iriawan of Technology in the Learning Process Using Structural Equation 86
Modeling (SEM) with Bayesian Approach
Health Chatbot Using Natural Language Processing for Disease
Philip Indra Prayitno 62
Prediction and Treatment
Determining the best Delivery Service in Jakarta using Tsukamoto
Phillips Tionathan 284
Fuzzy Algorithm
A Comparison of Artificial Intelligence-Based Methods in Traffic
Priscilla 32
Prediction
Exploration of React Native Framework in designing a Rule-Based
Putri Handayani 391
Application for healthy lifestyle education
Expert System to Predict Acute Inflammation of Urinary Bladder and
Rachel Haryawan 243
Nephritis Using Naïve Bayes Method

XXII
2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

A Systematic Literature Review of Fintech Investment and


Raesita Zahra 184
Relationship with Bank in Developed Countries.
Rafael Edwin Hananto Kusumo A Review of Signature Recognition Using Machine Learning 219
Raheliya Br Ginting Smart Tourism Services: A Systematic Literature Review 329
Ramadhany Nuryansyah Image Data Encryption Using DES Method 130
Sinophobia in Indonesia and Its Impact on Indonesia-China Economic
Rangga Aditya 340
Cooperation with the SVM (Support Vector Machine) Approach
Indonesia China Trade Relations, Social Media and Sentiment
Rangga Aditya Elias 334
Analysis: Insight from Text Mining Technique
Memorize COVID-19 Advertisement: Customer Neuroscience Data
Raymond Bahana 427
Collection Techniques by Using EEG and fMRI
Development of Stock Market Price Application to Predict Purchase
Reinert Y. Rumagit 431
and Sales Decisions Using Proximal Policy Optimization Method
Health Chatbot Using Natural Language Processing for Disease
Reinhart Perbowo Pujo Leksono 62
Prediction and Treatment
Development of Portable Temperature and Air Quality Detector for
Retno Dewanti 47
Preventing Covid-18
Finetunning IndoBERT to Understand Indonesian Stock Trader Slang
Reza Rahutomo 42
Language
Reza Rahutomo IoT Sensors Integration for Water Quality Analysis 361
Expert System to Predict Acute Inflammation of Urinary Bladder and
Ria Arafiyah 243
Nephritis Using Naïve Bayes Method
Health Chatbot Using Natural Language Processing for Disease
Richard Aldy 62
Prediction and Treatment
Identify High-Priority Barriers to Effective Digital Transformation in
Ridi Ferdiana 76
Higher Education: A Case Study at Private University in Indonesia
Effective Methods for Fake News Detection: A Systematic Literature
Rifdah Defrina Abdiansyah 278
Review
Effectiveness of LMS in Online Learning by Analyzing Its Usability and
Rio 56
Features
Rita Layona Web Based Application for Ordering Food Raw Materials 1
Rivandi Waste Classification Using EfficientNet-B0 253
Rizki Ashari A Systematic Literature Review: Database Optimization Techniques 295
Self-Checkout System Using RFID (Radio Frequency Identification)
Rizky Prawira Putra 273
Technology: A Survey
Spread of COVID-19 Deaths in Jakarta: Cluster and Regression
Ro’fah Nur Rachmawati 379
Analysis
Indonesian Banking Stock Price Prediction with LSTM and Random
Ro’fah Nur Rachmawati 385
Walk Method
Review Literature Performance : Quality of Service from Internet of
Roesfiansjah Rasjidin 444
Things for Transportation System
Towards Classification of Personality Prediction Model: A
Roslynlia 346
Combination of BERT Word Embedding and MLSMOTE
Ruben Setiawan A Survey: Crowds Detection Method on Public Transportation 258
Factors that Affect Data Gathered Using Interviews for Requirements
Russell Otniel Tjakra 161
Gathering
Developing An Automated Face Mask Detection Using Computer
Samuel 109
Vision and Artificial Intelligence
Compare the Path Finding Algorithms that are Applied for Route
Samuel Dennis 178
Searching in Maps
Aspect Based Sentiment Analysis: Restaurant Online Review Platform
Samuel Mahatmaputra Tedjojuwono in Indonesia with Unsupervised Scraped Corpus in Indonesian 213
Language

XXIII
2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

The Search for the Best Real-Time Face Recognition Method for
Samuel Wijaya 249
Finding Potential COVID Patients
The Influence of UI UX Design to Number of Users Between ‘Line’
Sartika Devina 27
and ‘Whatsapp’
Sawali Wahyu E-Learning Service Issues and Challenges: An Exploratory Study 196
Sena Kumara Predicting Stock Market Prices using Time Series SARIMA 92
A Hydrodynamic Analysis of Water System in Dadahup Swamp
Sentot Purboseno 400
Irrigation Area
Sfenrianto Cultural Tourism Technology Used and Themes: A Literature Review 355
Developing An Automated Face Mask Detection Using Computer
Sheryl Livia Sulaiman 109
Vision and Artificial Intelligence
Effective Methods for Fake News Detection: A Systematic Literature
Shevila Pannadhika Sumedha 278
Review
Utilization Big Data and GPS to Help E-TLE System in The Cities of
Sindy Nikita Wijaya 236
Indonesia
Exploiting Facial Action Unit in Video for Recognizing Depression
Sintia Dewi 438
using Metaheuristic and Neural Networks
Exploration of React Native Framework in designing a Rule-Based
Siti Zuliatul Faidah 391
Application for healthy lifestyle education
Spits Warnars Harco Leslie Hendric Cultural Tourism Technology Used and Themes: A Literature Review 355
Identify High-Priority Barriers to Effective Digital Transformation in
Sri Suning Kusumawardani 76
Higher Education: A Case Study at Private University in Indonesia
Stefanus Usability Evaluation of Learning Management System 269
An Efficient System to Collect Data for AI Training on Multi-Category
Steven Andry 22
Object Counting Task
Comparative of Advanced Sorting Algorithms (Quick Sort, Heap Sort,
Steven Santoso Suntiarko Merge Sort, Intro Sort, Radix Sort) Based on Time and Memory 154
Usage
Comparison of Gaussian Hidden Markov Model and Convolutional
Suharjito 5
Neural Network in Sign Language Recognition System
Enhancement Design for Smart Parking System Using IoT and A-Star
Suharjito 190
Algorithm
Sumarlin Big Data For Smart City: An Advance Analytical Review 307
Suparman IoT Sensors Integration for Water Quality Analysis 361
Teddy Suparyanto IoT Sensors Integration for Water Quality Analysis 361
Design of Water Information Management System in Palm Oil
Teddy Suparyanto 395
Plantation
A Hydrodynamic Analysis of Water System in Dadahup Swamp
Teddy Suparyanto 400
Irrigation Area
Developing An Automated Face Mask Detection Using Computer
Tedjojuwono 109
Vision and Artificial Intelligence
Indonesia China Trade Relations, Social Media and Sentiment
Tia Mariatul Kibtiah 334
Analysis: Insight from Text Mining Technique
Sinophobia in Indonesia and Its Impact on Indonesia-China Economic
Tia Mariatul Kibtiah 340
Cooperation with the SVM (Support Vector Machine) Approach
Timothy Gilbert A Survey: Crowds Detection Method on Public Transportation 258
Design of Cadets Administration System for Nusantara Cilacap
Tisnanto Adisatyo Widcaksono 142
Maritime Academy Based On Website
An Efficient System to Collect Data for AI Training on Multi-Category
Tjeng Wawan Cenggoro 22
Object Counting Task
Systematic Literature Review: An Intelligent Pulmonary TB Detection
Tjeng Wawan Cenggoro 136
from Chest X-Rays
Vanessa Giovani Study on Face Recognition Techniques 301

XXIV
2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI)

Auto-Tracking Camera System for Remote Learning Using Face


Verine Detection and Hand Gesture Recognition Based on Convolutional 451
Neural Network
The Influence of UI UX Design to Number of Users Between ‘Line’
Vicky Chen 27
and ‘Whatsapp’
Sentiment Analysis of E-commerce Review using Lexicon Sentiment
Wahyu Raihan Hidayat 68
Method
Compare the Path Finding Algorithms that are Applied for Route
Wendy Susanto 178
Searching in Maps
Wendy Wihalim The Effect of UI/UX Design on User Satisfaction in Online Art Gallery 120
Development of Robot to Clean Garbage in River Streams with Deep
Widodo Budiharto 51
Learning
Health Chatbot Using Natural Language Processing for Disease
Widodo Budiharto 62
Prediction and Treatment
A Comparison of Lexicon-based and Transformer-based Sentiment
Widodo Budiharto 81
Analysis on Code-mixed of Low-Resource Languages
Widodo Budiharto Image Data Encryption Using DES Method 130
Widodo Budiharto Development of Smart Restaurant Application for Dine-In 230
Performance Analysis Between Cloud Storage and NAS to Improve
Widodo Budiharto 263
Company’s Performance: A Literature Review
Widodo Budiharto Line Follower Smart Trolley System V2 using RFID 17
Street View Object Detection for Autonomous Car Steering Angle
Widodo Budiharto 367
Prediction Using Convolutional Neural Network
Development of Portable Temperature and Air Quality Detector for
Widodo Budiharto 47
Preventing Covid-14
William Mulim Waste Classification Using EfficientNet-B0 253
Winata Dharmawan Thamrin A Systematic Literature Review: Database Optimization Techniques 295
A Comparison of Lexicon-based and Transformer-based Sentiment
Yaya Heryadi 81
Analysis on Code-mixed of Low-Resource Languages
Street View Object Detection for Autonomous Car Steering Angle
Yaya Heryadi 367
Prediction Using Convolutional Neural Network
Yogi Udjaja Immersive Experience with Non-Player Characters Dynamic Dialogue 418
RTR AR PHOTO BOOTH: THE REAL-TIME RENDERING AUGMENTED
Yogi Udjaja 289
REALITY PHOTO BOOTH
Determining the best Delivery Service in Jakarta using Tsukamoto
Yohanes Raditya Janarto 284
Fuzzy Algorithm
Yovita Tunardi Web Based Application for Ordering Food Raw Materials 1
Exploiting Facial Action Unit in Video for Recognizing Depression
Yuli Azmi Rozali 438
using Metaheuristic and Neural Networks
Expert System to Predict Acute Inflammation of Urinary Bladder and
Zakiyah Hamidah 243
Nephritis Using Naïve Bayes Method
Zevira Varies Martan A Review of Signature Recognition Using Machine Learning 219

XXV

You might also like