Performance Analysis and Comparison of Machine and Deep Learning Algorithms For Iot Data Classification
Performance Analysis and Comparison of Machine and Deep Learning Algorithms For Iot Data Classification
1
common machine and deep learning [5] algorithms. To performance in terms of overall accuracy rate for hand
achieve this goal, we will use several IoT datasets that palm’s position.
come from various domains and each one has different
dimensions with different features. For better analysis of The purpose of Ever et.al in [9] is to determine optimum
results, different performance evaluation metrics will be method for prediction problems using nine datasets and
employed and comparative tables and diagrams will be four machine learning algorithms. They concluded that
presented. We specifically concentrate on classification the increase or decrease in the number of dataset
problem which is of utmost importance in the large samples do not influence the performance of algorithms
number of IoT applications. directly, while the features of dataset do. Furthermore,
neural network models were able to produce better and
The rest of the paper is organized as follows. Section 2 more stable results. [10] leveraged a new stochastic
will discuss the related work in the area of using machine algorithm for time series forecast, called the Conditional
learning algorithms in IoT domains. In section 3, the Restricted Boltzmann Machine (CRBM), in order to
methodology will be proposed and detailed information predict the energy usage in an office building. Conducted
about used algorithms, datasets and performance experiments indicated that CRBMs performed better than
evaluation metrics will be provided. Sections 4 will Artificial Neural Networks and Hidden Markov Models.
investigate and compare the performance of algorithms Research done in [11] compared the performance of a
and finally, in section 5, concluding points will be group of machine learning techniques to predict human
presented. actions in a smart home using some specific experiments.
According to the results, Support Vector Machine gained
best accuracy rates over all combinations of input
sensors and when the network was equipped with all
2. Related Work
provided sensors, the most accurate outcomes were
In this section, we briefly review some studies that are achieved.
most relevant to our work. At the end of this section, table
Authors in [12] evaluated the performance of a wide
1 summarizes all reviewed articles in terms of their main
range of common machine learning algorithms for
topics, used algorithms and evaluation metrics. As
anomaly detection problem using smart building
mentioned in the previous section, the main goal of the
datasets. They proposed a recommendation framework
most papers presented here is leveraging machine
that can be useful for choosing the most appropriate
learning methods to solve a particular problem.
learning models according to the collected data. It is
According to our exhaustive studies, not only are there
noticeable that auROC was used as the evaluation metric
very few number of research works concentrating on the
in this work. Paper [13] employed Linear Regression,
performance analysis of machine and deep learning
Artificial Neural Networks and Support Vector Machines
algorithms on different types of IoT problems, but they
for vehicle classification. Measurements produced by
are not also really deep and thorough.
road side sensors including accelerometer and
Paper [6] investigated the performance of five popular magnetometer have been used in this study. Final results
supervised machine learning algorithms on five different showed that Logistic Regression had the best
IoT datasets. Among all of the algorithms, Decision Trees performance and reached 93.4% for overall classification
achieved the highest accuracy of 99% for all datasets, rate. Wireless based indoor location classification with
whereas Logistic Regression and Naïve Bayesian showed the help of machine learning algorithms was studied in
the weakest performances. Authors in [7] exploited four [14]. This research classified four indoor rooms
machine learning algorithms with the aim of predicting according to wireless signal strengths from seven various
air-conditioning load in a shopping mall. Moreover, they sources. Standard score and feature scaling are
studied hyperparameters optimization for SVR model. normalization methods which were used in this work.
The obtained results showed that Chaos-SVR and WD- According to the experiments, K-Nearest Neighbors had
SVR which are hybrid models outperform single ones, the best performance with around 98% accuracy rate.
however the complexity of hybrid models can increase
Baldominos et.al in [15] compared several well-known
considerably. The main objective of [8] is to compare
machine and deep learning models for human activity
different classification algorithms for an activity
recognition by means of generated data from two smart
recognition problem. The data was produced by mobile
devices: one worn on the wrist and another one in the
phone’s accelerometer sensors and Weka workbench
pocket. Extremely Randomized Trees and Random Forest
was used for data analysis. Based on the results, IB1 and
algorithms showed the best results, using just the sensors
IBk algorithms from lazy classifier category had the best
2
positioned on the wrist. Moreover, deep learning models 3. Research Methodology
were not able to produce competitive outcomes in the
conducted experiments. The main contribution of [16] is In this section we are going to describe our approach,
the comparison of common machine learning algorithms especially our selected algorithms, the datasets and their
using an activity recognition android application. This characteristics, and the evaluation metrics that we will
paper performed its experiments on both offline and real- use in section 4 for performance comparison.
time situations. It has been shown that ANN achieved
97% of recognition rate which was better than all
previous experiments without using parameter search. 3.1. Machine and Deep Learning Algorithms
Results also indicated that using just one sensor can lead
to generating the same data for different activities, Up until now, many machine learning algorithms have
therefore, a single sensor is not always sufficient for been proposed by researchers. Each of these algorithms
getting high recognition rates. The goal of authors in [17] possess their own strengths and weaknesses which make
is the analysis of eight data mining and deep learning them suitable for employing in specific domains. More
algorithms on three Internet of Things datasets. novel versions of these algorithms which have recently
According to the experiments, not only did C4.5 and C5.0 gained tremendous popularity, are known as deep
obtain better accuracy, but they also had higher learning algorithms. However, the advent of these
processing speeds and consumed memory more methods stems from biological neural networks which
efficiently. In [18] a comparative analysis of three their introduction to computer science dates back to
supervised algorithms was proposed. Case-Based many years ago. In our study, not only do we use machine
Reasoner (CBR) reached an accuracy of 92% which was learning algorithms, but some of the most common deep
the highest among others. In addition, with respect to the learning techniques are chosen to be evaluated on IoT
results, when datasets are small the performance of K- datasets. The performance analysis and comparison of
Nearest Neighbors is better, while Naïve Bayesian this noticeable number of algorithms on datasets from
performance stays almost constant even if the size of a various IoT domains and with unique characteristics, and
dataset and the number of features increase. the results which will be produced in this thorough
investigation, will definitely help researchers and
3
practitioners of related fields to achieve better sensors have been used for data collection in this
knowledge and insight for choosing right algorithms dataset, naming accelerometer, gyroscope and
according to their problems. Machine learning algorithms sound. There are five target classes including car,
used in this paper are Logistic Regression (LR) [19], K- bus, train, walking and still.
Nearest Neighbors (KNN) [20], Gaussian Naïve Bayesian DS2 [31]: This dataset is related to occupancy
(GNB) [21], Decision Trees (DT) [22], Random Forests detection of an office room with the help of four
(RF) [23], Support Vector Machine (SVM) [24], Stochastic environmental features, namely light, temperature,
Gradient Descent Classifier (SGDC) [25] and Adaboost humidity and CO2. Real-time occupancy detection
[26]. Moreover, among deep learning algorithms, we and estimation are crucial issues in making buildings
analyze the performance of Artificial Neural Networks and indoor places smarter when it comes to energy
(ANN) [27], Convolutional Neural Networks (CNN) [5] efficiency. The dataset only focuses on binary
and Long Short-Term Memory (LSTM) [28] as the most classification, which means there are just two target
renowned type of recurrent neural networks model. classes or labels (0 and 1) that identify whether the
room is occupied by people or not.
DS3 [32]: This is an activity recognition dataset
3.2. Datasets containing data from a wearable accelerometer
mounted on the chest. The data were collected from
The type of data which we work with, is an influential and 15 participants performing 7 activities. Each class is
determinative issue in selecting algorithms that are best made up of one activity or a combination of several
fit for our problems. As it is mentioned earlier, most of the activities. The classes are: 1) working at computer, 2)
studies conducted on performance evaluation of machine standing up, walking and going up/down stairs, 3)
and deep learning methods so far, have focused on a standing, 4) walking, 5) going up/down stairs, 6)
certain type of problem. In addition, just a few number of walking and talking with someone, and 7) talking
them have been worked on IoT-related datasets. A while standing.
valuable contribution of this paper is using different IoT DS4 [33]: Activities of daily living (ADL) information
datasets coming from various scopes with distinguished related to 14 participants along with their medical
features. All of these datasets belong to smart records have been collected in this dataset. It has
environments and have been generated by means of IoT- been designed for a fall detection system and has
compatible devices and equipment. Six datasets are used been gathered by means of wearable motion sensors
in this work which are named DS1 to DS6 for simplicity. positioned on participants’ bodies. Target classes of
In the following, a brief description of all datasets is the dataset include standing, walking, sitting, falling,
proposed and a summary of their features is provided in cramps and running.
Table 2. DS5 [34]: The fifth dataset contains daily weather
observations from a large number of weather
DS1 [20]: It is a transportation mode detection
stations in Australia. It is a huge dataset with values
dataset collected using smartphones. Transportation
gathered from different weather sensors and
mode detection/recognition can be considered as a
measuring equipment. The collected data are about
Human Activity Recognition (HAR) [30] task, but it
temperature, rainfall, evaporation, sunshine,
aims to identify the type of transportation
direction and speed of wind, humidity, pressure and
individuals are using. Three different types of
clouds. This dataset is also for a binary classification
4
problem, in which it can be used to predict if it is ×
1− = 2 × (3)
rainy tomorrow or not.
DS6 [35]: This dataset consists of pump sensors data 4) Accuracy: It is the most used and maybe the first
for predictive maintenance. It includes data from 52 choice for evaluating an algorithm performance in
sensors which monitor the status of a water pump in classification problems. It can be defined as the ratio
a small area, in order to predict the failure time of the of accurately classified data items to the total number
pump. The aim of this dataset is to discover whether of observations (formula (4)). Despite the
the water pump is working fine or not. Therefore, this widespread usability, accuracy is not the most
is also related to a binary classification problem. appropriate performance metric in some situations,
especially in the cases where target variable classes
in the dataset are unbalanced.
3.3. Performance Evaluation Metrics
= (4)
After applying machine learning algorithms, we need
some tools to find out how well they performed their jobs.
These tools are called performance evaluation metrics. A
5) Confusion Matrix: This matrix is one of the most
significant number of metrics have been introduced in
intuitive and descriptive metrics used to find the
studies, where each one considers certain aspects of an
accuracy and correctness of a machine learning
algorithm performance. Thus, for each machine learning
algorithm. Its main usage is in classification
problem we require an appropriate set of metrics for
problems where the output can contain two or more
performance evaluation. In this paper, we use several
types of classes. For more information see [40].
common metrics for classification problems to obtain
6) ROC-AUC score: This metric is calculated using ROC
valuable information about the performance of
curve (receiver operating characteristic curve)
algorithms and to run a comparative analysis. These
which represents the relation between true positive
metrics are precision, recall [36], f1-score [37], accuracy,
rate (aka sensitivity or recall) and false positive rate
confusion matrix and ROC-AUC score [38, 39].
(1- specificity). Area Under ROC Curve or ROC-AUC is
1) Precision: It simply shows “what number of selected used for binary classification and demonstrates how
data items are relevant”. In other words, out of the good a model is in discriminating positive and
observations that an algorithm has predicted to be negative target classes. Especially, if the importance
positive, how many of them are actually positive. of positive and negative classes are equal for us, ROC-
According to formula (1), the precision equals the AUC score can be a useful performance metric.
number of true positives divided by the sum of true
positives and false positives:
4. Experiments and Results
= (1)
In this section, we are going to evaluate and compare the
2) Recall: It presents “what number of relevant data performance of aforementioned algorithms on 6 datasets.
items are selected”. In fact, out of the observations Experiments are divided into two parts. In the first part,
that are actually positive, how many of them have the algorithms will be analyzed using evaluation metrics
been predicted by the algorithm. According to including precision, recall, f1-score, accuracy (for both
formula (2), the recall equals the number of true training and test sets) and execution time. For binary
positives divided by the sum of true positives and classification problems, we also benefit from ROC-AUC
false negatives: score. In the second part of our experiments, we will
conduct a more challenging analysis and will compare the
= (2) proposed algorithms in terms of convergence speed on
each dataset.
5
proper balance between the values of performance 4.1. Experiment 1: Performance Comparison
metrics, especially accuracy, and models execution times. using Evaluation Metrics
In fact, for some models, deep models to be exact, we
obtained high accuracies at the expense of higher 1) DS1: This dataset is a completely balanced dataset,
execution time. However, when we work on IoT meaning for each target class there are equal number of
problems, processing time is of utmost importance in labels. Therefore, accuracy rate can be a quite reliable
evaluation of an IoT system performance. As a performance metric for evaluating learning models.
consequence, for IoT systems and devices, we are not According to table 3, in general, RF had the best
allowed to develop models which are extremely complex performance and obtained better results for almost all
and need a lot of time-consuming computations. On the metrics. The average accuracy rate of RF for test set is
other hand, we should not forget that most IoT devices 85% and its highest accuracy is 87%. However, the
are resource and energy-constrained. Thus, the models comparison of training and test set accuracies indicates
that have been developed in this study do not have high that RF and especially DT suffer from overfitting problem.
execution time, but at the same time are able to produce In terms of execution time, GNB was the fastest, followed
good results. by KNN. But it is clear that GNB had the worst
performance among other algorithms. As it was
Although implemented machine and deep learning predictable, deep learning algorithms, naming ANN, CNN
algorithms for each dataset are compered in one table, we and LSTM, considering their structures, had the highest
will discuss some interesting points about deep models in execution time and LSTM was the slowest. One
particular. We should note that for datasets number 3 and outstanding point is that CNN performed very well and
5, we omitted SVM from our tables, in that its execution was much faster than ANN and LSTM.
time exceeded our threshold immensely. In the tables, the
highest figures for each evaluation metric appear in 2) DS2: This dataset is related to a binary classification
boldface. problem. We should mention that it has a feature named
“light” which can highly bias the results. In other words,
All the experiments were conducted on a system with an by just using this feature we can achieve 99% accuracy.
Intel Corei7 CPU at 2.4GHz, 8 GB of RAM, 1 TB of Owing to this, we eliminated this feature from the original
secondary storage, running windows 10 and Python 3.7. dataset so as to make our analysis much more interesting
We used Spyder IDE for model development and and challenging. As we can see in table 4, there are several
leveraged machine and deep learning libraries and algorithms which demonstrated outstanding
packages in scikit-learn, TensorFlow and Keras. performance for this dataset. For precision and f1-score,
KNN and RF had the highest values with 98%. For recall,
SVM together with these two algorithms obtained the
best results and peaked at 98%. The highest figures for
6
Table 4. Performance comparison of the algorithms on DS2
Test Set
Precision Recall F1-Score Training Set ROC-AUC Execution
Algorithms Accuracy (%)
Accuracy (%)
(%) (%) (%) (Avg/Highest) Score Time (Sec)
LR 79 85 81 84 84/85 0.85 0.11
GNB 78 78 78 83 83/84 0.78 0.005
KNN 98 98 98 99 98/99 0.98 0.01
DT 97 97 97 100 98/98 0.97 0.06
RF 98 98 98 100 98/99 0.98 1.1
SVM 96 98 97 91 97/98 0.98 2.6
SGDC 80 83 81 85 84/86 0.83 0.08
Adaboost 89 85 87 92 91/92 0.85 1.05
ANN 94 95 94 96 96/96 0.95 24.52
CNN 94 95 95 96 94/96 0.95 36.06
LSTM 87 85 86 90 89/90 0.85 52.22
test set accuracy belong to KNN and RF, whereas DT and slowest with 17.11 seconds. Deep models obtained quite
RF gained complete accuracy for training set. When it similar results and their test set accuracies were around
comes to ROC-AUC score, KNN, RF and SVM were leading 75%, but ANN execution time was much lower than CNN
algorithms with 0.98 out of 1. Again, GNB had the lowest and LSTM.
execution time. Among deep learning algorithms, the
performance of ANN and CNN where better, but their 4) DS4: Looking at table 6, it can be seen that RF was the
execution times are not comparable with those of top-performing algorithm in most cases except in
execution time. This algorithm achieved an accuracy of
machine learning models.
79% for test set which was more than accuracy rates
3) DS3: This dataset possesses the greatest number of gained by other algorithms. Moreover, RF processed all
samples among other datasets in this study. With respect data in 1.2 seconds which is an acceptable amount of
to table 5, RF and KNN reached 92% accuracy for training time. The percentage of training set accuracy for RF and
set, which was the highest number. The best results for DT was 100, which means these algorithms are
recall and f1-score belong to KNN, while RF overtook overfitted. NB was the most inefficient algorithm on this
KNN a little bit in precision metric. It is noticeable that dataset and its accuracy rate never exceeded 16%. CNN
GNB is the only algorithm which processed all data in less came first among deep learning models, while its
than a second. Another remarkable point is that, for this execution time with 33.5 seconds was better than ANN
dataset, LR produced the worst results for almost all and LSTM.
metrics and among machine learning models, it was the
7
Table 6. Performance comparison of the algorithms on DS4
Test Set
Precision Recall F1-Score Training Set Execution
Algorithms Accuracy (%)
Accuracy (%)
(%) (%) (%) (Avg/Highest) Time (Sec)
LR 25 28 26 40 40/41 3.04
GNB 21 23 11 13 14/16 0.005
KNN 67 69 68 76 66/68 0.009
DT 70 71 71 100 70/73 0.07
RF 77 76 76 100 76/79 1.2
SVM 63 69 65 73 67/68 18.12
SGDC 26 27 25 32 33/39 0.22
Adaboost 47 41 42 45 43/47 1.2
ANN 50 50 48 51 58/60 43.17
CNN 61 60 58 61 60/61 33.5
LSTM 21 24 22 31 31/33 69.42
8
Table 8. Performance comparison of the algorithms on DS6
0 1 0 1
LR 5.74 0 11598 10 SGDC 0.71 0 11589 19
1 12 291 1 10 293
0 1 0 1
GNB 0.18 0 11429 179 Adaboost 34.01 0
1 2 301 1
0 1 0 1
KNN 0.73 0 ANN 20.3 0
1 1
0 1 0 1
DT 3.38 0 11607 1 CNN 43.91 0
1 3 300 1
0 1 0 1
RF 1.8 0 LSTM 141.9 0 11603 5
1 1 44 259
0 1
SVM 11.65 0
1
predicted 1 for target class, whereas the actual value was increase the volume of input data to 10, 30 and 50
0. This competition becomes more interesting when we percent and repeat the experiment. For DS5 these
can observe that RF processed this dataset in only 1.8 proportions are 1, 5, 10 and 30 percent and for DS3 these
seconds, while the figure for Adaboost was 34 seconds. figures are 0.1, 0.5, 1 and 5 percent. This experiment is
Therefore, the final winner was RF. only carried out on DS1 to DS5.
9
Table 9. Convergence speed comparison of the algorithms on different Datasets
Training
DS Data LR GNB KNN DT RF SVM SGDC Adaboost ANN CNN LSTM
Volume
5% 53 47 57 57 77 70 37 43 50 57 50
10% 61 56 73 73 75 63 47 53 64 66 59
DS1
30% 66 62 76 73 82 75 59 64 73 63 62
50% 68 56 81 76 88 80 61 71 79 79 71
5% 81 84 94 94 94 93 85 91 91 92 85
10% 84 85 90 92 93 93 83 87 89 88 84
DS2
30% 86 83 99 97 98 97 87 92 93 95 85
50% 85 83 98 98 99 97 85 92 93 91 88
0.1% 53 59 77 75 73 73 62 57 51 64 23
0.5% 51 62 76 71 74 69 49 60 64 68 42
DS3
1% 52 61 74 72 75 71 51 52 65 70 56
5% 51 61 76 75 79 70 50 55 69 73 59
5% 43 18 52 55 61 45 29 46 52 46 21
10% 43 20 63 66 72 66 39 34 43 46 25
DS4
30% 39 17 68 66 74 67 32 37 52 53 28
50% 32 12 67 67 74 65 31 36 47 53 31
1% 78 71 77 77 79 85 81 83 84 75 79
5% 85 72 80 78 85 87 81 85 84 81 80
DS5
10% 85 75 78 78 84 85 81 85 84 82 79
30% 84 72 79 78 84 84 81 84 85 84 78
the end. Adaboost, ANN and CNN are other successful to gain an accuracy rate higher than 67%. One surprising
algorithms for this dataset. point is that none of deep learning models could achieve
acceptable results in face with this dataset and the best
Fig. 3 shows that by entering just 0.1% of total data, KNN accuracy that they could obtain was only 53%.
became the fastest learner with an accuracy of 77%. Its
outstanding performance were followed by DT, RF, SVM Looking at table 9, it can be seen that most of the
and CNN. It is interesting to note that when the volume of algorithms performed quite well on DS5. Maybe the main
input data has reached 5%, it was RF which came first reason is that this dataset like DS2 is related to a binary
with 79% accuracy. classification problem. According to Fig. 5, when our
algorithms trained with 1% of input data, SVM, Adaboost
With respect to Fig. 4, RF overtook other algorithms for and ANN with 85, 84, and 83, had the highest test set
any volume of input data. Its accuracy started at 61% for accuracy rates, respectively. However, once the volume of
5% of input data and peaked at 74% when the training training data increased to 30% of total data, LR, RF and
data increased to 50% of whole dataset. DT, KNN and CNN progressed very well and reached an accuracy rate
SVM also produced good results, but they were not able
90 100
85
Test Set Accuracy (%)
Test Set Accuracy (%)
80
95
75
70
65
KNN 90 KNN
60 DT DT
RF
RF
55 SVM
CNN SVM
50 85
5% 10% 30% 50% 5% 10% 30% 50%
Ratio of Input Data to the Entire Dataset Ratio of Input Data to the Entire Dataset
Fig. 1: Convergence speed comparison of top algorithms on DS1 Fig. 2: Convergence speed comparison of top algorithms on DS2
10
80 80
75
Test Set Accuracy (%)
KNN
55
65 DT 50 KNN
RF DT
SVM 45 RF
CNN SVM
60 40
0.1% 0.5% 1.0% 5.0% 5% 10% 30% 50%
Ratio of Input Data to the Entire Dataset Ratio of Input Data to the Entire Dataset
Fig. 3: Convergence speed comparison of top algorithms on DS3 Fig. 4: Convergence speed comparison of top algorithms on DS4
between 84-85%. Overall, our models showed small consequence, we eliminated its results from our
fluctuations in accuracy rates for this dataset and GNB tables.
was the weakest algorithm in comparison with others. 5) RF and DT were two algorithms which showed
higher probability of overfitting.
6) Among deep learning models, CNN produced
90 surprising results. Many research studies use
this algorithm mostly for image processing and
computer vision problems. But in this study, by
Test Set Accuracy (%)
85
conducting different experiments and analyses,
we saw that CNN is able to perform very well for
80
LR typical classification problems. In fact, not only
RF did CNN achieve interesting values for our
SVM
75
Adaboost
evaluation metrics, but it also appeared faster in
ANN many cases in comparison with ANN and LSTM.
CNN 7) In relation to convergence speed experiment,
70
1% 5% 10% 30% KNN, DT, RF and SVM was the best among
Ratio of Input Data to the Entire Dataset machine learning models, while for deep models,
ANN and CNN performed better than LSTM.
Fig. 5: Convergence speed comparison of top algorithms on DS5 8) According to our comprehensive studies and
experiments, deep models are able to produce
extremely good outcomes, if they benefit from a
4.3. Discussion well-designed architecture and near optimal
hyperparameter tuning. But the key point is that
Finally, we are going to terminate this section by giving with a complex architecture, execution time will
some concluding points on achieved results. grow dramatically. Due to this, it may seem that
using deep learning algorithms for time-critical
1) LR performed much better for binary
IoT problems is not a good idea. On the other
classification compared with multi-class
hand, we cannot easily ignore the power of these
classification problems.
algorithms for processing sophisticated
2) GNB and SGDC showed weakest performance in
problems. Thus, it is suggested that by leveraging
most cases.
state-of-the-art techniques including distributed
3) In almost all problems, GNB was the fastest
and federated learning [41, 42] which spread
algorithm in terms of execution time.
workloads among processing nodes, we would
4) Because of recurrent structure of LTSM, it took
be able to implement deep learning models on
the highest execution time for the processing of
resource and energy-constrained IoT devices.
most datasets. However, for datasets DS3 and
DS5, SVM was disappointingly slow and as a
11
5. Conclusion Workshops of the International Conference on Advanced
Information Networking and Applications, 2019, pp. 713-723.
In this paper, we carried out comprehensive experiments [10] E. Mocanu, P. H. Nguyen, M. Gibescu, and W. L. Kling,
to evaluate the performance of several machine and deep "Comparison of machine learning methods for estimating
learning algorithms, naming LR, GNB, KNN, DT, RF, SVM, energy consumption in buildings," in Proceedings of the
SGD Classifier, Adaboost, ANN, CNN and LSTM. The international conference on probabilistic methods applied to
distinction between this study and many of other power systems (PMAPS), 2014, pp. 1-6.
researches is that it targets Internet of Things [11] B. M. H. Alhafidh and W. H. Allen, "Comparison and
environments and concentrates on IoT-related datasets performance analysis of machine learning algorithms for the
for classification problems. We conducted two separated prediction of human actions in a smart home environment," in
experiments to assess our models. One experiment for Proceedings of the International Conference on Compute and
investigating the performance of models in terms of some Data Analysis, 2017, pp. 54-59.
evaluation metrics, and another one for measuring how [12] F. Almaguer-Angeles, J. Murphy, L. Murphy, and A. O.
fast these models can learn. According to the results, RF Portillo-Dominguez, "Choosing Machine Learning Algorithms
overtook other machine learning algorithms for most for Anomaly Detection in Smart Building IoT Scenarios," in IEEE
metrics, while the execution time of GNB was lower than 5th World Forum on Internet of Things (WF-IoT), 2019, pp. 491-
others. With respect to deep learning algorithms, they 495.
were ANN and CNN which obtained the best results. [13] D. Kleyko, R. Hostettler, W. Birk, and E. Osipov,
When it comes to convergence speed, it was again RF "Comparison of machine learning techniques for vehicle
which could learn faster when the volume of input classification using road side sensors," in Proceedings of the
training data was remarkably small. IEEE 18th International Conference on Intelligent Transportation
Systems, 2015, pp. 572-577.
[14] K. Sabanci, E. Yigit, D. Ustun, A. Toktas, and M. F. Aslan,
References "WiFi Based Indoor Localization: Application and Comparison
of Machine Learning Algorithms," in 23rd International
[1] L. Atzori, A. Iera, and G. Morabito, "The internet of things: A Seminar/Workshop on Direct and Inverse Problems of
survey," Computer networks, vol. 54, 2010, pp. 2787-2805. Electromagnetic and Acoustic Wave Theory (DIPED), 2018, pp.
246-251.
[2] ERC: European Research Cluster of the Internet of Things,
Accessed on: Dec. 4, 2019. [Online]. Available: [15] A. Baldominos, A. Cervantes, Y. Saez, and P. Isasi, "A
http://www.internet-of-things-research.eu/about_iot.htm. Comparison of Machine Learning and Deep Learning
Techniques for Activity Recognition using Mobile Devices,"
[3] M. Chen, S. Mao, and Y. Liu, "Big data: A survey," Mobile
Sensors, vol. 19, no. 3, 2019, p. 521.
networks and applications, vol. 19, no. 2, 2014, pp. 171-209.
[16] J. Suto, S. Oniga, C. Lung, and I. Orha, "Comparison of offline
[4] Z. Ghahramani, "Probabilistic machine learning and artificial
and real-time human activity recognition results using machine
intelligence," Nature, vol. 521, no. 7553, 2015, pp. 452-459.
learning techniques," Neural Computing and Applications, 2018,
[5] Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," nature, pp. 1-14.
vol. 521, no. 7553, 2015, pp. 436-444.
[17] F. Alam, R. Mehmood, I. Katib, and A. Albeshri, "Analysis of
[6] V. Khadse, P. N. Mahalle, and S. V. Biraris, "An Empirical eight data mining algorithms for smarter Internet of Things
Comparison of Supervised Machine Learning Algorithms for (IoT)," Procedia Computer Science, vol. 98, 2016, pp. 437-442.
Internet of Things Data," in Proceedings of the Fourth
[18] R. Chettri, S. Pradhan, and L. Chettri, "Internet of things:
International Conference on Computing Communication Control
comparative study on classification algorithms (k-nn, naive
and Automation (ICCUBEA), 2018, pp. 1-6.
bayes and case based reasoning)," International Journal of
[7] Z. Xuan, F. Zhubing, L. Liequan, Y. Junwei, and P. Dongmei, Computer Applications, vol. 130, no. 12, 2015, pp. 7-9.
"Comparison of four algorithms based on machine learning for
[19] C. M. Bishop, Pattern recognition and machine learning:
cooling load forecasting of large-scale shopping mall," Energy
Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006.
Procedia, vol. 142, 2017, pp. 1799-1804.
[20] N. S. Altman, "An introduction to kernel and nearest-
[8] M. A. Ayu, S. A. Ismail, A. F. A. Matin, and T. Mantoro, "A
neighbor nonparametric regression," The American Statistician,
comparison study of classifier algorithms for mobile-phone's
vol. 46, no. 3, 1992, pp. 175-185.
accelerometer based activity recognition," Procedia
Engineering, vol. 41, 2012, pp. 224-229. [21] I. Rish, "An empirical study of the naive Bayes classifier," in
IJCAI workshop on empirical methods in artificial intelligence,
[9] Y. K. Ever, K. Dimililer, and B. Sekeroglu, "Comparison of
2001, pp. 41-46.
Machine Learning Techniques for Prediction Problems," in
12
[22] W. Y. Loh, "Classification and regression trees," Wiley [37] Y. Sasaki, "The truth of the F-measure," Teach Tutor mater,
Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 1, no. 5, 2007, pp. 1-5.
vol. 1, no. 1, 2011, pp. 14-23.
[38] T. Fawcett, "An introduction to ROC analysis," Pattern
[23] L. Breiman, "Random forests," Machine learning, vol. 45, no. recognition letters, vol. 27, no. 8, 2006, pp. 861-874.
1, 2001, pp. 5-32.
[39] C. D. Brown and H. T. Davis, "Receiver operating
[24] A. Ben-Hur, D. Horn, H. T. Siegelmann, and V. Vapnik, characteristics curves and related decision measures: A
"Support vector clustering," Journal of machine learning tutorial," Chemometrics and Intelligent Laboratory Systems, vol.
research, vol. 2, 2001, pp. 125-137. 80, no. 1, 2006, pp. 24-38.
[25] T. Zhang, "Solving large scale linear prediction problems [40] C. Sammut and G. I. Webb, Encyclopedia of machine
using stochastic gradient descent algorithms," in Proceedings of learning: Springer Science & Business Media, 2011.
the twenty-first international conference on Machine learning,
[41] Z. Zhou, X. Chen, E. Li, L. Zeng, K. Luo, and J. Zhang, "Edge
2004, p. 116.
Intelligence: Paving the Last Mile of Artificial Intelligence with
[26] M. Collins, R. E. Schapire, and Y. Singer, "Logistic regression, Edge Computing," arXiv preprint arXiv:1905.10083, 2019.
AdaBoost and Bregman distances," Machine Learning, vol. 48,
[42] T. Li, A. K. Sahu, A. Talwalkar, and V. Smith, "Federated
2002, pp. 253-285.
learning: Challenges, methods, and future directions," arXiv
[27] I. N. Da Silva, D. H. Spatti, R. A. Flauzino, L. H. B. Liboni, and preprint arXiv:1908.07873, 2019.
S. F. dos Reis Alves, "Artificial neural networks," Springer, 2017.
[28] S. Hochreiter and J. Schmidhuber, "Long short-term
memory," Neural computation, vol. 9, no. 8, 1997, pp. 1735-
1780.
[29] C. Carpineti, V. Lomonaco, L. Bedogni, M. Di Felice, and L.
Bononi, "Custom Dual Transportation Mode Detection by
Smartphone Devices Exploiting Sensor Diversity," in
Proceedings of the IEEE International Conference on Pervasive
Computing and Communications Workshops (PerCom
Workshops), 2018, pp. 367-372.
[30] O. D. Lara and M. A. Labrador, "A survey on human activity
recognition using wearable sensors," IEEE communications
surveys & tutorials, vol. 15, no. 3, 2012, pp. 1192-1209.
[31] L. M. Candanedo and V. Feldheim, "Accurate occupancy
detection of an office room from light, temperature, humidity
and CO2 measurements using statistical learning models,"
Energy and Buildings, vol. 112, 2016, pp. 28-39.
[32] P. Casale, O. Pujol, and P. Radeva, "Personalization and user
verification in wearable systems using biometric walking
patterns," Personal and Ubiquitous Computing, vol. 16, no. 5,
2012, pp. 563-580.
[33] A. Özdemir and B. Barshan, "Detecting falls with wearable
sensors using machine learning techniques," Sensors, vol. 14, no.
6, 2014, pp. 10691-10708.
[34] Rain in Australia dataset, Accessed on: Dec. 4, 2019.
[Online]. Available: https://www.kaggle.com/jsphyg/weather-
dataset-rattle-package.
[35] Pump sensor dataset, Accessed on: Dec. 4, 2019. [Online].
Available: https://www.kaggle.com/nphantawee/pump-
sensor-data.
[36] D. M. Powers, "Evaluation: from precision, recall and F-
measure to ROC, informedness, markedness and correlation,"
Journal of machine learning research, vol. 2, no. 1, 2011, pp. 37–
63.
13