Driver Identification Using Only The CAN-Bus Vehicle Data Through An
Driver Identification Using Only The CAN-Bus Vehicle Data Through An
PII: S0921-8890(20)30547-9
DOI: https://doi.org/10.1016/j.robot.2020.103707
Reference: ROBOT 103707
Please cite this article as: N. Abdennour, T. Ouni and N.B. Amor, Driver identification using only
the CAN-Bus vehicle data through an RCN deep learning approach, Robotics and Autonomous
Systems (2020), doi: https://doi.org/10.1016/j.robot.2020.103707.
This is a PDF file of an article that has undergone enhancements after acceptance, such as the
addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive
version of record. This version will undergo additional copyediting, typesetting and review before it
is published in its final form, but we are providing this version to give early visibility of the article.
Please note that, during the production process, errors may be discovered which could affect the
content, and all legal disclaimers that apply to the journal pertain.
of
Received 15 January 2020
Abstract
pro
In the recent years, many studies claim that humans have a unique driving behavior style that could be used as a fingerprint in recognizing
the identity of the driver. With the rising evolution of Machine Learning (ML), the research efforts aiming to take advantage of the human
driving style identifiers have been increasing exponentially. For Advanced Driver Assistance Systems (ADAS), this attribute can be an
efficient factor to ensure the security and protection of the vehicle. Additionally, it extends the ADAS capabilities by creating different
profiles for the drivers, which helps every driver according to his own driving style and improve the ADAS fidelity. Nonetheless, certain
problems in the unpredictability of human behavior and the effectiveness of capturing the temporal features of the signal represented an
ongoing challenge to accomplish driver identification. In this paper, we propose a novel deep learning approach to driver identification based
on a Residual Convolutional Network (RCN). This approach outperforms the existing state of the art methods in less than two hours of
training, while simultaneously achieving 99.3% accuracy. The used data are exclusively provided by the Controller Area Network (CAN-
Bus) vehicle data that eliminates any privacy invading concerns from the user.
re-
Keywords: Driver behavior; identification; machine learning; CNN; Residual neural network; classification.
sensors and their generated data can also be useful in ensuring 2. Related work
the security of the vehicle by recognizing the identity of the Several approaches were explored in the driver behavior
driver. This is beneficial to the ADAS by ensuring the recognition with a persisting focus on using the latest
protection of the vehicle through alerting the owner in case advancements in machine learning algorithms [1]. We list
of unknown driver or event of theft, and simultaneously chronologically the most distinguishable state of the art
creating a drivers profiles table that can customize the type of efforts using non-invasive sensors.
the assistance according to the identity of the driver. Since
Wakita et al (2005) [2] used a Gaussian Mixed Model
ADAS are mostly designed for a single case use, driver
(GMM) in a simulated environment achieving 81%
profiles can boost the ADAS performances and improve their
recognition accuracy. Miyajima et al (2007) [3] also used
consistency by eliminating errors created by the changes of
Jou
1
Corresponding author. E-mail: [email protected]
2
Corresponding author. E-mail: [email protected]
3
Corresponding author. E-mail: [email protected]
Journal Pre-proof
identification accuracy. Hallac et al (2016) [6] used Random and Recurrent Neural Network (RNN) such as
Forest classifier (RF) for an event based detection of CAN- DeepConvGRU-Attention, DeepConvLSTM-Attention and
bus data scoring a 76.9% accuracy. Enev et al (2016) [7] also DeepConvGRU after subjecting CAN-bus data to
used Random Forest while counting on a sliding window pre- standardization and overlapping sliding window
processing for time series analysis of CAN logs signals segmentation to achieve an identification accuracy varying
achieving an accuracy between 87% and 91% (depending on between 97.72% and 98.36%.
the number of drivers). Martínez et al (2016) [8] used an
Extreme Learning Machine (ELM) with several feature Most of these state of the art works managed to achieve a
extraction methods including temporal, frequency domain successful identification accuracy while sacrificing the
and cepstral features from CAN-bus data with 96.95% latency and training time aspects in the process or did the
accuracy for 2 drivers’ identification and 84.36% for 11 opposite by sacrificing the accuracy in favor of latency and
of
drivers. Fung et al (2017) [9] worked with Linear training time. In our attempt to tackle this problem we
Discriminant Analysis (LDA) for event based detection of searched for the optimum calibration between these variables
acceleration and deceleration from GPS and On-board to manifest the best of our approach.
diagnostics 2 (OBD-II) producing 60.5% of accuracy for 14
drivers. Moreira-Matias et al (2017) [10] relied on stacked 3. Methodology
pro
generalization of multiple ML methods including SVM, RF,
3.1. Dataset description
Decision Tree and K-Nearest Neighbors (KNN) using a
Floating Car Data (FCD) from a smartphone with 88% To validate and evaluate our models we used the Ocslab
identification accuracy. Jafarnejad et al (2017) [11] utilized driving dataset [22], collected in South Korea back in 2015.
majority vote and maximum score while also exploiting The data consists of 51 different OBD-II signals extracted
various ML algorithms such as AdaBoost, Gradient from the car Electronic Control Unit (ECU) with a 1Hz
Boosting, Random Forest, SVM and Extra Tree with cepstral sampling rate. For a total of 23 hours, the car was driven by
and statistical features from OBD-II and inertial 10 different participants in two round trips along a 46 km
measurement unit (IMU) scoring between 89% and 95% path creating 94401 records overall. The experiment was
identification accuracy depending on the number of drivers.
re- performed in very different road conditions including a city
Wang et al (2017) [12] used Random Forest with 1000 way, a motor way and a parking space that required cautious
estimators while exploiting a sliding window segmentation, driving. Figure 1 shows the driving path map of the study.
standardization and statistical features from the CAN-bus
signals with 100% identification success for a training set of
4 hours and test set of 1 hour. Marchegiani et al (2018) [13]
proposed a Universal Background Model (UBM) that
benefits from both SVM and GMM also with sliding window
lP
partitioning and cepstral features from CAN-bus and GPS
producing 83% driver recognition accuracy. Tahmasbi et al
(2018) [14] adopted Gradient Boost Tree (GBT) with raw
accelerometer and gyroscope axis as the main features after
passing through a low pass filter reaching an accuracy
between 83% and 92% depending on the size of the train test
split, for 4 drivers’ identification. Luo et al (2018) [15]
applied Random Forest using an overlapping sliding window
rna
of
features values in one single second cannot provide enough segmentation to our data with a window size of 1 minute and
information to the model to make a decision about the step size of 6 seconds as shown in Figure 2.
identity of the driver. This discussion is strongly confirmed
pro
re-
lP
Fig. 2. Overlapping windows sliding for 15 normalized features of driver A.
The choice for the window size and step size was based The first feature reduction technique we used is the
on [21] approach for the same dataset after our own feature importance evaluation of our data using forests of
individual data manipulation experimentation followed by a trees. As we go through the results section, we notice that the
reflection and assessment stage. tree based classifiers are the most efficient for our dataset.
Therefore, it is logical to use the Forest of Trees feature
rna
3.3. ML based models importance evaluation as the first method for feature
In the first development section, we focus on features reduction. This technique is also used by the work of [10],
extraction and dimensionality reduction for machine learning [12], [15], [19] and [21]. Feature importance evaluation is the
models. As confirmed by the state of the art, machine process of testing the weight and significance of a unique
learning models had proven their efficiency in driver feature on the classification in order to rank the overall
identification. However, the classification techniques and the features by importance. Forest of trees importance evaluation
feature manipulation done before the classification can make reduces the criterion used in the split point selection to
the difference between good and bad identification models. achieve this. To demonstrate, we present by order of
importance the list of the 15 most important features
The selection of classifiers in our study relied on the
Jou
of
In order to improve the classification accuracy, this
object of the next part of our study.
technique helped reduce the number of features leading to
more emphasis and focus on the significant features 3.4. Deep learning based models
according to our classifiers. We changed the number of In order to classify our drivers, we created a deep learning
important features selected three times to observe their effect model capable of extracting the main characteristic from the
on the accuracy of our models. According to our feature driving style. This model enables us to have the required
pro
reduction technique, the usage of the 10 most important information for the identification process and performs the
features represented an expression rate of 81.06% of the total classification. To create a model that manages to fit the
features importance, while the 15 most important features problem and the data suitably, we generated two
depicted 87.05% expression rate and the 20 most important convolutional based DL models. The first one had a standard
features represented 90.74% of our total feature importance. architecture influenced by the AlexNet model [23] with
Other techniques for feature extraction and distinctive components that target reducing the training time
dimensionality reduction that we used included the Principal of our model and the time series aspect of the data. While, we
Component Analysis (PCA), which is a statistical procedure founded our second DL model on the basis of our first model
that helps reducing the dimensions of the data. PCA is known with an additional new ResNet block [24].
Fig. 3. The architecture flowchart of our DeepCNN model and DeepRCN model.
Journal Pre-proof
As shown in figure 3, the proposed DeepCNN model • Flatten function that reshapes the data of the multi-
consists of an input layer that introduces the shape of the data dimensional data outputted from convolutional
to the model, 4 convolutional layers coupled with batch layers to the one-dimensional linear vector inputted
normalization, 3 max pooling layers with dropout to fully connected layers.
regularization,1 max pooling, 1 flatten layer, 1 fully
connected layer with batch normalization and 1 fully • Add function, which sums up the features extracted
connected as the output layer. We composed our DeepRCN from different blocks to pass it forward to the next
model with the same structure of the DeepCNN model except layer conjoined with an activation function.
for the additional residual block. This residual block comes We also note that through a ReLU (Rectified Linear Unit)
after the first three layers and includes 1 additional activation function [31] we activated all the convolutional
convolutional layer with batch normalization, 1
of
and fully connected layers excluding the output layer. We
convolutional layer with batch normalization and dropout define the ReLU function by equation (3):
regularization and a new add function combined with an
activation function that links the new block with the rest of f(x)= max(0,x) , (3)
the structure.
pro
The input layer is responsible for introducing the shape of
the data to the model. As the model includes convolutional The usage of ReLU is imposed for their computational
layers, we define our input shape by equation (2): efficiency and effectiveness especially with convolutional
layers [32].
Shape=( samplesnb )×( Swidth )×( Sheight )×( Sdepth ), (2) The output layer produces a value between 0 and 1 for
every class of our 10 drivers which describes the driver
prediction certitude concluded from his input data. This is
Where Samplesnb the number of data samples, Swidth is the possible via a fully connected layer activated by Softmax
width of one sample, which correlates to the window size for function [33] defined by equation (4):
our time series data, Sheight is the height of one sample
exi
represented in the number of features, and Sdepth is the depth
of one sample depicted as the number of the input data
dimensions.
The hidden layers are mainly:
re- f(xi )= ∑
je
xj ,
and connection in the training process to avoid same hyperparameters of DeepRCN considering it contains
units’ co-adaptations [30]. the same architecture except the additional layers of the
residual block.
• Fully connected layers are a feedforward
perceptrons accountable for the classification of the
extracted features by the convolutional layers.
Table 1
Layers hyperparameters of our DeepRCN model using 15 features.
Layer Output shape Unique parameters Activation
Input layer (samplesnb ,60,15,1) - -
Jou
Conv + batch norm (samplesnb ,60,15,60) Filters: 60 kernel size:(4,4) Padding: same ReLU
Max pool + dropout (samplesnb ,30,8,60) Pool size: (2,2) Padding: same dropout rate:0.2 -
Conv + batch norm (samplesnb ,30,8,60) Filters: 60 kernel size:(4,4) Padding: same ReLU
Conv + batch norm + dropout (samplesnb ,30,8,60) Filters: 60 kernel size:(4,4) Padding: same dropout rate:0.2 ReLU
Add + activation (samplesnb ,30,8,60) - ReLU
Conv + batch norm (samplesnb ,30,8,30) Filters: 30 kernel size:(4,4) Padding: same ReLU
Max pool +dropout (samplesnb ,15,4,30) Pool size: (2,2) Padding: same dropout rate:0.2 -
Conv + batch norm (samplesnb ,15,4,15) Filters: 15 kernel size:(4,4) Padding: same ReLU
Max pool +dropout (samplesnb ,8,2,15) Pool size: (2,2) Padding: same dropout rate:0.2 -
Conv + batch norm (samplesnb ,8,2,8) Filters: 8 kernel size:(4,4) Padding: same ReLU
Max pool (samplesnb ,8,2,8) Pool size: (1,1) Padding: same -
Flatten (samplesnb ,128) - -
Fully connected + batch norm (samplesnb ,224) Input: 448 ReLU
Output layer (fully connected) (samplesnb ,10) Input: 224 Softmax
Journal Pre-proof
To summarize our design choices and the reason behind and 13.3GB RAM memory characterizes this dedicated
them, we need to take into consideration that our main machine. The training exploited the Tesla P100 GPU with
objective is to have the best possible accuracy with a 17.1 GB of RAM memory to accelerate the computationally
realistically reasonable latency time. This objective was expensive calculations.
fulfilled, when we used convolutional layers to take The testing for classical machine learning algorithm,
advantage of their capabilities to filter the inputs and however, were done on a late 2013 MacBook pro with an
simplifying the work for the fully connected layers that Intel Core i5 dual core 2.6GHz 4th generation with 3 MB
classifies the data. We also exploited batch normalization cache size, 256 KB L2 cache size per core and 8 GB RAM
to help accelerate the training, dropout regularization to memory.
avoid overfitting and max pooling in down sampling the
of
We based the evaluation of our models on various
data. However, even with a reasonable number of layers
performance metrics and execution time. This fact allowed
our DeepCNN model plateaued and was depending on the
us to compare the performance and speed of execution
number of features in the input, which can be interpreted to factors of our architectures alongside other state of the art
the inability of the model to generalize. This problem was methods. The performance metrics included accuracy,
the seed for the additional ResNet block and the creation of precision, recall, F1 score, Cohen kappa score, AUC score
pro
the DeepRCN model. Using a ResNet block enabled us to and we also used the k-fold cross-validation test as an
expand our model deeper without overfitting it. The new evaluation technique. While the speed of execution metrics
DeepRCN model was able to pass low level features fixated on the training and prediction time. Equations (7),
through the ResNet block, while also maintaining a copy (8), (9), (10) and (11) calculate our performance metrics
of high level features passed through the add function. We [39]:
chose to add the ResNet block only after the first TP+TN
downsampling step to maintain similar latency to the one Accuracy= , (7)
TP+TN+FP+FN
achieved with the DeepCNN model while improving the
accuracy. This architecture allowed us to improve the result TP
Precision= , (8)
TP+FP
in the other.
re-
outcome on both fronts the latency and faster training time
in one hand and the accuracy and performance of the model
Recall(True positive Rate)=
TP
TP+FN
, (9)
TP
We also adopted the Adam optimization algorithm [36] TPR= , (12)
TP+FN
while training our model with a learning rate scheduler for
faster convergence [37]. The learning rate was initially set FP
FPR= , (13)
to lr = 0.001 then we slowed it down to lr = 0.0001 when FP+FN
we reach 120 epochs, finally the training continues to reach
500 epochs. The batch size was set to 60 and our dataset The k-fold cross-validation test is our chosen technique
train test split ratio was 70:30. We selected all these settings for performance validation. This test is a segmentation of
and parameters after various trial and error experimentation. the adopted dataset into k number of folds and iterating
between the parts used for training and the parts used for
We conducted the training for all our deep learning tests testing. This rotation over the dataset segments allows us
on a Google Colaboratory cloud machine [38]. An Intel to exam the model ability to perform with new unknown
Xeon CPU dual core 2.3 GHz with L3 cache size of 56 MB
Journal Pre-proof
data that we did not use in training. We used this test to significantly, we indicate that the Naïve Bayes classifier
guarantee the performance, accuracy and robustness of our produced the lowest accuracy with only 26.40%
models [40]. identification rate.
Table 3
4. Results Evaluation of the 7 different classical machine learning methods after
hyperparameters refinements.
To evaluate our models, we compared them to various Method Refined hyper-parameter Accuracy
other methods with the same evaluation metrics. We divide Decision Tree [10] - 88.13 %
these methods into classical machine learning classifiers K-Nearest Neighbors K=7 36.18 %
and deep learning classifiers. (KNN)
Random Forest [12] Number of estimatiors = 500 97.20 %
of
4.1. Classical machine learning classifiers Naïve Bayes - 26.40 %
Support Vector Kernel coefficient gamma* = 37.36 %
We started testing classical machine learning classifiers Machine (SVM) 1/(nb_features × X)
to identify their performance and efficiency our data. These Logistic Regression Optimiser** = L-BFGS 31.51 %
tests allowed us to conceptualize a basic idea about their Multi-Layer Preceptron Max iterations = 2000 56.97 %
advantage and limitations in our specific case study (MLP) [17]
*the gamma value γ is in the Radial Basis Function Kernel (RBF Kernel):
pro
application. 2
𝑘(𝑥, 𝑦) = 𝑒 − γ‖𝑥−𝑦‖
In table 2, we present the initial accuracy evaluations of **Limited memory Broyden–Fletcher–Goldfarb–Shanno algorithm
7 different classical machine learning methods. (BFGS).
Table 2
Evaluation of 7 different classical machine learning classifiers. After hyperparameters refinement, we proceeded to
Method Accuracy feature manipulation evaluating their effects on the final
Decision Tree [10] 88.13 % result. Feature manipulation and feature selection are an
K-Nearest Neighbors [10] 33 % important methods to test on our data especially when
Random Forest [15] 92.74 % classical machine learning classifiers are known for their
Naïve Bayes 26.40 %
Support Vector Machine (SVM) [4]
Logistic Regression
Multi-Layer Preceptron (MLP) [17]
20 %
29 %
49.28 %
re- limitations with large number of features.
We restrain our search to only include the classifiers that
achieved an accuracy score of 50% or higher for a realistic
and more focused approach. As consequence, we conduct
These accuracy values illustrate the variation of
our feature manipulation with an implementation on
performance from classifier to another, where the lowest
Decision Tree, Random Forest and MLP. We also reviewed
evaluation score is only 20 % achieved by SVM and the
the Extra Tree algorithm praised and pitched by [11] [18]
highest score reached the 92.74 % by random forest. [19], while introducing the 5-fold cross-validation accuracy
lP
In order to improve these primary results, we refined the test into the evaluation process to confirm these results.
hyper-parameters of these classifiers and gathered the best In table 4 and 5, we present the 5-fold cross-validation
possible results and the tweaked hyper-parameters in table and the speed evaluation results of our selected machine
3. learning methods in contrast with feature selection of the
Hyperparameters manipulation managed to improve the 10, 15 and 20 most important features extracted by the
performance of the majority of these classifiers. We Forest of Trees feature importance evaluation.
rna
Table 4
Cross-validation accuracy results of the selected classifiers in contrast with feature selection of the 10 most important features from the Forest of Trees
importance evaluation.
Features Original 51 features 10 most important features [17]
Classifiers/ Metrics Accuracy Training Time Prediction Time Accuracy Training Time Prediction Time
Jou
Decision Tree 87.76% (+/- 1.03) 14.2 seconds 64 milliseconds 91.18% (+/- 0.32) 2.71 seconds 19 milliseconds
Random Forest 96.99% (+/- 0.38) 1 minute and 896 milliseconds 97.57% (+/- 0.23) 38.4 seconds 749 milliseconds
39.5 seconds
MLP (200 iterations) 47.95% (+/- 9.49) 1 minute and 155 milliseconds 78.09% (+/- 8.32) 24.4 seconds 50 milliseconds
36.3 seconds
Extra Tree 97.94% (+/- 0.29) 48 seconds 1.09 seconds 98.32% (+/- 0.21) 18.3 seconds 896 milliseconds
Journal Pre-proof
Table 5
Cross-validation accuracy results of the selected classifiers in contrast with feature selection of the 15 and 20 most important features from the Forest of
Trees feature importance list.
Features 15 most important features [15] [19] 20 most important features
Classifiers/ Metrics Accuracy Training Time Prediction Time Accuracy Training Time Prediction Time
Decision Tree 89.27% (+/- 0.94) 4.99 seconds 24 milliseconds 90.28% (+/- 0.48) 7.87 seconds 31 milliseconds
Random Forest 97.52% (+/- 0.34) 56.2 seconds 793 milliseconds 97.86% (+/- 0.24) 1minute and 14.2 801 milliseconds
seconds
MLP (200 iterations) 68.30% (+/- 7.19) 38.7 seconds 58 milliseconds 68.73% (+/- 2.99) 56.2 seconds 73 milliseconds
of
Extra Tree 98.47% (+/- 0.21) 26.7 seconds 933 milliseconds 98.47% (+/- 0.15) 26.3 seconds 949 milliseconds
As shown in these two tables, the speed and time factors using too complex data descriptions and higher number of
for classical ML classifiers are considerably very low and features. We can also conclude that the Extra Tree classifier
pro
negligible for our case. All the training time sessions were produces the ultimate best performance with 98.47%
less than 2 minutes, while the prediction time did not surpass identification accuracy using the 15 most important features
the 1 second threshold. previously presented.
While analyzing feature selection performance in table 4 Table 6 and 7 present the 5-fold cross-validation and the
and 5 we conclude that reducing the number of features to a speed evaluation results of our selected machine learning
certain extant has a positive impact on the performance of methods in contrast with our various feature manipulation
almost all the chosen classifiers. We can interpret this effect and extraction algorithms including PCA, FFT and the
to the overfitting problem of classical ML classifiers when statistical extracted features.
Table 6
Features
Decision Tree 42.05% (+/- 1.51) 18.6 seconds 29 milliseconds 42.64% (+/- 1.64) 18.2 seconds 27 milliseconds
Random Forest 67.21% (+/- 1.91) 3minutes and 17 1.44 seconds 65.51% (+/- 1.41) 3minutes and 9 1.26 seconds
seconds seconds
lP
MLP (200 iterations) 62.55 % (+/- 36.3 seconds 63 milliseconds 46.19% (+/- 3.34) 17.7 seconds 60 milliseconds
1.68)
Extra Tree 67.62% (+/- 1.77) 47.5 seconds 1.47 seconds 66.27% (+/- 0.87) 47.7 seconds 1.54 seconds
Table 7
Cross-validation accuracy results of the selected classifiers in contrast with feature extraction and manipulation using statistical features and feature selection
from the Forest of Trees feature importance list.
Features Statistical features of the 15 most important features 23 features representing the 15 most important features
rna
extracted by the Forest of Trees feature importance [7] and 7 statistical features extracted from the 15 most
[11] [20] important features
Classifiers/ Metrics Accuracy Training Time Prediction Time Accuracy Training Time Prediction Time
Decision Tree 49.21% (+/- 1.56) 7.54 seconds 21 milliseconds 88.49% (+/- 1.33) 10.7 seconds 32 milliseconds
Random Forest 72.26% (+/- 1.35) 1minute and 41.7 1.20 seconds 97.19% (+/- 0.76) 1minute and 33.6 921 milliseconds
seconds seconds
MLP (200 iterations) 28.96% (+/- 2.61) 13.2 seconds 52 milliseconds 61.54% (+/- 5.15) 32.7 seconds 76 milliseconds
Extra Tree 74.91% (+/- 1.51) 33.1 seconds 1.45 seconds 98.39% (+/- 0.39) 31.9 seconds 1.02 seconds
Jou
As manifested in tables 6 and 7, the training time sessions the Principal Component Analysis (PCA) algorithm with 15
were less than 4 minutes, while the prediction time was under components for feature reduction producing the accuracy
1.47 seconds. score of 67.62% as the best result with Extra Tree. Then, we
used the Fast Fourier Transform (FFT) application on the
While using our feature extraction algorithms and features achieving 66.27% accuracy also as the best score
dimensionality reduction techniques, we notice that the Extra with Extra Tree. Additionally, we exploited the extraction of
Tree classifier once more produced the best accuracy results 8 different statistical features from the 15 most important raw
in all our tests. We also notice that these feature features of the Forest of Trees importance evaluation, which
manipulations did not have an enhancing effect on the managed to raise the accuracy to 74.91% with Extra Tree.
performance of all our classical ML classifiers. First, we used
Journal Pre-proof
Finally, we introduced 23 features representing a mixture we also propose a Recurrent Neural Network (RNN) model
between the previously extracted 8 statistical features and the based on two successive Long Short-Term Memory (LSTM)
15 most important features achieving the best score with the layers [21]. The RNNs are notoriously recognized for their
extra tree classifier with 98.39% accuracy. This accuracy capabilities in solving complex problems in time series
score is better than the one achieved with the raw 51 features analysis.
but lower than the results produced by only the 15 most
important features. We also included successful state of the art work that
proved their capabilities such as such as DeepConvGRU-
4.2. Deep learning classifiers and our developed Attention, DeepConvLSTM-Attention and DeepConvGRU
models [21]. These DL models based their approach on the
To validate and evaluate our deep learning models, we hybridized architectures of CNN and RNN to exploit the
of
compare their performance and speed metrics to other state feature extraction abilities of CNNs, and the competent
of the art work while exploring a variety of hyperparameters management of RNNs in time series data.
refinements. We also examine the classical machine learning Table 8 present the overall result comparison of these ML
implementations conducted previously and include it in the and DL classifiers with our models DeepRCN and DeepCNN
overall comparison. using all our evaluation metrics with the 5-fold cross-
pro
The comparison involves deep learning approaches that validation accuracy technique.
proved their efficiency dealing with this type of data. Thus,
Table 8
Results comparison between our developed models and the state of the art methods with the 5-Fold cross-validation and speed of execution evaluation.
Cross Predictio
Cross
Cross Cross Cross Cross validation n time of
validation 5-fold Training
Number validation validation validation validation Cohen one single
AUC cross- Time with
Model of accuracy: precision: recall: F1 score: Kappa sample
score: validation Google
features Mean % Mean % Mean % Mean % score: using
Mean % accuracy collab
DeepRCN 51
+/- std %
99.31%
+/- std %
99.36%
re-
+/- std %
99.30%
+/- std %
99.33%
Mean %
+/- std %
99.2%
+/- std %
99.99%
99.36%
99.44%
99.29%
108
Macbook
pro**
568ms
+/-0.08% +/- 0.08% +/- 0.05% +/- 0.06% +/- 0.09% +/- 0.01% minutes
99.23%
99.23%
99.55%
99.18%
lP
99.28% 99.28% 99.29% 99.28% 99.18% 99.99% 116
DeepRCN 15 99.38% 229ms
+/- 0.17% +/- 0.17% +/- 0.16% +/- 0.16% +/- 0.18% +/- 0.00% minutes
99.21%
99.08%
99.48%
99.21%
99.27% 99.29% 99.96% 433
LSTM 15 NA* NA* NA* 99.31% 654ms
+/- 0.16% +/- 0.15 % +/- 0.01 % minutes
99.33%
98.99%
rna
99.36%
99.16%
99.17% 99.22% 99.19% 99.20% 99.07% 99.99% 133
DeepCNN 51 99.01% 207ms
+/- 0.11% +/- 0.10% +/- 0.11% +/- 0.10% +/- 0.12% +/- 0.01% minutes
99.14%
99.18%
99.44%
99.03%
99.14% 99.17% 99.15% 99.16% 99.04% 99.99% 83
DeepCNN 15 99.12% 189ms
+/- 0.18% +/- 0.21% +/- 0.15% +/- 0.18% +/- 0.20% +/- 0.01% minutes
99.21%
98.91%
Jou
98.68%
98.54%
Extra tree 98.47% 98.59% 98.45% 98.52% 98.29% 99.99% 19.6
[18][19]
15 98.48% 48ms
+/- 0.21 +/- 0.17% +/- 0.18% +/- 0.17% +/- 0.24% +/- 0.00% seconds
98.18%
98.50%
DeepCon
vGRU- 98.36 % 98.36 % 98.78 % 308
Attention
51 NA* NA* NA* NA* 651ms
+/- 0.15% +/- 0.15% +/- 0.01% minutes
[21]
DeepCon
vLSTM- 97.86% 97.87% 99.78% 350
Attention
51 NA* NA* NA* NA* 631ms
+/- 0.68% +/- 0.68% +/- 0.06% minutes
[21]
Journal Pre-proof
97.49%
97.98%
Random 97.86% 97.89% 97.85% 97.85% 97.61% 99.97% 56
forest [15]
20 98.09% 44ms
+/- 0.24 +/- 0.25% +/- 0.24% +/- 0.25% +/- 0.27% +/- 0.01% seconds
97.73%
97.68%
DeppCon
97.72% 97.72% 99.68% 833
vGRU 51 NA* 624ms
[21]
+/- 0.62% NA* NA* +/- 0.62% NA* +/- 0.08% minutes
90.54%
91.18% 91.39% 91.26% 91.31% 90.14% 95.15% 91.74%
Decision 4.81
10 +/- 0.32% +/- 0.41% +/- 0.34% +/- 0.37% +/- 0.37% +/- 0.23% 90.71% 6ms
of
tree seconds
91.42%
90.79%
*NA: Not Available
**The prediction time in milliseconds
Table 8 illustrates that our DeepRCN model produced Decision Tree producing the lowest score with 91.04%
pro
the highest accuracy results with a cross-validation accuracy using 10 features.
accuracy of 99.31% with the raw 51 features followed by
the same version of DL model using the 15 most important As for the execution speed tests, they revealed that the
features with 99.28% identification accuracy. The LSTM fastest classifier to train is Decision Tree with the 10 most
based model scored 99.27% accuracy also using the 15 most important features lasting only 4.81 seconds followed
important features. Then, we find our DeepCNN model respectively by the other classical machine learning
classifiers. Extra Tree with the 15 features and Random
with both 51-raw feature version and 15 most important
features respectively with 99.17% and 99.14% accuracy. Forest with the 20 features trained respectively during 19.6
Next, we have the Extra Tree classifier [18] [19] also with seconds and 56 seconds. The fastest deep learning model to
train is our model DeepCNN with the 15 most important
the selected 15 most important features scoring 98.47%
accuracy, afterwards we find two of the state of the art deep
learning models DeepConvGRU-Attention,
DeepConvLSTM-Attention [21] respectively with 98.36%,
re- features after 83 minutes training session as shown in the
accuracy learning curve of figure 4, followed by the first 51-
raw features method, which is our DeepRCN model with a
training session of 108 minutes, presented in figure 5. The
97.86. Finally, the Random Forest with 97.72% accuracy
using 20 features, DeepConvGRU [21] scoring 97.72% and longest method to train was the state of the art model
DeepConvGRU [21] for an 833 minutes long session.
lP
rna
Jou
Fig. 4. Accuracy learning curve of the DeepRCN with the 51-raw features.
Journal Pre-proof
of
pro
Fig. 5. Accuracy learning curve of the DeepCNN with the 15 most important features.
The prediction execution time in all presented methods are onetime processes that do not need to be repeated every
including our models ranged between 1 and 3 seconds. time we need to identify the driver. Reflecting on our
These predictions were on the test set that representing 30% results, we can conclude that for most cases having 15
of the entire data and using Google Colaboratory cloud
machine with the GPU accelerator that parallelize the
computations. On our local machine the MacBook pro, the
re-
prediction execution time for one unique sample from the
features is the best suitable number of features as input for
both classical machine learning and deep learning
classifiers. This fact is the reason behind our emphasis over
this number of features in our study. However, our results
dataset did not exceed the 1 second threshold for all the proved that our DeepRCN model is capable of achieving
classical ML methods and all the deep learning classifiers. similar performance results despite the differences in the
number of features at the input. This attribute is unique to
our DeepRCN model, which further proves the robustness
lP
5. Discussion
To reflect on the reason behind the superiority of our and generalization capabilities of our approach.
DeepRCN model compared to the other state of the art In a real world application and deployment of a driver
approaches, we need to analyze the nature and structure of identification model as a security attribute, the latency of
these models. Our DeepRCN model is based on CNNs, result delivery becomes very important especially in case of
while the other state of the art deep learning models are theft detection where every second counts. However, it is
based on RNNs and hybrid CNN/RNN models. not worth sacrificing the accuracy of the application and the
Theoretically, RNNs are powerful models especially when
rna
of
[14] Tahmasbi, Fatemeh, et al. "Poster: Your Phone Tells Us The Truth:
other state of the art deep learning methods while using the Driver Identification Using Smartphone on One Turn." Proceedings
Osclab driving dataset. Our second model DeepCNN also of the 24th Annual International Conference on Mobile Computing
achieved an important cross-validation accuracy of 99.14% and Networking. ACM, 2018.
with the lowest deep learning training time (1 hour and 13 [15] D. Luo, J. Lu, and G. Guo, “Driver identification using multivariate
minutes). The results also revealed the generalization in-vehicle time series data,” in WCX World Congress Experience.
pro
SAE International, apr 2018. [Online]. Available: https://doi.org/10.
abilities and robustness of our DeepRCN model that 4271/2018- 01- 1198.
managed to maintain approximately the same accuracy
[16] D. Jeong, M. Kim, K. Kim, T. Kim, J. Jin, C. Lee, and S. Lim, “Real-
when moving from 51 features to 15 features. We also point time driver identification using vehicular big data and deep
out the privacy conservation efficiency of our model learning,” in 2018 21st International Conference on Intelligent
eliminating any threats that other sensors such as GPS or Transportation Systems (ITSC), Nov 2018, pp. 123–130.
camera may introduce while providing that kind of data. [17] M. L. Bernardi, M. Cimitile, F. Martinelli, and F. Mercaldo, “Driver
identification: a time series classification approach,” in 2018
For Future work perspective, we could improve our Interna- tional Joint Conference on Neural Networks (IJCNN), July
model by reducing the training time and we could predict 2018, pp.1-7.
different classes that could identify various driving [18] P. H. L. Rettore, A. B. Campolina, A. Souza, G. Maia, L. A. Villas,
behaviors and differentiate the dangerous drivers from the
careful ones to build a full ADAS system.
re- and A. A. F. Loureiro, “Driver authentication in vanets based on
intra- vehicular sensor data,” in 2018 IEEE Symposium on
Computers and Communications (ISCC), June 2018, pp. 00 078–00
083.
[19] Ezzini, Saad, Ismail Berrada, and Mounir Ghogho. "Who is behind
References the wheel? Driver identification and fingerprinting." Journal of Big
Data 5.1 (2018): 9.
[1] Alom, Md Zahangir, et al. "A state-of-the-art survey on deep [20] Lestyan, Szilvia, et al. "Extracting vehicle sensor signals from CAN
learning theory and architectures." Electronics 8.3 (2019): 292. logs for driver re-identification." arXiv preprint arXiv:1902.08956
lP
(2019).
[2] T. Wakita, K. Ozawa, C. Miyajima, K. Igarashi, K. Itou, K. Takeda,
and F. Itakura, “Driver identification using driving behavior [21] Zhang, Jun, et al. "A deep learning framework for driving behavior
signals,” in IEEE Conference on Intelligent Transportation Systems, identification on in-vehicle CAN-BUS sensor data." Sensors 19.6
Proceedings, ITSC, vol. 2005, 2005, pp. 907–912. (2019): 1356.
[3] Miyajima, Chiyomi, et al. "Driver modeling based on driving [22] Kwak BI, Woo JY, Kim HK. Driving dataset. PST 2016
behavior and its evaluation in driver identification." Proceedings of http://ocslab.hksecurity.net/Datasets/driving-dataset. (accessed on
the IEEE 95.2 (2007): 427-437. 29 October 2019).
[4] Burton, Angela, et al. "Driver identification and authentication with [23] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton.
rna
active behavior modeling." 2016 12th International Conference on "Imagenet classification with deep convolutional neural
Network and Service Management (CNSM). IEEE, 2016. networks." Advances in neural information processing systems.
2012.
[5] I. Del Campo, R. Finker, M. Martinez, J. Echanobe, and F. Doctor,
“A real-time driver identification system based on artificial neural [24] He, Kaiming, et al. "Deep residual learning for image
networks and cepstral analysis,” in International Joint Conference recognition." Proceedings of the IEEE conference on computer
on Neural Networks, IJCNN 2014. IEEE, 9 2014, pp. 1848–1855. vision and pattern recognition. 2016.
[6] Hallac, David, et al. "Driver identification using automobile sensor [25] Yang, Jianbo, et al. "Deep convolutional neural networks on
data from a single turn." 2016 IEEE 19th International Conference multichannel time series for human activity recognition." Twenty-
on Intelligent Transportation Systems (ITSC). IEEE, 2016. Fourth International Joint Conference on Artificial Intelligence.
2015.
[7] Enev, Miro, et al. "Automobile driver fingerprinting." Proceedings
on Privacy Enhancing Technologies 2016.1 (2016): 34-50. [26] Strigl, Daniel, Klaus Kofler, and Stefan Podlipnig. "Performance
Jou
of
pattern recognition." Neurocomputing. Springer, Berlin, validation." Machine Learning 13.1 (1993): 135-143.
Heidelberg, 1990. 227-236.
[41] Cho, Kyunghyun, et al. "On the properties of neural machine
[34] De Boer, Pieter-Tjerk, et al. "A tutorial on the cross-entropy translation: Encoder-decoder approaches." arXiv preprint
method." Annals of operations research 134.1 (2005): 19-67. arXiv:1409.1259 (2014).
[35] Bottou, Léon, Frank E. Curtis, and Jorge Nocedal. "Optimization
methods for large-scale machine learning." Siam Review 60.2
pro
(2018): 223-311.
re-
lP
rna
Jou
Journal Pre-proof
Najmeddine Abdennour :
of
pro
re-
lP
rna
Jou
Journal Pre-proof
Tarek Ouni :
of
pro
re-
lP
rna
Jou
of
pro
re-
lP
rna
Jou
Journal Pre-proof
Najmeddine received his bachelor degree in electronics electro-technics and automation from
the Faculty of sciences, University of Monastir, Tunisia in 2015. Then, he completed his
master’s degree in electronics and telecommunications and graduated from the Higher
institute of computer science and multimedia, University of Gabes, Tunisia in 2018. He is
currently pursuing his PhD in Science and technology of information and communications at
the National School of Electronics and Telecommunications of Sfax (ENET’Com),
University of Sfax, Tunisia. His research interests include signal processing, Data analysis,
machine learning, and Deep learning.
of
pro
Tarek Ouni was born in Sidi Bouzid, Tunisia, in 1979. Electronic engineer from the National
School of Engineering of Sfax, University of Sfax, Tunisia since 2004. He graduated his
master's degree in New technologies in 2006. He received his PhD with a new approach in
image and video compression based on scanning methods at the University of Sfax,Tunisa in
2012. He was appointed assistant professor at the National School of Electronics and
Telecommunications of Sfax (ENET’COM), SFAX University, Tunisia. His research interests
focus on human re-identification in camera networks, face recognition and Data processing.
re-
Nader Ben Amor is an associate professor in Electrical engineering at the National School of
Engineering of Sfax, University of Sfax, Tunisia. He received his PhD in Computer System
Engineering at the University of Sfax in 2005. His research interests include electrical
lP
engineering, Embedded Systems, Robotics, image processing and Machine Learning.
rna
Jou
Journal Pre-proof
Declaration of interests
☒ The authors declare that they have no known competing financial interests or
personal relationships that could have appeared to influence the work reported in
this paper.
of
☐The authors declare the following financial interests/personal relationships which
may be considered as potential competing interests:
pro
re-
lP
rna
Jou