SlideShare a Scribd company logo
IAES International Journal of Artificial Intelligence (IJ-AI)
Vol. 12, No. 1, March 2023, pp. 137~145
ISSN: 2252-8938, DOI: 10.11591/ijai.v12.i1.pp137-145  137
Journal homepage: http://ijai.iaescore.com
Vehicle make and model recognition using mixed sample data
augmentation techniques
Talha Anwar1
, Seemab Zakir2
1
Center of Chiropractic Research, New Zealand College of Chiropractic, Auckland 1149, New Zealand
2
Department of Engineering Technology, Foundation University, Rawalpindi, Pakistan
Article Info ABSTRACT
Article history:
Received Sep 28, 2021
Revised Jul 7, 2022
Accepted Aug 5, 2022
Vehicle identification based on make and model is an integral part of an
intelligent transport system that helps traffic monitoring and crime control.
Much research has been performed in this regard, but most of them used
manual feature extraction or ensemble convolution neural networks (CNNs)
that result in increased execution time during inference. This paper
compared three deep learning models and utilized different augmentation
techniques to achieve state-of-the-art performance without ensembling or
fusing the models. Experimentations are made without any augmentation,
with standard augmentation, and by mixed sample data augmentation
techniques. Gradient accumulation and stochastic weighted averaging with
mixed precision are used to have a large batch size that helped to reduce
training time. The dataset comprised 48 vehicles’ models running on the
road of Pakistan. The highest accuracy and F1 score of 97% and 95% using
the FMix augmentation technique with EfficientNetV2-S architecture gave
the confidence that the proposed solution can be implemented in production.
Keywords:
Deep learning
Mixed data augmentation
Vehicle identification system
This is an open access article under the CC BY-SA license.
Corresponding Author:
Talha Anwar
Center of Chiropractic Research, New Zealand College of Chiropractic
Auckland 1149, New Zealand
Email: chtalhaanwar@gmail.com
1. INTRODUCTION
Vehicle identification system (VIS), an integral component of the intelligent transport system (ITS),
brings ease to the traffic management system and helps against criminal activities. VIS is widely used in road
violation detection, traffic congestion alarm, and unmanned driving. Millions of vehicles are on the road in
big cities, making it challenging to track a particular vehicle. The vehicles' number plate is mostly used to
track them [1], but number plates can be changed easily, leading to false identification. VIS also helps
automate tax collection at toll plazas based on vehicle type.
With the advent of artificial intelligence (AI), deep learning has been widely used in transportation
[2] Some recent studies used traditional imaging techniques such as haar-like features with AdaBoost
classifier [3] and pattern descriptors with support vector classifier [4]. The pattern descriptors study used
local binary patterns, median binary patterns, directional gradient patterns, and local arc patterns as features.
Kiran et al. also studied different colour spaces such as red, green and blue (RGB), green (Y), blue (Cb), red
(Cr) (YcbCr) and hue, saturation, value (HSV) for descriptor extraction [4] haar-like features-based study
first removed shadows using HSV colour space to reduce the chances of false detection. Different single
feature methods, such as colour moment, local binary pattern (LBP) features, Hu moment features, angle
features, and circularity are also used. Using Adaboost 85.8% accuracy is achieved [3]. Qiu et al. [5]
compared the performance of haar features along with convolution neural network (CNN). Using haar-like
 ISSN: 2252-8938
Int J Artif Intell, Vol. 12, No. 1, March 2023: 137-145
138
features, 86.72% and 91.86% precision and recall are achieved, which increased by 5.63% and 0.2% with
CNN [5]. Gholamalinejad and Khosravi proposed a novel CNN architecture composed of CNN layers with
squeeze-and-excitation (SE) modules. Instead of using classic max pooling or average pooling, they used
haar wavelet as a pooling layer [6]. The data is composed of 5 classes, including bus, heavy truck, medium
truck and pickup. They achieved an accuracy of 95.1% [6]. Ajitha et al. proposed a shallow CNN model with
traditional augmentation techniques such as flip, rotation, shear, crop and zoom, resulting in an accuracy of
92.3% [7]. Mansor et al. [8] achieved an accuracy of 95% with 4 class classification problems. Their work is
based on emergency vehicle type classification and had images of fire trucks, police cars, ambulances and
standard cars [8]. Hassan et al. compared different classifiers with cyclic learning rate and used the MixUp
image augmentation technique to achieve an accuracy of 93.96% through ensembling homogeneous models
of DenseNet201 [9]. Though the CNN-based model has gained much attention in recent years, manual
feature-based classification is still being studied recently. Chen detected multiple features from the vehicle,
such as taillight features, shadow area features and other descriptors. Radial basis function (RBF) artificial
neural network is further used for classification and achieved 97% accuracy [10]. Another manual feature-
based study used histogram-oriented gradients (HOG) and ant colony optimization (ACO) to classify vehicles
and achieved an accuracy of 90% [11].
All the existing studies either deal with a few vehicle models, manual features extraction or used
ensemble models in which multiple models are tested during inference resulting in increased prediction time.
As the VIS is implemented in real-time, it needs to be robust. Keeping in view the limitation, we proposed a
single network-based approach that yields the state of the art performance. Three different models and five
augmentations techniques are compared. All the experiments are seeded for the purpose of reproducibility.
The main contributions of this paper are,
− Different deep learning architectures are compared without using any augmentation technique, with
commonly used and mixed sample data augmentation techniques (MSDA).
− Ensemble and fusion of different models increase the inference time, so the approach used a single model
that performed better than the existing ensembled models.
− The proposed approach achieved state-of-the-art performance with 97% and 95% accuracy and F1 score,
respectively.
The paper is organized: The introduction, motivation, and literature review on vehicle classification
are presented in section 1. Section 2 describes the methodology in detail. Section 3 deals with results and
discussion. The conclusion is made in section 4. The implementation is publicly available at GitHub [12].
2. METHOD
2.1. Dataset
We used images of common cars running on the road of Pakistan [13]. There are 3,103 and 752
training and test images divided into 48 car models/classes. Figure 1 shows the sample image. Table 1 shows
the vehicle name and the number of images available for training for each vehicle.
2.2. Transformation
Transformation is a technique to produce variation in the data. It helps to generalize prediction on
test data and avoid over-fitting the model. Albumentation [14] library is used for this purpose. Following the
main standard Augmentation used for applied transformations:
− Resize: all images are resized to 256×256
− Center crop: crop all images are centre cropped to 224×224
− Horizontal Flip: fifty per cent of images are horizontally flipped
− Vertical Flip: fifty per cent of images are flipped vertically
− Shift scale rotate: fifty per cent of images are randomly shifted, rotated, and scaled in height and width.
− CLAHE: contrast limited adaptive histogram equalization (CLAHE) is a modified form of adaptive
histogram equalization. In histogram equalization, the intensity range of the image is stretched between 0
and 255 to improve the contrast of the image. However, this led to either too dark or too bright picture.
Adaptive histogram handled this issue by dividing the image into small patches and applied histogram
equalization on each patch. This sometimes led to over-amplification of contrast if the image has noise.
CLAHE performed bi-linear interpolation on the edges of patches and reduced this contrast amplification
by removing the artificial boundaries.
− Cutout: cutout is one of the ways to handle over-fitting. In this technique, black boxes are introduced in
images, making the image classification hard, and reduced the chances of over-fitting.
− Normalization: normalization led to fast convergence and speeds up the training process.
Int J Artif Intell ISSN: 2252-8938 
Vehicle make and model recognition using mixed sample data augmentation techniques (Talha Anwar)
139
Figure 1. Sample vehicles image from each class label, the number on each image corresponds to the vehicle
ID in Table 1
Table 1. Vehicle models and the number of images for that models. ID column is related to Figure 1.
No. shows number of training examples for that model
ID Vehicle model No
1 Daiatsu Core 80
2 Daiatsu Hijet 44
3 Daiatsu Mira 81
4 FAW V2 29
5 FAW XPV 26
6 Honda BRV 27
7 Honda city 1994 32
8 Honda city 2000 69
9 Honda City aspire 105
10 Honda civic 1994 16
11 Honda civic 2005 34
12 Honda civic 2007 74
13 Honda civic 2015 31
14 Honda civic 2018 82
15 Honda Grace 21
16 Honda Vezell 38
17 KIA Sportage 25
18 Suzuki alto 2007 132
19 Suzuki alto 2019 56
20 Suzuki alto japan 2010 27
21 Suzuki carry 13
22 Suzuki cultus 2018 269
23 Suzuki cultus 2019 108
24 Suzuki Every 20
25 Suzuki highroof 63
26 Suzuki kyber 52
27 Suzuki liana 33
28 Suzuki margala 16
29 Suzuki Mehran 195
30 Suzuki swift 118
31 Suzuki wagonR 2015 112
32 Toyota hiace 2000 23
33 Toyota Aqua 77
34 Toyota axio 20
35 Toyota corolla 2000 39
36 Toyota corolla 2007 82
37 Toyota corolla 2011 127
38 Toyota corolla 2016 270
39 Toyota fortuner 43
40 Toyota Hiace 2012 72
41 Toyota Landcruser 17
42 Toyota Passo 61
43 Toyota pirus 23
44 Toyota Prado 21
45 Toyota premio 18
46 Toyota Vigo 53
47 Toyota Vitz 81
48 Toyota Vitz 2010 48
 ISSN: 2252-8938
Int J Artif Intell, Vol. 12, No. 1, March 2023: 137-145
140
2.3. Mixed sample data augmentation
Large neural networks are notorious for memorizing data instead of learning it even in strong
regularization and fail during inference. Though standard data augmentation helped in generalization, this
technique is data-dependent and required domain knowledge. Anwar and Zakir [15] studied that standard
augmentation sometimes led to poor results. They explored different image augmentation techniques on
electrocardiogram (ECG) graphs and found that the best results are obtained without applying any
augmentation. CNN focused on the discriminative part of the image instead of the whole image leading to
poor generalization. Regional dropout techniques such as the CutOut helped the CNN to view the bigger
image perspective, but this reduced the proportion of informative pixels of training data [16]. Mixed Sample
data augmentation (MSDA) techniques are introduced to overcome standard augmentation and generalization
issues. MSDA mixed different distributions of data to produce new data from the same distribution of
existing data. It is categorized into two policies, interpolation and masking. MixUp is an example of
interpolation, whereas CutMix and FMix are an example of masking MSDA.
2.3.1. Mixup
MixUp mixed two images from different classes and linearly interpolated them to produce a new
image. It not only interpolated the input images' features but also interpolated the corresponding target [17].
The working principle of MixUp is shown in (1) and (2),
𝑥
̃ = 𝜆𝑥𝑖 + (1 − 𝜆)𝑥𝑗 (1)
𝑦
̃ = 𝜆𝑦𝑖 + (1 − 𝜆)𝑦𝑗 (2)
xi and xj are raw images in (1) and yi and yj are the one-hot encoded labels in (2). λ drawn from β distribution
is used to mix two random images. MixUp increased the capability of deep learning architectures to learn
from corrupted labels and improved the generalization. Linear interpolation of input images reduced the
memorization by large deep learning models [18].
2.3.2. CutMix
Cutout and MixUp inspired CutMix paper. It claimed to resolve the issues in MixUp. Though
MixUp improved classification performance, the resulting sample is unnatural. CutMix replaced an image
patch with a patch of another random picture from the training data [16]. It is like a cutout where a patch is
replaced with zeros and MixUp where two images are mixed.
𝑥
̃ = 𝑀𝑥𝑖 + (1 − 𝑀)𝑥𝑗 (3)
Patch mixing in training images is shown in (3). M is a binary mask indicating where the dropout
rectangular region should be placed. Then this rectangular dropout region is replaced by a patch of another
image. Mixing of one-hot encoded labels is the same as in the MixUp technique. CutMix focused on the less
discriminative part of the object, whereas Mixup focused on the entire image but produced unnatural
artefacts.
2.3.3. FMix
CutMix reduced overfitting by increasing the observable data points without changing the data
distribution. However, CutMix used square patches, which is a limitation and leads to distortion. FMix
claimed to resolve the issue in CutMix by using binary masks obtained by applying a threshold to low-
frequency images from the Fourier space. The authors first sampled low-frequency grayscaled masks from
Fourier space and then converted them to binary masks using a threshold. Once a binary mask is obtained,
two images from different classes are overlaid together, such as 0 pixels of binary mask corresponded to one
image and pixels with 1 value of binary mask is related to another image from a different class. FMix, unlike
CutMix, proposed patches of different shapes which maximize the number of possible masks [19].
Overall, when data is limited and learning from individual examples is easier, MixUp is a good
candidate, and FMix is a better choice when data is abundant. In Figure 2, MixUp shows that two images are
mixed together in an overlay fashion. CutMix shows that a square patch of another image replaces a square
patch. FMix shows that another image from the training data replaced a randomly shaped patch of an image.
2.4. Deep learning architecture
Deep learning is a subset of artificial intelligence that takes the complex raw data as input,
automatically extracts valuable features, and performs task-relevant work such as classification or regression.
Int J Artif Intell ISSN: 2252-8938 
Vehicle make and model recognition using mixed sample data augmentation techniques (Talha Anwar)
141
In image classification, deep learning boomed in 2014 after VGGNet came out. Though before VGG,
AlexNet was there, VGG16 outperformed it by 10%. At that time, it was believed that increasing the layer
increased the performance of the model, until in December 2015, ResNet paper was released and proved that
adding layers helped to some extent and started decreasing the performance beyond that [20]. To date,
ResNet or ResNet variants are one of the most used architecture; therefore, we decided to use ResNet as our
baseline.
Figure 2. Mixed sample data augmented images of two cars
2.4.1. ResNet
Ideally, a deeper neural network is preferable as it yields better results. Nevertheless, this comes
with the cost of vanishing gradient and degradation. By increasing the depth of the neural network, the
gradients became very small during back-propagation and reached zero; this phenomenon is known as
vanishing gradient. Though this problem can be resolved using the rectified linear units (ReLU) activation
function, skip connection also played a role. Skip connection back-propagates the gradient of larger
magnitude by skipping some layers in between.
ResNet paper explained that further deepening neural network led to a significant error rate
characterized by degradation. Adding layers saturated the model, and the error rate started increasing. It is
believed that if a shallow network is working fine, the additional deep layers should work the same though it
did not happen, and deep networks start performing poorly. So, an identity function is added from a shallow
layer to a deeper layer, and the model started learning that identity function. In ResNet, this identity function
ensured that the deep network output should be identical to the shallow network. ResNet paper named this
identity function as skip connections that skip some layers and pass information directly to other layers by an
identity function. In the worst case, the performance of a deeper network will not be worse than a shallow
network, and in the best scenario, it can be better than the shallow network [20]. Multiple ResNet variants are
described by network size and the number of layers skipped by the skip connections. We used ResNet-50 as
it is neither tiny to underfit nor very large to overfit.
2.4.2. DenseNet
DenseNet was proposed in 2018 by Huang et al. [21]. Based on the observation, if there is a shorter
connection between input and output layers, the model can be deeper, more accurate, and more efficient to
train. DenseNet is based on dense blocks and transition layers. In dense blocks, each coming layer received
collective information from all previous layers both directly and indirectly. Similarly, in back-propagation,
the error signal collectively flowed to all layers. For each layer, the feature maps of all previous layers are
considered output, and the output of that layer is considered as input for all subsequent layers. For the sake of
downsampling to reduce network size, a transition layer between two dense blocks is used. This layer is
composed of a 1×1 convolution filter preceded and followed by batch normalization and an average pooling
layer. We used DenseNet 121 in this study.
2.4.3. EfficientNetV2
Most of the deep learning architecture either scaled the depth such as ResNet by increasing the
number of layers or width by adding more neurons/filters in each layer, for example, wide ResNet [22].
Wider networks learn more detailed features and are easier to train because they are usually shallower
However, shallower and wider networks have an issue in learning high-level features. Some networks used
high-resolution images such as InceptionV3 which used 299×299 image size [23]. Scaling a specific
dimension such as depth, width, and resolution increase accuracy up to a limit. EfficientNet in 2019 claimed
that its depth, width and resolution should be scaled proportionally to make a deeper network more effective.
So the authors proposed a compound scaling method to scale width, depth and resolution proportionally [24].
 ISSN: 2252-8938
Int J Artif Intell, Vol. 12, No. 1, March 2023: 137-145
142
EfficientNetV2 in June 2021 is one of the latest proposed models and is known for faster training
speed [25]. This model is based on training awareness neural architecture search (NAS) and progressive
scaling. It is observed that small image sizes require less regularization as compared to large image sizes. So
the authors started with small image size and increased the size progressively. They used EfficientNet as their
backbone architecture and applied the NAS strategy, though the authors removed unnecessary search options
to reduce the search space. This paper used a small kernel size of 3×3 and added more layers to compensate
for the reduced receptive field. Other tweaks are applied to reduce the memory access overhead in
EfficientNet, such as removing the last stride layer. In our study, EfficientNetV2-S is used.
2.5. Explainability of MSDA techniques
To understand the impact of MSDA techniques, we used gradient-weighted class activation
mapping (Grad-CAM) that explained which area of an image is focused by a network to decide the label
class. Grad-CAM produced a localization heatmap of the target by utilizing its gradient against the last
convolution layers and highlighted the essential regions of the image [26]. To generate Grad-CAM PyTorch
library for CAM methods is used [27].
2.6. Additional information
Fifty epochs are trained with a learning rate and batch size of 0.001 and 48, respectively. AdamW
optimizer is used instead of Adam as it provides better results [15]. Pytorch Lightning framework is used for
implementation. Accuracy, macro F1 score, precision and recall are used for evaluation. Mixed precision,
gradient accumulation, and stochastic weight averaging (SWA) techniques are used to speed up the training
time. Gradient accumulation is a technique to train the model with larger batch sizes by updating weights
after some batches instead of every batch. SWA helps to generalize the model, whereas Mixed precision
reduces training time up to 8x [28] by allowing a large batch size.
3. RESULTS AND DISCUSSION
This paper deals with the identification of commonly used vehicles in Pakistan. Table 2 shows the
performance of different augmentation techniques with three deep learning architectures. Without using any
augmentation technique, an F1 score of 88%,91%, and 90% is achieved using ResNet-50, DenseNet121 and
EfficientNetV2-S, respectively. When standard augmentations are applied, the F1 score increased in all three
models, which shows the impact of data augmentation. With MixUp augmentation techniques in which two
images are mixed together in an overlay fashion, there is not much difference in the F1 score of different
deep learning models compared with standard augmentations. When CutMix is applied, there is 1%
increment in accuracy obtained using EfficientNet and ResNet. FMix augmentation technique achieved the
highest accuracy and F1 score in all deep learning models. EfficientNetV2 with FMix augmented input
resulted in accuracy and F1 score of 97% and 95%, respectively. With EfficientNetV2 this is a 2% increment
in F1 score compared to MixUp and CutMix augmentation techniques. Without augmentation, the macro F1
score is 90% which increased by 5% with FMix augmentation technique. These MSDA augmentation
techniques are applied without standard augmentation to study the impact of MSDA augmentations alone.
Figure 3 shows validation loss using five different augmentation techniques. The lowest validation loss is
achieved using FMix augmentation technique when EfficientNetV2-S model is used. EfficientNetV2-S also
showed the second-lowest curve with the CutMix MSDA technique. CutMix and MixUp produced similar
results in standard augmentation, but FMix outperformed them in all three deep learning architectures.
Figure 4 shows the heatmap generated by the Grad-CAM technique. MixUp techniques paid
attention to most parts of the car's front, but its focus is diverged. On the other hand, CutMix focused on the
right front headlight, but its span of coverage is less. FMix covered both aspects, its heatmap is more focused
and spread over the front area. It helped the model visualize and focus broader region while making a
decision and providing better results.
The existing studies are either based on manual features extraction [3] or multiple ensemble
models [9] resulted in reduced performance during inference. The proposed solution is robust during
inference but has some limitations during training. The more the augmentation, the more time a model needs
to train itself because an image undergoes a series of transformations before feeding to the neural network.
We observed that MSDA augmentation takes time to do the mathematical calculation of image mixing.
However, no augmentations are applied during test time, making the model robust during the inference.
The limitation of standard augmented CNN or features-based classifiers is adversarial image attacks.
Manipulating certain car parts can make CNN fool, and it would not predict the vehicle. On the other hand,
MSDA techniques heavily altered the image by placing other pictures on it; thus, there would be minimal
chances of adversarial attacks. FMix resolved the issues of CutMix which is inspired by MixUp, so
Int J Artif Intell ISSN: 2252-8938 
Vehicle make and model recognition using mixed sample data augmentation techniques (Talha Anwar)
143
theoretically, FMix should have better performance [19]. Practically this is proved as FMix augmentation got
1%, 2% and 2% accuracy improvement in EfficientNetV2-S, DenseNet121 and ResNet50 as compared to
CutMix, respectively.
Table 2. Model performance using different augmentations techniques
ResNet-50 DenseNet121 EfficientNet
Techniques F1 Prec Rec Acc F1 Pre Rec Acc F1 Prec Rec Acc
None 88% 90% 87% 92% 91% 94% 91% 94% 90% 92% 88% 94%
Standard 90% 91% 90% 93% 92% 93% 91% 94% 93% 95% 92% 95%
MixUp 90% 94% 89% 94% 91% 94% 90% 94% 93% 96% 92% 95%
CutMix 91% 94% 90% 95% 91% 94% 90% 95% 93% 96% 92% 96%
FMix 93% 94% 92% 95% 94% 95% 94% 97% 95% 96% 95% 97%
Prec: precision, Rec: recall, F1: f1 score, Acc: accuracy
Figure 3. Validation loss using different architectures and augmentation techniques. Three different subplots
with a common axis show three deep learning architectures. Five different patterns show five different
augmentation methods
Figure 4. Grad-CAM heatmap for MSDA augmentation techniques
 ISSN: 2252-8938
Int J Artif Intell, Vol. 12, No. 1, March 2023: 137-145
144
4. CONCLUSION
In this paper, different augmentation techniques are studied to achieve the state of art results. Unlike
other studies that used manual feature extraction such as edge detection or haar features, this study used end-
to-end CNN to extract and classify features automatically. Ensemble models are not used because they are
not feasible for deployment because of time complexity and inference time limitations. Five augmentation
scenarios are used, such as no augmentation, standard augmentation, and three mixed sample data
augmentation techniques. Three deep learning algorithms such as ResNet, DenseNet and EfficientNet are
used. All five augmentation techniques and three CNN architectures are compared. Mixed sample data
augmentation techniques helped to achieve state-of-the-art performance using an EfficientNetV2-S model on
a dataset comprised of 48 models of vehicles running on the roads of Pakistan. Further, the heatmap of
MSDA techniques are compared to understand the learning of deep learning model. FMix image
augmentation with EfficientNetV2 resulted in the highest F1 score of 95%, which is 5% better if no
augmentation is applied and 2% better if standard commonly used augmentation techniques are used.
REFERENCES
[1] P. N. Huu and C. V. Quoc, “Proposing WPOD-NET combining SVM system for detecting car number plate,” IAES International
Journal of Artificial Intelligence (IJ-AI), vol. 10, no. 3, p. 657, Sep. 2021, doi: 10.11591/ijai.v10.i3.pp657-665.
[2] Y. Wang, D. Zhang, Y. Liu, B. Dai, and L. H. Lee, “Enhancing transportation systems via deep learning: a survey,”
Transportation Research Part C: Emerging Technologies, vol. 99, pp. 144–163, 2019, doi: 10.1016/j.trc.2018.12.004.
[3] L. Zhang, J. Wang, and Z. An, “Vehicle recognition algorithm based on Haar-like features and improved Adaboost classifier,”
Journal of Ambient Intelligence and Humanized Computing, 2021, doi: 10.1007/s12652-021-03332-4.
[4] V. Keerthi Kiran, S. Dash, and P. Parida, “Vehicle recognition using extensions of pattern descriptors,” in IOP Conference Series:
Materials Science and Engineering, 2021, vol. 1166, no. 1, p. 12046, doi: 10.1088/1757-899x/1166/1/012046.
[5] L. Qiu, D. Zhang, Y. Tian, and N. Al-Nabhan, “Deep learning-based algorithm for vehicle detection in intelligent transportation
systems,” Journal of Supercomputing, vol. 77, no. 10, pp. 11083–11098, 2021, doi: 10.1007/s11227-021-03712-9.
[6] H. Gholamalinejad and H. Khosravi, “Vehicle classification using a real-time convolutional structure based on DWT pooling
layer and SE blocks,” Expert Systems with Applications, vol. 183, 2021, doi: 10.1016/j.eswa.2021.115420.
[7] P. Ajitha, S. Jeyakumar, Y. N. Krishna K, and A. Sivasangari, “Vehicle model classification using deep learning,” in Proceedings
of the 5th International Conference on Trends in Electronics and Informatics, ICOEI 2021, 2021, pp. 1544–1548,
doi: 10.1109/ICOEI51242.2021.9452842.
[8] M. A. Hakim Bin Che Mansor, N. A. Mohamad Kamal, M. H. Bin Baharom, and M. Adib Bin Zainol, “Emergency vehicle type
classification using convolutional neural network,” in 2021 IEEE International Conference on Automatic Control and Intelligent
Systems, I2CACIS 2021 - Proceedings, 2021, pp. 126–129, doi: 10.1109/I2CACIS52118.2021.9495899.
[9] A. Hassan, M. Ali, N. M. Durrani, and M. A. Tahir, “An empirical analysis of deep learning architectures for vehicle make and
model recognition,” IEEE Access, vol. 9, pp. 91487–91499, 2021, doi: 10.1109/ACCESS.2021.3090766.
[10] X. Chen, H. Chen, and H. Xu, “Vehicle detection based on multifeature extraction and recognition adopting RBF neural network
on ADAS system,” Complexity, vol. 2020, 2020, doi: 10.1155/2020/8842297.
[11] R. S. El-Sayed and M. N. El-Sayed, “Classification of vehicles’ types using histogram oriented gradients: comparative study and
modification,” IAES International Journal of Artificial Intelligence, vol. 9, no. 4, pp. 700–712, 2020,
doi: 10.11591/ijai.v9.i4.pp700-712.
[12] T. Anwar, “Pak vehicle classification,” GitHub repository. 2021.
[13] M. Ali, M. A. Tahir, and M. N. Durrani, “Vehicle images dataset for make and model recognition,” Data in Brief, vol. 42,
p. 108107, Jun. 2022, doi: 10.1016/j.dib.2022.108107.
[14] A. Buslaev, V. I. Iglovikov, E. Khvedchenya, A. Parinov, M. Druzhinin, and A. A. Kalinin, “Albumentations: fast and flexible
image augmentations,” Information (Switzerland), vol. 11, no. 2, 2020, doi: 10.3390/info11020125.
[15] T. Anwar and S. Zakir, “Effect of image augmentation on ECG image classification using deep learning,” in 2021 International
Conference on Artificial Intelligence, ICAI 2021, 2021, pp. 182–186, doi: 10.1109/ICAI52203.2021.9445258.
[16] S. Yun, D. Han, S. Chun, S. J. Oh, J. Choe, and Y. Yoo, “CutMix: regularization strategy to train strong classifiers with
localizable features,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, vol. 2019-Octob,
pp. 6022–6031, doi: 10.1109/ICCV.2019.00612.
[17] H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz, “Mixup: beyond empirical risk minimization,” Apr. 2018,
doi: 10.48550/arXiv.1710.09412.
[18] D. Liang, F. Yang, T. Zhang, and P. Yang, “Understanding mixup training methods,” IEEE Access, vol. 6, pp. 58774–58783,
2018, doi: 10.1109/ACCESS.2018.2872698.
[19] E. Harris, A. Marcu, M. Painter, M. Niranjan, A. Prügel-Bennett, and J. Hare, “FMix: Enhancing Mixed Sample Data
Augmentation,” arXiv preprint, Feb. 2020.
[20] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), Jun. 2016, pp. 770–778, doi: 10.1109/cvpr.2016.90.
[21] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in 2017 IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), Jul. 2017, pp. 4700–4708, doi: 10.1109/cvpr.2017.243.
[22] S. Zagoruyko and N. Komodakis, “Wide residual networks,” in British Machine Vision Conference 2016, BMVC 2016, 2016,
vol. 2016-Septe, pp. 87.1--87.12, doi: 10.5244/C.30.87.
[23] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016, vol. 2016-Decem,
pp. 2818–2826, doi: 10.1109/CVPR.2016.308.
[24] M. Tan and Q. V Le, “EfficientNet: rethinking model scaling for convolutional neural networks,” arXiv preprint, May 2019,
doi: 10.48550/arXiv.1905.11946.
[25] M. Tan and Q. V Le, “EfficientNetV2: smaller models and faster training,” arXiv preprint, 2021.
Int J Artif Intell ISSN: 2252-8938 
Vehicle make and model recognition using mixed sample data augmentation techniques (Talha Anwar)
145
[26] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-CAM: visual explanations from deep
networks via gradient-based localization,” in Proceedings of the IEEE International Conference on Computer Vision, 2017,
vol. 2017-Octob, pp. 618–626, doi: 10.1109/ICCV.2017.74.
[27] J. Gildenblat et al., “PyTorch library for CAM methods,” GitHub, 2021.
[28] S. Narang et al., “Mixed precision training,” in 6th International Conference on Learning Representations, ICLR 2018 -
Conference Track Proceedings, 2018, pp. 1–12.
BIOGRAPHIES OF AUTHORS
Talha Anwar is an AI researcher having a Master's degree in Data Science from
FAST, National University, Pakistan. He obtained Bachelor's Degree in Biomedical
Engineering from Riphah International University in 2018. His research is in biomedical
image analysis, biosignal analysis, particularly in the area of brain-computer interface. He has
a special interest in social text analysis in the field of NLP. He is equally interested in machine
learning and deep learning and has several publications in this domain. Talha is actively
involved in research and working with Centre for Chiropractic Research, New Zealand
College of Chiropractic, Auckland 1060, New Zealand. All of his research is available at
github.com/talhaanwarch. He can be contacted at email: chtalhaanwar@gmail.com.
Seemab Zakir has Bachelor's and Masters's degrees in biomedical engineering
from Riphah International University, Pakistan. She has experience in conducting labs on
biomedical engineering subjects, particularly programming, machine learning, and
instrumentation. She has also served as a biomedical engineer at Pak-Austria Fachhochschule:
Institute of Applied Sciences. She was a lecturer at Foundation University School of Science
and Technology, Pakistan. Currently, she is a Ph.D. scholar at Scuola Superiore Sant'Anna
Pisa, Italy. Her areas of interest are biomedical instrumentation and artificial intelligence. She
can be contacted at email: seemabzakir2@gmail.com.
Ad

Recommended

Togaf 9 template business services and information diagram
Togaf 9 template business services and information diagram
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
Why New-age IT Operating Models are Necessary for Enhanced Operational Agility
Why New-age IT Operating Models are Necessary for Enhanced Operational Agility
Cognizant
 
Consulting toolkit structuring the problem
Consulting toolkit structuring the problem
chrisdoran
 
Improve your Process Models by Modeling Decisions
Improve your Process Models by Modeling Decisions
Decision Management Solutions
 
Accenture-Start-your-Career-with-Accenture-English2015
Accenture-Start-your-Career-with-Accenture-English2015
Syed Ahmed
 
Operating Model
Operating Model
rmuse70
 
Dominik Strube – Understanding UNECE WP.29 regulations on cybersecurity
Dominik Strube – Understanding UNECE WP.29 regulations on cybersecurity
Dominik Strube
 
S4 h 188 sap s4hana cloud implementation with sap activate
S4 h 188 sap s4hana cloud implementation with sap activate
Lokesh Modem
 
Digital Transformation Strategy & Framework | By ex-McKinsey
Digital Transformation Strategy & Framework | By ex-McKinsey
Aurelien Domont, MBA
 
How to Partner with SAP
How to Partner with SAP
Donagh Kiernan
 
Big Data Reference Architecture for Aviation Domain for Customer Satisfaction
Big Data Reference Architecture for Aviation Domain for Customer Satisfaction
hari_surya
 
Enterprise architecture
Enterprise architecture
Samah SAFI, MBA
 
Hypercare Support Model.pptx
Hypercare Support Model.pptx
UrielUgalde2
 
ARAS - change management
ARAS - change management
Patrick Willemsen
 
Strategic Operating Model
Strategic Operating Model
Management Consultant | Global Transformation and Change | Strategy Design | Consulting Skills Coach
 
Joint Ariba SAP Roadmap
Joint Ariba SAP Roadmap
SAP Ariba
 
Aligning The Business Model to Technology Landscapes Enterprise Systems Arch...
Aligning The Business Model to Technology Landscapes Enterprise Systems Arch...
Daljit Banger
 
Ford fiesta 2010-2011-2012-etc-es-ar_ar_3fee4d02e1
Ford fiesta 2010-2011-2012-etc-es-ar_ar_3fee4d02e1
palmer78
 
2 Effective Manufacturing Erp Mes
2 Effective Manufacturing Erp Mes
Hieu Le Trung
 
Credit Profile Writing
Credit Profile Writing
Ahmed Moustapha
 
Identifying the Right Path Forward for Future Vehicle EE Architecture - Train...
Identifying the Right Path Forward for Future Vehicle EE Architecture - Train...
xmumiao
 
Application Portfolio Rationalization
Application Portfolio Rationalization
Bob Rhubart
 
Arcadia and Capella: Model-Based Systems Engineering made easier! euroforum -...
Arcadia and Capella: Model-Based Systems Engineering made easier! euroforum -...
Etienne Juliot
 
SAP for Automotive
SAP for Automotive
Vct Sales
 
Enterprise Architecture Visualization
Enterprise Architecture Visualization
Shkumbin Rrushaj
 
0101 foundation - detailed view of hana architecture
0101 foundation - detailed view of hana architecture
Ramakrishna Donepudi
 
Archimate Meta Model
Archimate Meta Model
Maganathin Veeraragaloo
 
Wake Up – It’s Time to Upgrade Your S/4HANA System!
Wake Up – It’s Time to Upgrade Your S/4HANA System!
panayaofficial
 
STEP TOWARDS INTELLIGENT TRANSPORTATION SYSTEM WITH VEHICLE CLASSIFICATION AN...
STEP TOWARDS INTELLIGENT TRANSPORTATION SYSTEM WITH VEHICLE CLASSIFICATION AN...
JANAK TRIVEDI
 
Classification and Detection of Vehicles using Deep Learning
Classification and Detection of Vehicles using Deep Learning
ijtsrd
 

More Related Content

What's hot (20)

Digital Transformation Strategy & Framework | By ex-McKinsey
Digital Transformation Strategy & Framework | By ex-McKinsey
Aurelien Domont, MBA
 
How to Partner with SAP
How to Partner with SAP
Donagh Kiernan
 
Big Data Reference Architecture for Aviation Domain for Customer Satisfaction
Big Data Reference Architecture for Aviation Domain for Customer Satisfaction
hari_surya
 
Enterprise architecture
Enterprise architecture
Samah SAFI, MBA
 
Hypercare Support Model.pptx
Hypercare Support Model.pptx
UrielUgalde2
 
ARAS - change management
ARAS - change management
Patrick Willemsen
 
Strategic Operating Model
Strategic Operating Model
Management Consultant | Global Transformation and Change | Strategy Design | Consulting Skills Coach
 
Joint Ariba SAP Roadmap
Joint Ariba SAP Roadmap
SAP Ariba
 
Aligning The Business Model to Technology Landscapes Enterprise Systems Arch...
Aligning The Business Model to Technology Landscapes Enterprise Systems Arch...
Daljit Banger
 
Ford fiesta 2010-2011-2012-etc-es-ar_ar_3fee4d02e1
Ford fiesta 2010-2011-2012-etc-es-ar_ar_3fee4d02e1
palmer78
 
2 Effective Manufacturing Erp Mes
2 Effective Manufacturing Erp Mes
Hieu Le Trung
 
Credit Profile Writing
Credit Profile Writing
Ahmed Moustapha
 
Identifying the Right Path Forward for Future Vehicle EE Architecture - Train...
Identifying the Right Path Forward for Future Vehicle EE Architecture - Train...
xmumiao
 
Application Portfolio Rationalization
Application Portfolio Rationalization
Bob Rhubart
 
Arcadia and Capella: Model-Based Systems Engineering made easier! euroforum -...
Arcadia and Capella: Model-Based Systems Engineering made easier! euroforum -...
Etienne Juliot
 
SAP for Automotive
SAP for Automotive
Vct Sales
 
Enterprise Architecture Visualization
Enterprise Architecture Visualization
Shkumbin Rrushaj
 
0101 foundation - detailed view of hana architecture
0101 foundation - detailed view of hana architecture
Ramakrishna Donepudi
 
Archimate Meta Model
Archimate Meta Model
Maganathin Veeraragaloo
 
Wake Up – It’s Time to Upgrade Your S/4HANA System!
Wake Up – It’s Time to Upgrade Your S/4HANA System!
panayaofficial
 
Digital Transformation Strategy & Framework | By ex-McKinsey
Digital Transformation Strategy & Framework | By ex-McKinsey
Aurelien Domont, MBA
 
How to Partner with SAP
How to Partner with SAP
Donagh Kiernan
 
Big Data Reference Architecture for Aviation Domain for Customer Satisfaction
Big Data Reference Architecture for Aviation Domain for Customer Satisfaction
hari_surya
 
Hypercare Support Model.pptx
Hypercare Support Model.pptx
UrielUgalde2
 
Joint Ariba SAP Roadmap
Joint Ariba SAP Roadmap
SAP Ariba
 
Aligning The Business Model to Technology Landscapes Enterprise Systems Arch...
Aligning The Business Model to Technology Landscapes Enterprise Systems Arch...
Daljit Banger
 
Ford fiesta 2010-2011-2012-etc-es-ar_ar_3fee4d02e1
Ford fiesta 2010-2011-2012-etc-es-ar_ar_3fee4d02e1
palmer78
 
2 Effective Manufacturing Erp Mes
2 Effective Manufacturing Erp Mes
Hieu Le Trung
 
Identifying the Right Path Forward for Future Vehicle EE Architecture - Train...
Identifying the Right Path Forward for Future Vehicle EE Architecture - Train...
xmumiao
 
Application Portfolio Rationalization
Application Portfolio Rationalization
Bob Rhubart
 
Arcadia and Capella: Model-Based Systems Engineering made easier! euroforum -...
Arcadia and Capella: Model-Based Systems Engineering made easier! euroforum -...
Etienne Juliot
 
SAP for Automotive
SAP for Automotive
Vct Sales
 
Enterprise Architecture Visualization
Enterprise Architecture Visualization
Shkumbin Rrushaj
 
0101 foundation - detailed view of hana architecture
0101 foundation - detailed view of hana architecture
Ramakrishna Donepudi
 
Wake Up – It’s Time to Upgrade Your S/4HANA System!
Wake Up – It’s Time to Upgrade Your S/4HANA System!
panayaofficial
 

Similar to Vehicle make and model recognition using mixed sample data augmentation techniques (20)

STEP TOWARDS INTELLIGENT TRANSPORTATION SYSTEM WITH VEHICLE CLASSIFICATION AN...
STEP TOWARDS INTELLIGENT TRANSPORTATION SYSTEM WITH VEHICLE CLASSIFICATION AN...
JANAK TRIVEDI
 
Classification and Detection of Vehicles using Deep Learning
Classification and Detection of Vehicles using Deep Learning
ijtsrd
 
FRONT AND REAR VEHICLE DETECTION USING HYPOTHESIS GENERATION AND VERIFICATION
FRONT AND REAR VEHICLE DETECTION USING HYPOTHESIS GENERATION AND VERIFICATION
sipij
 
5521-English.pdf thhjjjjjkkkjuyttyjkkujug
5521-English.pdf thhjjjjjkkkjuyttyjkkujug
mtnronoh98
 
A robust ga knn based hypothesis
A robust ga knn based hypothesis
ijaia
 
IRJET - Vehicle Classification with Time-Frequency Domain Features using ...
IRJET - Vehicle Classification with Time-Frequency Domain Features using ...
IRJET Journal
 
Intelligent Transportation System Based On Machine Learning For Vehicle Perce...
Intelligent Transportation System Based On Machine Learning For Vehicle Perce...
IRJET Journal
 
Vehicle Speed Estimation using Haar Classifier Algorithm
Vehicle Speed Estimation using Haar Classifier Algorithm
ijtsrd
 
Automated License Plate detection and Speed estimation of Vehicle Using Machi...
Automated License Plate detection and Speed estimation of Vehicle Using Machi...
ijtsrd
 
CAR DAMAGE DETECTION USING DEEP LEARNING
CAR DAMAGE DETECTION USING DEEP LEARNING
IRJET Journal
 
Deep Learning Approach Model for Vehicle Classification using Artificial Neur...
Deep Learning Approach Model for Vehicle Classification using Artificial Neur...
IRJET Journal
 
AI IN VEHICLE COUNTING (1).pptx
AI IN VEHICLE COUNTING (1).pptx
Yash670955
 
Vehicle Identification and Classification System
Vehicle Identification and Classification System
Vishal Polley
 
An Analysis of Various Deep Learning Algorithms for Image Processing
An Analysis of Various Deep Learning Algorithms for Image Processing
vivatechijri
 
Accident vehicle types classification: a comparative study between different...
Accident vehicle types classification: a comparative study between different...
nooriasukmaningtyas
 
License Plate Recognition
License Plate Recognition
Amr Rashed
 
VEHICLE CLASSIFICATION USING THE CONVOLUTION NEURAL NETWORK APPROACH
VEHICLE CLASSIFICATION USING THE CONVOLUTION NEURAL NETWORK APPROACH
JANAK TRIVEDI
 
Paper id 25201491
Paper id 25201491
IJRAT
 
IRJET- Car Defect Detection using Machine Learning for Insurance
IRJET- Car Defect Detection using Machine Learning for Insurance
IRJET Journal
 
Neural Network based Vehicle Classification for Intelligent Traffic Control
Neural Network based Vehicle Classification for Intelligent Traffic Control
ijseajournal
 
STEP TOWARDS INTELLIGENT TRANSPORTATION SYSTEM WITH VEHICLE CLASSIFICATION AN...
STEP TOWARDS INTELLIGENT TRANSPORTATION SYSTEM WITH VEHICLE CLASSIFICATION AN...
JANAK TRIVEDI
 
Classification and Detection of Vehicles using Deep Learning
Classification and Detection of Vehicles using Deep Learning
ijtsrd
 
FRONT AND REAR VEHICLE DETECTION USING HYPOTHESIS GENERATION AND VERIFICATION
FRONT AND REAR VEHICLE DETECTION USING HYPOTHESIS GENERATION AND VERIFICATION
sipij
 
5521-English.pdf thhjjjjjkkkjuyttyjkkujug
5521-English.pdf thhjjjjjkkkjuyttyjkkujug
mtnronoh98
 
A robust ga knn based hypothesis
A robust ga knn based hypothesis
ijaia
 
IRJET - Vehicle Classification with Time-Frequency Domain Features using ...
IRJET - Vehicle Classification with Time-Frequency Domain Features using ...
IRJET Journal
 
Intelligent Transportation System Based On Machine Learning For Vehicle Perce...
Intelligent Transportation System Based On Machine Learning For Vehicle Perce...
IRJET Journal
 
Vehicle Speed Estimation using Haar Classifier Algorithm
Vehicle Speed Estimation using Haar Classifier Algorithm
ijtsrd
 
Automated License Plate detection and Speed estimation of Vehicle Using Machi...
Automated License Plate detection and Speed estimation of Vehicle Using Machi...
ijtsrd
 
CAR DAMAGE DETECTION USING DEEP LEARNING
CAR DAMAGE DETECTION USING DEEP LEARNING
IRJET Journal
 
Deep Learning Approach Model for Vehicle Classification using Artificial Neur...
Deep Learning Approach Model for Vehicle Classification using Artificial Neur...
IRJET Journal
 
AI IN VEHICLE COUNTING (1).pptx
AI IN VEHICLE COUNTING (1).pptx
Yash670955
 
Vehicle Identification and Classification System
Vehicle Identification and Classification System
Vishal Polley
 
An Analysis of Various Deep Learning Algorithms for Image Processing
An Analysis of Various Deep Learning Algorithms for Image Processing
vivatechijri
 
Accident vehicle types classification: a comparative study between different...
Accident vehicle types classification: a comparative study between different...
nooriasukmaningtyas
 
License Plate Recognition
License Plate Recognition
Amr Rashed
 
VEHICLE CLASSIFICATION USING THE CONVOLUTION NEURAL NETWORK APPROACH
VEHICLE CLASSIFICATION USING THE CONVOLUTION NEURAL NETWORK APPROACH
JANAK TRIVEDI
 
Paper id 25201491
Paper id 25201491
IJRAT
 
IRJET- Car Defect Detection using Machine Learning for Insurance
IRJET- Car Defect Detection using Machine Learning for Insurance
IRJET Journal
 
Neural Network based Vehicle Classification for Intelligent Traffic Control
Neural Network based Vehicle Classification for Intelligent Traffic Control
ijseajournal
 
Ad

More from IAESIJAI (20)

Harnessing adapted capsule networks for accurate lumpy skin disease diagnosis...
Harnessing adapted capsule networks for accurate lumpy skin disease diagnosis...
IAESIJAI
 
Framework for abnormal event detection and tracking based on effective sparse...
Framework for abnormal event detection and tracking based on effective sparse...
IAESIJAI
 
Design of an effective multiple objects tracking framework for dynamic video ...
Design of an effective multiple objects tracking framework for dynamic video ...
IAESIJAI
 
A systematic assertive wide-band routing using location and potential aware t...
A systematic assertive wide-band routing using location and potential aware t...
IAESIJAI
 
Implementation of global navigation satellite system software defined radio b...
Implementation of global navigation satellite system software defined radio b...
IAESIJAI
 
Deep learning approach for forensic facial reconstruction depends on unidenti...
Deep learning approach for forensic facial reconstruction depends on unidenti...
IAESIJAI
 
A multi-core makespan model for parallel scientific workflow execution in clo...
A multi-core makespan model for parallel scientific workflow execution in clo...
IAESIJAI
 
Multi platforms fake accounts detection based on federated learning
Multi platforms fake accounts detection based on federated learning
IAESIJAI
 
A novel energy efficient data gathering algorithm for wireless sensor network...
A novel energy efficient data gathering algorithm for wireless sensor network...
IAESIJAI
 
Real-time anomaly detection in electric motor operation noise
Real-time anomaly detection in electric motor operation noise
IAESIJAI
 
Implications of artificial intelligence chatbot models in higher education
Implications of artificial intelligence chatbot models in higher education
IAESIJAI
 
Improving performance of air quality monitoring: a qualitative data analysis
Improving performance of air quality monitoring: a qualitative data analysis
IAESIJAI
 
Artificial intelligence for deepfake detection: systematic review and impact ...
Artificial intelligence for deepfake detection: systematic review and impact ...
IAESIJAI
 
Impact of federated learning and explainable artificial intelligence for medi...
Impact of federated learning and explainable artificial intelligence for medi...
IAESIJAI
 
Systematic review of artificial intelligence with near-infrared in blueberries
Systematic review of artificial intelligence with near-infrared in blueberries
IAESIJAI
 
Application of classification algorithms for smishing detection on mobile dev...
Application of classification algorithms for smishing detection on mobile dev...
IAESIJAI
 
A review of machine learning methods to build predictive models for male repr...
A review of machine learning methods to build predictive models for male repr...
IAESIJAI
 
Methodology applied to computer audit with artificial intelligence: a systema...
Methodology applied to computer audit with artificial intelligence: a systema...
IAESIJAI
 
Application of artificial intelligence in music generation: a systematic review
Application of artificial intelligence in music generation: a systematic review
IAESIJAI
 
Artificial intelligence ethics: ethical consideration and regulations from th...
Artificial intelligence ethics: ethical consideration and regulations from th...
IAESIJAI
 
Harnessing adapted capsule networks for accurate lumpy skin disease diagnosis...
Harnessing adapted capsule networks for accurate lumpy skin disease diagnosis...
IAESIJAI
 
Framework for abnormal event detection and tracking based on effective sparse...
Framework for abnormal event detection and tracking based on effective sparse...
IAESIJAI
 
Design of an effective multiple objects tracking framework for dynamic video ...
Design of an effective multiple objects tracking framework for dynamic video ...
IAESIJAI
 
A systematic assertive wide-band routing using location and potential aware t...
A systematic assertive wide-band routing using location and potential aware t...
IAESIJAI
 
Implementation of global navigation satellite system software defined radio b...
Implementation of global navigation satellite system software defined radio b...
IAESIJAI
 
Deep learning approach for forensic facial reconstruction depends on unidenti...
Deep learning approach for forensic facial reconstruction depends on unidenti...
IAESIJAI
 
A multi-core makespan model for parallel scientific workflow execution in clo...
A multi-core makespan model for parallel scientific workflow execution in clo...
IAESIJAI
 
Multi platforms fake accounts detection based on federated learning
Multi platforms fake accounts detection based on federated learning
IAESIJAI
 
A novel energy efficient data gathering algorithm for wireless sensor network...
A novel energy efficient data gathering algorithm for wireless sensor network...
IAESIJAI
 
Real-time anomaly detection in electric motor operation noise
Real-time anomaly detection in electric motor operation noise
IAESIJAI
 
Implications of artificial intelligence chatbot models in higher education
Implications of artificial intelligence chatbot models in higher education
IAESIJAI
 
Improving performance of air quality monitoring: a qualitative data analysis
Improving performance of air quality monitoring: a qualitative data analysis
IAESIJAI
 
Artificial intelligence for deepfake detection: systematic review and impact ...
Artificial intelligence for deepfake detection: systematic review and impact ...
IAESIJAI
 
Impact of federated learning and explainable artificial intelligence for medi...
Impact of federated learning and explainable artificial intelligence for medi...
IAESIJAI
 
Systematic review of artificial intelligence with near-infrared in blueberries
Systematic review of artificial intelligence with near-infrared in blueberries
IAESIJAI
 
Application of classification algorithms for smishing detection on mobile dev...
Application of classification algorithms for smishing detection on mobile dev...
IAESIJAI
 
A review of machine learning methods to build predictive models for male repr...
A review of machine learning methods to build predictive models for male repr...
IAESIJAI
 
Methodology applied to computer audit with artificial intelligence: a systema...
Methodology applied to computer audit with artificial intelligence: a systema...
IAESIJAI
 
Application of artificial intelligence in music generation: a systematic review
Application of artificial intelligence in music generation: a systematic review
IAESIJAI
 
Artificial intelligence ethics: ethical consideration and regulations from th...
Artificial intelligence ethics: ethical consideration and regulations from th...
IAESIJAI
 
Ad

Recently uploaded (20)

FME for Distribution & Transmission Integrity Management Program (DIMP & TIMP)
FME for Distribution & Transmission Integrity Management Program (DIMP & TIMP)
Safe Software
 
The Future of Technology: 2025-2125 by Saikat Basu.pdf
The Future of Technology: 2025-2125 by Saikat Basu.pdf
Saikat Basu
 
cnc-processing-centers-centateq-p-110-en.pdf
cnc-processing-centers-centateq-p-110-en.pdf
AmirStern2
 
Edge-banding-machines-edgeteq-s-200-en-.pdf
Edge-banding-machines-edgeteq-s-200-en-.pdf
AmirStern2
 
Powering Multi-Page Web Applications Using Flow Apps and FME Data Streaming
Powering Multi-Page Web Applications Using Flow Apps and FME Data Streaming
Safe Software
 
Crypto Super 500 - 14th Report - June2025.pdf
Crypto Super 500 - 14th Report - June2025.pdf
Stephen Perrenod
 
You are not excused! How to avoid security blind spots on the way to production
You are not excused! How to avoid security blind spots on the way to production
Michele Leroux Bustamante
 
Viral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Viral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Puppy jhon
 
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
caoyixuan2019
 
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Safe Software
 
Improving Data Integrity: Synchronization between EAM and ArcGIS Utility Netw...
Improving Data Integrity: Synchronization between EAM and ArcGIS Utility Netw...
Safe Software
 
“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...
“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...
Edge AI and Vision Alliance
 
Connecting Data and Intelligence: The Role of FME in Machine Learning
Connecting Data and Intelligence: The Role of FME in Machine Learning
Safe Software
 
9-1-1 Addressing: End-to-End Automation Using FME
9-1-1 Addressing: End-to-End Automation Using FME
Safe Software
 
From Manual to Auto Searching- FME in the Driver's Seat
From Manual to Auto Searching- FME in the Driver's Seat
Safe Software
 
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Alliance
 
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
Safe Software
 
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
Fwdays
 
PyCon SG 25 - Firecracker Made Easy with Python.pdf
PyCon SG 25 - Firecracker Made Easy with Python.pdf
Muhammad Yuga Nugraha
 
ENERGY CONSUMPTION CALCULATION IN ENERGY-EFFICIENT AIR CONDITIONER.pdf
ENERGY CONSUMPTION CALCULATION IN ENERGY-EFFICIENT AIR CONDITIONER.pdf
Muhammad Rizwan Akram
 
FME for Distribution & Transmission Integrity Management Program (DIMP & TIMP)
FME for Distribution & Transmission Integrity Management Program (DIMP & TIMP)
Safe Software
 
The Future of Technology: 2025-2125 by Saikat Basu.pdf
The Future of Technology: 2025-2125 by Saikat Basu.pdf
Saikat Basu
 
cnc-processing-centers-centateq-p-110-en.pdf
cnc-processing-centers-centateq-p-110-en.pdf
AmirStern2
 
Edge-banding-machines-edgeteq-s-200-en-.pdf
Edge-banding-machines-edgeteq-s-200-en-.pdf
AmirStern2
 
Powering Multi-Page Web Applications Using Flow Apps and FME Data Streaming
Powering Multi-Page Web Applications Using Flow Apps and FME Data Streaming
Safe Software
 
Crypto Super 500 - 14th Report - June2025.pdf
Crypto Super 500 - 14th Report - June2025.pdf
Stephen Perrenod
 
You are not excused! How to avoid security blind spots on the way to production
You are not excused! How to avoid security blind spots on the way to production
Michele Leroux Bustamante
 
Viral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Viral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Puppy jhon
 
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
caoyixuan2019
 
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Safe Software
 
Improving Data Integrity: Synchronization between EAM and ArcGIS Utility Netw...
Improving Data Integrity: Synchronization between EAM and ArcGIS Utility Netw...
Safe Software
 
“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...
“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...
Edge AI and Vision Alliance
 
Connecting Data and Intelligence: The Role of FME in Machine Learning
Connecting Data and Intelligence: The Role of FME in Machine Learning
Safe Software
 
9-1-1 Addressing: End-to-End Automation Using FME
9-1-1 Addressing: End-to-End Automation Using FME
Safe Software
 
From Manual to Auto Searching- FME in the Driver's Seat
From Manual to Auto Searching- FME in the Driver's Seat
Safe Software
 
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Alliance
 
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
Safe Software
 
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
Fwdays
 
PyCon SG 25 - Firecracker Made Easy with Python.pdf
PyCon SG 25 - Firecracker Made Easy with Python.pdf
Muhammad Yuga Nugraha
 
ENERGY CONSUMPTION CALCULATION IN ENERGY-EFFICIENT AIR CONDITIONER.pdf
ENERGY CONSUMPTION CALCULATION IN ENERGY-EFFICIENT AIR CONDITIONER.pdf
Muhammad Rizwan Akram
 

Vehicle make and model recognition using mixed sample data augmentation techniques

  • 1. IAES International Journal of Artificial Intelligence (IJ-AI) Vol. 12, No. 1, March 2023, pp. 137~145 ISSN: 2252-8938, DOI: 10.11591/ijai.v12.i1.pp137-145  137 Journal homepage: http://ijai.iaescore.com Vehicle make and model recognition using mixed sample data augmentation techniques Talha Anwar1 , Seemab Zakir2 1 Center of Chiropractic Research, New Zealand College of Chiropractic, Auckland 1149, New Zealand 2 Department of Engineering Technology, Foundation University, Rawalpindi, Pakistan Article Info ABSTRACT Article history: Received Sep 28, 2021 Revised Jul 7, 2022 Accepted Aug 5, 2022 Vehicle identification based on make and model is an integral part of an intelligent transport system that helps traffic monitoring and crime control. Much research has been performed in this regard, but most of them used manual feature extraction or ensemble convolution neural networks (CNNs) that result in increased execution time during inference. This paper compared three deep learning models and utilized different augmentation techniques to achieve state-of-the-art performance without ensembling or fusing the models. Experimentations are made without any augmentation, with standard augmentation, and by mixed sample data augmentation techniques. Gradient accumulation and stochastic weighted averaging with mixed precision are used to have a large batch size that helped to reduce training time. The dataset comprised 48 vehicles’ models running on the road of Pakistan. The highest accuracy and F1 score of 97% and 95% using the FMix augmentation technique with EfficientNetV2-S architecture gave the confidence that the proposed solution can be implemented in production. Keywords: Deep learning Mixed data augmentation Vehicle identification system This is an open access article under the CC BY-SA license. Corresponding Author: Talha Anwar Center of Chiropractic Research, New Zealand College of Chiropractic Auckland 1149, New Zealand Email: [email protected] 1. INTRODUCTION Vehicle identification system (VIS), an integral component of the intelligent transport system (ITS), brings ease to the traffic management system and helps against criminal activities. VIS is widely used in road violation detection, traffic congestion alarm, and unmanned driving. Millions of vehicles are on the road in big cities, making it challenging to track a particular vehicle. The vehicles' number plate is mostly used to track them [1], but number plates can be changed easily, leading to false identification. VIS also helps automate tax collection at toll plazas based on vehicle type. With the advent of artificial intelligence (AI), deep learning has been widely used in transportation [2] Some recent studies used traditional imaging techniques such as haar-like features with AdaBoost classifier [3] and pattern descriptors with support vector classifier [4]. The pattern descriptors study used local binary patterns, median binary patterns, directional gradient patterns, and local arc patterns as features. Kiran et al. also studied different colour spaces such as red, green and blue (RGB), green (Y), blue (Cb), red (Cr) (YcbCr) and hue, saturation, value (HSV) for descriptor extraction [4] haar-like features-based study first removed shadows using HSV colour space to reduce the chances of false detection. Different single feature methods, such as colour moment, local binary pattern (LBP) features, Hu moment features, angle features, and circularity are also used. Using Adaboost 85.8% accuracy is achieved [3]. Qiu et al. [5] compared the performance of haar features along with convolution neural network (CNN). Using haar-like
  • 2.  ISSN: 2252-8938 Int J Artif Intell, Vol. 12, No. 1, March 2023: 137-145 138 features, 86.72% and 91.86% precision and recall are achieved, which increased by 5.63% and 0.2% with CNN [5]. Gholamalinejad and Khosravi proposed a novel CNN architecture composed of CNN layers with squeeze-and-excitation (SE) modules. Instead of using classic max pooling or average pooling, they used haar wavelet as a pooling layer [6]. The data is composed of 5 classes, including bus, heavy truck, medium truck and pickup. They achieved an accuracy of 95.1% [6]. Ajitha et al. proposed a shallow CNN model with traditional augmentation techniques such as flip, rotation, shear, crop and zoom, resulting in an accuracy of 92.3% [7]. Mansor et al. [8] achieved an accuracy of 95% with 4 class classification problems. Their work is based on emergency vehicle type classification and had images of fire trucks, police cars, ambulances and standard cars [8]. Hassan et al. compared different classifiers with cyclic learning rate and used the MixUp image augmentation technique to achieve an accuracy of 93.96% through ensembling homogeneous models of DenseNet201 [9]. Though the CNN-based model has gained much attention in recent years, manual feature-based classification is still being studied recently. Chen detected multiple features from the vehicle, such as taillight features, shadow area features and other descriptors. Radial basis function (RBF) artificial neural network is further used for classification and achieved 97% accuracy [10]. Another manual feature- based study used histogram-oriented gradients (HOG) and ant colony optimization (ACO) to classify vehicles and achieved an accuracy of 90% [11]. All the existing studies either deal with a few vehicle models, manual features extraction or used ensemble models in which multiple models are tested during inference resulting in increased prediction time. As the VIS is implemented in real-time, it needs to be robust. Keeping in view the limitation, we proposed a single network-based approach that yields the state of the art performance. Three different models and five augmentations techniques are compared. All the experiments are seeded for the purpose of reproducibility. The main contributions of this paper are, − Different deep learning architectures are compared without using any augmentation technique, with commonly used and mixed sample data augmentation techniques (MSDA). − Ensemble and fusion of different models increase the inference time, so the approach used a single model that performed better than the existing ensembled models. − The proposed approach achieved state-of-the-art performance with 97% and 95% accuracy and F1 score, respectively. The paper is organized: The introduction, motivation, and literature review on vehicle classification are presented in section 1. Section 2 describes the methodology in detail. Section 3 deals with results and discussion. The conclusion is made in section 4. The implementation is publicly available at GitHub [12]. 2. METHOD 2.1. Dataset We used images of common cars running on the road of Pakistan [13]. There are 3,103 and 752 training and test images divided into 48 car models/classes. Figure 1 shows the sample image. Table 1 shows the vehicle name and the number of images available for training for each vehicle. 2.2. Transformation Transformation is a technique to produce variation in the data. It helps to generalize prediction on test data and avoid over-fitting the model. Albumentation [14] library is used for this purpose. Following the main standard Augmentation used for applied transformations: − Resize: all images are resized to 256×256 − Center crop: crop all images are centre cropped to 224×224 − Horizontal Flip: fifty per cent of images are horizontally flipped − Vertical Flip: fifty per cent of images are flipped vertically − Shift scale rotate: fifty per cent of images are randomly shifted, rotated, and scaled in height and width. − CLAHE: contrast limited adaptive histogram equalization (CLAHE) is a modified form of adaptive histogram equalization. In histogram equalization, the intensity range of the image is stretched between 0 and 255 to improve the contrast of the image. However, this led to either too dark or too bright picture. Adaptive histogram handled this issue by dividing the image into small patches and applied histogram equalization on each patch. This sometimes led to over-amplification of contrast if the image has noise. CLAHE performed bi-linear interpolation on the edges of patches and reduced this contrast amplification by removing the artificial boundaries. − Cutout: cutout is one of the ways to handle over-fitting. In this technique, black boxes are introduced in images, making the image classification hard, and reduced the chances of over-fitting. − Normalization: normalization led to fast convergence and speeds up the training process.
  • 3. Int J Artif Intell ISSN: 2252-8938  Vehicle make and model recognition using mixed sample data augmentation techniques (Talha Anwar) 139 Figure 1. Sample vehicles image from each class label, the number on each image corresponds to the vehicle ID in Table 1 Table 1. Vehicle models and the number of images for that models. ID column is related to Figure 1. No. shows number of training examples for that model ID Vehicle model No 1 Daiatsu Core 80 2 Daiatsu Hijet 44 3 Daiatsu Mira 81 4 FAW V2 29 5 FAW XPV 26 6 Honda BRV 27 7 Honda city 1994 32 8 Honda city 2000 69 9 Honda City aspire 105 10 Honda civic 1994 16 11 Honda civic 2005 34 12 Honda civic 2007 74 13 Honda civic 2015 31 14 Honda civic 2018 82 15 Honda Grace 21 16 Honda Vezell 38 17 KIA Sportage 25 18 Suzuki alto 2007 132 19 Suzuki alto 2019 56 20 Suzuki alto japan 2010 27 21 Suzuki carry 13 22 Suzuki cultus 2018 269 23 Suzuki cultus 2019 108 24 Suzuki Every 20 25 Suzuki highroof 63 26 Suzuki kyber 52 27 Suzuki liana 33 28 Suzuki margala 16 29 Suzuki Mehran 195 30 Suzuki swift 118 31 Suzuki wagonR 2015 112 32 Toyota hiace 2000 23 33 Toyota Aqua 77 34 Toyota axio 20 35 Toyota corolla 2000 39 36 Toyota corolla 2007 82 37 Toyota corolla 2011 127 38 Toyota corolla 2016 270 39 Toyota fortuner 43 40 Toyota Hiace 2012 72 41 Toyota Landcruser 17 42 Toyota Passo 61 43 Toyota pirus 23 44 Toyota Prado 21 45 Toyota premio 18 46 Toyota Vigo 53 47 Toyota Vitz 81 48 Toyota Vitz 2010 48
  • 4.  ISSN: 2252-8938 Int J Artif Intell, Vol. 12, No. 1, March 2023: 137-145 140 2.3. Mixed sample data augmentation Large neural networks are notorious for memorizing data instead of learning it even in strong regularization and fail during inference. Though standard data augmentation helped in generalization, this technique is data-dependent and required domain knowledge. Anwar and Zakir [15] studied that standard augmentation sometimes led to poor results. They explored different image augmentation techniques on electrocardiogram (ECG) graphs and found that the best results are obtained without applying any augmentation. CNN focused on the discriminative part of the image instead of the whole image leading to poor generalization. Regional dropout techniques such as the CutOut helped the CNN to view the bigger image perspective, but this reduced the proportion of informative pixels of training data [16]. Mixed Sample data augmentation (MSDA) techniques are introduced to overcome standard augmentation and generalization issues. MSDA mixed different distributions of data to produce new data from the same distribution of existing data. It is categorized into two policies, interpolation and masking. MixUp is an example of interpolation, whereas CutMix and FMix are an example of masking MSDA. 2.3.1. Mixup MixUp mixed two images from different classes and linearly interpolated them to produce a new image. It not only interpolated the input images' features but also interpolated the corresponding target [17]. The working principle of MixUp is shown in (1) and (2), 𝑥 ̃ = 𝜆𝑥𝑖 + (1 − 𝜆)𝑥𝑗 (1) 𝑦 ̃ = 𝜆𝑦𝑖 + (1 − 𝜆)𝑦𝑗 (2) xi and xj are raw images in (1) and yi and yj are the one-hot encoded labels in (2). λ drawn from β distribution is used to mix two random images. MixUp increased the capability of deep learning architectures to learn from corrupted labels and improved the generalization. Linear interpolation of input images reduced the memorization by large deep learning models [18]. 2.3.2. CutMix Cutout and MixUp inspired CutMix paper. It claimed to resolve the issues in MixUp. Though MixUp improved classification performance, the resulting sample is unnatural. CutMix replaced an image patch with a patch of another random picture from the training data [16]. It is like a cutout where a patch is replaced with zeros and MixUp where two images are mixed. 𝑥 ̃ = 𝑀𝑥𝑖 + (1 − 𝑀)𝑥𝑗 (3) Patch mixing in training images is shown in (3). M is a binary mask indicating where the dropout rectangular region should be placed. Then this rectangular dropout region is replaced by a patch of another image. Mixing of one-hot encoded labels is the same as in the MixUp technique. CutMix focused on the less discriminative part of the object, whereas Mixup focused on the entire image but produced unnatural artefacts. 2.3.3. FMix CutMix reduced overfitting by increasing the observable data points without changing the data distribution. However, CutMix used square patches, which is a limitation and leads to distortion. FMix claimed to resolve the issue in CutMix by using binary masks obtained by applying a threshold to low- frequency images from the Fourier space. The authors first sampled low-frequency grayscaled masks from Fourier space and then converted them to binary masks using a threshold. Once a binary mask is obtained, two images from different classes are overlaid together, such as 0 pixels of binary mask corresponded to one image and pixels with 1 value of binary mask is related to another image from a different class. FMix, unlike CutMix, proposed patches of different shapes which maximize the number of possible masks [19]. Overall, when data is limited and learning from individual examples is easier, MixUp is a good candidate, and FMix is a better choice when data is abundant. In Figure 2, MixUp shows that two images are mixed together in an overlay fashion. CutMix shows that a square patch of another image replaces a square patch. FMix shows that another image from the training data replaced a randomly shaped patch of an image. 2.4. Deep learning architecture Deep learning is a subset of artificial intelligence that takes the complex raw data as input, automatically extracts valuable features, and performs task-relevant work such as classification or regression.
  • 5. Int J Artif Intell ISSN: 2252-8938  Vehicle make and model recognition using mixed sample data augmentation techniques (Talha Anwar) 141 In image classification, deep learning boomed in 2014 after VGGNet came out. Though before VGG, AlexNet was there, VGG16 outperformed it by 10%. At that time, it was believed that increasing the layer increased the performance of the model, until in December 2015, ResNet paper was released and proved that adding layers helped to some extent and started decreasing the performance beyond that [20]. To date, ResNet or ResNet variants are one of the most used architecture; therefore, we decided to use ResNet as our baseline. Figure 2. Mixed sample data augmented images of two cars 2.4.1. ResNet Ideally, a deeper neural network is preferable as it yields better results. Nevertheless, this comes with the cost of vanishing gradient and degradation. By increasing the depth of the neural network, the gradients became very small during back-propagation and reached zero; this phenomenon is known as vanishing gradient. Though this problem can be resolved using the rectified linear units (ReLU) activation function, skip connection also played a role. Skip connection back-propagates the gradient of larger magnitude by skipping some layers in between. ResNet paper explained that further deepening neural network led to a significant error rate characterized by degradation. Adding layers saturated the model, and the error rate started increasing. It is believed that if a shallow network is working fine, the additional deep layers should work the same though it did not happen, and deep networks start performing poorly. So, an identity function is added from a shallow layer to a deeper layer, and the model started learning that identity function. In ResNet, this identity function ensured that the deep network output should be identical to the shallow network. ResNet paper named this identity function as skip connections that skip some layers and pass information directly to other layers by an identity function. In the worst case, the performance of a deeper network will not be worse than a shallow network, and in the best scenario, it can be better than the shallow network [20]. Multiple ResNet variants are described by network size and the number of layers skipped by the skip connections. We used ResNet-50 as it is neither tiny to underfit nor very large to overfit. 2.4.2. DenseNet DenseNet was proposed in 2018 by Huang et al. [21]. Based on the observation, if there is a shorter connection between input and output layers, the model can be deeper, more accurate, and more efficient to train. DenseNet is based on dense blocks and transition layers. In dense blocks, each coming layer received collective information from all previous layers both directly and indirectly. Similarly, in back-propagation, the error signal collectively flowed to all layers. For each layer, the feature maps of all previous layers are considered output, and the output of that layer is considered as input for all subsequent layers. For the sake of downsampling to reduce network size, a transition layer between two dense blocks is used. This layer is composed of a 1×1 convolution filter preceded and followed by batch normalization and an average pooling layer. We used DenseNet 121 in this study. 2.4.3. EfficientNetV2 Most of the deep learning architecture either scaled the depth such as ResNet by increasing the number of layers or width by adding more neurons/filters in each layer, for example, wide ResNet [22]. Wider networks learn more detailed features and are easier to train because they are usually shallower However, shallower and wider networks have an issue in learning high-level features. Some networks used high-resolution images such as InceptionV3 which used 299×299 image size [23]. Scaling a specific dimension such as depth, width, and resolution increase accuracy up to a limit. EfficientNet in 2019 claimed that its depth, width and resolution should be scaled proportionally to make a deeper network more effective. So the authors proposed a compound scaling method to scale width, depth and resolution proportionally [24].
  • 6.  ISSN: 2252-8938 Int J Artif Intell, Vol. 12, No. 1, March 2023: 137-145 142 EfficientNetV2 in June 2021 is one of the latest proposed models and is known for faster training speed [25]. This model is based on training awareness neural architecture search (NAS) and progressive scaling. It is observed that small image sizes require less regularization as compared to large image sizes. So the authors started with small image size and increased the size progressively. They used EfficientNet as their backbone architecture and applied the NAS strategy, though the authors removed unnecessary search options to reduce the search space. This paper used a small kernel size of 3×3 and added more layers to compensate for the reduced receptive field. Other tweaks are applied to reduce the memory access overhead in EfficientNet, such as removing the last stride layer. In our study, EfficientNetV2-S is used. 2.5. Explainability of MSDA techniques To understand the impact of MSDA techniques, we used gradient-weighted class activation mapping (Grad-CAM) that explained which area of an image is focused by a network to decide the label class. Grad-CAM produced a localization heatmap of the target by utilizing its gradient against the last convolution layers and highlighted the essential regions of the image [26]. To generate Grad-CAM PyTorch library for CAM methods is used [27]. 2.6. Additional information Fifty epochs are trained with a learning rate and batch size of 0.001 and 48, respectively. AdamW optimizer is used instead of Adam as it provides better results [15]. Pytorch Lightning framework is used for implementation. Accuracy, macro F1 score, precision and recall are used for evaluation. Mixed precision, gradient accumulation, and stochastic weight averaging (SWA) techniques are used to speed up the training time. Gradient accumulation is a technique to train the model with larger batch sizes by updating weights after some batches instead of every batch. SWA helps to generalize the model, whereas Mixed precision reduces training time up to 8x [28] by allowing a large batch size. 3. RESULTS AND DISCUSSION This paper deals with the identification of commonly used vehicles in Pakistan. Table 2 shows the performance of different augmentation techniques with three deep learning architectures. Without using any augmentation technique, an F1 score of 88%,91%, and 90% is achieved using ResNet-50, DenseNet121 and EfficientNetV2-S, respectively. When standard augmentations are applied, the F1 score increased in all three models, which shows the impact of data augmentation. With MixUp augmentation techniques in which two images are mixed together in an overlay fashion, there is not much difference in the F1 score of different deep learning models compared with standard augmentations. When CutMix is applied, there is 1% increment in accuracy obtained using EfficientNet and ResNet. FMix augmentation technique achieved the highest accuracy and F1 score in all deep learning models. EfficientNetV2 with FMix augmented input resulted in accuracy and F1 score of 97% and 95%, respectively. With EfficientNetV2 this is a 2% increment in F1 score compared to MixUp and CutMix augmentation techniques. Without augmentation, the macro F1 score is 90% which increased by 5% with FMix augmentation technique. These MSDA augmentation techniques are applied without standard augmentation to study the impact of MSDA augmentations alone. Figure 3 shows validation loss using five different augmentation techniques. The lowest validation loss is achieved using FMix augmentation technique when EfficientNetV2-S model is used. EfficientNetV2-S also showed the second-lowest curve with the CutMix MSDA technique. CutMix and MixUp produced similar results in standard augmentation, but FMix outperformed them in all three deep learning architectures. Figure 4 shows the heatmap generated by the Grad-CAM technique. MixUp techniques paid attention to most parts of the car's front, but its focus is diverged. On the other hand, CutMix focused on the right front headlight, but its span of coverage is less. FMix covered both aspects, its heatmap is more focused and spread over the front area. It helped the model visualize and focus broader region while making a decision and providing better results. The existing studies are either based on manual features extraction [3] or multiple ensemble models [9] resulted in reduced performance during inference. The proposed solution is robust during inference but has some limitations during training. The more the augmentation, the more time a model needs to train itself because an image undergoes a series of transformations before feeding to the neural network. We observed that MSDA augmentation takes time to do the mathematical calculation of image mixing. However, no augmentations are applied during test time, making the model robust during the inference. The limitation of standard augmented CNN or features-based classifiers is adversarial image attacks. Manipulating certain car parts can make CNN fool, and it would not predict the vehicle. On the other hand, MSDA techniques heavily altered the image by placing other pictures on it; thus, there would be minimal chances of adversarial attacks. FMix resolved the issues of CutMix which is inspired by MixUp, so
  • 7. Int J Artif Intell ISSN: 2252-8938  Vehicle make and model recognition using mixed sample data augmentation techniques (Talha Anwar) 143 theoretically, FMix should have better performance [19]. Practically this is proved as FMix augmentation got 1%, 2% and 2% accuracy improvement in EfficientNetV2-S, DenseNet121 and ResNet50 as compared to CutMix, respectively. Table 2. Model performance using different augmentations techniques ResNet-50 DenseNet121 EfficientNet Techniques F1 Prec Rec Acc F1 Pre Rec Acc F1 Prec Rec Acc None 88% 90% 87% 92% 91% 94% 91% 94% 90% 92% 88% 94% Standard 90% 91% 90% 93% 92% 93% 91% 94% 93% 95% 92% 95% MixUp 90% 94% 89% 94% 91% 94% 90% 94% 93% 96% 92% 95% CutMix 91% 94% 90% 95% 91% 94% 90% 95% 93% 96% 92% 96% FMix 93% 94% 92% 95% 94% 95% 94% 97% 95% 96% 95% 97% Prec: precision, Rec: recall, F1: f1 score, Acc: accuracy Figure 3. Validation loss using different architectures and augmentation techniques. Three different subplots with a common axis show three deep learning architectures. Five different patterns show five different augmentation methods Figure 4. Grad-CAM heatmap for MSDA augmentation techniques
  • 8.  ISSN: 2252-8938 Int J Artif Intell, Vol. 12, No. 1, March 2023: 137-145 144 4. CONCLUSION In this paper, different augmentation techniques are studied to achieve the state of art results. Unlike other studies that used manual feature extraction such as edge detection or haar features, this study used end- to-end CNN to extract and classify features automatically. Ensemble models are not used because they are not feasible for deployment because of time complexity and inference time limitations. Five augmentation scenarios are used, such as no augmentation, standard augmentation, and three mixed sample data augmentation techniques. Three deep learning algorithms such as ResNet, DenseNet and EfficientNet are used. All five augmentation techniques and three CNN architectures are compared. Mixed sample data augmentation techniques helped to achieve state-of-the-art performance using an EfficientNetV2-S model on a dataset comprised of 48 models of vehicles running on the roads of Pakistan. Further, the heatmap of MSDA techniques are compared to understand the learning of deep learning model. FMix image augmentation with EfficientNetV2 resulted in the highest F1 score of 95%, which is 5% better if no augmentation is applied and 2% better if standard commonly used augmentation techniques are used. REFERENCES [1] P. N. Huu and C. V. Quoc, “Proposing WPOD-NET combining SVM system for detecting car number plate,” IAES International Journal of Artificial Intelligence (IJ-AI), vol. 10, no. 3, p. 657, Sep. 2021, doi: 10.11591/ijai.v10.i3.pp657-665. [2] Y. Wang, D. Zhang, Y. Liu, B. Dai, and L. H. Lee, “Enhancing transportation systems via deep learning: a survey,” Transportation Research Part C: Emerging Technologies, vol. 99, pp. 144–163, 2019, doi: 10.1016/j.trc.2018.12.004. [3] L. Zhang, J. Wang, and Z. An, “Vehicle recognition algorithm based on Haar-like features and improved Adaboost classifier,” Journal of Ambient Intelligence and Humanized Computing, 2021, doi: 10.1007/s12652-021-03332-4. [4] V. Keerthi Kiran, S. Dash, and P. Parida, “Vehicle recognition using extensions of pattern descriptors,” in IOP Conference Series: Materials Science and Engineering, 2021, vol. 1166, no. 1, p. 12046, doi: 10.1088/1757-899x/1166/1/012046. [5] L. Qiu, D. Zhang, Y. Tian, and N. Al-Nabhan, “Deep learning-based algorithm for vehicle detection in intelligent transportation systems,” Journal of Supercomputing, vol. 77, no. 10, pp. 11083–11098, 2021, doi: 10.1007/s11227-021-03712-9. [6] H. Gholamalinejad and H. Khosravi, “Vehicle classification using a real-time convolutional structure based on DWT pooling layer and SE blocks,” Expert Systems with Applications, vol. 183, 2021, doi: 10.1016/j.eswa.2021.115420. [7] P. Ajitha, S. Jeyakumar, Y. N. Krishna K, and A. Sivasangari, “Vehicle model classification using deep learning,” in Proceedings of the 5th International Conference on Trends in Electronics and Informatics, ICOEI 2021, 2021, pp. 1544–1548, doi: 10.1109/ICOEI51242.2021.9452842. [8] M. A. Hakim Bin Che Mansor, N. A. Mohamad Kamal, M. H. Bin Baharom, and M. Adib Bin Zainol, “Emergency vehicle type classification using convolutional neural network,” in 2021 IEEE International Conference on Automatic Control and Intelligent Systems, I2CACIS 2021 - Proceedings, 2021, pp. 126–129, doi: 10.1109/I2CACIS52118.2021.9495899. [9] A. Hassan, M. Ali, N. M. Durrani, and M. A. Tahir, “An empirical analysis of deep learning architectures for vehicle make and model recognition,” IEEE Access, vol. 9, pp. 91487–91499, 2021, doi: 10.1109/ACCESS.2021.3090766. [10] X. Chen, H. Chen, and H. Xu, “Vehicle detection based on multifeature extraction and recognition adopting RBF neural network on ADAS system,” Complexity, vol. 2020, 2020, doi: 10.1155/2020/8842297. [11] R. S. El-Sayed and M. N. El-Sayed, “Classification of vehicles’ types using histogram oriented gradients: comparative study and modification,” IAES International Journal of Artificial Intelligence, vol. 9, no. 4, pp. 700–712, 2020, doi: 10.11591/ijai.v9.i4.pp700-712. [12] T. Anwar, “Pak vehicle classification,” GitHub repository. 2021. [13] M. Ali, M. A. Tahir, and M. N. Durrani, “Vehicle images dataset for make and model recognition,” Data in Brief, vol. 42, p. 108107, Jun. 2022, doi: 10.1016/j.dib.2022.108107. [14] A. Buslaev, V. I. Iglovikov, E. Khvedchenya, A. Parinov, M. Druzhinin, and A. A. Kalinin, “Albumentations: fast and flexible image augmentations,” Information (Switzerland), vol. 11, no. 2, 2020, doi: 10.3390/info11020125. [15] T. Anwar and S. Zakir, “Effect of image augmentation on ECG image classification using deep learning,” in 2021 International Conference on Artificial Intelligence, ICAI 2021, 2021, pp. 182–186, doi: 10.1109/ICAI52203.2021.9445258. [16] S. Yun, D. Han, S. Chun, S. J. Oh, J. Choe, and Y. Yoo, “CutMix: regularization strategy to train strong classifiers with localizable features,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, vol. 2019-Octob, pp. 6022–6031, doi: 10.1109/ICCV.2019.00612. [17] H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz, “Mixup: beyond empirical risk minimization,” Apr. 2018, doi: 10.48550/arXiv.1710.09412. [18] D. Liang, F. Yang, T. Zhang, and P. Yang, “Understanding mixup training methods,” IEEE Access, vol. 6, pp. 58774–58783, 2018, doi: 10.1109/ACCESS.2018.2872698. [19] E. Harris, A. Marcu, M. Painter, M. Niranjan, A. Prügel-Bennett, and J. Hare, “FMix: Enhancing Mixed Sample Data Augmentation,” arXiv preprint, Feb. 2020. [20] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2016, pp. 770–778, doi: 10.1109/cvpr.2016.90. [21] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jul. 2017, pp. 4700–4708, doi: 10.1109/cvpr.2017.243. [22] S. Zagoruyko and N. Komodakis, “Wide residual networks,” in British Machine Vision Conference 2016, BMVC 2016, 2016, vol. 2016-Septe, pp. 87.1--87.12, doi: 10.5244/C.30.87. [23] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016, vol. 2016-Decem, pp. 2818–2826, doi: 10.1109/CVPR.2016.308. [24] M. Tan and Q. V Le, “EfficientNet: rethinking model scaling for convolutional neural networks,” arXiv preprint, May 2019, doi: 10.48550/arXiv.1905.11946. [25] M. Tan and Q. V Le, “EfficientNetV2: smaller models and faster training,” arXiv preprint, 2021.
  • 9. Int J Artif Intell ISSN: 2252-8938  Vehicle make and model recognition using mixed sample data augmentation techniques (Talha Anwar) 145 [26] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-CAM: visual explanations from deep networks via gradient-based localization,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, vol. 2017-Octob, pp. 618–626, doi: 10.1109/ICCV.2017.74. [27] J. Gildenblat et al., “PyTorch library for CAM methods,” GitHub, 2021. [28] S. Narang et al., “Mixed precision training,” in 6th International Conference on Learning Representations, ICLR 2018 - Conference Track Proceedings, 2018, pp. 1–12. BIOGRAPHIES OF AUTHORS Talha Anwar is an AI researcher having a Master's degree in Data Science from FAST, National University, Pakistan. He obtained Bachelor's Degree in Biomedical Engineering from Riphah International University in 2018. His research is in biomedical image analysis, biosignal analysis, particularly in the area of brain-computer interface. He has a special interest in social text analysis in the field of NLP. He is equally interested in machine learning and deep learning and has several publications in this domain. Talha is actively involved in research and working with Centre for Chiropractic Research, New Zealand College of Chiropractic, Auckland 1060, New Zealand. All of his research is available at github.com/talhaanwarch. He can be contacted at email: [email protected]. Seemab Zakir has Bachelor's and Masters's degrees in biomedical engineering from Riphah International University, Pakistan. She has experience in conducting labs on biomedical engineering subjects, particularly programming, machine learning, and instrumentation. She has also served as a biomedical engineer at Pak-Austria Fachhochschule: Institute of Applied Sciences. She was a lecturer at Foundation University School of Science and Technology, Pakistan. Currently, she is a Ph.D. scholar at Scuola Superiore Sant'Anna Pisa, Italy. Her areas of interest are biomedical instrumentation and artificial intelligence. She can be contacted at email: [email protected].