Ref 9
Ref 9
https://doi.org/10.1007/s12652-021-03488-z
ORIGINAL RESEARCH
Received: 7 April 2021 / Accepted: 31 August 2021 / Published online: 17 September 2021
© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2021
Abstract
Image classification is getting more attention in the area of computer vision. During the past few years, a lot of research has
been done on image classification using classical machine learning and deep learning techniques. Presently, deep learning-
based techniques have given stupendous results. The performance of a classification system depends on the quality of features
extracted from an image. The better is the quality of extracted features, the more the accuracy will be. Although, numerous
deep learning-based methods have shown enormous performance in image classification, still due to various challenges deep
learning methods are not able to extract all the important information from the image. This results in a reduction in overall
classification accuracy. The goal of the present research is to improve the image classification performance by combining the
deep features extracted using popular deep convolutional neural network, VGG19, and various handcrafted feature extrac-
tion methods, i.e., SIFT, SURF, ORB, and Shi-Tomasi corner detector algorithm. Further, the extracted features from these
methods are classified using various machine learning classification methods, i.e., Gaussian Naïve Bayes, Decision Tree,
Random Forest, and eXtreme Gradient Boosting (XGBClassifier) classifier. The experiment is carried out on a benchmark
dataset Caltech-101. The experimental results indicate that Random Forest using the combined features give 93.73% accuracy
and outperforms other classifiers and methods proposed by other authors. The paper concludes that a single feature extractor
whether shallow or deep is not enough to achieve satisfactory results. So, a combined approach using deep learning features
and traditional handcrafted features is better for image classification.
1 Introduction
* Monika Bansal Image classification is considered as the main research
[email protected]
topic in the area of computer vision and artificial intelli-
Munish Kumar gence. Image classification works on correctly identifying
[email protected]
an object in an image. Earlier, various machine learning
Monika Sachdeva algorithms were used to solve this problem. Various hand-
[email protected]
crafted feature extraction methods were adopted to acquire
Ajay Mittal the features from the image. The features used for image
[email protected]; [email protected]
classification may be local, global, or both. Then, single or
1
SSD Women Institute of Technology, Bathinda, Punjab, ensemble machine learning classification algorithms are
India employed to classify the images based on color, shape, tex-
2
Department of Computational Sciences, Maharaja Ranjit ture, or some other feature. In the current era, the deep learn-
Singh Punjab Technical University, Bathinda, Punjab, India ing has given outstanding results in all the applications of
3
Department of Computer Science and Engineering, computer vision like image classification, object detection,
I.K.G. Punjab Technical University, Mohali Campus‑1, security, image processing, etc. Deep learning is a subset
Mohali, India of machine learning. In the deep learning approach, both
4
Department of Computer Science and Engineering, feature extraction and classification are done automatically
University Institute of Engineering and Technology, Panjab to classify the images having similar objects. There is no
University, Chandigarh, India
13
Vol.:(0123456789)
3610 M. Bansal et al.
need for the researchers to perform both the tasks manually feature extraction methods with the Random Forest classi-
as done in classical machine learning. Deep learning uses a fier outperforms with high recognition accuracy, precision,
collection of various neural layers to process a huge amount recall, the area under curve (AUC), and low false-positive
of data. So, it is also known as a Deep Convolutional Neural rate, root means squared error, and CPU time. The proposed
Network (DeepCNN). fusion feature extraction system achieves 93.73% recogni-
This system is modeled on the architecture of the human tion accuracy which outperforms the approaches given by
brain. Just like the human brain that functions on a mesh of many researchers. The paper exhibits all the performance
neurons, deep learning processes the data through the net- measure outcomes using the proposed approach as Precision
work of neural layers, filters outliners, spots familiar entities, (93.70%), Recall (93.73%), F1_score (93.22%), Area Under
and produces the final output i.e., the label of the object. A Curve (96.79%), False Positive Rate (0.15%), Root Mean
description of the functioning of classical machine learn- Square Error (20.05%), Average CPU Time (0.39 min).
ing and deep learning for image classification is depicted The rest of the paper is organized as. In Sect. 2, the prob-
in Fig. 1. lem that occurred in the use of the deep neural network is
However, recognizing the correct class of a given object described for image classification. Section 3 lists the related
is very challenging due to the low resolution of the image, work done by various authors. Sections 4, 5 describes the
inadequate extraction of local and/or global features, geo- feature extraction algorithms used in the experiment. Sec-
metric variation, etc. While considering these issues, the tion 6 mentions the machine learning classification algo-
paper reveals that no single feature extraction algorithm can rithms used in the experiment. In Sect. 7, the techniques
classify a wide range of images accurately. The Caltech-101 used in the proposed system are explained. The results based
dataset is considered for the experiment as it is one of the on experiments are demonstrated in Sect. 8 and the whole
most challenging datasets. This dataset contains 101 classes paper is concluded in Sect. 9.
and 1 background scene class. Each class has 40–800 images
and this dataset has a total of 9146 images. In the paper,
a supervised learning approach is adopted where fea- 2 Challenges
ture extraction through a pre-trained deep learning model
(VGG19 model) is done and these features are further clas- Over the past few years, deep Convolutional Neural Net-
sified using various state-of-art classifiers (Naïve Bayes, work (CNN) has revealed tremendous results in the area of
Decision tree, Random Forest, XGBClassifier). These days, computer vision. But still, the researchers are facing many
the VGG19 model has shown good performance for image challenges to execute the CNN model. The proposed sys-
classification. But the experimental results show that even tem is implemented to resolve the issues that have arisen
this model is not enough for accurate image classification. for the image classification task using a deep neural net-
So, a fusion of deep features (pre-trained VGG19) and vari- work. The first challenge is to design a network model. CNN
ous state-of-art handcrafted features extraction algorithms is designed with many layers, so they require millions of
(SIFT, SURF, ORB, and Shi-Tomasi corner detector) is parameters to learn during the training phase. Designing a
experimented in the paper to provide very high recognition CNN model from the scratch demands a few resources for
rates on the Caltech-101 dataset. The combined feature vec- the execution, such as a large memory capacity, a fast pro-
tor is further classified using various state-of-art machine cessor, a huge dataset, enormous power consumption, etc.
learning classification algorithms (Gaussian Naïve Bayes, Deep learning needs an extremely large memory capacity
Decision tree, Random Forest, XGBClassifier). A standard as deep learning extracts a huge amount of data during the
data-partitioning strategy is followed for the study in which feature extraction phase. Basically, deep learning evaluates
70% of the images of each class are considered in the train- the value for each pixel of the image using various math-
ing dataset and the rest 30% are used in the testing dataset. ematical operations. Deep learning takes a lot of time for the
The experiment proved that the fusion of the above five computation (can be many hours or many days) depending
13
Transfer learning for image classification using VGG19: Caltech‑101 image data set 3611
on the computational capabilities of the hardware. So, power implemented on MNIST and CIFAR-10 dataset and proved
backup is required to make it a continuous process. Deep the best results.
learning algorithms cannot be implemented on the general Srivastava et al. (2017) proposed an ensemble of local
CPU system rather they need GPUs and TPUs enabled sys- and deep features for image classification. They compared
tems. These systems are very expensive and are not easily various pre-trained convolutional neural networks for feature
affordable. Deep learning works well with a large collection extraction. A combined feature extraction approach is fol-
of data. The accuracy depends on the size of data which is lowed using SIFT and various pre-trained neural networks.
very difficult to assemble in the real world. Even it makes the The proposed model is trained using an SVM classifier that
use of data augmentation to consider the various aspects of is followed by a majority voting scheme to recognize the
the image and to increase the size of the dataset, but still, it image. The model is evaluated on the CIFAR-10 dataset and
does not help to achieve the satisfactory results. achieved 91.8% accuracy. Shaha and Pawar (2018) proposed
In image classification, deep learning shows adequate a fusion of the deep learning model (VGG19) for feature
results in high-resolution images. It uses various pre-pro- extraction and support vector machine (SVM) for image
cessing steps before feature extraction, but still, it is not able classification. They compared different neural models, i.e.,
to extract the accurate global features of the image. Several AlexNet, VGG16, VGG19 for feature extraction and fine-
state-of-art deep learning methods are highly sensitive to tuned these models over GHIM10K and Caltech256 data-
translation, scaling, and rotation. Data augmentation has sets for image classification. VGG19 architecture showed
resolved this issue in the neural networks to some extent, better performance results over AlexNet and VGG16 that
but this increases the size of the dataset that will again need are represented by using three evaluation parameters, i.e.,
more storage capacity and computation time. Keeping these precision, recall, and F-score. Mingyuan and Wang (2019)
issues in view, there is still a demand for handcrafted feature used the CNN model for feature extraction and presented
extraction methods. Deep learning extracts low-level features a comparative analysis among various classification algo-
that help to acquire the best results, but these are not enough rithms—CNN, SVM, RF, DT, KNN, NB and GBDT for
for image classification. Therefore, the proposed system uses image classification. Pandey et al. (2018) proposed Common
a fusion of features extracted using a pre-trained model of Sense Knowledge (CSK) by embedding three deep learning
deep learning, i.e., VGG19, and various handcrafted fea- models using CNN, R-CNN and R-FCN for object detection.
ture extraction algorithms, i.e., SIFT, SURF, ORB, and Shi- The experiment has been conducted to aid smart mobility.
Tomasi corner detector for image classification. Singh and Singh (2019) presented a fusion of various hand-
crafted features for image classification. They made a com-
parative analysis of the proposed work over a deep neural
network (DNN) i.e., AlexNet and achieved high accuracy.
3 Related work They also exhibited various challenges of image classifi-
cation that cannot be solved with the AlexNet model. The
Kataoka et al. (2015) demonstrated a description of the fea- experiment was taken on five dataset- PASCAL VOC2005,
ture evaluation on various deep learning networks for object Soccer, SIMPLIcity, Flower, and Caltech-101. Yadav (2019)
recognition and detection. They experimented that VGGNet evaluated the performance of the CNN based model using
architecture performed over AlexNet architecture. Further, VGG16 and inception over the traditional image classifica-
they carried out feature tuning by concatenating some layers tion model using ORB and SVM. The experiment has been
of both the architectures and transformed them using Prin- conducted on various medical images. Transfer learning is
cipal Component Analysis (PCA). Caltech101 and Daim- used to improve the accuracy of the image classification. The
lerPedestrian Benchmark Datasets are used for the experi- experiment using transfer learning achieved the best results
ment and achieved 91.8% accuracy. Mahmood et al. (2017) on chest X-ray images.
presented a hybrid approach for image classification where Garg et al. (2020) proposed an object detection system,
the ResNet model is used for feature extraction and then named, CK-SNIFFER to automatically identifies a large
extracted features are fine-tuned using PCA-SVM for image number of errors based on common sense knowledge.
classification. Four datasets are taken for experiments are Karthikeyan et al. (2020) investigated the transfer learning
MIT-67, MLC, Caltech-101, and Caltech-256. The model approach on a huge dataset of X-ray images from patients
was trained by using 30 images from each class and out- with common bacterial pneumonia, confirmed COVID-19
performs other methods. Ren et al. (2017) implemented a cases and healthy cases with three pre-trained models—
combined approach for image classification where features VGG16, VGG19 and RestNet101. They achieved the best
are acquired using Convolutional Neural Network (CNN) results with the proposed approach. Talaat et al. (2020)
architecture and eXtreme Gradient Boost (XGBClassifier) proposed an improved hybrid approach for image classifi-
Classifier for recognition of the image. The experiment was cation using CNN for feature extraction and swarm-based
13
3612 M. Bansal et al.
(b) Ensemble of deep feature extraction using VGG19 model and machine learning classification
Fig. 2 a Architecture of VGG19 model. b Ensemble of deep feature extraction using VGG19 model and machine learning classification
13
Transfer learning for image classification using VGG19: Caltech‑101 image data set 3613
into this model and the model outputs the label of the object criteria. Shi and Tomasi (1994) presented this approach that
in the image. In the paper, features are extracted through helps in better corner detection than Harris ‘algorithm and
a pre-trained VGG19 model, but for classification, various achieves better accuracy. Corners are used as global features
machine learning approach is followed. As the CNN model of the image that identify the shape of the object which aids
computes huge parameters after feature extraction, there is in the object recognition task.
a need for dimensionality reduction to minimize the size of
the feature vector as shown in Fig. 2b. The dimensionality
reduction is done with Locality Preserving Projection that 5 Feature dimension reduction techniques
is followed by a classification method.
The following techniques are employed to select the impor-
4.2 Scale invariant feature transform (SIFT) tant features from the large set of features (Varde et al. 2007)
and to diminish the dimensions of the feature vector that are
SIFT is one of the most widely used shape feature extrac- obtained using above mentioned feature extraction methods
tion algorithm. The algorithm is a key point detector and as the large size of the feature vector will cause the problem
descriptor algorithm proposed by Lowe (2004) to extract key of overfitting.
interest points from the image. It is highly robust towards the
orientation and scaling of an image. It is invariant to illu-
mination changes. It extracts the maximum interest points 5.1 k‑means clustering
(features) even from low-resolution images. SIFT extracts
128 features from an image through a filtering approach k-means clustering is a distance-based algorithm where dis-
which functions in four stages. The first stage detects the tance is evaluated between the centroid of the cluster and
important locations from the image using the Difference-of- the key descriptors of the object using Euclidean distance
Gaussian (DoG) algorithm. Then localization is performed or max–min method. k-means clustering follows a number
to determine the important features. This is followed by the of steps as:
computation of directions of gradients that makes the algo-
rithm invariant to rotation. In the last stage, the computed 1. k is used as the number of clusters that is to be chosen
keypoints are converted into a feature vector of size 128. randomly.
2. k descriptors are selected randomly from a set of n
4.3 Speed up robust features (SURF) descriptors of an object as centroids.
3. All key descriptors are assigned to the closest cluster
SURF is a variant of the SIFT algorithm that is used as the centroid.
keypoints detector and descriptor. It is developed by Bay 4. Cluster centroid is recomputed from the newly formed
et al. (2006). The interest points of an image are detected clusters.
by approximating the Laplacian-of-Gaussian (LoG) with a 5. The process of updating cluster centroid goes on till fur-
Box filter. These detected keypoints are represented with ther there is no change in centroid.
the Hessian matrix. SURF is more invariant to geometric 6. Finally, k clusters are obtained according to closest
and photometric displacement deformation. SURF creates points and the mean of each cluster is computed. The
a feature vector of 64 or 128 dimensions. resultant k values are used as a reduced feature vector.
ORB algorithm is a local feature extraction algorithm that is Locality preserving projection (LPP) is a linear dimensional-
presented by Rublee et al. (2011). The ORB uses a pyramid ity reduction algorithm. LPP retains the local neighborhood
scheme with a FAST keypoint detector and a BRIEF key- information of the data set by discarding undesired data.
point descriptor that is followed by a Harris corner detector LPP operates in three steps as follows.
(Harris and Stephens 1988). This algorithm is faster than
SIFT and SURF. It is also robust to noise, scale, rotation, 1. An adjacency graph is constructed by placing an edge
translation. between nodes i and j where the distance between these
two nodes is very less.
4.5 Shi Tomasi corner detector 2. Weights are chosen for each edge using two variations—
Heat kernel and Simple-minded.
This algorithm is a variation of the Harris Corner Detec- 3. Finally, an Eigenmaps is designed by computing eigen-
tor algorithm where a slight change is done in the selection vectors and eigenvalues.
13
3614 M. Bansal et al.
The outcome of the LPP method lessens the size of the has shown better outcomes as compared to the decision tree
feature vector by considering the important ones and dis- but still, it has some drawbacks. This method is difficult to
carding unused data points. interpret, and the time taken by it is more as compared to
the decision tree.
Various state-of-art classification methods are used for XGB Classifier stands for eXtreme Gradient Boosting
image classification. Each method has its own merits and Classifier which is a boosting algorithm based on Gradient
demerits. Some methods work very fast while some present Boosting Classifier. This method is proposed by Chen and
more accuracy. In the paper, we have analyzed the results of Guestrin (2016). XGB Classifier is an ensemble classifier
image classification using various well-known classification that uses the regularization technique to reduce the problem
methods- Decision Tree, Gaussian Naïve Bayes, Random of overfitting. This method outperforms a Gradient Boost-
Forest, and XGB Classifier. These methods are described ing Algorithm but compared to the above-mentioned clas-
as follows. sifiers, this takes more time to classify the data. Recently,
this method has gained very popularity. It boosts the perfor-
6.1 Gaussian Naïve Bayes mance of an ensemble classification algorithm by making
the stronger model from numerous weaker models using an
Gaussian Naïve Bayes is an extended version of the Naïve iterative approach.
Bayes algorithm that adopted the Probability approach. In
Naïve Bayes, prediction of test data is computed using the
distribution of data wherein Gaussian Naïve Bayes, predic- 7 Proposed methodology
tion of the test data given a class is obtained from Gaussian
distribution by employing the mean and standard devia- The architecture of the proposed system is depicted in
tion of the data. Gaussian Naïve Bayes is the simple and Fig. 3. The proposed method is based on a combination of
most popular probability-based approach used for image deep learning features and traditionally handcrafted feature
classification. extraction algorithms. The experiment analyzed the perfor-
mance of the image classification system with deep learning
6.2 Decision Tree features and ensemble of deep features and various tradi-
tional handcrafted feature extraction methods. The proposed
The decision tree algorithm is proposed by Quinlan in 1986. system is used to represent that rather deep learning has
It is a tree-based approach used for classification where all gained worldwide popularity, but still, it does not fully sup-
the features considered are placed at the root. Similar fea- port the system of image classification on the Caltech-101
tures are grouped in one category and taken as the nodes. dataset. The proposed model works in two phases: feature
The decision tree is recursive in nature as these nodes are extraction and image classification.
further subdivided into various nodes representing simi- The first phase of feature extraction consists of three com-
lar features. The process of splitting the trained data into ponents to create a feature vector:
nodes continues till there is no further division possible. The
classes are represented by the tree leaves. This algorithm has 1. Using the pre-trained model in Keras i.e., VGG19 and
many advantages as it is very simple to understand, interpret. various handcrafted methods in OpenCV i.e., SIFT,
It is very fast to execute and shows better accuracy for image SURF, ORB, and Shi_Tomasi corner detector to extract
classification. But there is a problem of overfitting in the the features of images.
decision tree. It can create over-complex trees which do not 2. Using k-means clustering in OpenCV to select the
generalize the data well. important features and obtain a 64-dimensional feature
vector for every descriptor.
6.3 Random Forest 3. Using Locality Preserving Projection to diminish the
feature vector of size 64 into 8 components.
Random Forest algorithm (developed by Kleinberg in 1996)
is an ensemble tree that comprises many decision trees. In During the first phase, a combined feature vector having
this algorithm, various decision trees are made on subsam- a total of 40 features is computed that is followed by a clas-
ples of the trained data, and averaging of all the results is sification task.
computed to obtain more predictive accuracy. Random For- The second phase is Image Classification where the
est has also solved the problem of overfitting. This algorithm performance of the recognition system is evaluated after
13
Transfer learning for image classification using VGG19: Caltech‑101 image data set 3615
applying various machine learning classification algorithms, Table 1 A representation of Label Feature
i.e., Gaussian Naïve Bayes, Decision Tree, Random Forest, labelling for various feature extraction
extraction methods
and XGB Classifier. During this process, a model is built method
using standard data partitioning strategy (i.e., 70:30) where
F1 VGG19
70% of the images of each class is used for training purpose
F2 SIFT
and remaining 30% of the images are used for testing the
F3 SURF
images for recognition. The performance of the model is
F4 ORB
predicted on the test dataset.
F5 Shi-Tomasi
8 Experimental results
The images in the dataset have low resolution and are noisy.
This section discusses the evaluation results on the Table 1 shows the labels used for various feature extrac-
Caltech-101 dataset. Caltech-101 is one of the most chal- tion algorithms. Various performance measures are used to
lenging multiclass datasets for the image classification represent the comparison among these feature extraction
problem. It consists of 102 categories where it comprises methods for image classification. Due to the multi-class
total of 9146 images. Out of 102 categories, one category is dataset, macro-averaging has been adopted in the experiment
of Background scene which is not used in the experiment. as macro-averaging estimates the performance by averag-
So, the experiment is done on 101 categories having 8678 ing the predictive results of each class. The experiments are
images. The dataset is unbalanced as each category con- analyzed using eight parameters, i.e., accuracy, precision,
tains a different number of images nearly 40–800 images. recall, F1-score, False Positive Rate (FPR), Area Under
13
3616 M. Bansal et al.
Curve (AUC), root mean square error (RMSE), and CPU presents average CPU time which depicts the average execu-
execution time. A standard data partitioning methodology tion time of object recognition increases with the number
is adopted for the experiment in which 70% images from of features. All the tables witness the improvement in all
each class are considered in the training data and the rest performance measures due to the ensemble of VGG19 and
30% of images from all the classes are used for the analysis all handcrafted methods. Table 7 demonstrates a detailed
of the proposed system as test data. The experiment also comparison among various classifiers on all the performance
demonstrates a comparative analysis, among various state- parameters. This shows that the proposed combined feature
of-art classifiers i.e., Gaussian Naïve Bayes, Decision Tree, vector is more advantageous than a single feature extrac-
Random Forest, and XGB Classifier. tion method. All the experiments have been performed on
In Table 2, a comparison between various feature extrac- a machine with Microsoft Windows 10 Operating System
tion methods is represented using recognition accuracy. (original) and Intel Core i3 processor with 4 GB RAM.
These comparative results have been graphically presented Recently, various researchers have analyzed various ensem-
using Fig. 4 which clearly describes the performance of the ble approaches for image classification due to the improvement
proposed system. Table 3 shows the comparison using Pre- in accuracy results. Table 8 shows a comparative analysis of
cision, Table 4 presents False Positive Rate (FPR), Table 5 the proposed system with some recent experiments on the
presents Root Mean Square Error (RMSE), and Table 6 Caltech-101 dataset. Through the comparison, it is observed
Table 2 Quantitative Features Gaussian Naïve Decision tree Random forest XGB Classifier
comparison among deep Bayes
learning, various handcrafted
and ensemble feature extraction F1 55.37 54.49 57.47 63.13
methods (Classifier wise
F1 + F2 67.65 68.51 70.31 73.02
recognition accuracy (in %))
F1 + F3 64.69 63.52 66.79 71.59
F1 + F4 71.84 72.62 75.08 78.29
F1 + F5 70.37 70.93 72.88 76.64
F1 + F2 + F3 75.04 77.29 76.63 80.38
F1 + F2 + F4 80.71 80.93 82.24 83.74
F1 + F2 + F5 78.01 78.07 79.48 82.76
F1 + F3 + F4 78.46 80.58 81.01 84.16
F1 + F3 + F5 75.03 77.05 77.31 83.00
F1 + F4 + F5 81.77 83.68 84.88 84.55
F1 + F2 + F3 + F4 86.32 88.47 89.65 88.86
F1 + F2 + F3 + F5 82.69 83.90 84.00 87.87
F1 + F2 + F4 + F5 89.16 89.29 90.23 90.35
F1 + F3 + F4 + F5 85.63 88.33 89.42 88.84
F1 + F2 + F3 + F4 + F5 92.05 92.67 93.73 93.02
Bold face of text depicting the maximum accuracy achieved in each table
13
Transfer learning for image classification using VGG19: Caltech‑101 image data set 3617
Table 3 Quantitative Features Gaussian Naïve Decision Tree Random Forest XGB Classifier
comparison among deep Bayes
learning, various handcrafted
and ensemble feature extraction F1 53.47 53.67 57.51 61.62
methods (Classifier wise
F1 + F2 66.13 67.86 68.96 71.40
precision (in %))
F1 + F3 63.15 62.91 64.85 69.63
F1 + F4 70.68 72.04 73.35 77.09
F1 + F5 68.75 70.69 72.28 75.59
F1 + F2 + F3 74.46 76.64 76.63 79.70
F1 + F2 + F4 80.37 80.97 81.78 83.58
F1 + F2 + F5 77.05 77.56 78.18 82.22
F1 + F3 + F4 77.56 79.70 79.93 83.63
F1 + F3 + F5 73.85 76.54 76.80 82.57
F1 + F4 + F5 80.92 83.58 84.17 83.61
F1 + F2 + F3 + F4 86.16 88.63 90.13 88.99
F1 + F2 + F3 + F5 81.89 83.36 83.24 87.96
F1 + F2 + F4 + F5 89.42 89.50 90.38 90.47
F1 + F3 + F4 + F5 85.17 87.86 89.16 88.77
F1 + F2 + F3 + F4 + F5 91.90 92.66 93.70 93.10
Bold face of text depicting the maximum accuracy achieved in each table
Table 4 Quantitative Features Gaussian Naïve Decision Tree Random Forest XGB Classifier
comparison among deep Bayes
learning, various handcrafted
and ensemble feature extraction F1 0.56 0.55 0.50 0.50
methods (Classifier wise false
F1 + F2 0.46 0.43 0.41 0.41
positive rate (in %))
F1 + F3 0.43 0.43 0.39 0.38
F1 + F4 0.36 0.33 0.30 0.29
F1 + F5 0.37 0.37 0.34 0.32
F1 + F2 + F3 0.34 0.31 0.31 0.29
F1 + F2 + F4 0.29 0.28 0.26 0.24
F1 + F2 + F5 0.29 0.29 0.28 0.26
F1 + F3 + F4 0.31 0.28 0.27 0.24
F1 + F3 + F5 0.33 0.31 0.30 0.26
F1 + F4 + F5 0.27 0.25 0.24 0.23
F1 + F2 + F3 + F4 0.24 0.21 0.20 0.20
F1 + F2 + F3 + F5 0.25 0.24 0.24 0.21
F1 + F2 + F4 + F5 0.20 0.20 0.19 0.17
F1 + F3 + F4 + F5 0.23 0.21 0.19 0.20
F1 + F2 + F3 + F4 + F5 0.18 0.17 0.15 0.15
Bold face of text depicting the maximum accuracy achieved in each table
13
3618 M. Bansal et al.
Table 5 Quantitative Features Gaussian Naïve Decision Tree Random Forest XGB Classifier
comparison among deep Bayes
learning, various handcrafted
and ensemble feature extraction F1 31.21 31.24 29.38 30.49
methods (Classifier wise root
F1 + F2 27.11 26.76 25.63 25.96
mean square error (in %))
F1 + F3 29.96 29.84 28.38 28.64
F1 + F4 25.78 24.75 23.35 22.71
F1 + F5 25.87 26.62 26.04 24.87
F1 + F2 + F3 25.99 24.95 24.61 24.72
F1 + F2 + F4 24.26 23.67 22.88 21.88
F1 + F2 + F5 24.14 23.88 23.66 23.66
F1 + F3 + F4 25.77 24.47 23.88 22.92
F1 + F3 + F5 25.93 25.66 25.12 24.41
F1 + F4 + F5 22.78 22.38 21.42 21.20
F1 + F2 + F3 + F4 22.84 21.63 21.76 21.32
F1 + F2 + F3 + F5 23.43 22.65 22.87 22.45
F1 + F2 + F4 + F5 21.32 21.41 20.51 19.60
F1 + F3 + F4 + F5 22.14 21.56 20.72 21.15
F1 + F2 + F3 + F4 + F5 21.01 20.92 20.05 20.01
Bold face of text depicting the maximum accuracy achieved in each table
Table 6 Quantitative comparison among number of features (Classi- accuracy of the proposed system and Mahmood et al. (2017)
fier wise average execution (CPU) Time (in seconds)) under rotation and scaling conditions in their next experiment.
Number of Gauss- Decision Tree Random XGB
Features ian Naïve Forest Classifier
Bayes 9 Conclusion
1 0.00 0.01 0.22 2.39
2 0.00 0.01 0.29 3.84 In this article, an analysis of various feature extraction
3 0.00 0.02 0.30 5.32 techniques is discussed that includes a deep learning
4 0.00 0.02 0.34 6.54 model (VGG19) and various handcrafted feature extrac-
5 0.00 0.02 0.39 7.66 tion methods, i.e., SIFT, SURF, ORB, and Shi-Tomasi
corner detector algorithm. A survey on various classi-
fication methods, i.e., Gaussian Naïve Bayes, Decision
that accuracy achieved by Mahmood et al. (2017) is higher but Tree, Random Forest, and XGB Classifier is also con-
Singh and Singh (2019) in their paper has experimented that ducted in the paper. The investigation indicates that the
the recognition accuracy of the convolutional neural network ensemble method for feature extraction performs better
decreases under rotation and scaling. Considering the method- than a single feature extraction method. The results show
ology of Singh and Singh (2019), the authors will examine the that feature extraction using a popular method VGG19 is
still not enough for image classification. The experiment
Table 7 Quantitative Performance measures Gaussian Naïve Decision Tree Random Forest XGB Classifier
comparison analysis of all Bayes
performance measurements
Accuracy 92.05% 92.67% 93.73% 93.02%
Precision 91.90% 92.66% 93.70% 93.10%
Recall 92.05% 92.67% 93.73% 93.02%
False positive rate 0.18% 0.17% 0.15% 0.15%
F1-score 91.85% 92.54% 93.22% 92.84%
AUC 95.94% 96.25% 96.79% 96.43%
RMSE 21.01% 20.92% 20.05% 20.01%
Average execution time 0.00 min 0.02 min 0.39 min 7.66 min
13
Transfer learning for image classification using VGG19: Caltech‑101 image data set 3619
Author Year Features Technique used Num- Accuracy (%) Time (min)
ber of
classes
confirms that the proposed method is very powerful and Kleinberg EM (1996) An overtraining-resistant stochastic modeling
consistently outperforms the methods proposed by other method for pattern recognition. Ann Stat 24(6):2319–2349
Kumar V, Recupero DR, Riboni D, Helaoui R (2021) Ensembling clas-
researchers. The paper has also notified on various chal- sical machine learning and deep learning approaches for morbid-
lenges that occurred in the image classification task. This ity identification from clinical notes. IEEE Access 9:7107–7126.
article will help other researchers to explore other com- https://doi.org/10.1109/ACCESS.2020.3043221
bined approaches for image classification, also using vari- Liu L, Xie C, Wang R, Yang P, Sudirman S, Zhang J, Li R, Wang F
(2020) Deep learning based automatic multi-class wild pest moni-
ous latest deep learning models. toring approach using hybrid global and local activated features.
IEEE Trans Ind Inf 17(11):7589–7598
Declarations Lowe DG (2004) Distinctive image features from scale-invariant key-
points. Int J Comput vis 60(2):91–110
Conflict of interest The authors declare that they have no conflict of in- Mahmood A, Bennamoun M, An S, Sohel F (2017) Resfeats: residual
terest in this work. The authors have employed a public dataset, name- network based features for image classification. In: 2017 IEEE
ly, Caltech-101 for performing the experiments in the considered work. International conference on image processing (ICIP). https://doi.
org/10.1109/icip.2017.8296551
Mingyuan X, Wang Y (2019) Research on image classification model
based on deep convolution neural network. EURASIP J Image
Video Process 2019(1):1–11
References Pandey A, Puri M, Varde A (2018) Object detection with neural mod-
els, deep learning and common sense to aid smart mobility. In:
2018 IEEE 30th international conference on tools with artificial
Bay H, Tuytelaars T, Van-Gool L (2006) Surf: speeded up robust fea- intelligence (ICTAI), pp 859–863. https://d oi.o rg/1 0.1 109/I CTAI.
tures. In: Proceedings of the European conference on computer 2018.00134
vision, pp 404–417 Ren X, Guo H, Li S, Wang S, Li J (2017) A novel image classifica-
Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. tion method with CNN-XGBoost model. Lect Notes Comput Sci.
In: Proceedings of the 22nd ACM SIGKDD international confer- https://doi.org/10.1007/978-3-319-64185-0_28
ence on knowledge discovery and data mining, pp 785–794 Rublee E, Rabaud V, Konolige K, Bradski GR (2011) ORB: an efficient
Garg A, Tandon N, Varde A (2020) I am guessing you can’t recognize alternative to SIFT or SURF. Int Conf Comput vis 11(1):2
this: generating adversarial images for object detection using spa- Seemendra A, Singh R, Singh S (2021) Breast cancer classification
tial commonsense (student abstract). Proc AAAI Conf Artif Intell using transfer learning. In: Evolving Technologies for computing,
34(10):13789–13790. https://doi.org/10.1609/aaai.v34i10.7166 communication and smart world, pp 425–436. Springer
Harris C, Stephens M (1988) A combined corner and edge detector. In: Shaha M, Pawar M (2018) Transfer learning for image classification.
Proceedings of the Fourth Alvey vision conference, pp 147–151 In: 2018 Second International conference on electronics, com-
Karthikeyan D, Varde AS, Wang W (2020) Transfer learning for deci- munication and aerospace technology (ICECA). https://doi.org/
sion support in Covid-19 detection from a few images in big data. 10.1109/iceca.2018.8474802
IEEE Int Conf Big Data (big Data) 2020:4873–4881. https://doi. Shi J, Tomasi S (1994) Good features to track. In: Proceedings of
org/10.1109/BigData50022.2020.9377886 IEEE conference on computer vision and pattern recognition, pp
Kataoka H, Iwata K, Satoh Y (2015) Feature evaluation of deep con- 593–600
volutional neural networks for object recognition and detection. Simonyan K, Zisserman A (2014) Very deep convolutional networks
https://arxiv.org/abs/1509.07627 for large-scale image recognition. http://arxiv.org/abs/1409.1556
13
3620 M. Bansal et al.
Singh C, Singh J (2019) Geometrically Invariant color, shape and tex- Varde A, Rundensteiner E, Javidi G, Sheybani E, Liang J (2007) Learn-
ture features for object recognition using multiple kernel learn- ing the relative importance of features in image data. In: 2007
ing classification approach. Inf Sci. https://doi.org/10.1016/j.ins. IEEE 23rd international conference on data engineering work-
2019.01.058 shop, pp 237–244. https://d oi.o rg/1 0.1 109/I CDEW.2 007.4 40099 8
Srivastava S, Mukherjee P, Lall B, Jaiswal K (2017) Object clas- Yadav SS, Jadhav SM (2019) Deep convolutional neural network based
sification using ensemble of local and deep features. In: 2017 medical image classification for disease diagnosis. J Big Data
ninth international conference on advances in pattern recognition 6:113. https://doi.org/10.1186/s40537-019-0276-2
(ICAPR), pp 1–6. IEEE
Talaat A, Yousri D, Ewees A, Al-qaness MAA, Damasevicius R, Publisher's Note Springer Nature remains neutral with regard to
Elaziz MEA (2020) COVID-19 image classification using deep jurisdictional claims in published maps and institutional affiliations.
features and fractional-order marine predators’ algorithm. Sci Rep
10(1):15364. https://doi.org/10.1038/s41598-020-71294-2
13