Imp.-Image Category Classification Using Deep Learning-MATLAB
Imp.-Image Category Classification Using Deep Learning-MATLAB
https://www.mathworks.com/help/vision/examples/image-category-classification-using-deep-learning.html
Overview
A Convolutional Neural Network (CNN) is a powerful machine learning technique from the field of deep
learning. CNNs are trained using large collections of diverse images. From these large collections, CNNs
can learn rich feature representations for a wide range of images. These feature representations often
outperform hand-crafted features such as HOG, LBP, or SURF. An easy way to leverage the power of
CNNs, without investing time and effort into training, is to use a pre-trained CNN as a feature extractor.
In this example, images from Caltech 101 are classified into categories using a multiclass linear SVM
trained with CNN features extracted from the images. This approach to image category classification
follows the standard practice of training an off-the-shelf classifier using features extracted from images.
For example, the Image Category Classification Using Bag Of Features example uses SURF features
within a bag of features framework to train a multiclass SVM. The difference here is that instead of using
image features such as HOG or SURF, features are extracted using a CNN. And, as this example will
show, the classifier trained using CNN features provides close to 100% accuracy, which is higher than the
accuracy achieved using bag of features and SURF.
Note: This example requires Neural Network Toolbox, Statistics and Machine Learning Toolbox, and
Neural Network Toolbox Model for AlexNet Network .
Using a CUDA-capable NVIDIA GPU with compute capability 3.0 or higher is highly recommended for
running this example. Use of a GPU requires the Parallel Computing Toolbox.
function DeepLearningImageClassificationExample
Load Images
Instead of operating on all of Caltech 101, which is time consuming, use three of the categories:
airplanes, ferry, and laptop. The image category classifier will be trained to distinguish amongst these six
categories.
rootFolder = fullfile(outputFolder, '101_ObjectCategories');
% rootFolder = fullfile(outputFolder, '101_ObjectCategories');
categories = {'diabet', 'normal', 'proliferativdiabet', 'retinitis'};
% categories = {'airplanes');
Create an ImageDatastore to help you manage the data. Because ImageDatastore operates on
image file locations, images are not loaded into memory until read, making it efficient for use with large
image collections.
% imds = imageDatastore(fullfile(rootFolder, categories), 'IncludeSubfolders'
% , 'LabelSource', 'foldernames');
imds = imageDatastore(fullfile(rootFolder, categories), 'LabelSource',
'foldernames');
The imds variable now contains the images and the category labels associated with each image. The
labels are automatically assigned from the folder names of the image files. Use countEachLabel to
summarize the number of images per category.
tbl = countEachLabel(imds)
tbl =
Label Count
__________________ _____
diabet 61
normal 40
proliferativdiabet 21
retinitis 18
Because imds above contains an unequal number of images per category, let's first adjust it, so that the
number of images in the training set is balanced.
minSetCount = min(tbl{:,2}); % determine the smallest amount of images in a
category
% Notice that each set now has exactly the same number of images.
countEachLabel(imds)
ans =
Label Count
__________________ _____
diabet 61
normal 40
proliferativdiabet 21
retinitis 18
Below, you can see example images from three of the categories included in the dataset.
% Find the first instance of an image for each category
diabet = find(imds.Labels == 'diabet', 1);
normal = find(imds.Labels == 'normal', 1);
diabet_proliferativ = find(imds.Labels == 'proliferativdiabet', 1);
figure
subplot(1,3,1);
imshow(readimage(imds,diabet))
title('diabet')
subplot(1,3,2);
imshow(readimage(imds,normal))
title('normal')
subplot(1,3,3);
imshow(readimage(imds, diabet_proliferativ))
title('diabetproliferativ')
Other popular networks trained on ImageNet include VGG-16 and VGG-19 [3], which can be loaded
using vgg16 and vgg19 from the Neural Network Toolbox.
Name: 'data'
InputSize: [227 227 3]
Hyperparameters
DataAugmentation: 'none'
Normalization: 'zerocenter'
The intermediate layers make up the bulk of the CNN. These are a series of convolutional layers,
interspersed with rectified linear units (ReLU) and max-pooling layers [2]. Following the these layers are 3
fully-connected layers.
The final layer is the classification layer and its properties depend on the classification task. In this
example, the CNN model that was loaded was trained to solve a 1000-way classification problem. Thus
the classification layer has 1000 classes from the ImageNet dataset.
% Inspect the last layer
net.Layers(end)
ans =
Name: 'output'
ClassNames: {11000 cell}
OutputSize: 1000
Hyperparameters
LossFunction: 'crossentropyex'
ans =
1000
Note that the CNN model is not going to be used for the original classification task. It is going to be re-
purposed to solve a different classification task on the Caltech 101 dataset.
I = imread(filename);
% Note that the aspect ratio is not preserved. In Caltech 101, the
% object of interest is centered in the image and occupies a
% majority of the image scene. Therefore, preserving the aspect
% ratio is not critical. However, for other data sets, it may prove
% beneficial to preserve the aspect ratio of the original image
% when resizing.
end
Prepare Training and Test Image Sets
Split the sets into training and validation data. Pick 30% of images from each set for the training data and
the remainder, 70%, for the validation data. Randomize the split to avoid biasing the results. The training
and test sets will be processed by the CNN model.
[trainingSet, testSet] = splitEachLabel(imds, 0.3, 'randomize');
Notice how the first layer of the network has learned filters for capturing blob and edge features. These
"primitive" features are then processed by deeper network layers, which combine the early features to
form higher level image features. These higher level features are better suited for recognition tasks
because they combine all the primitive features into a richer image representation [4].
You can easily extract features from one of the deeper layers using the activations method. Selecting
which of the deep layers to choose is a design choice, but typically starting with the layer right before the
classification layer is a good place to start. In net, this layer is named 'fc7'. Let's extract training features
using that layer.
featureLayer = 'fc7';
trainingFeatures = activations(net, trainingSet, featureLayer, ...
'MiniBatchSize', 32, 'OutputAs', 'columns');
Note that the activations function automatically uses a GPU for processing if one is available, otherwise, a
CPU is used. Because of the number of layers in AlexNet, using a GPU is highly recommended. Using a
the CPU to run the network will greatly increase the time it takes to extract features.
In the code above, the 'MiniBatchSize' is set 32 to ensure that the CNN and image data fit into GPU
memory. You may need to lower the 'MiniBatchSize' if your GPU runs out of memory. Also, the
activations output is arranged as columns. This helps speed-up the multiclass linear SVM training that
follows.
% Train multiclass SVM classifier using a fast linear solver, and set
% 'ObservationsIn' to 'columns' to match the arrangement used for training
% features.
classifier = fitcecoc(trainingFeatures, trainingLabels, ...
'Learners', 'Linear', 'Coding', 'onevsall', 'ObservationsIn', 'columns');
Evaluate Classifier
Repeat the procedure used earlier to extract image features from testSet. The test features can then be
passed to the classifier to measure the accuracy of the trained classifier.
% Extract test features using the CNN
testFeatures = activations(net, testSet, featureLayer, 'MiniBatchSize',32);
1 0 0
0 1 0
0 0 1
categorical
airplanes
References
[1] Deng, Jia, et al. "Imagenet: A large-scale hierarchical image database." Computer Vision and Pattern
Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 2009.
[2] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep
convolutional neural networks." Advances in neural information processing systems. 2012.
[3] Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image
recognition." arXiv preprint arXiv:1409.1556 (2014).
[4] Donahue, Jeff, et al. "Decaf: A deep convolutional activation feature for generic visual recognition."
arXiv preprint arXiv:1310.1531 (2013).
end