DL Lab-III-II
DL Lab-III-II
1
Implement multilayer perceptron algorithm for MNIST Hand written Digit Classification.
Human Visual System is a marvel of the world. People can readily recognize digits. But it is not as simple
as it looks like. The human brain has a million neurons and billions of connections between them, which
makes this exceptionally complex task of image processing easier. People can effortlessly recognize digits.
However, it turns into a challenging task for computers to recognize digits. Simple hunches about how to
recognize digits become difficult to express algorithmically. Moreover, there is a significant variation in
writing from person to person, which makes it immensely complex.
Handwritten digit recognition system is the working of a machine to train itself so that it can recognize
digits from different sources like emails, bank cheque, papers, images, etc.
Google Colab
Google Colab has been used to implement the network. It is a free cloud service that can be used to
develop deep learning applications using popular libraries such as Keras, TensorFlow, PyTorch, and
OpenCV. The most important feature that distinguishes Colab from other free cloud services is; it provides
GPU and is totally free. Thus, if PC is incompatible with hardware requirements or does not support GPU,
then it is the best option because a stable internet connection is the only requirement.
MNIST Datasets
MNIST stands for “Modified National Institute of Standards and Technology”. It is a dataset of 70,000
handwritten images. Each image is of 28x28 pixels i.e. about 784 features. Each feature represents only
one pixel’s intensity i.e. from 0(white) to 255(black). This database is further divided into 60,000 training
and 10,000 testing images.
Phases of Implementation
The MNIST dataset is also part of it. So, we imported it from keras.datasets and loaded it into variable
“objects”. The objects.load_data() method returns us the training data(train_img), its labels(train_lab) and
also the testing data(test_img) and its labels(test_lab). Out of the 70,000 images provided in the dataset,
60,000 are given for training and 10,000 are given for testing.
Before preprocessing the data, we first displayed the first 20 images of the training set with the help of for
loop.
subplot() is used to add a subplot or grid-like structure to the current figure. The first argument is for “no.
of rows”, second for “no. of columns” and third for position index in the grid.
Suppose we have to plot 10 images in the 4x5 grid starting from the second position in the grid. Then, it
will be like
imshow() is used to display data as an image i.e. training image (train_img[i])
whereas cmap stands for the colour map. Cmap is an optional feature. Basically, if the image is in
the array of shape (M, N), then the cmap controls the colour map used to display the values.
cmap=‘gray’ will display image as grayscale while cmap=‘gray_r’ is used to display image as
inverse grayscale.
title() sets title for each image. We have set “Digit: train_lab[i]” as the title for each image in the
subplot.
subplots_adjust() is used for tuning subplot layout. In order to change the space provided between
two rows, we have used hspace. If you want to change space between two columns then you can
use wspace.
By default parameters of the subplot layout are,
In order to hide the axis of the image, plt.axis(‘off’) has been used.
After that, we displayed the shape of training and testing section.
(60000,28,28) means there are 60,000 images in the training set and each image is of size 28x28 pixels.
Similarly, there are 10,000 images of the same size in the testing set.
So each image is of size 28x28 i.e. 784 features, and each feature represents the intensity of each pixel
from 0 to 255.
You can use print(train_img[0]) to print the first training set image in the matrix form of 28x28.
hist() is used to plot the histogram for the first training image i.e. train_img[0]. The image has been
reshaped into a 1-D array of size 784. facecolor is an optional parameter which specifies the colour of the
histogram. Title of the histogram, Y-axis and X-axis have been named as “Pixel vs its intensity”, “PIXEL”
and “Intensity”.
Pre-process the data
Before feeding the data to the network, we will normalize it. Normalizing the input data helps to speed up
the training. Also, it reduces the chance of getting stuck in local optima, since we’re using stochastic
gradient descent to find the optimal weights for the network.
The pixel values are between 0 and 255. So, scaling of input values is good when using neural network
models since the scale is well known and well behaved, we can very quickly normalize the pixel values to
the range 0 and 1 by dividing each value by the maximum intensity of 255.
After normalization,
Creating the model
The output layer has 10 neurons i.e. for each class from 0 to 9. A softmax activation function is used on
the output layer to turn the outputs into probability-like values.
Note: You can add more neurons int the hidden layers. You can even increase the no. of hidden layers int
the model to increase efficiency. However, it will take more time during training.
Next, we need to compile our model. Compiling the model takes three parameters: optimizer, loss and
metrics. The optimizer controls the learning rate. We are using ‘adam’ as our optimizer. It is generally a
good optimizer to use for many cases. It adjusts the learning rate throughout the training.
We will use ‘Sparse_Categorical_Crossentropy’ for our loss function because it saves time in memory
as well as computation since it simply uses a single integer for a class, rather than a whole vector. A lower
In order to determine the accuracy, we will use the ‘accuracy’ metric to see the accuracy score on the
training labels (train_lab) and the number of epochs. The number of epochs is the number of times the
model will cycle through the data. The more epochs we run, the more the model will improve, up to a
certain point. After that point, the model will stop improving during each epoch.
case, the accuracy is computed on the 10,000 testing examples using the network weights given by the
saved model.
verbose = 1, which includes both progress bar and one line per epoch.
verbose = 2, one line per epoch i.e. epoch no./total no. of epochs.
After evaluating the model, we will now check the model for the testing section.
model.predict() is used to do prediction on the testing set.
Now, in order to make a prediction for a new image that is not part of MNIST dataset. We will first create
Above function converts the image into an array of pixels which is fed to the model as an input.
In order to upload a file from local drive, we used the code:
uploaded = files.upload()
It will lead you to select a file. Click on “Choose Files” then select and upload the file and wait for the file
to be uploaded 100%. You will see the name of the file once Colab has uploaded it.
Now, if we want to run the model after a few days then, we will have to run the whole code again, which is
time-consuming.
In that case, you can use the saved model i.e. project.h5
So, before closing the colab notebook, you can download the model from the folder symbol.
Highlighted folder
So, when you try to run the model again, all you have to do is upload project.h5 file from the computer by
uploaded = files.upload()
When the file is 100% uploaded, use the following code & after that, you can predict the digit for new
model=tf.keras.models.load_model(‘project.h5’)
usp=sharing
Experiment 2:
Design a neural network for classifying movie reviews (Binary Classification) using IMDB dataset.
Package requirements
Hide
library(keras) # for deep learning
library(tidyverse) # for dplyr, ggplot2, etc.
library(testthat) # unit testing
library(glue) # easy print statements
Hide
length(reviews_test) # 25K reviews in our test data
[1] 25000
1. start of a sequence
2. unknown words
3. padding
We can map the integer values back to the original word index (dataset_imdb_word_index()). The integer
number corresponds to the position in the word count list and the name of the vector is the actual word.
Hide
word_index <- dataset_imdb_word_index() %>%
unlist() %>%
sort() %>%
names()
# The indices are offset by 3 since 0, 1, and 2 are reserved for "padding",
# "start of sequence", and "unknown"
reviews_train[[1]] %>%
map_chr(~ ifelse(.x >= 3, word_index[.x - 3], "<UNK>")) %>%
cat()
<UNK> this film was just brilliant casting location scenery story direction everyone's really suited the
part they played and you could just imagine being there robert <UNK> is an amazing actor and now the
same being director <UNK> father came from the same scottish island as myself so i loved the fact there
was a real connection with this film the witty remarks throughout the film were great it was just brilliant
so much that i bought the film as soon as it was released for <UNK> and would recommend it to
everyone to watch and the fly fishing was amazing really cried at the end it was so sad and you know
what they say if you cry at a film it must have been good and this definitely was also <UNK> to the two
little boy's that played the <UNK> of norman and paul they were just brilliant children are often left out
of the <UNK> list i think because the stars that play them all grown up are such a big profile for the
whole film but these children are amazing and should be praised for what they have done don't you think
the whole story was so lovely because it was true and was someone's life after all that was shared with us
all
Our response variable is just a vector of 1s (positive reviews) and 0s (negative reviews).
Hide
str(y_train)
int [1:25000] 1 0 0 1 0 0 1 0 1 0 ...
Hide
# our labels are equally balanced between positive (1s) and negative (0s)
# reviews
table(y_train)
y_train
0 1
12500 12500
Our transformed feature set is now just a matrix (2D tensor) with 25K rows and 10K columns (features).
Hide
dim(x_train)
[1] 25000 9999
Initial model
Since we are performing binary classification, our output activation function will be the sigmoid
activation function ℹ️. Recall hat the sigmoid activation is used to predict the probability of the output
being positive. This will constrain our output to be values ranging from 0-100%.
Hide
network <- keras_model_sequential() %>%
layer_dense(units = 16, activation = "relu", input_shape = n_features) %>%
layer_dense(units = 16, activation = "relu") %>%
layer_dense(units = 1, activation = "sigmoid")
Hide
summary(network)
Model: "sequential"
_____________________________________________________________________________________
____
Layer (type) Output Shape Param #
===========================================================================
==============
dense (Dense) (None, 16) 160000
_____________________________________________________________________________________
____
dense_1 (Dense) (None, 16) 272
_____________________________________________________________________________________
____
dense_2 (Dense) (None, 1) 17
===========================================================================
==============
Total params: 160,289
Trainable params: 160,289
Non-trainable params: 0
_____________________________________________________________________________________
____
We’re going to use binary crossentropy since we only have two possible classes.
Hide
network %>% compile(
optimizer = "rmsprop",
loss = "binary_crossentropy",
metrics = "accuracy"
)
Now let’s train our network for 20 epochs and we’ll use a batch size of 512 because, as you’ll find out,
this model overfits very quickly (remember, large batch sizes compute more accurate gradient descents
that traverse the loss more slowly).
Hide
history <- network %>% fit(
x_train,
y_train,
epochs = 20,
batch_size = 512,
validation_split = 0.2
)
In the previous module, we had the problem of underfitting; however looking at our learning curve for
this model it’s obvious that we have an overfitting problem.
Hide
plot(history)
Hide
network <- keras_model_sequential() %>%
layer_dense(units = ____, activation = "relu", input_shape = n_features) %>%
layer_dense(units = ____, activation = "relu") %>%
layer_dense(units = 1, activation = "sigmoid")
Regardless of what you tried above, you likely had results that consistently overfit. Our quest is to see if
we can control this overfitting. Often, when we control the overfitting we improve model performance
and generalizability. To reduce overfitting we are going to look at a few common ways to regularize our
model.
When tuning the learning rate, we often try factors of 10−s10−s where s ranges between 1-6 (0.1,
0.01, …, 0.000001).
Add callback_reduce_lr_on_plateau() to automatically adjust the learning during training.
As you reduce the learning rate, reduce the batch size
o Adds stochastic nature to reduce chance of getting stuck in local minimum
o Speeds up training (small learning rate + large batch size = SLOW!)
Hide
network <- keras_model_sequential() %>%
layer_dense(units = 16, activation = "relu", input_shape = n_features) %>%
layer_dense(units = 16, activation = "relu") %>%
layer_dense(units = 1, activation = "sigmoid")
Our results show decrease in overfitting and improvement in our loss score and (possibly) accuracy.
Hide
best_epoch <- which.min(history$metrics$val_loss)
best_loss <- history$metrics$val_loss[best_epoch] %>% round(3)
best_acc <- history$metrics$val_accuracy[best_epoch] %>% round(3)
Hide
plot(history) +
scale_x_continuous(limits = c(0, length(history$metrics$val_loss)))
return(output)
}
Let’s also define a helper function that simply pulls out the minimum loss score from the above output
(this is not necessary, just informational):
Hide
get_min_loss <- function(output) {
output %>%
filter(data == "validation", metric == "loss") %>%
summarize(min_loss = min(value, na.rm = TRUE)) %>%
pull(min_loss) %>%
round(3)
}
for (i in powerto_range) {
cat("Running model with", 2^i, "neurons per hidden layer: ")
m <- dl_model(i)
results <- rbind(results, m)
loss <- get_min_loss(m)
cat(loss, "\n", append = TRUE)
}
Running model with 4 neurons per hidden layer: 0.271
Running model with 8 neurons per hidden layer: 0.271
Running model with 16 neurons per hidden layer: 0.282
Running model with 32 neurons per hidden layer: 0.301
Running model with 64 neurons per hidden layer: 0.268
Running model with 128 neurons per hidden layer: 0.293
Running model with 256 neurons per hidden layer: 0.277
The above results indicate that we may actually be improving our optimal loss score as we constrain the
size of our hidden layers. The below plot shows that we definitely reduce overfitting.
Hide
min_loss <- results %>%
filter(metric == "loss" & data == "validation") %>%
summarize(min_loss = min(value, na.rm = TRUE)) %>%
pull()
results %>%
filter(metric == "loss") %>%
ggplot(aes(epoch, value, color = data)) +
geom_line() +
geom_hline(yintercept = min_loss, lty = "dashed") +
facet_wrap(~ neurons) +
theme_bw()
# Train model
history <- network %>%
fit(
x_train,
y_train,
epochs = 25,
batch_size = 512,
validation_split = 0.2,
verbose = FALSE,
callbacks = callback_early_stopping(patience = 5)
)
return(output)
}
Now we can iterate over a range of layers and neurons in each layer to assess the impact to performance.
For time, we’ll use hidden layers with 64 nodes and just assess the impact of adding more layers:
Hide
# so that we can store results
results <- data.frame()
nlayers <- 1:6
for (i in nlayers) {
cat("Running model with", i, "hidden layer(s) and 16 neurons per layer: ")
m <- dl_model(nlayers = i, powerto = 4)
results <- rbind(results, m)
loss <- get_min_loss(m)
cat(loss, "\n", append = TRUE)
}
Running model with 1 hidden layer(s) and 16 neurons per layer: 0.27
Running model with 2 hidden layer(s) and 16 neurons per layer: 0.274
Running model with 3 hidden layer(s) and 16 neurons per layer: 0.27
Running model with 4 hidden layer(s) and 16 neurons per layer: 0.278
Running model with 5 hidden layer(s) and 16 neurons per layer: 0.279
Running model with 6 hidden layer(s) and 16 neurons per layer: 0.274
It’s uncertain how much performance in the minimum loss score we get from the above results; however,
the plot below illustrates that our 1-2 layer models have less overfitting than the deeper models.
Hide
min_loss <- results %>%
filter(metric == "loss" & data == "validation") %>%
summarize(min_loss = min(value, na.rm = TRUE)) %>%
pull()
results %>%
filter(metric == "loss") %>%
ggplot(aes(epoch, value, color = data)) +
geom_line() +
geom_hline(yintercept = min_loss, lty = "dashed") +
facet_wrap(~ nlayers, ncol = 3) +
theme_bw()
Although you can use L1, L2 or a combination, L2 is by far the most common and is known
as weight decay in the context of neural nets.
Optimal values vary but when tuning we typically start with factors of 10−s10−s where s ranges
between 1-4 (0.1, 0.01, …, 0.0001).
The larger the weight regularizer, the more epochs generally required to reach a minimum loss
Weight decay can cause a noisier learning curve so its often beneficial to increase
the patience parameter for early stopping
Hide
network <- keras_model_sequential() %>%
layer_dense(
units = 16, activation = "relu", input_shape = n_features,
kernel_regularizer = regularizer_l2(l = 0.01) # regularization parameter
) %>%
layer_dense(
units = 16, activation = "relu",
kernel_regularizer = regularizer_l2(l = 0.01) # regularization parameter
) %>%
layer_dense(units = 1, activation = "sigmoid")
Unfortunately, in this example, weight decay negatively impacts performance. The impact of weight
decay is largely problem and data specific.
Hide
best_epoch <- which.min(history$metrics$val_loss)
best_loss <- history$metrics$val_loss[best_epoch] %>% round(3)
best_acc <- history$metrics$val_accuracy[best_epoch] %>% round(3)
Hide
plot(history) +
scale_x_continuous(limits = c(0, length(history$metrics$val_loss)))
Regularizing happenstance patterns
Dropout is one of the most effective and commonly used regularization techniques for neural networks.
Dropout applied to a layer randomly drops out (sets to zero) a certain percentage of the output features of
that layer. By randomly dropping some of a layer’s outputs we minimize the chance of fitting patterns to
noise in the data, a common cause of overfitting. ℹ️
Best practice:
Dropout rates typically ranges between 0.2-0.5. Sometimes higher rates are necessary but note that
you will get a warning when supplying rate > 0.5.
The higher the dropout rate, the slower the convergence so you may need to increase the number
of epochs.
Its common to apply dropout after each hidden layer and with the same rate; however, this is not
necessary.
Hide
network <- keras_model_sequential() %>%
layer_dense(units = 16, activation = "relu", input_shape = n_features) %>%
layer_dropout(0.6) %>% # regularization parameter
layer_dense(units = 16, activation = "relu") %>%
layer_dropout(0.6) %>% # regularization parameter
layer_dense(units = 1, activation = "sigmoid")
Similar to weight regularization, the impact of dropout is largely problem and data specific. In this
example we do not see significant improvement.
Hide
best_epoch <- which.min(history$metrics$val_loss)
best_loss <- history$metrics$val_loss[best_epoch] %>% round(3)
best_acc <- history$metrics$val_accuracy[best_epoch] %>% round(3)
Hide
plot(history) +
scale_x_continuous(limits = c(0, length(history$metrics$val_loss)))
So which is best?
There is no definitive best approach for minimizing overfitting. However, typically you want to focus first
on finding the optimal learning rate and model capacity that optimizes the loss score. Then move on to
fighting overfitting with dropout or weight decay.
Unfortunately, many of these hyperparameters interact so changing one can impact the performance of
another. Performing a grid search can help you identify the optimal combination; however, as your data
gets larger or as you start using more complex models such as CNNs and LSTMs, you often constrained
by compute to adequately execute a sizable grid search. Here is a great paper on how to practically
approach hyperparameter tuning for neural networks (https://arxiv.org/abs/1803.09820).
To see the performance of a grid search on this data set and the parameters discussed here, check out this
notebook.
Key takeaways
Experiment 5:
The MNIST dataset is an acronym that stands for the Modified National Institute of Standards and
Technology dataset.
It is a dataset of 60,000 small square 28×28 pixel grayscale images of handwritten single digits between 0
and 9.
The task is to classify a given image of a handwritten digit into one of 10 classes representing integer
values from 0 to 9, inclusively.
It is a widely used and deeply understood dataset and, for the most part, is “solved.” Top-performing
models are deep learning convolutional neural networks that achieve a classification accuracy of above
99%, with an error rate between 0.4 %and 0.2% on the hold out test dataset.
The example below loads the MNIST dataset using the Keras API and creates a plot of the first nine
images in the training dataset.
4 # load dataset
10 for i in range(9):
11 # define subplot
12 plt.subplot(330 + 1 + i)
16 plt.show()
Running the example loads the MNIST train and test dataset and prints their shape.
We can see that there are 60,000 examples in the training dataset and 10,000 in the test dataset and that
images are indeed square with 28×28 pixels.
A plot of the first nine images in the dataset is also created showing the natural handwritten nature of the
images to be classified.
Although the MNIST dataset is effectively solved, it can be a useful starting point for developing and
practicing a methodology for solving image classification tasks using convolutional neural networks.
Instead of reviewing the literature on well-performing models on the dataset, we can develop a new
model from scratch.
The dataset already has a well-defined train and test dataset that we can use.
In order to estimate the performance of a model for a given training run, we can further split the training
set into a train and validation dataset. Performance on the train and validation dataset over each run can
then be plotted to provide learning curves and insight into how well a model is learning the problem.
The Keras API supports this by specifying the “validation_data” argument to the model.fit() function
when training the model, that will, in turn, return an object that describes model performance for the
chosen loss and metrics on each training epoch.
1 # record model performance on a validation dataset during training
In order to estimate the performance of a model on the problem in general, we can use k-fold cross-
validation, perhaps five-fold cross-validation. This will give some account of the models variance with
both respect to differences in the training and test datasets, and in terms of the stochastic nature of the
learning algorithm. The performance of a model can be taken as the mean performance across k-folds,
given the standard deviation, that could be used to estimate a confidence interval if desired.
We can use the KFold class from the scikit-learn API to implement the k-fold cross-validation evaluation
of a given neural network model. There are many ways to achieve this, although we can choose a flexible
approach where the KFold class is only used to specify the row indexes used for each spit.
1 # example of k-fold cv for a neural net
2 data = ...
5 # enumerate splits
7 model = ...
8 ...
We will hold back the actual test dataset and use it as an evaluation of our final model.
This is critical as it both involves developing the infrastructure for the test harness so that any model we
design can be evaluated on the dataset, and it establishes a baseline in model performance on the problem,
by which all improvements can be compared.
The design of the test harness is modular, and we can develop a separate function for each piece. This
allows a given aspect of the test harness to be modified or inter-changed, if we desire, separately from the
rest.
We can develop this test harness with five key elements. They are the loading of the dataset, the
preparation of the dataset, the definition of the model, the evaluation of the model, and the presentation of
results.
Load Dataset
For example, we know that the images are all pre-aligned (e.g. each image only contains a hand-drawn
digit), that the images all have the same square size of 28×28 pixels, and that the images are grayscale.
Therefore, we can load the images and reshape the data arrays to have a single color channel.
1 # load dataset
We also know that there are 10 classes and that classes are represented as unique integers.
We can, therefore, use a one hot encoding for the class element of each sample, transforming the integer
into a 10 element binary vector with a 1 for the index of the class value, and 0 values for all other classes.
We can achieve this with the to_categorical() utility function.
1 # one hot encode target values
2 trainY = to_categorical(trainY)
3 testY = to_categorical(testY)
The load_dataset() function implements these behaviors and can be used to load the dataset.
1 # load train and test dataset
2 def load_dataset():
3 # load dataset
9 trainY = to_categorical(trainY)
10 testY = to_categorical(testY)
We do not know the best way to scale the pixel values for modeling, but we know that some scaling will
be required.
A good starting point is to normalize the pixel values of grayscale images, e.g. rescale them to the range
[0,1]. This involves first converting the data type from unsigned integers to floats, then dividing the pixel
values by the maximum value.
1 # convert from integers to floats
2 train_norm = train.astype('float32')
3 test_norm = test.astype('float32')
The prep_pixels() function below implements these behaviors and is provided with the pixel values for
both the train and test datasets that will need to be scaled.
1 # scale pixels
4 train_norm = train.astype('float32')
5 test_norm = test.astype('float32')
This function must be called to prepare the pixel values prior to any modeling.
Define Model
Next, we need to define a baseline convolutional neural network model for the problem.
The model has two main aspects: the feature extraction front end comprised of convolutional and pooling
layers, and the classifier backend that will make a prediction.
For the convolutional front-end, we can start with a single convolutional layer with a small filter size (3,3)
and a modest number of filters (32) followed by a max pooling layer. The filter maps can then be
flattened to provide features to the classifier.
Given that the problem is a multi-class classification task, we know that we will require an output layer
with 10 nodes in order to predict the probability distribution of an image belonging to each of the 10
classes. This will also require the use of a softmax activation function. Between the feature extractor and
the output layer, we can add a dense layer to interpret the features, in this case with 100 nodes.
All layers will use the ReLU activation function and the He weight initialization scheme, both best
practices.
We will use a conservative configuration for the stochastic gradient descent optimizer with a learning
rate of 0.01 and a momentum of 0.9. The categorical cross-entropy loss function will be optimized,
suitable for multi-class classification, and we will monitor the classification accuracy metric, which is
appropriate given we have the same number of examples in each of the 10 classes.
The define_model() function below will define and return this model.
# define cnn model
1
def define_model():
2
model = Sequential()
3
model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform',
4
input_shape=(28, 28, 1)))
5
model.add(MaxPooling2D((2, 2)))
6
model.add(Flatten())
7
model.add(Dense(100, activation='relu', kernel_initializer='he_uniform'))
8
model.add(Dense(10, activation='softmax'))
9
# compile model
10
opt = SGD(learning_rate=0.01, momentum=0.9)
11
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
12
return model
Evaluate Model
The model will be evaluated using five-fold cross-validation. The value of k=5 was chosen to provide a
baseline for both repeated evaluation and to not be so large as to require a long running time. Each test set
will be 20% of the training dataset, or about 12,000 examples, close to the size of the actual test set for
this problem.
The training dataset is shuffled prior to being split, and the sample shuffling is performed each time, so
that any model we evaluate will have the same train and test datasets in each fold, providing an apples-to-
apples comparison between models.
We will train the baseline model for a modest 10 training epochs with a default batch size of 32 examples.
The test set for each fold will be used to evaluate the model both during each epoch of the training run, so
that we can later create learning curves, and at the end of the run, so that we can estimate the performance
of the model. As such, we will keep track of the resulting history from each run, as well as the
classification accuracy of the fold.
The evaluate_model() function below implements these behaviors, taking the training dataset as
arguments and returning a list of accuracy scores and training histories that can be later summarized.
1 # evaluate a model using k-fold cross-validation
6 # enumerate splits
8 # define model
9 model = define_model()
15 # evaluate model
16 _, acc = model.evaluate(testX, testY, verbose=0)
17 print('> %.3f' % (acc * 100.0))
18 # stores scores
19 scores.append(acc)
20 histories.append(history)
return scores, histories
Present Results
Once the model has been evaluated, we can present the results.
There are two key aspects to present: the diagnostics of the learning behavior of the model during training
and the estimation of the model performance. These can be implemented using separate functions.
First, the diagnostics involve creating a line plot showing model performance on the train and test set
during each fold of the k-fold cross-validation. These plots are valuable for getting an idea of whether a
model is overfitting, underfitting, or has a good fit for the dataset.
We will create a single figure with two subplots, one for loss and one for accuracy. Blue lines will
indicate model performance on the training dataset and orange lines will indicate performance on the hold
out test dataset. The summarize_diagnostics() function below creates and shows this plot given the
collected training histories.
1 # plot diagnostic learning curves
2 def summarize_diagnostics(histories):
3 for i in range(len(histories)):
4 # plot loss
5 plt.subplot(2, 1, 1)
9 # plot accuracy
10 plt.subplot(2, 1, 2)
11 plt.title('Classification Accuracy')
14 plt.show()
Next, the classification accuracy scores collected during each fold can be summarized by calculating the
mean and standard deviation. This provides an estimate of the average expected performance of the
model trained on this dataset, with an estimate of the average variance in the mean. We will also
summarize the distribution of scores by creating and showing a box and whisker plot.
The summarize_performance() function below implements this for a given list of scores collected during
model evaluation.
# summarize model performance
1
def summarize_performance(scores):
2
# print summary
3
print('Accuracy: mean=%.3f std=%.3f, n=%d' % (mean(scores)*100, std(scores)*100,
4
len(scores)))
5
# box and whisker plots of results
6
plt.boxplot(scores)
7
plt.show()
Complete Example
2 def run_test_harness():
3 # load dataset
7 # evaluate model
9 # learning curves
10 summarize_diagnostics(histories)
We now have everything we need; the complete code example for a baseline convolutional neural
network model on the MNIST dataset is listed below.
14
16 def load_dataset():
17 # load dataset
23 trainY = to_categorical(trainY)
24 testY = to_categorical(testY)
26
27 # scale pixels
30 train_norm = train.astype('float32')
31 test_norm = test.astype('float32')
37
39 def define_model():
40 model = Sequential()
56 # enumerate splits
58 # define model
59 model = define_model()
65 # evaluate model
66 _, acc = model.evaluate(testX, testY, verbose=0)
67 print('> %.3f' % (acc * 100.0))
68 # stores scores
69 scores.append(acc)
70 histories.append(history)
71 return scores, histories
72
74 def summarize_diagnostics(histories):
75 for i in range(len(histories)):
76 # plot loss
77 plt.subplot(2, 1, 1)
81 # plot accuracy
82 plt.subplot(2, 1, 2)
83 plt.title('Classification Accuracy')
86 plt.show()
87
89 def summarize_performance(scores):
90 # print summary
96
# run the test harness for evaluating a model
97
def run_test_harness():
98
# load dataset
99
trainX, trainY, testX, testY = load_dataset()
100
# prepare pixel data
101
trainX, testX = prep_pixels(trainX, testX)
# evaluate model
102
scores, histories = evaluate_model(trainX, trainY)
103
# learning curves
104
summarize_diagnostics(histories)
105
# summarize estimated performance
106
summarize_performance(scores)
107
108
# entry point, run the test harness
109
run_test_harness()
Running the example prints the classification accuracy for each fold of the cross-validation process. This
is helpful to get an idea that the model evaluation is progressing.
Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or
differences in numerical precision. Consider running the example a few times and compare the average
outcome.
We can see two cases where the model achieves perfect skill and one case where it achieved lower than
98% accuracy. These are good results.
1 > 98.550
2 > 98.600
3 > 98.642
4 > 98.850
5 > 98.742
Next, a diagnostic plot is shown, giving insight into the learning behavior of the model across each fold.
In this case, we can see that the model generally achieves a good fit, with train and test learning curves
converging. There is no obvious sign of over- or underfitting.
Loss and Accuracy Learning Curves for the Baseline Model During k-Fold Cross-Validation
We can see in this case, the model has an estimated skill of about 98.6%, which is reasonable.
Finally, a box and whisker plot is created to summarize the distribution of accuracy scores.
Box and Whisker Plot of Accuracy Scores for the Baseline Model Evaluated Using k-Fold Cross-
Validation
There are many ways that we might explore improvements to the baseline model.
We will look at areas of model configuration that often result in an improvement, so-called low-hanging
fruit. The first is a change to the learning algorithm, and the second is an increase in the depth of the
model.
Improvement to Learning
There are many aspects of the learning algorithm that can be explored for improvement.
Perhaps the point of biggest leverage is the learning rate, such as evaluating the impact that smaller or
larger values of the learning rate may have, as well as schedules that change the learning rate during
training.
Another approach that can rapidly accelerate the learning of a model and can result in large performance
improvements is batch normalization. We will evaluate the effect that batch normalization has on our
baseline model.
Batch normalization can be used after convolutional and fully connected layers. It has the effect of
changing the distribution of the output of the layer, specifically by standardizing the outputs. This has the
effect of stabilizing and accelerating the learning process.
We can update the model definition to use batch normalization after the activation function for the
convolutional and dense layers of our baseline model. The updated version of define_model() function
with batch normalization is listed below.
1 # define cnn model
2 def define_model():
3 model = Sequential()
15
18 # load dataset
24 trainY = to_categorical(trainY)
25 testY = to_categorical(testY)
27
28 # scale pixels
31 train_norm = train.astype('float32')
32 test_norm = test.astype('float32')
38
40 def define_model():
41 model = Sequential()
44 model.add(MaxPooling2D((2, 2)))
45 model.add(Flatten())
47 model.add(BatchNormalization())
48 model.add(Dense(10, activation='softmax'))
49 # compile model
52 return model
53
59 # enumerate splits
61 # define model
62 model = define_model()
68 # evaluate model
69 _, acc = model.evaluate(testX, testY, verbose=0)
71 # stores scores
72 scores.append(acc)
73 histories.append(history)
75
77 def summarize_diagnostics(histories):
78 for i in range(len(histories)):
79 # plot loss
80 plt.subplot(2, 1, 1)
84 # plot accuracy
85 plt.subplot(2, 1, 2)
86 plt.title('Classification Accuracy')
89 plt.show()
90
92 def summarize_performance(scores):
93 # print summary
98
# run the test harness for evaluating a model
99
def run_test_harness():
100
# load dataset
101
trainX, trainY, testX, testY = load_dataset()
102
# prepare pixel data
103
trainX, testX = prep_pixels(trainX, testX)
104
# evaluate model
105
scores, histories = evaluate_model(trainX, trainY)
106
# learning curves
107
summarize_diagnostics(histories)
108
# summarize estimated performance
109
summarize_performance(scores)
110
111
# entry point, run the test harness
112
run_test_harness()
Running the example again reports model performance for each fold of the cross-validation process.
Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or
differences in numerical precision. Consider running the example a few times and compare the average
outcome.
We can see perhaps a small drop in model performance as compared to the baseline across the cross-
validation folds.
1 > 98.475
2 > 98.608
3 > 98.683
4 > 98.783
5 > 98.667
A plot of the learning curves is created, in this case showing that the speed of learning (improvement over
epochs) does not appear to be different from the baseline model.
The plots suggest that batch normalization, at least as implemented in this case, does not offer any
benefit.
Loss and Accuracy Learning Curves for the BatchNormalization Model During k-Fold Cross-Validation
Next, the estimated performance of the model is presented, showing performance with a slight decrease in
the mean accuracy of the model: 98.643 as compared to 98.677 with the baseline model.
There are many ways to change the model configuration in order to explore improvements over the
baseline model.
Two common approaches involve changing the capacity of the feature extraction part of the model or
changing the capacity or function of the classifier part of the model. Perhaps the point of biggest
influence is a change to the feature extractor.
We can increase the depth of the feature extractor part of the model, following a VGG-like pattern of
adding more convolutional and pooling layers with the same sized filter, while increasing the number of
filters. In this case, we will add a double convolutional layer with 64 filters each, followed by another
max pooling layer.
The updated version of the define_model() function with this change is listed below.
# define cnn model
1
def define_model():
2
model = Sequential()
3
model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform',
4
input_shape=(28, 28, 1)))
5
model.add(MaxPooling2D((2, 2)))
6
model.add(Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_uniform'))
7
model.add(Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_uniform'))
8
model.add(MaxPooling2D((2, 2)))
9
model.add(Flatten())
10
model.add(Dense(100, activation='relu', kernel_initializer='he_uniform'))
11
model.add(Dense(10, activation='softmax'))
12
# compile model
13
opt = SGD(learning_rate=0.01, momentum=0.9)
14
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
15
return model
For completeness, the entire code listing, including this change, is provided below.
1 # deeper cnn model for mnist
14
16 def load_dataset():
17 # load dataset
23 trainY = to_categorical(trainY)
24 testY = to_categorical(testY)
26
27 # scale pixels
30 train_norm = train.astype('float32')
31 test_norm = test.astype('float32')
37
39 def define_model():
40 model = Sequential()
59 # enumerate splits
61 # define model
62 model = define_model()
68 # evaluate model
69 _, acc = model.evaluate(testX, testY, verbose=0)
70 print('> %.3f' % (acc * 100.0))
71 # stores scores
72 scores.append(acc)
73 histories.append(history)
74 return scores, histories
75
77 def summarize_diagnostics(histories):
78 for i in range(len(histories)):
79 # plot loss
80 plt.subplot(2, 1, 1)
84 # plot accuracy
85 plt.subplot(2, 1, 2)
86 plt.title('Classification Accuracy')
89 plt.show()
90
92 def summarize_performance(scores):
93 # print summary
99
# run the test harness for evaluating a model
100
def run_test_harness():
101
# load dataset
102
trainX, trainY, testX, testY = load_dataset()
103
# prepare pixel data
104
trainX, testX = prep_pixels(trainX, testX)
# evaluate model
105
scores, histories = evaluate_model(trainX, trainY)
106
# learning curves
107
summarize_diagnostics(histories)
108
# summarize estimated performance
109
summarize_performance(scores)
110
111
# entry point, run the test harness
112
run_test_harness()
Running the example reports model performance for each fold of the cross-validation process.
Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or
differences in numerical precision. Consider running the example a few times and compare the average
outcome.
The per-fold scores may suggest some improvement over the baseline.
1 > 99.058
2 > 99.042
3 > 98.883
4 > 99.192
5 > 99.133
A plot of the learning curves is created, in this case showing that the models still have a good fit on the
problem, with no clear signs of overfitting. The plots may even suggest that further training epochs could
be helpful.
Loss and Accuracy Learning Curves for the Deeper Model During k-Fold Cross-Validation
Next, the estimated performance of the model is presented, showing a small improvement in performance
as compared to the baseline from 98.677 to 99.062, with a small drop in the standard deviation as well.
The process of model improvement may continue for as long as we have ideas and the time and resources
to test them out.
At some point, a final model configuration must be chosen and adopted. In this case, we will choose the
deeper model as our final model.
First, we will finalize our model, but fitting a model on the entire training dataset and saving the model to
file for later use. We will then load the model and evaluate its performance on the hold out test dataset to
get an idea of how well the chosen model actually performs in practice. Finally, we will use the saved
model to make a prediction on a single image.
A final model is typically fit on all available data, such as the combination of all train and test dataset.
In this tutorial, we are intentionally holding back a test dataset so that we can estimate the performance of
the final model, which can be a good idea in practice. As such, we will fit our model on the training
dataset only.
1 # fit model
Once fit, we can save the final model to an H5 file by calling the save() function on the model and pass in
the chosen filename.
1 # save model
2 model.save('final_model.h5')
Note, saving and loading a Keras model requires that the h5py library is installed on your workstation.
The complete example of fitting the final deep model on the training dataset and saving it to file is listed
below.
10
12 def load_dataset():
13 # load dataset
19 trainY = to_categorical(trainY)
20 testY = to_categorical(testY)
22
23 # scale pixels
26 train_norm = train.astype('float32')
27 test_norm = test.astype('float32')
33
35 def define_model():
36 model = Sequential()
50
# run the test harness for evaluating a model
51
def run_test_harness():
52
# load dataset
53
trainX, trainY, testX, testY = load_dataset()
54
# prepare pixel data
55
trainX, testX = prep_pixels(trainX, testX)
56 # define model
57 model = define_model()
58 # fit model
60 # save model
61 model.save('final_model.h5')
62
64 run_test_harness()
After running this example, you will now have a 1.2-megabyte file with the name ‘final_model.h5‘ in
your current working directory.
Evaluate Final Model
We can now load the final model and evaluate it on the hold out test dataset.
This is something we might do if we were interested in presenting the performance of the chosen model
to project stakeholders.
7 def load_dataset():
8 # load dataset
14 trainY = to_categorical(trainY)
15 testY = to_categorical(testY)
17
18 # scale pixels
21 train_norm = train.astype('float32')
22 test_norm = test.astype('float32')
28
30 def run_test_harness():
31 # load dataset
35 # load model
36 model = load_model('final_model.h5')
40
42 run_test_harness()
Running the example loads the saved model and evaluates the model on the hold out test dataset.
Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or
differences in numerical precision. Consider running the example a few times and compare the average
outcome.
The classification accuracy for the model on the test dataset is calculated and printed. In this case, we can
see that the model achieved an accuracy of 99.090%, or just less than 1%, which is not bad at all and
reasonably close to the estimated 99.753% with a standard deviation of about half a percent (e.g. 99% of
scores).
1 > 99.090
Make Prediction
The model assumes that new images are grayscale, that they have been aligned so that one image contains
one centered handwritten digit, and that the size of the image is square with the size 28×28 pixels.
Below is an image extracted from the MNIST test dataset. You can save it in your current working
directory with the filename ‘sample_image.png‘.
Sample Handwritten Digit
Download the sample image (sample_image.png)
We will pretend this is an entirely new and unseen image, prepared in the required way, and see how we
might use our saved model to predict the integer that the image represents (e.g. we expect “7“).
First, we can load the image, force it to be in grayscale format, and force the size to be 28×28 pixels. The
loaded image can then be resized to have a single channel and represent a single sample in a dataset.
The load_image() function implements this and will return the loaded image ready for classification.
Importantly, the pixel values are prepared in the same way as the pixel values were prepared for the
training dataset when fitting the final model, in this case, normalized.
2 def load_image(filename):
5 # convert to array
6 img = img_to_array(img)
10 img = img.astype('float32')
11 img = img / 255.0
12 return img
Next, we can load the model as in the previous section and call the predict() function to get the predicted
score, and then use argmax() to obtain the digit that the image represents.
1 # predict the class
2 predict_value = model.predict(img)
3 digit = argmax(predict_value)
8 def load_image(filename):
11 # convert to array
12 img = img_to_array(img)
16 img = img.astype('float32')
18 return img
19
21 def run_example():
23 img = load_image('sample_image.png')
24 # load model
25 model = load_model('final_model.h5')
27 predict_value = model.predict(img)
28 digit = argmax(predict_value)
29 print(digit)
30
32 run_example()
Running the example first loads and prepares the image, loads the model, and then correctly predicts that
the loaded image represents the digit ‘7‘.
17
Extensions
This section lists some ideas for extending the tutorial that you may wish to explore.
Tune Pixel Scaling. Explore how alternate pixel scaling methods impact model performance as
compared to the baseline model, including centering and standardization.
Tune the Learning Rate. Explore how different learning rates impact the model performance as
compared to the baseline model, such as 0.001 and 0.0001.
Tune Model Depth. Explore how adding more layers to the model impact the model performance as
compared to the baseline model, such as another block of convolutional and pooling layers or another
dense layer in the classifier part of the model.
If you explore any of these extensions, I’d love to know.
Post your findings in the comments below.
Classification of Handwritten Digits Using CNN
6 min read
Introduction
In this blog, we will understand how to create and train a simple Convolutional Neural
Pre-requisite
Although each step will be thoroughly explained in this tutorial, it will certainly benefit someone
who already has some theoretical knowledge of the working of CNN. Also, some knowledge
For those of you new to this concept, CNN is a deep learning technique to classify the input
automatically (well, after you provide the right data). Over the years, CNN has found a good grip
over classifying images for computer visions and now it is being used in healthcare domains too.
This indicates that CNN is a reliable deep learning algorithm for an automated end-to-end
prediction. CNN essentially extracts ‘useful’ features from the given input automatically making it
A CNN model consists of three primary layers: Convolutional Layer, Pooling layer(s), and fully
connected layer.
(1) Convolutional Layer: This layer extracts high-level input features from input data and passes
(2) Pooling Layer: It is used to reduce the dimensions of data by applying pooling on the feature
map to generate new feature maps with reduced dimensions. PL takes either maximum or average in
(3) Fully-Connected Layer: Finally, the task of classification is done by the FC layer. Probability
scores are calculated for each class label by a popular activation function called the softmax
function.
For more details, I highly recommend you check this awesome tutorial on Analytics Vidhya.
Dataset
The dataset that is being used here is the MNIST digits classification dataset . Keras is a deep
learning API written in Python and MNIST is a dataset provided by this API. This dataset consists of
60,000 training images and 10,000 testing images. It is a decent dataset for individuals who need to
When the Keras API is called, there are four values returned namely- x_train, y_train, x_test, and
The language used here is python. I am going to use google colab for writing and executing the
python code. You may choose a jupyter notebook as well. I choose google colab because it provides
easy access to notebooks anytime and anywhere. It is also possible to connect a colab notebook to a
GitHub repository.
Also, the code used in this tutorial is available on this Github repository. So if you find yourself
stuck someplace, do check that repository. To keep this tutorial relevant for all, we will understand
2. After loading the necessary libraries, load the MNIST dataset as shown below:
(X_train, y_train) , (X_test, y_test) = keras.datasets.mnist.load_data()
As we discussed previously, this dataset returns four values and in the same order as mentioned
above. Also, x_train, y_train, x_test, and y_test are representations for training and test datasets. To
get how a dataset is divided into training and test, check out the picture below which I used during a
Voilà! You just loaded your dataset and are ready to move to the next step which is to process the
data
a dataset that does not contain any null values, has all numeric data, and is scaled. So, here we will
perform some steps to ensure that our dataset is perfectly suitable for a CNN model to learn
from. From here onwards till we create CNN model, we will work only on the training dataset.
If you write X_train[0] then you get the 0th image with values between 0-255 (0 means black and
255 means white). The output is a 2-dimensional matrix (Of course, we will not know what
handwritten digit X_train[0] represents. To know this write y_train[0] and you will get 5 as output.
This means that the 0th image of this training dataset represents the number 5.
So, let’s scale this training and test datasets as shown below:
After scaling, we should convert the 2-d matrix to a 1-d array by using this:
Now that the dataset is looking good, it is high time that we create a Convolutional Neural Network.
Let’s create a CNN model using the TensorFlow library. The model is created as follows:
convolutional_neural_network = models.Sequential([
layers.Conv2D(filters=25, kernel_size=(3, 3), activation='relu', input_shape=(28,28,1)),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(filters=64, kernel_size=(3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(filters=64, kernel_size=(3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)),
layers.Flatten(),
layers.Dense(64, activation='relu'),
layers.Dense(10, activation='softmax')
])
Take some time to let this entire code sink in. It is important that you understand every bit of it. In
the CNN model created above, there is an input layer followed by two hidden layers and finally an
output layer. In the most simpler terms, activation functions are responsible for making decisions of
whether or not to move forward. In a deep neural network like CNN, there are many neurons, and
based on activation functions, neurons fire up and the network moves forward. If you do not
understand much about activation functions use ‘relu’ as it is used most popularly.
Once the model has been created, it is time to compile it and fit the model. During the process of
fitting, the model will go through the dataset and understand the relations. It will learn throughout
the process as many times as has been defined. In our example, we have defined 10 epochs. During
the process, the CNN model will learn and also make mistakes. For every mistake (i.e., wrong
predictions) the model makes, there is a penalty and that is represented in the loss value for each
epoch (see GIF below). In short, the model should generate as little loss and as high accuracy as
Making Predictions
convolutional_neural_network.evaluate(X_test, y_test)
It is time to use our test dataset to see how well the CNN model will perform.
y_predicted_by_model = convolutional_neural_network.predict(X_test)
The above code will use the convolutional_neural_network model to make predictions for the test
dataset and store it in the y_predicted_by_model dataframe. For each of the 10 possible digits, a
probability score will be calculated. The class with the highest probability score is the prediction
made by the model. For example, if you want to see what is the digit in the first row of the test set:
y_predicted_by_model[0]
Since it is really difficult to identify the output class label with the highest probability score, let’s
np.argmax(y_predicted[0])
And with this, you will get one of the ten digits as output (0 to 9).
Conclusion
In this blog, we begin by discussing the Convolutional Neural Network and its importance. The
tutorial also covered how a dataset is divided into training and test dataset. As an example, a popular
dataset called MNIST was taken to make predictions of handwritten digits from 0 to 9. The dataset
was cleaned, scaled, and shaped. Using TensorFlow, a CNN model was created and was eventually
trained on the training dataset. Finally, predictions were made using the trained model.
Experiment 6:
operations like conv2d to convolve learned filters (kernels) with input images. These filters assign
weights and biases to different aspects of the image, aiding in feature extraction. During training,
batches of labeled images are fed into the network. We compare predictions to ground truth labels
using algorithms like argmax to determine the class with the highest probability. We apply batch
normalization to enhance learning by normalizing the input across batches. The network parameters
are adjusted iteratively to minimize the distance between predictions and labels. This process repeats
This tutorial aims to create a system capable of recognizing cat and dog images. It analyzes input
images of cats and images of dogs to make predictions. The implemented model is adaptable for
websites or mobile devices. The Dogs vs Cats dataset, available on Kaggle, comprises images for the
model to learn distinctive features. After training, the classification model distinguishes between cat
TensorFlow Keras layers – Every NN needs layers and CNN needs well a couple of layers.
import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
from os import listdir
from sklearn import metrics
from keras.models import Sequential
from keras.layers import Convolution2D
from keras.layers import MaxPooling2D
from keras.layers import Dense
from keras.layers import Flatten
CNN does the processing of Images with the help of matrixes of weights known as filters. They
detect low-level features like vertical and horizontal edges etc. Through each layer, the filters
Adaptive Moment Estimation (Adam) is a method used for computing individual learning rates for
each parameter. For loss function, we are using Binary cross-entropy to compare the class output to
each of the predicted probabilities. Then it calculates the penalization score based on the total
resulting in multiple transformed copies of the same image. The images are different from each other
in certain aspects because of shifting, rotating, flipping techniques. So, we are using the Keras
We need a way to turn our images into batches of data arrays in memory so that they can be fed to
the network during training. ImageDataGenerator can readily be used for this purpose. So, we import
this class and create an instance of the generator. We are using Keras to retrieve images from the
Also Read: 25 Open Datasets for Deep Learning Every Data Scientist Must Work With!
Convolution
Convolution involves linearly multiplying weights with the input. This multiplication occurs
between an array of input data and a 2D array of weights called a filter or kernel. The filter is
consistently smaller than the input data, and the dot product takes place between the input and filter
array.
Activation
We add the activation function to assist the Artificial Neural Network (ANN) in learning complex
patterns within the data. The primary purpose of the activation function is to introduce non-linearity
The pooling operation provides spatial variance making the system capable of recognizing an object
with some varied appearance. It involves adding a 2Dfilter over each channel of the feature map and
So, pooling basically helps reduce the number of parameters and computations present in the
network. It progressively reduces the spatial size of the network and thus controls overfitting. There
are two types of operations in this layer; Average pooling and Maximum pooling. Here, we are using
max-pooling which according to its name will only take out the maximum from a pool. With the
help of filters sliding through the input and at each stride, the maximum parameter is taken out, and
The pooling layer does not modify the depth of the network unlike in the convolution layer.
Fully Connected
The fully connected layer receives the flattened output from the final pooling layer.
The neurons present in the fully connected layer detect a certain feature and preserves its value then
communicates the value to both the dog and cat classes who then check out the feature and decide if
We are fitting our model to the training set. It will take some time for this to finish.
classifier.fit_generator(training_set,samples_per_epoch=8000,nb_epoch=25,validation_data=test_set
,nb_val_samples=2000)
It is seen that we have 0.8115 accuracies on our training set.
We can predict new images with our model by predict_image function where we have to provide a
path of new image as image path and using predict method. If the probability is more than 0.5 then
Features Provided
We can test our own images and verify the accuracy of the model.
We can integrate the code directly into our other project and extend it into a website or
We can extend the project to different entities by just finding the suitable dataset, change the
Conclusion
In this exhilarating journey through the realm of image classification, we delved into the marvels of
Convolutional Neural Networks (CNN). From discerning between cats and dogs to installing
essential Python packages, we’ve left no stone unturned. This beginner-friendly project provides
invaluable insights and sets the stage for exploring diverse applications. With a solid understanding
of CNN fundamentals, you’re now ready to embark on your own image classification escapades!
Don’t forget to leverage techniques like softmax activation and model.predict to further enhance
your models and you can overlook key metrics like validation loss (val_loss) to assess model
performance accurately.
Key Takeaways
CNNs are essential deep learning models for image classification, capable of automatically
Preprocessing and augmenting image data are crucial steps in CNN training, enhancing
Practical applications of CNNs extend beyond cat and dog classification, encompassing
various domains like medical imaging, object detection, and natural language processing.
A. Adam is popular in deep learning due to its adaptive learning rate and momentum features,
A. Cat and Dog Classification using CNN involves training a convolutional neural network on
labeled cat and dog image data to differentiate between the two classes.
Q4. Do you have any tutorial that I can follow step by step to generate the Class activation
map?
A. Generating Class Activation Maps involves visualizing which parts of an image are important for
classification, often done by appending a global average pooling layer and visualizing activations.
Q5. How would I predict the images in the test1 data set?
A. To predict images in the test1 dataset, use a trained model on test data, typically resizing images
to match training image size, then generating predictions, often with libraries like PyTorch. Detailed
https://www.studocu.com/in/document/pragati-engineering-college/deep-learning/dl-lab-r20-manual-
123/108610698