CNN with TensorFlow and Keras
CNN with TensorFlow and Keras
We’ll first look at the concept of a classifier, CNNs themselves and their
components. We then continue with a real Keras / Python implementation for
classifying numbers using the MNIST dataset.
Here we will:
What is a classifier?
Suppose that you work in the field of separating non-ripe tomatoes from the ripe
ones. It’s an important job, one can argue, because we don’t want to sell customers
tomatoes they can’t process into dinner. It’s the perfect job to illustrate what a human
classifier would do.
Humans have a perfect eye to spot tomatoes that are not ripe or that have any other
defect, such as being rotten. They derive certain characteristics for those tomatoes,
e.g. based on color, smell and shape:
If none of those occur, it’s likely that the tomato can be sold.
We now have two classes: sellable tomatoes and non-sellable tomatoes.
Human classifiers decide about which class an object (a tomato) belongs to.
The same principle occurs again in machine learning and deep learning.
Only then, we replace the human with a machine learning model. We’re then using
machine learning for classification, or for deciding about some “model input” to
“which class” it belongs.
MachineCurve
Especially when we deal with image-like inputs, Convolutional Neural Networks
can be very good at doing that.
Suppose that you have an image. In the case of the humans classifying tomatoes
above this would be the continuous stream of image-like data that is processed by
our brain and is perceived with our eyes. In the case of artificial classification with
machine learning models, it would likely be input generated from a camera such as
a webcam.
You wish to detect certain characteristics from the object in order to classify them.
This means that you’ll have to make a summary of those characteristics that gets
more abstract over time. For example, with the tomatoes above, humans translate
their continuous stream of observation into a fixed set of intuitive rules about when
to classify a tomato as non-sellable; i.e., the three rules specified above.
Please note, we left out the often-common layers like max pooling and batch
normalization from the description above.
MachineCurve
class. Just like humans do. We’ll show it next with a simple and clear example using
the MNIST dataset in Keras.
Dataset:
For the model that we’ll create, we’re going to use the MNIST dataset. The dataset,
or the Modified National Institute of Standards and Technology database, contains
many thousands of 28×28 pixel images of handwritten numbers, like this:
It’s a very fine dataset for practicing with CNNs in Keras, since the dataset is already
pretty normalized, there is not much noise and the numbers discriminate themselves
relatively easily. Additionally, much data is available.
We always start with listing certain dependencies that you’ll need to install before
you can run the model on your machine.
Python, Tensorflow, Numpy, Matplotlib.
MachineCurve
Model dependencies
Here, we’ll first import the dependencies that we require later on:
import tensorflow
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Flatten
from tensorflow.keras.layers import Conv2D, MaxPooling2D
Obviously, we need Keras since it’s the framework we’re working with. We import
the mnist dataset and benefit from the fact that it comes with Keras by default.
With respect to the layers, we will primarily use the Conv2D and Dense layers – we
would say that these constitute the core of your deep learning model. The Conv2D
layers will provide these magnifier operations, at two dimensions (like on the image
above). That means: it slides with a small 2D box over a larger 2D box, being the
image. It goes without saying that one can also apply 3D convolutional layers (for
analyzing videos, with boxes sliding over a larger box) and 1D convolutional layers
(for analyzing e.g. timeseries, with ‘pixels’ / points on a line sliding over the line).
We use the Dense layers later on for generating predictions (classifications) as it’s
the structure used for that.
However, we’ll also use Dropout, Flatten and MaxPooling2D. A max pooling
layer is often added after a Conv2D layer and it also provides a magnifier operation,
although a different one. In the 2D case, it also slides with a box over the image (or
in that case, the ‘convolutional maps’ generated by the first convolutional layer, i.e.
the summarized image) and for every slide picks the maximum value for further
propagation. In short, it generates an even stronger summary and can be used to
induce sparsity when data is large.
Flatten connects the convolutional parts of the layer with the Dense parts. Those
latter ones can only handle flat data, e.g. onedimensional data, but convolutional
outputs are anything but onedimensional. Flatten simply takes all dimensions and
concatenates them after each other.
With Dropout, we’re essentially breaking tiny bits of the magnifier directly in front
of it. This way, a little bit of noise is introduced into the summary during training.
4
MachineCurve
Since we’re breaking the magnifiers randomly, the noise is somewhat random as
well and hence cannot be predicted in advance. Perhaps counterintuitively, it tends
to improve model performance and reduce overfitting: the variance between training
images increases without becoming too large. This way, a ‘weird’ slice of e.g. a
tomato can perhaps still be classified correctly.
Model configuration
Since the MNIST images are 28×28 pixels, we define img_width and img_height to
be 28. We use a batch size of 250 samples, which means that 250 samples are fed
forward every time before a model improvement is calculated. We’ll do 25 epochs,
or passing all data 25 times (in batches of 250 samples, many batches per epoch),
and have 10 classes: the numbers 0-9. We also use 20% of the training data, or 0.2,
for validation during optimization. Finally, we wish to see as much output as
possible, thus configure the training process to be verbose.
MachineCurve
input_test = input_test.astype('float32')
We first reshape our input data (the feature vectors). As you can see with the
input_shape, it’s the way your data must be built up to be handled correctly by
the framework.
We then parse the numbers as floats, especially 32-bit floats. This optimizes the
trade-off between memory and number precision over e.g. integers and 64-bit floats.
We next convert our target vectors, which are integers (0-9) into categorical data:
# Convert target vectors to categorical targets
target_train =
tensorflow.keras.utils.to_categorical(target_train, no_classes)
target_test = tensorflow.keras.utils.to_categorical(target_test,
no_classes)
MachineCurve
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dense(no_classes, activation='softmax'))
We first define the model itself to be using the Sequential API, or, a stack of
layers that together compose the Convolutional Neural Network.
Finally, before repeating the convolutional layers, we add Dropout. Dropout, as said,
essentially breaks the magnifiers. Hence, a little bit of random noise is introduced
during training. This greatly reduces the odds of overfitting. It does so by converting
certain inputs to 0, and does so randomly. The parameter 0.25 is the dropout rate,
or the number of input neurons to drop (in this case, 25% of the inputs is converted
to 0).
It’s then likely that the summary is general enough to compare new images and
assign them one of the classes 0-9. We must however convert the many filters learnt
and processed to a flat structure before it can be processed by the part that can
actually generate the predictions. Hence, we use the Flatten layer. Subsequently, we
let the data pass through two Dense layers, of which the first is ReLU-activated and
7
MachineCurve
the second one is Softmax-activated. Softmax activation essentially generates a
multiclass probability distribution, or computes the probability that the item belongs
to one of the classes 0-9, summed to 1 (the maximum probability). This is also why
we must have categorical data: it’s going to be difficult to add an 11th class on the
fly.
Note that the number of output neurons is num_classes for the final layer for the
same reason: since num_classes probabilities must be computed, we must have
num_classes different outputs so that for every class a unique output exists.
We then compile the model and start the training by fitting the data:
# Compile the model
model.compile(loss=tensorflow.keras.losses.categorical_crossentr
opy,
optimizer=tensorflow.keras.optimizers.Adam(),
metrics=['accuracy'])
We next fit the data to the model, or in plain English start the training process. We
do so by feeding the training data (both inputs and targets), specifying the batch size,
number of epochs, verbosity and validation split configured before.
Here you’ll need to add metrics for testing as well. After training with the training
and validation data, which essentially tells you something about the model’s
predictive performance, you also wish to test it for generalization – or, whether it
works well when data is used that the model has never seen before. That’s why you
MachineCurve
created the train / test split in the first place. Now is the time to add a test, or an
evaluation step, to the model – which executes just after the training process ends:
# Generate generalization metrics
score = model.evaluate(input_test, target_test, verbose=0)
print(f'Test loss: {score[0]} / Test accuracy: {score[1]}')
Final model
# Model configuration
img_width, img_height = 28, 28
batch_size = 250
no_epochs = 25
no_classes = 10
validation_split = 0.2
verbosity = 1
# Reshape data
input_train = input_train.reshape(input_train.shape[0],
img_width, img_height, 1)
input_test = input_test.reshape(input_test.shape[0], img_width,
img_height, 1)
input_shape = (img_width, img_height, 1)
MachineCurve
input_train = input_train / 255
input_test = input_test / 255
It’s a complete Keras model that can now be run in order to find its performance and
to see whether it works.
10
MachineCurve
Open your terminal, preferably an Anaconda environment, and ensure that all the
necessary dependencies are installed and are in working order.
You should see Keras starting up, running the training process in TensorFlow, and
displaying the results of the epochs.
11
MachineCurve