Module 4
Module 4
Intelligence
Model Building Using Keras
Keras prioritizes
developer experience
Keras is broadly
adopted in the
industry and among
the research
community
Keras supports
multiple backend
engines and does not
lock you into one
ecosystem
Keras development is
backed by key
companies in the Deep
Learning ecosystem
Keras is a high-level neural networks API. It is written in Python and can run on top of Theano, TensorFlow, or CNTK. It is designed
to be modular, fast, and easy to use
‘Being able to go from idea to result with the least possible delay is key to
• So, Keras is the high-level API wrapper for the low-level API
You can create two types of models available in Keras, i.e., the Sequential model and the Functional model
▪ You can create a Sequential model by passing a list of layer instances to the constructor
▪ Stacking convolutional layers one above the other can be an example of a sequential model
The Keras functional API is used for defining complex models, such as multi-output models, directed acyclic graphs, or models with
shared layers
Models are defined by creating instances of layers and connecting are as follows:
them directly to each other in pairs, then specifying the layers to act as • Defining the Input
standalone Input Layer that specifies the shape of the input data
• In the case of one-dimensional input data, such as for a from keras.layers import Input
visible = Input(shape=(2,))
multilayer perceptron, the shape must explicitly leave room for
the shape of the mini-batch size used when splitting the data
• A bracket notation is used to specify the layer from from keras.layers import Input
from keras.layers import Dense
which the input is received to the current layer, after
visible = Input(shape=(2,))
hidden = Dense(2)(visible)
the layer is created
• Example:
• As we move toward the right in this graph, our model tries to learn too well the details and the noise from the training data, which results in poor
• In other words, while going toward the right, the complexity of the model increases such that the training error reduces but the testing error doesn’t. This
• Have you come across a situation where your model performed exceptionally well on train data but was not able to predict test data?
• Or, were you ever on the top of a competition in public leaderboard only to fall hundreds of places in the final ranking?
• Do you know how complex neural networks are and how it makes them prone to overfitting? This is one of the most common problems Data Science
Regularization is a technique which makes slight modifications to the learning algorithm such that the model generalizes better. This in turn improves the
Let’s consider a neural network which is overfitting on the training data as shown in the above image
Assume that our regularization coefficient is so high that some of the weight matrices are nearly equal to zero
This will result in a much simpler linear network and slight underfitting of the training data
• We need to optimize the value of the regularization coefficient in order to obtain a well-fitted model as shown in the image below
• Dropout produces very good results and is consequently the most frequently used Regularization technique in the field of Deep Learning
• Let’s say our neural network structure is akin to the one shown below:
• At every iteration, it randomly selects some nodes and removes them, along with all of their incoming and outgoing connections as shown below:
• So, each iteration has a different set of nodes, and this results in a different set of outputs. It can also be thought of as an ensemble technique in
Machine Learning
• Ensemble models usually perform better than a single model as they capture more randomness. Similarly, dropout also performs better than a
• The probability of choosing how many nodes should be dropped out is the hyperparameter of the dropout function. As seen in the image below,
dropout can be applied to both the Hidden Layers as well as the Input Layers
• Due to these reasons, dropout is usually preferred when we have a large neural network structure in order to introduce more randomness
model = Sequential([
Dense(output_dim=hidden1_num_units, input_dim=input_num_units, activation='relu'),
Dropout(0.25),
• The simplest way to reduce overfitting is to increase the size of the training data
• There are a few ways of increasing the size of the training data—rotating the image, flipping, scaling, shifting, etc.
• In the below image, some transformation has been done on the handwritten digits dataset
• This usually provides a big leap in improving the accuracy of the model
• It has a big list of arguments which you can use to pre-process your training data
• Example:
• When the data is fed through a deep neural network and weights and parameters adjust those values, sometimes making the data too big or too small, it
becomes a problem. By normalizing the data in each mini-batch, this problem is largely avoided
• Batch Normalization normalizes each batch by both mean and variance reference
• It is just another layer, you can use to create your desired network architecture
• It is generally used between the linear and the non-linear layers in your network, because it normalizes the input to your activation function so that you're
• When the data is fed through a deep neural network and weights
A normal Dense fully connected layer looks like this:
and parameters adjust those values, sometimes making the data
variance reference
• It is just another layer, you can use to create your desired network
architecture
• When the data is fed through a deep neural network and weights
A normal Dense fully connected layer looks like this:
and parameters adjust those values, sometimes making the data
variance reference
To make it Batch normalization enabled, we have to tell the
• It is just another layer, you can use to create your desired network Dense Layer not to use bias, since it is not needed, and thus
it can save some calculation. Also, put the Activation Layer
architecture after the BatchNormalization() layer
your network, because it normalizes the input to your activation model.add(layers.Dense(64, use_bias=False))
model.add(layers.BatchNormalization())
function so that you're centered in the linear section of the model.add(Activation("relu"))
Let us see the 4-step workflow in developing neural networks with Keras
Let us see the 4-step workflow in developing neural networks with Keras
Let us see the 4-step workflow in developing neural networks with Keras
Let us see the 4-step workflow in developing neural networks with Keras
Here, we will try to build a Sequential Network of Dense Layers, and the dataset used is MNIST. MNIST is a classic dataset of
handwritten images, released in 1999, and has served as the basis for benchmarking classification algorithms
Here, we will be using Keras to create a simple neural network to predict, as accurately as we can, digits from handwritten images. In
particular, we will be calling the Functional Model API of Keras and creating a 4-layered and 5-layered neural network.
Also, we will be experimenting with various optimizers: the plain Vanilla Stochastic Gradient Descent optimizer and the Adam’s optimizer.
We will also introduce dropout, a form of regularization technique, in our neural networks to prevent overfitting
A True
B False
A True
B False
D None of these
D None of these
A Yes
B No
A Yes
B No