0% found this document useful (0 votes)
3 views50 pages

9-MLP-EXAMPLE-08-08-2024

The document discusses deep learning concepts, focusing on deep neural networks and the multilayer perceptron (MLP) architecture. It explains the perceptron as a linear classifier, the limitations of single-layer perceptrons in solving non-linearly separable problems like XOR, and introduces backpropagation for weight updates in MLPs. Additionally, it covers practical applications of MLPs in sentiment analysis and image classification using the CIFAR-10 dataset, highlighting the importance of model architecture and preprocessing steps.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views50 pages

9-MLP-EXAMPLE-08-08-2024

The document discusses deep learning concepts, focusing on deep neural networks and the multilayer perceptron (MLP) architecture. It explains the perceptron as a linear classifier, the limitations of single-layer perceptrons in solving non-linearly separable problems like XOR, and introduces backpropagation for weight updates in MLPs. Additionally, it covers practical applications of MLPs in sentiment analysis and image classification using the CIFAR-10 dataset, highlighting the importance of model architecture and preprocessing steps.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 50

CSE4006- Deep

learning Dr.
D.Sumathi
Deep Neural Networks
Multilayer Perceptron-Gradient based Learning-
Backpropagation Algorithm- for
Regularization Learning- Optimization for Deep
training deep models
RECAP
• The perceptron is a classification algorithm. Specifically, it works as
a linear binary classifier. It was invented in the late 1950s by Frank
Rosenblatt.
• The perceptron basically works as a threshold function — non-
negative outputs are put into one class while negative ones are put into
the other class.
Perceptron –
Components
• Input nodes
• Output node
• An activation function
• Weights and biases
• Error function
Classificati
on
Implement linear classification in
terms of AND,OR gates..
Model to mimic XoR
problem
Attempt #1: The Single Layer Perceptron
• A perceptron can only converge on linearly separable data. Therefore, it
isn’t capable of imitating the XOR function.
• a perceptron must correctly classify the entire training data in one go.
• Non-linearity allows for more complex decision boundaries. One
potential decision boundary for our XOR data could look like this.
The 2d XOR problem — Attempt
#2
• Imitating the XOR function would require a non-linear decision
boundary.
• The XOR problem with neural networks can be solved by using Multi-
Layer Perceptrons or a neural network architecture with an input
layer, hidden layer, and output layer.
• So during the forward propagation through the neural networks, the
weights get updated to the corresponding layers and the XOR logic
gets executed.
Perceptron to solve XOR
problem
• Out of all the
2 input logic
gates, the
XOR and
XNOR gates
are the only
ones that are
not linearly-
separable
potential decision boundary could
be something to classify????
The Multi-layered
• Components are
Perceptron
• input and output nodes,
• activation function
• weights and
• Biases
• An MLP can have hidden layers. An MLP is generally restricted
to having a minimum of single hidden layer
• Activation functions should be differentiable, so that a network’s
parameters can be updated using backpropagation.
• Though the output generation process is a direct extension of that of the
perceptron, updating weights isn’t so straightforward.
• Backpropagation is an algorithm for update the weights and biases of a
model based on their gradients with respect to the error function,
starting from the output layer all the way to the first layer.
Architecture-
•MLP
The architecture of a network refers to its general structure — the
number of hidden layers, the number of nodes in each layer and how
these nodes are inter-connected.
Example-1
For w1 (with respect to
E1),
Proces
s
• update all the old weights with these new weights.
• Once the weights are updated, one backpropagation cycle is finished.
• Now the forward pass is done and the total new error is computed.
• And based on this newly computed total error the weights are again
updated.
• This goes on until the loss value converges to minima.
• This way a neural network starts with random values for its weights
and finally converges to optimum values.
Key
takeaway
• #1 Adding more layers or nodes : It gives increasingly complex
decision boundaries that could also lead to overfitting — where a model
achieves very high accuracies on the training data, but fails to
generalize.
• #2: Choosing a loss function: It makes some assumptions on the data (like
it being gaussian) and isn’t always convex when it comes to a classification
problem
Using Perceptron for Sentiment Analysis

With the final labels assigned to the entire


corpus, you decided to fit the data to a
Perceptron, the simplest neural network of all.
• Text from the guestbooks as a vector using the Term Frequence-
Inverse Document Frequency(TF-IDF). This method encodes any kind
of text as a statistic of how frequent each word, or term, is in each
sentence and the entire document.
• In Python we need to use TfidfVectorizer method from ScikitLearn.
• Remove English stop-words and even applyL1 normalization.

TfidfVectorizer(stop_words='english',
lowercase=True, norm='l1')
Step 1: Corpus is initialized along with targets
How would MultiLayer Perceptron perform
in this case?
• Activation function: ReLU, specified with the parameter activation=’relu’
• Optimization function: Stochastic Gradient Descent, specified with the
parameter solver=’sgd’
• Learning rate: Inverse Scaling, specified with the
parameter learning_rate=’invscaling’
• Number of iterations: 20, specified with the
parameter max_iter=20
• By default, Multilayer Perceptron has three hidden layers, but you want
to see how the number of neurons in each layer impacts performance
• Here the code started with 2 neurons per hidden layer, setting
the parameter num_neurons=2.
• Finally, to see the value of the loss function at each iteration, you also
added the parameter verbose=True.
• What about if you added more capacity to the neural network? What
happens when each hidden layer has more neurons to learn the patterns of
the dataset?
• Simply change the num_neurons parameter an set it, for instance, to 5.
• buildMLPerceptron(train_features, test_features, train_targets,
test_targets, num_neurons=5)
• Adding more neurons to the hidden layers definitely improved Model
accuracy!
Inferenc
es neural network structure, 3 hidden layers, but with the increased
• Same
computational power of the 5 neurons, the model got better at
understanding the patterns in the data.
• It converged much faster and mean accuracy doubled!
• In the end, for this specific case and dataset, the Multilayer Perceptron
performs as well as a simple Perceptron. But it was definitely a great
exercise to see how changing the number of neurons in each hidden-
layer impacts model performance.
Example -2 cifar dataset – deploy
MLP1: Preparing the Data for Training MLP Network
• Step
• Pixel scaling is an important preprocessing step that is often applied to
the input data
• In image classification tasks, the pixel values in the images can range
from 0 to 255 (for 8-bit images).
• Scaling the pixel values to be between 0 and 1 can make the model
training process more stable and efficient. This can be done by dividing
each pixel value by 255.
• from tensorflow.keras.datasets import cifar10
• from tensorflow.keras.utils import to_categorical

• # Load the CIFAR-10 dataset


• (x_train, y_train), (x_test, y_test) = cifar10.load_data()

• # Scale the pixel values to between 0 and 1


• x_train = x_train / 255.0
• x_test = x_test / 255.0
• # Convert the labels to one-hot encoding makes it easy to compare
the predicted probabilities to the true labels.
• y_train = to_categorical(y_train)
• y_test = to_categorical(y_test)
Defining the MLP Model
•Architecture
Using sequential model
• Using the Functional API.
• In the next slide, let us see the Keras Sequential Model Structure for
MLP
• from tensorflow.keras.models import Sequential
• from tensorflow.keras.layers import Dense, Flatten

• # Create a Sequential model


• model = Sequential()

• # Add a Flatten layer to flatten the input image


• model.add(Flatten(input_shape=(32, 32, 3)))

• # Add two dense layers with 200 units and 'relu' activation
function
• model.add(Dense(200, activation='relu'))
• model.add(Dense(150, activation='relu'))

• # Add a softmax output layer with 10 units


• model.add(Dense(10, activation='softmax'))

• # Print the model summary


• model.summary()
Model
summary
Keras Functional API Model
Structure for MLP
• from tensorflow.keras.layers import Input, Dense, Flatten
• from tensorflow.keras.models import Model

• # Define the input layer
• inputs = Input(shape=(32, 32, 3))

• # Flatten the input image
• x = Flatten()(inputs)

• # Add two dense layers with 200 units and 'relu' activation
function
• x = Dense(200, activation='relu')(x)
• x = Dense(150, activation='relu')(x)
• # Add a softmax output layer with 10 units
• outputs = Dense(10, activation='softmax')(x)

• # Create the model
• model = Model(inputs=inputs,
outputs=outputs)

• # Print the model summary
• model.summary()
Compile and Train the MLP
•Model
from tensorflow.keras import optimizers
• opt = optimizers.Adam(learning_rate=0.0005)
• model.compile(loss='categorical_crossentropy', optimizer=opt,
metrics=['accuracy'])
• model.fit(x_train, y_train, batch_size = 32, epochs = 10,
shuffle = True)
Evaluate the MLP
Model
Functional API vs
Sequential
• The Functional API method in Keras is recommended for creating
more complex models that have multiple inputs, multiple outputs, or
require layers to share connections.
• The Sequential model, on the other hand, is suitable for creating
simple models where the layers are stacked linearly.

You might also like