465-Lecture 2-4
465-Lecture 2-4
Lecture 2-4
Deep Feed Forward Neural Networks
Artificial Neural Network (ANN)
Perceptron
• The simplest neural network is the perceptron, which consists of a single
neuron
• Conceptually, the perceptron functions in a manner like a biological neuron
Perceptron
• The artificial neuron performs two consecutive functions
• It calculates the weighted sum of the inputs to represent the total strength
of the input signals
• Then it applies a step function to the result
• To determine whether to fire the output 1 if the signal exceeds a certain threshold
or
• 0 if the signal doesn’t exceed the threshold
• Not all input features are equally useful or important.
• To represent that, each input node is assigned a weight value, called its connection
weight, to reflect its importance
Perceptron
Input vector: The feature vector that It is usually denoted with an uppercase X to
is fed to the neuron represent a vector of inputs (x1, x2, . . ., xn)
Forward
Calculation
The weights break-down
Feedforward calculation
Feedforward process (2)
Feature learning
• The nodes in the hidden layers (ai) are the new features that are learned after
each layer
• For example, in the network of two slides earlier, we see that we have three
feature inputs (x1, x2, and x3)
• After computing the forward pass in the first layer, the network learns patterns,
and these features are transformed to three new features with different values (
a1 1 , a2 1 , a3 1 )
• Then, in the next layer, the network learns patterns within the patterns and
produces new features (a12, a22, a32 , and a42, and so forth)
• The produced features after each layer are not totally understood, and we don’t
see them, nor do we have much control over them
• That’s why they are called hidden layers
• What we do is this: we look at the final output prediction and keep tuning some
parameters until we are satisfied by the network’s performance
Feature Learning Process
Cost/Loss/Error Function
• The error function is a measure of how “wrong” the neural network
prediction is with respect to the expected output (the label)
• It quantifies how far we are from the correct solution
• For example, if we have a high loss, then our model is not doing a good
job
• The smaller the loss, the better the job the model is doing
• The larger the loss, the more our model needs to be trained to increase its
accuracy
Mean Square Error Loss Function
• Mean squared error (MSE) is commonly used in regression problems that require the output to
be a real value (like house pricing)
• Instead of just comparing the prediction output with the label (ŷi – yi), the error is squared and
averaged over the number of data points
• MSE is a good choice for a few reasons
• The square ensures the error is always positive, and
• Larger errors are penalized more than smaller errors
• MSE is quite sensitive to outliers, since it squares the error value
• A variation error function of MSE called mean absolute error (MAE) is immune to this issue
• It averages the absolute error over the entire dataset without taking the square of the error
Cross-entropy loss function
• Each element of the activation vector a[1] will be the same (because W[1] contains all the
same values)
• This behavior will occur at all layers of the neural network