This document provides legal notices and disclaimers for an informational presentation by Intel. It states that the presentation is for informational purposes only and that Intel makes no warranties. It also notes that Intel technologies' features and benefits depend on system configuration. Finally, it specifies that the sample source code in the presentation is released under the Intel Sample Source Code License Agreement and that Intel and its logo are trademarks.
NEURAL NETWORK IN MACHINE LEARNING FOR STUDENTShemasubbu08
- Artificial neural networks are computational models inspired by the human brain that use algorithms to mimic brain functions. They are made up of simple processing units (neurons) connected in a massively parallel distributed system. Knowledge is acquired through a learning process that adjusts synaptic connection strengths.
- Neural networks can be used for pattern recognition, function approximation, and associative memory in domains like speech recognition, image classification, and financial prediction. They have flexible inputs, resistant to errors, and fast evaluation, though interpretation is difficult.
- The document discusses multi-layer perceptrons (MLPs), a type of artificial neural network. MLPs have multiple layers of nodes and can classify non-linearly separable data using backpropagation.
- It describes the basic components and working of perceptrons, the simplest type of neural network, and how they led to the development of MLPs. MLPs use backpropagation to calculate error gradients and update weights between layers.
- Various concepts are explained like activation functions, forward and backward propagation, biases, and error functions used for training MLPs. Applications mentioned include speech recognition, image recognition and machine translation.
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Simplilearn
- TensorFlow is a popular deep learning library that provides both C++ and Python APIs to make working with deep learning models easier. It supports both CPU and GPU computing and has a faster compilation time than other libraries like Keras and Torch.
- Tensors are multidimensional arrays that represent inputs, outputs, and parameters of deep learning models in TensorFlow. They are the fundamental data structure that flows through graphs in TensorFlow.
- The main programming elements in TensorFlow include constants, variables, placeholders, and sessions. Constants are parameters whose values do not change, variables allow adding trainable parameters, placeholders feed data from outside the graph, and sessions run the graph to evaluate nodes.
The field of artificial neural networks is often just called neural networks or multi-layer perceptrons after perhaps the most useful type of neural network. A perceptron is a single neuron model that was a precursor to larger neural networks.
It is a field that investigates how simple models of biological brains can be used to solve difficult computational tasks like the predictive modeling tasks we see in machine learning. The goal is not to create realistic models of the brain but instead to develop robust algorithms and data structures that we can use to model difficult problems.
The power of neural networks comes from their ability to learn the representation in your training data and how best to relate it to the output variable you want to predict. In this sense, neural networks learn mapping. Mathematically, they are capable of learning any mapping function and have been proven to be a universal approximation algorithm.
The predictive capability of neural networks comes from the hierarchical or multi-layered structure of the networks. The data structure can pick out (learn to represent) features at different scales or resolutions and combine them into higher-order features, for example, from lines to collections of lines to shapes.
This document provides an overview of artificial intelligence and machine learning techniques, including:
1. It defines artificial intelligence and lists some common applications such as gaming, natural language processing, and robotics.
2. It describes different machine learning algorithms like supervised learning, unsupervised learning, reinforced learning, and their applications in areas such as healthcare, finance, and retail.
3. It explains deep learning concepts such as neural networks, activation functions, loss functions, and architectures like convolutional neural networks and recurrent neural networks.
The document provides an introduction to the back-propagation algorithm, which is commonly used to train artificial neural networks. It discusses how back-propagation calculates the gradient of a loss function with respect to the network's weights in order to minimize the loss through methods like gradient descent. The document outlines the history of neural networks and perceptrons, describes the limitations of single-layer networks, and explains how back-propagation allows multi-layer networks to learn complex patterns through error propagation during training.
This document provides an overview of artificial neural networks. It describes how biological neurons work and how artificial neurons are modeled after them. An artificial neuron receives weighted inputs, sums them, and outputs the result through an activation function. A neural network consists of interconnected artificial neurons arranged in layers. The network is trained using a learning algorithm called backpropagation that adjusts the weights to minimize error between the network's output and target output. Neural networks can be used for applications like handwritten digit recognition.
Artificial Neural Networks have been very successfully used in several machine learning applications. They are often the building blocks when building deep learning systems. We discuss the hypothesis, training with backpropagation, update methods, regularization techniques.
The document provides an overview of backpropagation, a common algorithm used to train multi-layer neural networks. It discusses:
- How backpropagation works by calculating error terms for output nodes and propagating these errors back through the network to adjust weights.
- The stages of feedforward activation and backpropagation of errors to update weights.
- Options like initial random weights, number of training cycles and hidden nodes.
- An example of using backpropagation to train a network to learn the XOR function over multiple training passes of forward passing and backward error propagation and weight updating.
This is a single day course, allows the learner to get experience with the basic details of deep learning, first half is building a network using python/numpy only and the second half we build the more advanced netwrok using TensorFlow/Keras.
At the end you will find a list of usefull pointers to continue.
course git: https://gitlab.com/eshlomo/EazyDnn
Recurrent Neural Networks have shown to be very powerful models as they can propagate context over several time steps. Due to this they can be applied effectively for addressing several problems in Natural Language Processing, such as Language Modelling, Tagging problems, Speech Recognition etc. In this presentation we introduce the basic RNN model and discuss the vanishing gradient problem. We describe LSTM (Long Short Term Memory) and Gated Recurrent Units (GRU). We also discuss Bidirectional RNN with an example. RNN architectures can be considered as deep learning systems where the number of time steps can be considered as the depth of the network. It is also possible to build the RNN with multiple hidden layers, each having recurrent connections from the previous time steps that represent the abstraction both in time and space.
This document provides an overview of neural networks. It discusses how neural networks were inspired by biological neural systems and attempt to model their massive parallelism and distributed representations. It covers the perceptron algorithm for learning basic neural networks and the development of backpropagation for learning in multi-layer networks. The document discusses concepts like hidden units, representational power of neural networks, and successful applications of neural networks.
This document provides an overview of deep learning including:
1. Why deep learning performs better than traditional machine learning for tasks like image and speech recognition.
2. Common deep learning applications such as image recognition, speech recognition, and healthcare.
3. Challenges of deep learning like the need for large datasets and lack of interpretability.
This document provides an overview of deep learning including why it is used, common applications, strengths and challenges, common algorithms, and techniques for developing deep learning models. In 3 sentences: Deep learning methods like neural networks can learn complex patterns in large, unlabeled datasets and are better than traditional machine learning for tasks like image recognition. Popular deep learning algorithms include convolutional neural networks for image data and recurrent neural networks for sequential data. Effective deep learning requires techniques like regularization, dropout, data augmentation, and hyperparameter optimization to prevent overfitting on training data.
Neural networks are inspired by biological neurons and are used to learn relationships in data. The document defines an artificial neural network as a large number of interconnected processing elements called neurons that learn from examples. It outlines the key components of artificial neurons including weights, inputs, summation, and activation functions. Examples of neural network architectures include single-layer perceptrons, multi-layer perceptrons, convolutional neural networks, and recurrent neural networks. Common applications of neural networks include pattern recognition, data classification, and processing sequences.
Principles of soft computing-Associative memory networksSivagowry Shathesh
The document discusses various types of associative memory networks including auto-associative, hetero-associative, bidirectional associative memory (BAM), and Hopfield networks. It describes the architecture, training algorithms, and testing procedures for each type of network. The key points are: Auto-associative networks store and recall patterns using the same input and output vectors, while hetero-associative networks use different input and output vectors. BAM networks perform bidirectional retrieval of patterns. Hopfield networks are auto-associative single-layer recurrent networks that can converge to stable states representing stored patterns. Hebbian learning and energy functions are important concepts in analyzing the storage and recall capabilities of these associative memory networks.
Cerebellar Model Articulation ControllerZahra Sadeghi
The document provides an overview of the Cerebellar Model Articulation Controller (CMAC) neural network model. Some key points:
- CMAC is a 3-layer feedforward neural network that mimics the functionality of the mammalian cerebellum. It uses coarse coding to store weights in a localized associative memory.
- The input layer uses threshold units to activate a fixed number of neurons. The second layer performs logic AND operations. The third layer computes the weighted sum to produce the output.
- Learning involves comparing the actual output to the desired output and adjusting weights using methods like least mean square. Generalization occurs due to overlapping receptive fields between neurons.
- Applications include robot control,
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsContify
AI competitor analysis helps businesses watch and understand what their competitors are doing. Using smart competitor intelligence tools, you can track their moves, learn from their strategies, and find ways to do better. Stay smart, act fast, and grow your business with the power of AI insights.
For more information please visit here https://www.contify.com/
Ad
More Related Content
Similar to Artificial Neural Networks presentations (20)
The field of artificial neural networks is often just called neural networks or multi-layer perceptrons after perhaps the most useful type of neural network. A perceptron is a single neuron model that was a precursor to larger neural networks.
It is a field that investigates how simple models of biological brains can be used to solve difficult computational tasks like the predictive modeling tasks we see in machine learning. The goal is not to create realistic models of the brain but instead to develop robust algorithms and data structures that we can use to model difficult problems.
The power of neural networks comes from their ability to learn the representation in your training data and how best to relate it to the output variable you want to predict. In this sense, neural networks learn mapping. Mathematically, they are capable of learning any mapping function and have been proven to be a universal approximation algorithm.
The predictive capability of neural networks comes from the hierarchical or multi-layered structure of the networks. The data structure can pick out (learn to represent) features at different scales or resolutions and combine them into higher-order features, for example, from lines to collections of lines to shapes.
This document provides an overview of artificial intelligence and machine learning techniques, including:
1. It defines artificial intelligence and lists some common applications such as gaming, natural language processing, and robotics.
2. It describes different machine learning algorithms like supervised learning, unsupervised learning, reinforced learning, and their applications in areas such as healthcare, finance, and retail.
3. It explains deep learning concepts such as neural networks, activation functions, loss functions, and architectures like convolutional neural networks and recurrent neural networks.
The document provides an introduction to the back-propagation algorithm, which is commonly used to train artificial neural networks. It discusses how back-propagation calculates the gradient of a loss function with respect to the network's weights in order to minimize the loss through methods like gradient descent. The document outlines the history of neural networks and perceptrons, describes the limitations of single-layer networks, and explains how back-propagation allows multi-layer networks to learn complex patterns through error propagation during training.
This document provides an overview of artificial neural networks. It describes how biological neurons work and how artificial neurons are modeled after them. An artificial neuron receives weighted inputs, sums them, and outputs the result through an activation function. A neural network consists of interconnected artificial neurons arranged in layers. The network is trained using a learning algorithm called backpropagation that adjusts the weights to minimize error between the network's output and target output. Neural networks can be used for applications like handwritten digit recognition.
Artificial Neural Networks have been very successfully used in several machine learning applications. They are often the building blocks when building deep learning systems. We discuss the hypothesis, training with backpropagation, update methods, regularization techniques.
The document provides an overview of backpropagation, a common algorithm used to train multi-layer neural networks. It discusses:
- How backpropagation works by calculating error terms for output nodes and propagating these errors back through the network to adjust weights.
- The stages of feedforward activation and backpropagation of errors to update weights.
- Options like initial random weights, number of training cycles and hidden nodes.
- An example of using backpropagation to train a network to learn the XOR function over multiple training passes of forward passing and backward error propagation and weight updating.
This is a single day course, allows the learner to get experience with the basic details of deep learning, first half is building a network using python/numpy only and the second half we build the more advanced netwrok using TensorFlow/Keras.
At the end you will find a list of usefull pointers to continue.
course git: https://gitlab.com/eshlomo/EazyDnn
Recurrent Neural Networks have shown to be very powerful models as they can propagate context over several time steps. Due to this they can be applied effectively for addressing several problems in Natural Language Processing, such as Language Modelling, Tagging problems, Speech Recognition etc. In this presentation we introduce the basic RNN model and discuss the vanishing gradient problem. We describe LSTM (Long Short Term Memory) and Gated Recurrent Units (GRU). We also discuss Bidirectional RNN with an example. RNN architectures can be considered as deep learning systems where the number of time steps can be considered as the depth of the network. It is also possible to build the RNN with multiple hidden layers, each having recurrent connections from the previous time steps that represent the abstraction both in time and space.
This document provides an overview of neural networks. It discusses how neural networks were inspired by biological neural systems and attempt to model their massive parallelism and distributed representations. It covers the perceptron algorithm for learning basic neural networks and the development of backpropagation for learning in multi-layer networks. The document discusses concepts like hidden units, representational power of neural networks, and successful applications of neural networks.
This document provides an overview of deep learning including:
1. Why deep learning performs better than traditional machine learning for tasks like image and speech recognition.
2. Common deep learning applications such as image recognition, speech recognition, and healthcare.
3. Challenges of deep learning like the need for large datasets and lack of interpretability.
This document provides an overview of deep learning including why it is used, common applications, strengths and challenges, common algorithms, and techniques for developing deep learning models. In 3 sentences: Deep learning methods like neural networks can learn complex patterns in large, unlabeled datasets and are better than traditional machine learning for tasks like image recognition. Popular deep learning algorithms include convolutional neural networks for image data and recurrent neural networks for sequential data. Effective deep learning requires techniques like regularization, dropout, data augmentation, and hyperparameter optimization to prevent overfitting on training data.
Neural networks are inspired by biological neurons and are used to learn relationships in data. The document defines an artificial neural network as a large number of interconnected processing elements called neurons that learn from examples. It outlines the key components of artificial neurons including weights, inputs, summation, and activation functions. Examples of neural network architectures include single-layer perceptrons, multi-layer perceptrons, convolutional neural networks, and recurrent neural networks. Common applications of neural networks include pattern recognition, data classification, and processing sequences.
Principles of soft computing-Associative memory networksSivagowry Shathesh
The document discusses various types of associative memory networks including auto-associative, hetero-associative, bidirectional associative memory (BAM), and Hopfield networks. It describes the architecture, training algorithms, and testing procedures for each type of network. The key points are: Auto-associative networks store and recall patterns using the same input and output vectors, while hetero-associative networks use different input and output vectors. BAM networks perform bidirectional retrieval of patterns. Hopfield networks are auto-associative single-layer recurrent networks that can converge to stable states representing stored patterns. Hebbian learning and energy functions are important concepts in analyzing the storage and recall capabilities of these associative memory networks.
Cerebellar Model Articulation ControllerZahra Sadeghi
The document provides an overview of the Cerebellar Model Articulation Controller (CMAC) neural network model. Some key points:
- CMAC is a 3-layer feedforward neural network that mimics the functionality of the mammalian cerebellum. It uses coarse coding to store weights in a localized associative memory.
- The input layer uses threshold units to activate a fixed number of neurons. The second layer performs logic AND operations. The third layer computes the weighted sum to produce the output.
- Learning involves comparing the actual output to the desired output and adjusting weights using methods like least mean square. Generalization occurs due to overlapping receptive fields between neurons.
- Applications include robot control,
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsContify
AI competitor analysis helps businesses watch and understand what their competitors are doing. Using smart competitor intelligence tools, you can track their moves, learn from their strategies, and find ways to do better. Stay smart, act fast, and grow your business with the power of AI insights.
For more information please visit here https://www.contify.com/
How iCode cybertech Helped Me Recover My Lost Fundsireneschmid345
I was devastated when I realized that I had fallen victim to an online fraud, losing a significant amount of money in the process. After countless hours of searching for a solution, I came across iCode cybertech. From the moment I reached out to their team, I felt a sense of hope that I can recommend iCode Cybertech enough for anyone who has faced similar challenges. Their commitment to helping clients and their exceptional service truly set them apart. Thank you, iCode cybertech, for turning my situation around!
[email protected]
GenAI for Quant Analytics: survey-analytics.aiInspirient
Pitched at the Greenbook Insight Innovation Competition as apart of IIEX North America 2025 on 30 April 2025 in Washington, D.C.
Join us at survey-analytics.ai!
2. 2
Functions Can be Made Linear
• Data is not linearly separable in one dimension
• Not separable if you insist on using a specific class of functions
𝒙
4. 4
Neural Networks
• Multi-layer networks were designed to overcome the
computational (expressivity) limitation of a single threshold
element.
Linear Threshold Unit
Input
Hidden
Output
𝑦 = 𝑠𝑖𝑔𝑛(∑w𝑖𝑥𝑖 − 𝑇)
5. 5
History: Neural Computation
• McCulloch and Pitts (1943) showed how linear threshold units
can be used to compute logical functions
𝑦 = 𝑠𝑖𝑔𝑛(∑w𝑖𝐼𝑖 − 𝑇)
7. 7
Neural Networks
• Multi-layer networks were designed to overcome the
computational (expressivity) limitation of a single threshold
element.
Linear Threshold Unit
Input
Hidden
Output
8. 8
Neural Networks
• Multi-layer networks were designed to overcome the computational
(expressivity) limitation of a single threshold element.
• The idea is to stack several
layers of threshold elements,
each layer using the output of
the previous layer as input.
• Multi-layer networks can represent arbitrary functions, but building
effective learning methods for such network was [thought to be]
difficult.
Input
Hidden
Output
10. 10
Neural Networks
• Neural Networks are functions: 𝑁𝑁: 𝑿 → 𝑌
• where 𝑿 = 0,1 𝑛
, or ℝ𝑛
and 𝑌 = [0,1], {0,1}
• Robust approach to approximating real-valued, discrete-valued and vector
valued target functions.
• Among the most effective general purpose supervised learning
method currently known.
• Effective especially for complex and hard to interpret input data such
as real-world sensory data, where a lot of supervision is available.
• Learning:
• The Backpropagation algorithm for neural networks has been shown
successful in many practical problems
11. 11
Motivation for Neural Networks
• Inspired by biological neural network systems
• But are not identical to them
• We are currently on rising part of a wave of interest in NN
architectures, after a long downtime from the mid-90-ies.
• Better computer architecture (parallelism on GPUs & TPUs)
• A lot more data than before; in many domains, supervision is available.
12. 12
Motivation for Neural Networks
• One potentially interesting perspective:
• We used to think about NNs only as function approximators.
• Geoffrey Hinton introduced “Restricted Boltzman Machines” (RBMs) in the
mid 2000s – method to learn high-level representations of input
• Many other ideas focusing in the Intermediate Representations of NNs
• Ideas are being developed on the value of these intermediate
representations for transfer learning, for the meaning they represent etc.
• We will present in the next two lectures a few of the basic
architectures and learning algorithms, and provide some
examples for applications
13. 13
Basic Unit in Multi-Layer Neural Network
• Threshold units: 𝑜𝑗 = sgn(𝒘 ⋅ 𝒙 − 𝑇) introduce non-linearity
• But not differentiable,
• Hence unsuitable for learning via Gradient Descent
activation
Input
Hidden
Output
14. 14
Logistic Neuron / Sigmoid Activation
• Neuron is modeled by a unit 𝑗 connected by weighted links 𝑤𝑖𝑗 to other
units 𝑖.
• Use a non-linear, differentiable output function such as the sigmoid or logistic
function
• Net input to a unit is defined as:
• Output of a unit is defined as:
net𝑗 = ∑𝑤𝑖𝑗 ⋅ 𝑥𝑖
𝑜𝑗 = 𝜎 𝑛𝑒𝑡𝑗 =
1
1 + exp − (net𝑗−𝑇𝑗)
𝑜𝑗
𝑥1
𝑥2
𝑥3
𝑥4
𝑥5
𝑥6
𝑥𝑗
𝑤1𝑗
𝑤6𝑗
15. 15
Representational Power
• Any Boolean function can be represented by a two layer network
(simulate a two layer AND-OR network)
• Any bounded continuous function can be approximated with arbitrary
small error by a two layer network.
• Sigmoid functions provide a set of basis functions from which arbitrary
function can be composed.
• Any function can be approximated to arbitrary accuracy by a three layer
network.
16. 16
Quiz Time!
• Given a neural network, how can we make predictions?
• Given input, calculate the output of each layer (starting from the first
layer), until you get to the output.
• What is required to fully specify a neural network?
• The weights.
• Why NN predictions can be quick?
• Because many of the computations could be parallelized.
• What makes a neural networks expressive?
• The non-linear units.
18. 18
History: Learning Rules
• Hebb (1949) suggested that if two units are both active (firing) then
the weights between them should increase:
𝑤𝑖𝑗 = 𝑤𝑖𝑗 + 𝑅𝑜𝑖𝑜𝑗
• 𝑅 and is a constant called the learning rate
• Supported by physiological evidence
• Rosenblatt (1959) suggested that when a target output value is
provided for a single neuron with fixed input, it can incrementally
change weights and learn to produce the output using the
Perceptron learning rule.
• assumes binary output units; single linear threshold unit
• Led to the Perceptron Algorithm
• See: http://people.idsia.ch/~juergen/who-invented-backpropagation.html
20. 20
Gradient Descent
• We use gradient descent to determine the weight vector that
minimizes some scalar valued loss function 𝐸𝑟𝑟 𝒘 𝑗 ;
• Fixing the set 𝐷 of examples, 𝐸rr is a function of 𝒘 𝑗
• At each step, the weight vector is modified in the direction that
produces the steepest descent along the error surface.
𝐸𝑟𝑟(𝒘)
𝒘
𝒘3 𝒘2 𝒘1 𝑤0
21. 21
Backpropagation Learning Rule
• Since there could be multiple output units, we define the error as the sum
over all the network output units.
• 𝐸𝑟𝑟 𝒘 =
1
2
∑𝑑∈𝐷 ∑𝑘∈𝐾 𝑡𝑘𝑑 − 𝑜𝑘𝑑
2
• where 𝐷 is the set of training examples,
• 𝐾 is the set of output units
• This is used to derive the (global) learning rule which performs gradient
descent in the weight space in an attempt to minimize the error function.
Δ𝑤𝑖𝑗 = −𝑅
𝜕𝐸
𝜕𝑤𝑖𝑗
𝑜1…𝑜𝑘
(1, 0, 1, 0, 0)
22. 22
Learning with a Multi-Layer Perceptron
• It’s easy to learn the top layer – it’s just a linear unit.
• Given feedback (truth) at the top layer, and the activation at the layer
below it, you can use the Perceptron update rule (more generally, gradient
descent) to updated these weights.
• The problem is what to do with the other set of weights – we do
not get feedback in the intermediate layer(s).
activation
Input
Hidden
Output
w2
ij
w1
ij
23. 23
Learning with a Multi-Layer Perceptron
• The problem is what to do with the other set of
weights – we do not get feedback in the intermediate
layer(s).
• Solution: If all the activation functions are
differentiable, then the output of the network is also a
differentiable function of the input and weights in the
network.
• Define an error function (e.g., sum of squares) that is a
differentiable function of the output, i.e. this error
function is also a differentiable function of the
weights.
• We can then evaluate the derivatives of the error with
respect to the weights, and use these derivatives to
find weight values that minimize this error function,
using gradient descent (or other optimization
methods).
• This results in an algorithm called back-propagation.
activation
Input
Hidden
Output
w2
ij
w1
ij
26. 26
Some facts from real analysis
• First let’s get the notation right:
• The arrow shows functional dependence of 𝑧 on 𝑦
• i.e. given 𝑦, we can calculate 𝑧.
• e.g., for example: 𝑧(𝑦) = 2𝑦2
The derivative of 𝑧, with respect to 𝑦.
27. 27
Some facts from real analysis
• Simple chain rule
• If 𝑧 is a function of 𝑦, and 𝑦 is a function of 𝑥
• Then 𝑧 is a function of 𝑥, as well.
• Question: how to find
𝜕𝑧
𝜕𝑥
We will use these facts to derive
the details of the
Backpropagation algorithm.
𝑧 will be the error (loss) function.
- We need to know how to
differentiate 𝑧
Intermediate nodes use a logistics
function (or another
differentiable step function).
- We need to know how to
differentiate it.
𝜕𝑧
𝜕𝑥
=
𝜕𝑧
𝜕𝑦
𝜕𝑦
𝜕𝑥
28. 28
Some facts from real analysis
• Multiple path chain rule
Slide Credit: Richard Socher
𝜕𝑧
𝜕𝑥
=
𝜕𝑧
𝜕𝑦1
𝜕𝑦1
𝜕𝑥
+
𝜕𝑧
𝜕𝑦2
𝜕𝑦2
𝜕𝑥
29. 29
Some facts from real analysis
• Multiple path chain rule: general
Slide Credit: Richard Socher
𝜕𝑧
𝜕𝑥
=
𝑖=1
𝑛
𝜕𝑧
𝜕𝑦𝑖
𝜕𝑦𝑖
𝜕𝑥
30. 30
Key Intuitions Required for BP
• Gradient Descent
• Change the weights in the direction of
gradient to minimize the error function.
• Chain Rule
• Use the chain rule to calculate the
weights of the intermediate weights
• Dynamic Programming (Memoization)
• Memoize the weight updates to make
the updates faster.
𝜕𝐸
𝜕𝑤𝑖𝑗
output
input
31. 31
Backpropagation: the big picture
• Loop over instances:
1. The forward step
• Given the input, make predictions
layer-by-layer, starting from the first
layer)
2. The backward step
• Calculate the error in the output
• Update the weights layer-by-layer,
starting from the final layer
𝜕𝐸
𝜕𝑤𝑖𝑗
output
input
32. 32
Quiz time!
• What is the purpose of forward step?
• To make predictions, given an input.
• What is the purpose of backward step?
• To update the weights, given an output error.
• Why do we use the chain rule?
• To calculate gradient in the intermediate layers.
• Why backpropagation could be efficient?
• Because it can be parallelized.
output
input
34. 34
Reminder: Model Neuron (Logistic)
• Neuron is modeled by a unit 𝑗 connected by weighted links 𝑤𝑖𝑗 to other
units 𝑖.
• Use a non-linear, differentiable output function such as the sigmoid or logistic
function
• Net input to a unit is defined as:
• Output of a unit is defined as:
net𝑗 = ∑𝑤𝑖𝑗. 𝑥𝑖
𝑜𝑗 =
1
1 + exp −(net𝑗 − 𝑇𝑗)
𝑜𝑗
𝑥1
𝑥2
𝑥3
𝑥4
𝑥5
𝑥6
𝑥7
𝑤17
𝑤67
Note:
Other gates, beyond Sigmoid, can be used (TanH, ReLu)
Other Loss functions, beyond LMS, can be used.
35. 35
Derivation of Learning Rule
• The weights are updated incrementally; the error is computed for
each example and the weight update is then derived.
𝐸𝑑 𝒘 =
1
2
𝑘∈𝐾
𝑡𝑘 − 𝑜𝑘
2
• 𝑤𝑖𝑗 influences the output 𝑜𝑗 only through net𝑗
• Therefore:
𝜕𝐸𝑑
𝜕𝑤𝑖𝑗
=
𝜕𝐸𝑑
𝜕o𝑗
𝜕𝑜𝑗
𝜕net𝑗
𝜕net𝑗
𝜕𝑤𝑖𝑗
𝑜1…, 𝑜𝑗, . . 𝑜𝑘
x𝑖
𝑤𝑖𝑗
𝑜𝑗 =
1
1+exp{−(net𝑗−𝑇)}
and net𝑗 = ∑𝑤𝑖𝑗. 𝑥𝑖
38. 38
Derivation of Learning Rule (3)
• Weights of output units:
• 𝑤𝑖𝑗 is changed by:
Where we defined:
𝛿𝑗 =
𝜕𝐸𝑑
𝜕net𝑗
= 𝑡𝑗 − 𝑜𝑗 𝑜𝑗 1 − 𝑜𝑗
Δ𝑤𝑖𝑗 = 𝑅 𝑡𝑗 − 𝑜𝑗 𝑜𝑗 1 − 𝑜𝑗 𝑥𝑖
= 𝑅𝛿𝑗𝑥𝑖
𝑗
𝑖
𝑤𝑖𝑗
𝑜𝑗
𝑥𝑖
39. 𝜕𝐸𝑑
𝜕𝑤𝑖𝑗
39
Derivation of Learning Rule (4)
• Weights of hidden units:
• 𝑤𝑖𝑗 Influences the output only through all the units whose direct input
include 𝑗
𝑘
𝑗
𝑖
𝑤𝑖𝑗
𝑜𝑘
𝐸𝑑
𝑜1
41. =
𝑘∈𝑝𝑎𝑟𝑒𝑛𝑡(𝑗)
−𝛿𝑘
𝜕net𝑘
𝜕𝑜𝑗
𝜕𝑜𝑗
𝜕net𝑗
𝑥𝑖
=
𝑘∈𝑝𝑎𝑟𝑒𝑛𝑡(𝑗)
−𝛿𝑘 𝑤𝑗𝑘 𝑜𝑗(1 − 𝑜𝑗) 𝑥𝑖
41
Derivation of Learning Rule (5)
• Weights of hidden units:
• 𝑤𝑖𝑗 influences the output only through all the units whose direct input
include 𝑗
𝑘
𝑗
𝑖
𝑤𝑖𝑗
𝑜𝑘
𝜕𝐸𝑑
𝜕𝑤𝑖𝑗
=
𝑘∈𝑝𝑎𝑟𝑒𝑛𝑡(𝑗)
−𝛿𝑘
𝜕net𝑘
𝜕net𝑗
𝑥𝑖 =
42. 42
Derivation of Learning Rule (6)
• Weights of hidden units:
• 𝑤𝑖𝑗 is changed by:
• Where
𝛿𝑗 = 𝑜𝑗 1 − 𝑜𝑗 . ∑𝑘∈𝑝𝑎𝑟𝑒𝑛𝑡 𝑗 −𝛿𝑘 𝑤𝑗𝑘
• First determine the error for the output units.
• Then, backpropagate this error layer by layer through the network, changing weights appropriately in each
layer.
𝑘
𝑗
𝑖
𝑤𝑖𝑗
𝑜𝑘
Δ𝑤𝑖𝑗 = 𝑅 𝑜𝑗 1 − 𝑜𝑗 .
𝑘∈𝑝𝑎𝑟𝑒𝑛𝑡 𝑗
−𝛿𝑘 𝑤𝑗𝑘 𝑥𝑖
= 𝑅𝛿𝑗𝑥𝑖
43. 43
The Backpropagation Algorithm
• Create a fully connected three layer network. Initialize weights.
• Until all examples produce the correct output within 𝜖 (or other criteria)
For each example in the training set do:
1. Compute the network output for this example
2. Compute the error between the output and target value
𝛿𝑘 = 𝑡𝑘 − 𝑜𝑘 𝑜𝑘 1 − 𝑜𝑘
1. For each output unit k, compute error term
𝛿𝑗 = 𝑜𝑗 1 − 𝑜𝑗 .
𝑘∈𝑑𝑜𝑤𝑛𝑠𝑡𝑟𝑒𝑎𝑚 𝑗
−𝛿𝑘 𝑤𝑗𝑘
1. For each hidden unit, compute error term: Δ𝑤𝑖𝑗 = 𝑅𝛿𝑗𝑥𝑖
2. Update network weights with Δ𝑤𝑖𝑗
End epoch
50. 50
The Backpropagation Algorithm
• Create a fully connected network. Initialize weights.
• Until all examples produce the correct output within 𝜖 (or other criteria)
For each example (𝑥𝑖, 𝑡𝑖) in the training set do:
1. Compute the network output 𝑦𝑖 for this example
2. Compute the error between the output and target value
𝐸 = ∑ 𝑡𝑖
𝑘
− 𝑜𝑖
𝑘 2
3. Compute the gradient for all weight values, Δ𝑤𝑖𝑗
4. Update network weights with 𝑤𝑖𝑗 = 𝑤𝑖𝑗 − R ∗ Δ𝑤𝑖𝑗
End epoch
Auto-differentiation packages such as Tensorflow, Torch, etc. help!
Quick example in code
51. 51
Comments on Training
• No guarantee of convergence; neural networks form non-convex
functions with multiple local minima
• In practice, many large networks can be trained on large amounts of
data for realistic problems.
• Many epochs (tens of thousands) may be needed for adequate
training. Large data sets may require many hours of CPU
• Termination criteria: Number of epochs; Threshold on training set
error; No decrease in error; Increased error on a validation set.
• To avoid local minima: several trials with different random initial
weights with majority or voting techniques
52. 52
Over-training Prevention
• Running too many epochs and/or a NN with many hidden layers may
lead to an overfit network
• Keep an held-out validation set and test accuracy after every epoch
• Early stopping: maintain weights for best performing network on the
validation set and return it when performance decreases
significantly beyond that.
• To avoid losing training data to validation:
• Use 10-fold cross-validation to determine the average number of epochs
that optimizes validation performance
• Train on the full data set using this many epochs to produce the final
results
53. 53
Over-fitting prevention
• Too few hidden units prevent the system from adequately
fitting the data and learning the concept.
• Using too many hidden units leads to over-fitting.
• Similar cross-validation method can be used to determine an
appropriate number of hidden units. (general)
• Another approach to prevent over-fitting is weight-decay: all
weights are multiplied by some fraction in (0,1) after every
epoch.
• Encourages smaller weights and less complex hypothesis
• Equivalently: change Error function to include a term for the sum of
the squares of the weights in the network. (general)
54. 54
Dropout training
• Proposed by (Hinton et al, 2012)
• Each time decide whether to delete one hidden unit with some probability
𝑝
56. 56
Dropout training
• Model averaging effect
• Among 2𝐻 models, with shared parameters
• 𝐻: number of units in the network
• Only a few get trained
• Much stronger than the known regularizer
• What about the input space?
• Do the same thing!
57. 57
Recap: Multi-Layer Perceptrons
• Multi-layer network
• A global approximator
• Different rules for training it
• The Back-propagation
• Forward step
• Back propagation of errors
• Congrats! Now you know the most important algorithm in neural networks!
• Next Time:
• Convolutional Neural Networks
• Recurrent Neural Networks
activation
Input
Hidden
Output