SlideShare a Scribd company logo
08 neural networks
Legal Notices and Disclaimers
This presentation is for informational purposes only. INTEL MAKES NO WARRANTIES,
EXPRESS OR IMPLIED, IN THIS SUMMARY.
Intel technologies’ features and benefits depend on system configuration and may require
enabled hardware, software or service activation. Performance varies depending on system
configuration. Check with your system manufacturer or retailer or learn more at intel.com.
This sample source code is released under the Intel Sample Source Code License
Agreement.
Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
Copyright © 2018, Intel Corporation. All rights reserved.
2
08 neural networks
Neural Networks
A fancy, tunable way to get an f when given data and target.
• That is, f(data)  tgt
Neural Network Example: OR Logic
A logic gate takes in two Boolean (true/false or 1/0) inputs.
Returns either a 0 or 1, depending on its rule.
The truth table for a logic gate shows the outputs for each combination of inputs.
Truth Table
For example, let's look at the truth table for an Or-gate:
OR as a Neuron
A neuron that uses the sigmoid activation function outputs a value between
(0, 1).
This naturally leads us to think about Boolean values.
Imagine a neuron that takes in two inputs, x1x1 and x2x2, and a bias term:
08 neural networks
Nodes
Nodes are the primitive elements.
out = activation(f(in) + bias)
𝑧 = 𝑎(𝑏 + 𝑖=1
𝑚
𝑊𝑖 ⋅ 𝑥𝑖)
= 𝑎(𝑊 𝑡
𝑥 + 𝑏)
Z
𝑥1
𝑥2
+1
a
out =
a(Z)
Classic Visualization of Neurons
𝑥1
𝑥2
+1
Inputs
Bias neuron
(constant 1)
Activation function
Weights are shown to be arrows
in classical visualizations of NNs
Z a
out =
a(Z)
Classic Visualization of Neurons
𝑥1
𝑥2
+1
Inputs
Bias neuron
(constant 1)
𝑊1
𝑊2
𝑏
Z a
out =
a(Z)
Training
z is a dot-product between inputs and weights of the node.
• sum-of-squares
We initialize the weights with constants and/or random values.
Learning is the process of finding good weights.
Activation Function: Sigmoid
Model inspired by biological neurons.
Biological neurons either pass no signal, full signal, or something in between.
Want a function that is like this and has an easy derivative.
Activation Function: Sigmoid
𝜎 𝑧 =
1
1 + 𝑒−𝑧
• Value at 𝑧 ≪ 0?
• Value at 𝑧 = 0?
• Value at 𝑧 ≫ 0?
≈ 0
= 0.5
≈ 1
Activation Function: Sigmoid
0.5
1.0
1.0
-40
5
5
𝜎 𝑧 = 𝜎(−10) ≈ 0.0
≈ 0.0
𝑥1
𝑥2
+1
a
0.0
Activation Function: ReLU
Many modern networks use rectified linear units (ReLU)
𝑅𝑒𝐿𝑈 𝑧 =
0, 𝑧 < 0
𝑧, 𝑧 ≥ 0
Value at 𝑧 ≪ 0?
Value at 𝑧 = 0?
Value at 𝑧 ≫ 0?
= 0
= 0.
= 𝑧
= max 0, 𝑧
Activation Function: ReLU
𝑅𝑒𝐿𝑈 𝑧 =
0, 𝑧 < 0
𝑧, 𝑧 ≥ 0
Layers and Networks
Inputs don’t need to be limited to passing data into a single neuron.
They can pass data to as many as we like
𝑥1
𝑥2
+1
a
a
Layers and Networks
Typically, neurons are grouped into layers.
Each neuron in the layer receives input from the same neurons.
Weights are different for each neuron
All neurons in this layer output to the same neurons in a subsequent layer.
Layers and Networks: Input/Output Layers
Input layer depends on:
• Form of raw data
• First level of our internal network architecture
Output layer depends on:
• Last layer of our internal network architecture
• Type of prediction we want to make
• Regression versus classification
Layers and Networks: Input/Output Layers
𝑥1
𝑥2
+1
a
a
a2
a1a1 and a2 receive the same x1 value
But having different weights mean a1 and a2 neurons
respond differently.
Feed Forward Neural Network
Weights
𝑥1
𝑥2
𝑥3
a
a
a
a
a
a
a
a
𝑦1
𝑦2
𝑦3
Feed Forward Neural Network
𝑥1
𝑥2
𝑥3
a
a
a
a
a
a
a
a
𝑦1
𝑦2
𝑦3
Input Layer
Hidden Layers
Output Layer
08 neural networks
Optimization and Loss: Gradient Descent
We will start with the cost function: J(x) = x2
• Cost is what we pay for an error
• For example, an error of -3 gives a cost of 9
Take the gradient of x2 = 2x.
Select datapoints to generate a gradient slope line.
Plot x2 with a given gradient slope and annotations.
We want the lowest cost.
Gradient Descent: Starting From Left Side
Gradient Descent: Starting From Left Side
Gradient Descent: Starting From Left Side
Gradient Descent: Starting From Left Side
Gradient Descent: Starting From Right Side
Process of Gradient Descent: Math
1. Find the gradient with respect to weights over training data.
 Plug data into our derivative function and sum up over data points
∆𝑊 =
𝑖=1
𝑛
𝜕𝐽
𝜕𝑊
𝑥𝑖, 𝑦𝑖
𝜕𝐽
𝜕𝑊
(𝑥𝑖, 𝑦𝑖) =
1
𝑛
𝑖=1
𝑛
𝑥𝑖 𝑦𝑖 − 𝑦𝑖
The number we’ll use to
adjust the weight
Derivative of MSE
Process of Gradient Descent: Math
2. Adjust the weight by subtracting some amount of ∆𝑊.
𝛼 (alpha) is known as the learning rate
A hyper-parameter we choose
3. Repeat until model is done training.
We can also adjust the learning rate as we train
𝑊: = 𝑊 − 𝛼 ∙ ∆𝑊
Minus adjusts W in the correct direction
J (cost)
W
𝜕𝐽
𝜕𝑊
< 0
𝛼 ∙ ∆𝑊
Adjusting the Learning Rate
J (cost)
W
𝜕𝐽
𝜕𝑊
< 0
Adjusting the Learning Rate
𝛼 ∙ ∆𝑊
Bigger 𝛼
J (cost)
W
𝜕𝐽
𝜕𝑊
< 0
Adjusting the Learning Rate
𝛼 ∙ ∆𝑊
Smaller 𝛼
Batches
How much data do we use for one training step?
• One training step takes us from old network weights to new network weights
We could use ALL of the examples at one time.
• Terrible performance -- if it is even possible
• We'll constantly be swapping memory to slow disks
We could use one example at a time.
• But terrible performance
• It doesn't take advantage of caching, vectorized operations, and so on
• We want good data processing size for vectorized operations
Batching
How much data do we use for one training step?
• One training step takes us from old network weights to new network weights
Options
• Full batch
• Update weights after considering all data in batch
• Mini-batch
• Update weights after considering part of batch, repeat
• Approximating the gradient
• Can help with local minima
Batching
Options continued…
• Stochastic gradient descent (SGD)
• Mini batch with size 1
• Also called online training
• Very sporadic, very easy to compute
• With a big network, performance comes from many weights
Comparing Full Batch, Mini Batch, and SGD
Stochastic Mini batch Full batch
Batch size1 N
Use all of training data per
step
Use small portion of training
data per step
Use single example per step
Epoch
One epoch is one pass through the entire dataset.
• Generally, the dataset is too big for system memory.
• Can't do this all in one go
General measure of the amount of training.
• How many epochs did I perform?
Shuffling Datasets for Epochs
After each epoch, shuffle the training data.
Prevents resampling in the exact same way.
• Different epochs sample data in different ways.
So…
Shuffle, make batches, repeat.
Splitting Data Up Into Batches
Batch 5
Batch 4
Batch 3
Batch 2
Batch 1
FULL
BATCH
Step 1
Splitting Data Up Into Batches
Batch 5
Batch 4
Batch 3
Batch 2
Batch 1 Step 1
Step 2
Step 3
Step 4
Step 5
First Epoch Completed
Shuffle Data
Batch 2
Batch 5
Batch 4
Batch 3
Batch 2
Batch 1 Step 6
08 neural networks
Special Issues With Overfitting
Very simple neural network architectures can approximate arbitrarily complex functions
very well.
• Consequence of universal representation theorem
• Three layers, finite # nodes  arbitrarily good approximation
• Although better approximations may require n  big
Even simple neural networks are, in some sense, too powerful.
Special Issues With Overfitting
Many architectures easily overfit data.
• Simply chugging through the data over-and-over leads to overfit.
• Memorizes data but doesn't learn the generality.
• Easily mislead by noise.
Traditionally, we control this by monitoring the performance on a test set.
• As long as it improves, we're good.
• When it starts going the wrong way, we stop.
Special Issues With Overfitting
Modern method uses a technique called dropout:
• Here we randomly have nodes disappear from the network.
• Everyone else still has to perform.
The overall network has to be more robust.
• Single nodes can't be too important.
• The nodes can't all be highly correlated with one another.
• Different nodes must respond to different stimuli
Dropout Model
Knocking Out and Rescaling Neurons
• During training, we randomly drop each neuron
with probability 1 − 𝑝.
• When running the model, we scale the
outputs of the neuron by 𝑝.
• This ensures that the expected value of the
weights stays the same at run time.
Concept of a “Pseudo-Ensemble”
An Example Model
08 neural networks
Multilayer Perceptron (MLP)
𝑥1
𝑥2
𝑥3
a
a
a
a
a
a
a
a
𝑦1
𝑦2
𝑦3
MLP: General Process
1. Shuffle the data and split between train and test sets
2. Flatten the data
3. Convert class vectors to binary class matrices
4. Generate network architecture
5. Display network architecture
6. Define learning procedure
7. Fit model
8. Evaluate
MLP
Trains a simple MLP with dropout on the MNIST* dataset.
Gets to 98.40 percent test accuracy after 20 epochs.
• There is a lot of margin for parameter tuning
• 0.2 seconds per epoch on a K520 GPU
Convolution Neural Networks (CNN)
Good to use when you have:
• Translational variance
• Huge number of parameters
We need to train models on translated data
CNN: General Process
Trains a simple convnet on the MNIST* dataset>
Gets to 99.25 percent test accuracy after 12 epochs.
• There is still a lot of margin for parameter tuning
• 0.16 seconds per epoch on a GRID K520 GPU
CNN
1. Shuffle dataset and split between train and test sets
2. Maintain grid structure of data
• Add a dimension to account for the single-channel images
3. Convert class vectors to binary class matrices
4. Define architecture
5. Define learning procedure
6. Fit model
7. Evaluate
CNN: Kernels
Like our image processing kernels, but we learn their weightings
• Instead of assuming Gaussian, we let the data determine the weights.
Example: 3 x 3
Input Kernel Output
3 2 1
1 2 3
1 1 1
-1 0 1
-2 0 2
-1 0 1
Kernel Math
Input Kernel Output
3 2 1
1 2 3
1 1 1
-1 0 1
-2 0 2
-1 0 1
= (3 * -1) + (2 * 0) + (1 * 1) + (1 * -2) … and so on.
Kernel Math
Input Kernel Output
3 2 1
1 2 3
1 1 1
-1 0 1
-2 0 2
-1 0 1
= (3 * -1) + (2 * 0) + (1 * 1) + (1 * -2) 1 ⋅ −2 + 2 ⋅ 0 + 3 ⋅ 2 + 1 ⋅ −1 + 1 ⋅ 0 + 1 ⋅ 1
= −3 + 1 − 2 + 6 − 1 + 1
= 2
2
Same Process, Larger Dataset
CNN: Pooling Layers
Reduce neighboring pixels.
Reduce dimensions of inputs (height and width).
No parameters!
CNN: Pooling Layers
CNN: Pooling Layers
CNN: Pooling Layers
(Average pool over whole layer)
LeNet*: Example CNN Architecture
Use convolutions to learn features on image data.
• Used on the MNIST* dataset
Input: 28 x 28, with two pixels of padding (on all sides)
Convolution size: 5 x 5
LeNet*
C1 layer depth: 6 S2 Pooling: 2 x 2
Convolution size: 5 x 5
C3 layer depth: 16
S4 Pooling: 2 x 2
Flatten from 5 x 5 x 16 to 400 x 1
Fully connected layer: from 400 to 120
Fully connected layer: from 120 to 84
Fully connected layer: from
84 to 10
Softmax
Table Description of LeNet*-5
Layer Name Parameters
1. Convolution 5 x 5, stride 1, padding 2 (‘SAME’)
2. Max pool 2 x 2, stride 2
3. Convolution 5 x 5, stride 1, padding 2 (‘SAME’)
4. Max pool 2 x 2, stride 2
5. Fully connected (ReLU) Depth: 120
6. Fully connected (ReLU) Depth: 84
7. Output (fully connected ReLU) Depth: 10
What’s the Point? Count Parameters
Conv1: 1*6*5*5 + 6 = 156
Pool2: 0
Conv3: 6*16*5*5 + 16 = 2416
Pool4: 0
FC1: 400*120 + 120 = 48120
FC2: 120*84 + 84 = 10164
FC3: 84*10 + 10 = 850
Total: = 61706
Less than a single FC layer with [1200 x 1200] weights!
What’s the Point? CNN Learns Features!
Layers replace manual image processing, transforming, and feature
extraction!
• For example, a slightly different architecture called AlexNet has a
layer that essentially performs Sobel filtering.
• Edge detection as a layer
• See:
• http://cs231n.github.io/assets/cnnvis/filt1.jpeg
Nodes
W
var
b
var
Add
MATMULInputs
Activation
Represents the activation
function 𝑎 = 𝑓(𝑧)
Nodes
W
var
b
var
Add
MATMULInputs
Activation
X: [m x 1] vector of inputs
W: [m x 1] vector of weights
Result of MATMUL is scalar
Bias is scalar
The add operation outputs z
z
The activation
function applies
a non-linear
transformation
and
passes it along
to the next layer
Batched Nodes
X: [n x m] matrix of inputs (batched)
W: [m x 1] vector of weights
W
var
b
var
Add
MATMULInputs
Activation
Result of MATMUL is vector
(one entry for each example)
Bias is scalar
(each prediction gets same bias added)
The add operation outputs z as a vector
one entry for each example
z
Activation
is a vector,
one entry
for each
input
example
Ad

More Related Content

What's hot (20)

Ffnn
FfnnFfnn
Ffnn
guestd60a613
 
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hakky St
 
Machine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksMachine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural Networks
Francesco Collova'
 
Foundations: Artificial Neural Networks
Foundations: Artificial Neural NetworksFoundations: Artificial Neural Networks
Foundations: Artificial Neural Networks
ananth
 
Neural net and back propagation
Neural net and back propagationNeural net and back propagation
Neural net and back propagation
Mohit Shrivastava
 
Using Deep Learning to Find Similar Dresses
Using Deep Learning to Find Similar DressesUsing Deep Learning to Find Similar Dresses
Using Deep Learning to Find Similar Dresses
HJ van Veen
 
Basic Learning Algorithms of ANN
Basic Learning Algorithms of ANNBasic Learning Algorithms of ANN
Basic Learning Algorithms of ANN
waseem khan
 
Generative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variantsGenerative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variants
ananth
 
Data Science - Part IX - Support Vector Machine
Data Science - Part IX -  Support Vector MachineData Science - Part IX -  Support Vector Machine
Data Science - Part IX - Support Vector Machine
Derek Kane
 
Overview of Convolutional Neural Networks
Overview of Convolutional Neural NetworksOverview of Convolutional Neural Networks
Overview of Convolutional Neural Networks
ananth
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
Databricks
 
Machine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demoMachine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demo
Hridyesh Bisht
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine Learning
Pranav Challa
 
Algorithms Design Patterns
Algorithms Design PatternsAlgorithms Design Patterns
Algorithms Design Patterns
Ashwin Shiv
 
Image classification using cnn
Image classification using cnnImage classification using cnn
Image classification using cnn
Debarko De
 
Adaline madaline
Adaline madalineAdaline madaline
Adaline madaline
Nagarajan
 
Machine Learning using Support Vector Machine
Machine Learning using Support Vector MachineMachine Learning using Support Vector Machine
Machine Learning using Support Vector Machine
Mohsin Ul Haq
 
Overview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language ProcessingOverview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language Processing
ananth
 
Machine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data DemystifiedMachine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data Demystified
Omid Vahdaty
 
Ppt shuai
Ppt shuaiPpt shuai
Ppt shuai
Xiang Zhang
 
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hakky St
 
Machine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksMachine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural Networks
Francesco Collova'
 
Foundations: Artificial Neural Networks
Foundations: Artificial Neural NetworksFoundations: Artificial Neural Networks
Foundations: Artificial Neural Networks
ananth
 
Neural net and back propagation
Neural net and back propagationNeural net and back propagation
Neural net and back propagation
Mohit Shrivastava
 
Using Deep Learning to Find Similar Dresses
Using Deep Learning to Find Similar DressesUsing Deep Learning to Find Similar Dresses
Using Deep Learning to Find Similar Dresses
HJ van Veen
 
Basic Learning Algorithms of ANN
Basic Learning Algorithms of ANNBasic Learning Algorithms of ANN
Basic Learning Algorithms of ANN
waseem khan
 
Generative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variantsGenerative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variants
ananth
 
Data Science - Part IX - Support Vector Machine
Data Science - Part IX -  Support Vector MachineData Science - Part IX -  Support Vector Machine
Data Science - Part IX - Support Vector Machine
Derek Kane
 
Overview of Convolutional Neural Networks
Overview of Convolutional Neural NetworksOverview of Convolutional Neural Networks
Overview of Convolutional Neural Networks
ananth
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
Databricks
 
Machine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demoMachine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demo
Hridyesh Bisht
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine Learning
Pranav Challa
 
Algorithms Design Patterns
Algorithms Design PatternsAlgorithms Design Patterns
Algorithms Design Patterns
Ashwin Shiv
 
Image classification using cnn
Image classification using cnnImage classification using cnn
Image classification using cnn
Debarko De
 
Adaline madaline
Adaline madalineAdaline madaline
Adaline madaline
Nagarajan
 
Machine Learning using Support Vector Machine
Machine Learning using Support Vector MachineMachine Learning using Support Vector Machine
Machine Learning using Support Vector Machine
Mohsin Ul Haq
 
Overview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language ProcessingOverview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language Processing
ananth
 
Machine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data DemystifiedMachine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data Demystified
Omid Vahdaty
 

Similar to 08 neural networks (20)

Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Simplilearn
 
Artificial Neural Networks presentations
Artificial Neural Networks presentationsArtificial Neural Networks presentations
Artificial Neural Networks presentations
migob991
 
Activation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural networkActivation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural network
Gayatri Khanvilkar
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”
Dr.(Mrs).Gethsiyal Augasta
 
Reason To Switch to DNNDNNs excel in handling huge volumes of data (e.g., ima...
Reason To Switch to DNNDNNs excel in handling huge volumes of data (e.g., ima...Reason To Switch to DNNDNNs excel in handling huge volumes of data (e.g., ima...
Reason To Switch to DNNDNNs excel in handling huge volumes of data (e.g., ima...
SrideviPcSenthilkuma
 
Comprehensive Guide to Neural Networks in Machine Learning and Deep Learning ...
Comprehensive Guide to Neural Networks in Machine Learning and Deep Learning ...Comprehensive Guide to Neural Networks in Machine Learning and Deep Learning ...
Comprehensive Guide to Neural Networks in Machine Learning and Deep Learning ...
RajeswariBsr1
 
Deep learning from scratch
Deep learning from scratch Deep learning from scratch
Deep learning from scratch
Eran Shlomo
 
Artificial neural networks
Artificial neural networksArtificial neural networks
Artificial neural networks
arjitkantgupta
 
Introduction to deep Learning Fundamentals
Introduction to deep Learning FundamentalsIntroduction to deep Learning Fundamentals
Introduction to deep Learning Fundamentals
VishalGour25
 
Introduction to deep Learning Fundamentals
Introduction to deep Learning FundamentalsIntroduction to deep Learning Fundamentals
Introduction to deep Learning Fundamentals
VishalGour25
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networks
Akash Goel
 
Backpropagation and computational graph.pptx
Backpropagation and computational graph.pptxBackpropagation and computational graph.pptx
Backpropagation and computational graph.pptx
tintu47
 
Techniques in Deep Learning
Techniques in Deep LearningTechniques in Deep Learning
Techniques in Deep Learning
Sourya Dey
 
Introduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner'sIntroduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner's
Vidyasagar Bhargava
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Universitat Politècnica de Catalunya
 
Lec 6-bp
Lec 6-bpLec 6-bp
Lec 6-bp
Taymoor Nazmy
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
Te-Yen Liu
 
Deep learning
Deep learningDeep learning
Deep learning
Kuppusamy P
 
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousryHands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Ahmed Yousry
 
ML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptx
DebabrataPain1
 
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Simplilearn
 
Artificial Neural Networks presentations
Artificial Neural Networks presentationsArtificial Neural Networks presentations
Artificial Neural Networks presentations
migob991
 
Activation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural networkActivation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural network
Gayatri Khanvilkar
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”
Dr.(Mrs).Gethsiyal Augasta
 
Reason To Switch to DNNDNNs excel in handling huge volumes of data (e.g., ima...
Reason To Switch to DNNDNNs excel in handling huge volumes of data (e.g., ima...Reason To Switch to DNNDNNs excel in handling huge volumes of data (e.g., ima...
Reason To Switch to DNNDNNs excel in handling huge volumes of data (e.g., ima...
SrideviPcSenthilkuma
 
Comprehensive Guide to Neural Networks in Machine Learning and Deep Learning ...
Comprehensive Guide to Neural Networks in Machine Learning and Deep Learning ...Comprehensive Guide to Neural Networks in Machine Learning and Deep Learning ...
Comprehensive Guide to Neural Networks in Machine Learning and Deep Learning ...
RajeswariBsr1
 
Deep learning from scratch
Deep learning from scratch Deep learning from scratch
Deep learning from scratch
Eran Shlomo
 
Artificial neural networks
Artificial neural networksArtificial neural networks
Artificial neural networks
arjitkantgupta
 
Introduction to deep Learning Fundamentals
Introduction to deep Learning FundamentalsIntroduction to deep Learning Fundamentals
Introduction to deep Learning Fundamentals
VishalGour25
 
Introduction to deep Learning Fundamentals
Introduction to deep Learning FundamentalsIntroduction to deep Learning Fundamentals
Introduction to deep Learning Fundamentals
VishalGour25
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networks
Akash Goel
 
Backpropagation and computational graph.pptx
Backpropagation and computational graph.pptxBackpropagation and computational graph.pptx
Backpropagation and computational graph.pptx
tintu47
 
Techniques in Deep Learning
Techniques in Deep LearningTechniques in Deep Learning
Techniques in Deep Learning
Sourya Dey
 
Introduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner'sIntroduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner's
Vidyasagar Bhargava
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Universitat Politècnica de Catalunya
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
Te-Yen Liu
 
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousryHands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Ahmed Yousry
 
ML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptx
DebabrataPain1
 
Ad

More from ankit_ppt (20)

06 image features
06 image features06 image features
06 image features
ankit_ppt
 
04 image transformations_ii
04 image transformations_ii04 image transformations_ii
04 image transformations_ii
ankit_ppt
 
03 image transformations_i
03 image transformations_i03 image transformations_i
03 image transformations_i
ankit_ppt
 
02 image processing
02 image processing02 image processing
02 image processing
ankit_ppt
 
01 foundations
01 foundations01 foundations
01 foundations
ankit_ppt
 
Word2 vec
Word2 vecWord2 vec
Word2 vec
ankit_ppt
 
Text similarity measures
Text similarity measuresText similarity measures
Text similarity measures
ankit_ppt
 
Text generation and_advanced_topics
Text generation and_advanced_topicsText generation and_advanced_topics
Text generation and_advanced_topics
ankit_ppt
 
Nlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniquesNlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniques
ankit_ppt
 
Matrix decomposition and_applications_to_nlp
Matrix decomposition and_applications_to_nlpMatrix decomposition and_applications_to_nlp
Matrix decomposition and_applications_to_nlp
ankit_ppt
 
Machine learning and_nlp
Machine learning and_nlpMachine learning and_nlp
Machine learning and_nlp
ankit_ppt
 
Latent dirichlet allocation_and_topic_modeling
Latent dirichlet allocation_and_topic_modelingLatent dirichlet allocation_and_topic_modeling
Latent dirichlet allocation_and_topic_modeling
ankit_ppt
 
Intro to nlp
Intro to nlpIntro to nlp
Intro to nlp
ankit_ppt
 
Ot regularization and_gradient_descent
Ot regularization and_gradient_descentOt regularization and_gradient_descent
Ot regularization and_gradient_descent
ankit_ppt
 
Ml8 boosting and-stacking
Ml8 boosting and-stackingMl8 boosting and-stacking
Ml8 boosting and-stacking
ankit_ppt
 
Ml7 bagging
Ml7 baggingMl7 bagging
Ml7 bagging
ankit_ppt
 
Ml6 decision trees
Ml6 decision treesMl6 decision trees
Ml6 decision trees
ankit_ppt
 
Ml5 svm and-kernels
Ml5 svm and-kernelsMl5 svm and-kernels
Ml5 svm and-kernels
ankit_ppt
 
Ml4 naive bayes
Ml4 naive bayesMl4 naive bayes
Ml4 naive bayes
ankit_ppt
 
Lesson 3 ai in the enterprise
Lesson 3   ai in the enterpriseLesson 3   ai in the enterprise
Lesson 3 ai in the enterprise
ankit_ppt
 
06 image features
06 image features06 image features
06 image features
ankit_ppt
 
04 image transformations_ii
04 image transformations_ii04 image transformations_ii
04 image transformations_ii
ankit_ppt
 
03 image transformations_i
03 image transformations_i03 image transformations_i
03 image transformations_i
ankit_ppt
 
02 image processing
02 image processing02 image processing
02 image processing
ankit_ppt
 
01 foundations
01 foundations01 foundations
01 foundations
ankit_ppt
 
Text similarity measures
Text similarity measuresText similarity measures
Text similarity measures
ankit_ppt
 
Text generation and_advanced_topics
Text generation and_advanced_topicsText generation and_advanced_topics
Text generation and_advanced_topics
ankit_ppt
 
Nlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniquesNlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniques
ankit_ppt
 
Matrix decomposition and_applications_to_nlp
Matrix decomposition and_applications_to_nlpMatrix decomposition and_applications_to_nlp
Matrix decomposition and_applications_to_nlp
ankit_ppt
 
Machine learning and_nlp
Machine learning and_nlpMachine learning and_nlp
Machine learning and_nlp
ankit_ppt
 
Latent dirichlet allocation_and_topic_modeling
Latent dirichlet allocation_and_topic_modelingLatent dirichlet allocation_and_topic_modeling
Latent dirichlet allocation_and_topic_modeling
ankit_ppt
 
Intro to nlp
Intro to nlpIntro to nlp
Intro to nlp
ankit_ppt
 
Ot regularization and_gradient_descent
Ot regularization and_gradient_descentOt regularization and_gradient_descent
Ot regularization and_gradient_descent
ankit_ppt
 
Ml8 boosting and-stacking
Ml8 boosting and-stackingMl8 boosting and-stacking
Ml8 boosting and-stacking
ankit_ppt
 
Ml6 decision trees
Ml6 decision treesMl6 decision trees
Ml6 decision trees
ankit_ppt
 
Ml5 svm and-kernels
Ml5 svm and-kernelsMl5 svm and-kernels
Ml5 svm and-kernels
ankit_ppt
 
Ml4 naive bayes
Ml4 naive bayesMl4 naive bayes
Ml4 naive bayes
ankit_ppt
 
Lesson 3 ai in the enterprise
Lesson 3   ai in the enterpriseLesson 3   ai in the enterprise
Lesson 3 ai in the enterprise
ankit_ppt
 
Ad

Recently uploaded (20)

Process Parameter Optimization for Minimizing Springback in Cold Drawing Proc...
Process Parameter Optimization for Minimizing Springback in Cold Drawing Proc...Process Parameter Optimization for Minimizing Springback in Cold Drawing Proc...
Process Parameter Optimization for Minimizing Springback in Cold Drawing Proc...
Journal of Soft Computing in Civil Engineering
 
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
inmishra17121973
 
new ppt artificial intelligence historyyy
new ppt artificial intelligence historyyynew ppt artificial intelligence historyyy
new ppt artificial intelligence historyyy
PianoPianist
 
QA/QC Manager (Quality management Expert)
QA/QC Manager (Quality management Expert)QA/QC Manager (Quality management Expert)
QA/QC Manager (Quality management Expert)
rccbatchplant
 
Raish Khanji GTU 8th sem Internship Report.pdf
Raish Khanji GTU 8th sem Internship Report.pdfRaish Khanji GTU 8th sem Internship Report.pdf
Raish Khanji GTU 8th sem Internship Report.pdf
RaishKhanji
 
International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)
samueljackson3773
 
Structural Response of Reinforced Self-Compacting Concrete Deep Beam Using Fi...
Structural Response of Reinforced Self-Compacting Concrete Deep Beam Using Fi...Structural Response of Reinforced Self-Compacting Concrete Deep Beam Using Fi...
Structural Response of Reinforced Self-Compacting Concrete Deep Beam Using Fi...
Journal of Soft Computing in Civil Engineering
 
The Gaussian Process Modeling Module in UQLab
The Gaussian Process Modeling Module in UQLabThe Gaussian Process Modeling Module in UQLab
The Gaussian Process Modeling Module in UQLab
Journal of Soft Computing in Civil Engineering
 
AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)
Vəhid Gəruslu
 
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITYADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ijscai
 
Reagent dosing (Bredel) presentation.pptx
Reagent dosing (Bredel) presentation.pptxReagent dosing (Bredel) presentation.pptx
Reagent dosing (Bredel) presentation.pptx
AlejandroOdio
 
Oil-gas_Unconventional oil and gass_reseviours.pdf
Oil-gas_Unconventional oil and gass_reseviours.pdfOil-gas_Unconventional oil and gass_reseviours.pdf
Oil-gas_Unconventional oil and gass_reseviours.pdf
M7md3li2
 
15th International Conference on Computer Science, Engineering and Applicatio...
15th International Conference on Computer Science, Engineering and Applicatio...15th International Conference on Computer Science, Engineering and Applicatio...
15th International Conference on Computer Science, Engineering and Applicatio...
IJCSES Journal
 
DT REPORT by Tech titan GROUP to introduce the subject design Thinking
DT REPORT by Tech titan GROUP to introduce the subject design ThinkingDT REPORT by Tech titan GROUP to introduce the subject design Thinking
DT REPORT by Tech titan GROUP to introduce the subject design Thinking
DhruvChotaliya2
 
Introduction to FLUID MECHANICS & KINEMATICS
Introduction to FLUID MECHANICS &  KINEMATICSIntroduction to FLUID MECHANICS &  KINEMATICS
Introduction to FLUID MECHANICS & KINEMATICS
narayanaswamygdas
 
introduction to machine learining for beginers
introduction to machine learining for beginersintroduction to machine learining for beginers
introduction to machine learining for beginers
JoydebSheet
 
Mathematical foundation machine learning.pdf
Mathematical foundation machine learning.pdfMathematical foundation machine learning.pdf
Mathematical foundation machine learning.pdf
TalhaShahid49
 
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E..."Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
Infopitaara
 
Machine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptxMachine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptx
rajeswari89780
 
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G..."Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
Infopitaara
 
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
inmishra17121973
 
new ppt artificial intelligence historyyy
new ppt artificial intelligence historyyynew ppt artificial intelligence historyyy
new ppt artificial intelligence historyyy
PianoPianist
 
QA/QC Manager (Quality management Expert)
QA/QC Manager (Quality management Expert)QA/QC Manager (Quality management Expert)
QA/QC Manager (Quality management Expert)
rccbatchplant
 
Raish Khanji GTU 8th sem Internship Report.pdf
Raish Khanji GTU 8th sem Internship Report.pdfRaish Khanji GTU 8th sem Internship Report.pdf
Raish Khanji GTU 8th sem Internship Report.pdf
RaishKhanji
 
International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)
samueljackson3773
 
AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)
Vəhid Gəruslu
 
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITYADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ijscai
 
Reagent dosing (Bredel) presentation.pptx
Reagent dosing (Bredel) presentation.pptxReagent dosing (Bredel) presentation.pptx
Reagent dosing (Bredel) presentation.pptx
AlejandroOdio
 
Oil-gas_Unconventional oil and gass_reseviours.pdf
Oil-gas_Unconventional oil and gass_reseviours.pdfOil-gas_Unconventional oil and gass_reseviours.pdf
Oil-gas_Unconventional oil and gass_reseviours.pdf
M7md3li2
 
15th International Conference on Computer Science, Engineering and Applicatio...
15th International Conference on Computer Science, Engineering and Applicatio...15th International Conference on Computer Science, Engineering and Applicatio...
15th International Conference on Computer Science, Engineering and Applicatio...
IJCSES Journal
 
DT REPORT by Tech titan GROUP to introduce the subject design Thinking
DT REPORT by Tech titan GROUP to introduce the subject design ThinkingDT REPORT by Tech titan GROUP to introduce the subject design Thinking
DT REPORT by Tech titan GROUP to introduce the subject design Thinking
DhruvChotaliya2
 
Introduction to FLUID MECHANICS & KINEMATICS
Introduction to FLUID MECHANICS &  KINEMATICSIntroduction to FLUID MECHANICS &  KINEMATICS
Introduction to FLUID MECHANICS & KINEMATICS
narayanaswamygdas
 
introduction to machine learining for beginers
introduction to machine learining for beginersintroduction to machine learining for beginers
introduction to machine learining for beginers
JoydebSheet
 
Mathematical foundation machine learning.pdf
Mathematical foundation machine learning.pdfMathematical foundation machine learning.pdf
Mathematical foundation machine learning.pdf
TalhaShahid49
 
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E..."Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
Infopitaara
 
Machine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptxMachine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptx
rajeswari89780
 
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G..."Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
Infopitaara
 

08 neural networks

  • 2. Legal Notices and Disclaimers This presentation is for informational purposes only. INTEL MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. Check with your system manufacturer or retailer or learn more at intel.com. This sample source code is released under the Intel Sample Source Code License Agreement. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Copyright © 2018, Intel Corporation. All rights reserved. 2
  • 4. Neural Networks A fancy, tunable way to get an f when given data and target. • That is, f(data)  tgt
  • 5. Neural Network Example: OR Logic A logic gate takes in two Boolean (true/false or 1/0) inputs. Returns either a 0 or 1, depending on its rule. The truth table for a logic gate shows the outputs for each combination of inputs.
  • 6. Truth Table For example, let's look at the truth table for an Or-gate:
  • 7. OR as a Neuron A neuron that uses the sigmoid activation function outputs a value between (0, 1). This naturally leads us to think about Boolean values. Imagine a neuron that takes in two inputs, x1x1 and x2x2, and a bias term:
  • 9. Nodes Nodes are the primitive elements. out = activation(f(in) + bias) 𝑧 = 𝑎(𝑏 + 𝑖=1 𝑚 𝑊𝑖 ⋅ 𝑥𝑖) = 𝑎(𝑊 𝑡 𝑥 + 𝑏) Z 𝑥1 𝑥2 +1 a out = a(Z)
  • 10. Classic Visualization of Neurons 𝑥1 𝑥2 +1 Inputs Bias neuron (constant 1) Activation function Weights are shown to be arrows in classical visualizations of NNs Z a out = a(Z)
  • 11. Classic Visualization of Neurons 𝑥1 𝑥2 +1 Inputs Bias neuron (constant 1) 𝑊1 𝑊2 𝑏 Z a out = a(Z)
  • 12. Training z is a dot-product between inputs and weights of the node. • sum-of-squares We initialize the weights with constants and/or random values. Learning is the process of finding good weights.
  • 13. Activation Function: Sigmoid Model inspired by biological neurons. Biological neurons either pass no signal, full signal, or something in between. Want a function that is like this and has an easy derivative.
  • 14. Activation Function: Sigmoid 𝜎 𝑧 = 1 1 + 𝑒−𝑧 • Value at 𝑧 ≪ 0? • Value at 𝑧 = 0? • Value at 𝑧 ≫ 0? ≈ 0 = 0.5 ≈ 1
  • 15. Activation Function: Sigmoid 0.5 1.0 1.0 -40 5 5 𝜎 𝑧 = 𝜎(−10) ≈ 0.0 ≈ 0.0 𝑥1 𝑥2 +1 a 0.0
  • 16. Activation Function: ReLU Many modern networks use rectified linear units (ReLU) 𝑅𝑒𝐿𝑈 𝑧 = 0, 𝑧 < 0 𝑧, 𝑧 ≥ 0 Value at 𝑧 ≪ 0? Value at 𝑧 = 0? Value at 𝑧 ≫ 0? = 0 = 0. = 𝑧 = max 0, 𝑧
  • 17. Activation Function: ReLU 𝑅𝑒𝐿𝑈 𝑧 = 0, 𝑧 < 0 𝑧, 𝑧 ≥ 0
  • 18. Layers and Networks Inputs don’t need to be limited to passing data into a single neuron. They can pass data to as many as we like 𝑥1 𝑥2 +1 a a
  • 19. Layers and Networks Typically, neurons are grouped into layers. Each neuron in the layer receives input from the same neurons. Weights are different for each neuron All neurons in this layer output to the same neurons in a subsequent layer.
  • 20. Layers and Networks: Input/Output Layers Input layer depends on: • Form of raw data • First level of our internal network architecture Output layer depends on: • Last layer of our internal network architecture • Type of prediction we want to make • Regression versus classification
  • 21. Layers and Networks: Input/Output Layers 𝑥1 𝑥2 +1 a a a2 a1a1 and a2 receive the same x1 value But having different weights mean a1 and a2 neurons respond differently.
  • 22. Feed Forward Neural Network Weights 𝑥1 𝑥2 𝑥3 a a a a a a a a 𝑦1 𝑦2 𝑦3
  • 23. Feed Forward Neural Network 𝑥1 𝑥2 𝑥3 a a a a a a a a 𝑦1 𝑦2 𝑦3 Input Layer Hidden Layers Output Layer
  • 25. Optimization and Loss: Gradient Descent We will start with the cost function: J(x) = x2 • Cost is what we pay for an error • For example, an error of -3 gives a cost of 9 Take the gradient of x2 = 2x. Select datapoints to generate a gradient slope line. Plot x2 with a given gradient slope and annotations. We want the lowest cost.
  • 26. Gradient Descent: Starting From Left Side
  • 27. Gradient Descent: Starting From Left Side
  • 28. Gradient Descent: Starting From Left Side
  • 29. Gradient Descent: Starting From Left Side
  • 30. Gradient Descent: Starting From Right Side
  • 31. Process of Gradient Descent: Math 1. Find the gradient with respect to weights over training data.  Plug data into our derivative function and sum up over data points ∆𝑊 = 𝑖=1 𝑛 𝜕𝐽 𝜕𝑊 𝑥𝑖, 𝑦𝑖 𝜕𝐽 𝜕𝑊 (𝑥𝑖, 𝑦𝑖) = 1 𝑛 𝑖=1 𝑛 𝑥𝑖 𝑦𝑖 − 𝑦𝑖 The number we’ll use to adjust the weight Derivative of MSE
  • 32. Process of Gradient Descent: Math 2. Adjust the weight by subtracting some amount of ∆𝑊. 𝛼 (alpha) is known as the learning rate A hyper-parameter we choose 3. Repeat until model is done training. We can also adjust the learning rate as we train 𝑊: = 𝑊 − 𝛼 ∙ ∆𝑊 Minus adjusts W in the correct direction
  • 33. J (cost) W 𝜕𝐽 𝜕𝑊 < 0 𝛼 ∙ ∆𝑊 Adjusting the Learning Rate
  • 34. J (cost) W 𝜕𝐽 𝜕𝑊 < 0 Adjusting the Learning Rate 𝛼 ∙ ∆𝑊 Bigger 𝛼
  • 35. J (cost) W 𝜕𝐽 𝜕𝑊 < 0 Adjusting the Learning Rate 𝛼 ∙ ∆𝑊 Smaller 𝛼
  • 36. Batches How much data do we use for one training step? • One training step takes us from old network weights to new network weights We could use ALL of the examples at one time. • Terrible performance -- if it is even possible • We'll constantly be swapping memory to slow disks We could use one example at a time. • But terrible performance • It doesn't take advantage of caching, vectorized operations, and so on • We want good data processing size for vectorized operations
  • 37. Batching How much data do we use for one training step? • One training step takes us from old network weights to new network weights Options • Full batch • Update weights after considering all data in batch • Mini-batch • Update weights after considering part of batch, repeat • Approximating the gradient • Can help with local minima
  • 38. Batching Options continued… • Stochastic gradient descent (SGD) • Mini batch with size 1 • Also called online training • Very sporadic, very easy to compute • With a big network, performance comes from many weights
  • 39. Comparing Full Batch, Mini Batch, and SGD Stochastic Mini batch Full batch Batch size1 N Use all of training data per step Use small portion of training data per step Use single example per step
  • 40. Epoch One epoch is one pass through the entire dataset. • Generally, the dataset is too big for system memory. • Can't do this all in one go General measure of the amount of training. • How many epochs did I perform?
  • 41. Shuffling Datasets for Epochs After each epoch, shuffle the training data. Prevents resampling in the exact same way. • Different epochs sample data in different ways. So… Shuffle, make batches, repeat.
  • 42. Splitting Data Up Into Batches Batch 5 Batch 4 Batch 3 Batch 2 Batch 1 FULL BATCH Step 1
  • 43. Splitting Data Up Into Batches Batch 5 Batch 4 Batch 3 Batch 2 Batch 1 Step 1 Step 2 Step 3 Step 4 Step 5 First Epoch Completed
  • 44. Shuffle Data Batch 2 Batch 5 Batch 4 Batch 3 Batch 2 Batch 1 Step 6
  • 46. Special Issues With Overfitting Very simple neural network architectures can approximate arbitrarily complex functions very well. • Consequence of universal representation theorem • Three layers, finite # nodes  arbitrarily good approximation • Although better approximations may require n  big Even simple neural networks are, in some sense, too powerful.
  • 47. Special Issues With Overfitting Many architectures easily overfit data. • Simply chugging through the data over-and-over leads to overfit. • Memorizes data but doesn't learn the generality. • Easily mislead by noise. Traditionally, we control this by monitoring the performance on a test set. • As long as it improves, we're good. • When it starts going the wrong way, we stop.
  • 48. Special Issues With Overfitting Modern method uses a technique called dropout: • Here we randomly have nodes disappear from the network. • Everyone else still has to perform. The overall network has to be more robust. • Single nodes can't be too important. • The nodes can't all be highly correlated with one another. • Different nodes must respond to different stimuli
  • 50. Knocking Out and Rescaling Neurons • During training, we randomly drop each neuron with probability 1 − 𝑝. • When running the model, we scale the outputs of the neuron by 𝑝. • This ensures that the expected value of the weights stays the same at run time.
  • 51. Concept of a “Pseudo-Ensemble”
  • 55. MLP: General Process 1. Shuffle the data and split between train and test sets 2. Flatten the data 3. Convert class vectors to binary class matrices 4. Generate network architecture 5. Display network architecture 6. Define learning procedure 7. Fit model 8. Evaluate
  • 56. MLP Trains a simple MLP with dropout on the MNIST* dataset. Gets to 98.40 percent test accuracy after 20 epochs. • There is a lot of margin for parameter tuning • 0.2 seconds per epoch on a K520 GPU
  • 57. Convolution Neural Networks (CNN) Good to use when you have: • Translational variance • Huge number of parameters We need to train models on translated data
  • 58. CNN: General Process Trains a simple convnet on the MNIST* dataset> Gets to 99.25 percent test accuracy after 12 epochs. • There is still a lot of margin for parameter tuning • 0.16 seconds per epoch on a GRID K520 GPU
  • 59. CNN 1. Shuffle dataset and split between train and test sets 2. Maintain grid structure of data • Add a dimension to account for the single-channel images 3. Convert class vectors to binary class matrices 4. Define architecture 5. Define learning procedure 6. Fit model 7. Evaluate
  • 60. CNN: Kernels Like our image processing kernels, but we learn their weightings • Instead of assuming Gaussian, we let the data determine the weights. Example: 3 x 3 Input Kernel Output 3 2 1 1 2 3 1 1 1 -1 0 1 -2 0 2 -1 0 1
  • 61. Kernel Math Input Kernel Output 3 2 1 1 2 3 1 1 1 -1 0 1 -2 0 2 -1 0 1 = (3 * -1) + (2 * 0) + (1 * 1) + (1 * -2) … and so on.
  • 62. Kernel Math Input Kernel Output 3 2 1 1 2 3 1 1 1 -1 0 1 -2 0 2 -1 0 1 = (3 * -1) + (2 * 0) + (1 * 1) + (1 * -2) 1 ⋅ −2 + 2 ⋅ 0 + 3 ⋅ 2 + 1 ⋅ −1 + 1 ⋅ 0 + 1 ⋅ 1 = −3 + 1 − 2 + 6 − 1 + 1 = 2 2
  • 64. CNN: Pooling Layers Reduce neighboring pixels. Reduce dimensions of inputs (height and width). No parameters!
  • 67. CNN: Pooling Layers (Average pool over whole layer)
  • 68. LeNet*: Example CNN Architecture Use convolutions to learn features on image data. • Used on the MNIST* dataset Input: 28 x 28, with two pixels of padding (on all sides) Convolution size: 5 x 5
  • 69. LeNet* C1 layer depth: 6 S2 Pooling: 2 x 2 Convolution size: 5 x 5 C3 layer depth: 16 S4 Pooling: 2 x 2 Flatten from 5 x 5 x 16 to 400 x 1 Fully connected layer: from 400 to 120 Fully connected layer: from 120 to 84 Fully connected layer: from 84 to 10 Softmax
  • 70. Table Description of LeNet*-5 Layer Name Parameters 1. Convolution 5 x 5, stride 1, padding 2 (‘SAME’) 2. Max pool 2 x 2, stride 2 3. Convolution 5 x 5, stride 1, padding 2 (‘SAME’) 4. Max pool 2 x 2, stride 2 5. Fully connected (ReLU) Depth: 120 6. Fully connected (ReLU) Depth: 84 7. Output (fully connected ReLU) Depth: 10
  • 71. What’s the Point? Count Parameters Conv1: 1*6*5*5 + 6 = 156 Pool2: 0 Conv3: 6*16*5*5 + 16 = 2416 Pool4: 0 FC1: 400*120 + 120 = 48120 FC2: 120*84 + 84 = 10164 FC3: 84*10 + 10 = 850 Total: = 61706 Less than a single FC layer with [1200 x 1200] weights!
  • 72. What’s the Point? CNN Learns Features! Layers replace manual image processing, transforming, and feature extraction! • For example, a slightly different architecture called AlexNet has a layer that essentially performs Sobel filtering. • Edge detection as a layer • See: • http://cs231n.github.io/assets/cnnvis/filt1.jpeg
  • 74. Nodes W var b var Add MATMULInputs Activation X: [m x 1] vector of inputs W: [m x 1] vector of weights Result of MATMUL is scalar Bias is scalar The add operation outputs z z The activation function applies a non-linear transformation and passes it along to the next layer
  • 75. Batched Nodes X: [n x m] matrix of inputs (batched) W: [m x 1] vector of weights W var b var Add MATMULInputs Activation Result of MATMUL is vector (one entry for each example) Bias is scalar (each prediction gets same bias added) The add operation outputs z as a vector one entry for each example z Activation is a vector, one entry for each input example

Editor's Notes

  • #21: A single neuronal layer.
  • #22: A single neuronal layer.
  • #49: [ref: tf/wk4 70-81]
  • #50: Note: The right network is during training; a final network is mathematically, “not computationally”, reconstructed from all of the partial networks.
  • #58: Note: Having many parameters (in the model) can occur because you have a large input space (like pixels in images) and/or the network architecture has many connections (fully connected layers, for example).
  • #61: ref tf/wk5: 8-15 + notebook/gif
  • #65: ref tf/wk5: 31-35
  • #66: ref tf/wk5: 31-35
  • #67: ref tf/wk5: 31-35
  • #68: ref tf/wk5: 31-35
  • #70: Note: In the paper, the model uses a more complex parameter-based pooling operation. Max/average pooling turns out to work better in practice.
  • #72: Two important points: Way fewer parameters (weights) and the convolution layers maintain 2D structure of the images.
  • #73: Two important points: Way fewer parameters (weights) and the convolution layers maintain 2D structure of the images.