Artificial Neural Networks
An Introduction
Why study ANNs?
• To understand how the brain actually works
• To understand a type of parallel computation
• IBM’s “The Brain Chip” (1 million neurons and 256 synapses)
• To solve practical problems
• Artificial Neural Networks should be good for things brains are good at
and bad at things brains are bad at
(Vision, speech recognition)
(eg: 32 * 71 = ???)
Artificial Neural Network (draft)
What neurons look like
Our model will be simplified:
 Synapses -> Weighted Inputs
 Soma -> An activation function
 Axon -> Outputs
Neuron – an electrically excitable cell that
transmits information
 Dendrites receive signals from many
other neurons
 These signals can be either excitatory
or inhibitory
 Soma (cell body) processes this
information
 Above a certain threshold, an
electrical signal is fired down the axon.
A Feed Forward Neural Net Weighted
connections
What can NNs do?
• Image recognition
 MNIST handwritten digits
 Read reCAPTCHA better than humans do
• Speech recognition and NLP
• Answer the meaning of life
Using recurrent neural net to predict the
next character
• In 2011, Ilya Sutskever used 5 millions strings of 100 characters, taken
from Wikipedia.
• Training took one month on a GPU
• Once trained, the neural net will predict the next character in a
sequence of characters
• He fed it the phrase “The meaning of life is” _______________
Ilya Sutskever, 2011
A Brief History of ANNs
• 1943 – McCulluogh and Pitts NN models can represent any Boolean
• 1949 – Donald Webb describes how learning might take place: “cells that
fire together, wire together.”
• 1959 – Rosenblatt’s perceptron can learn linearly separable data
• 1969 – Minsky & Papert criticize the perceptron
• 1970-1986 – The dark ages of neural networks (No funding)
• 1986 – Hinton, LeCun et al. describe the backpropagation algorithm for training
neural networks of arbitrary depth ( Paul Werbos, 1974)
• 1997 – A.K. Dewdney – "Although neural nets do solve a few toy problems, their powers of
computation are so limited that I am surprised anyone takes them seriously as a general problem-
solving tool.“
• Other techniques (Random Forests[1995] and Support Vector Machines[1995]) are considered
state of the art ML for classification problems
• 2006 – Second Renaissance of neural networks with new methods for training deep and recurrent
NNs.
1930 1940 1950 1960 1970 1980 1990 2000 2010 2020
Ln(#ofPublications)
ANN Scholarly Publications Per Year
(Ln Normalized)
1986
1969
Artificial Neural Network (draft)
Artificial Neural Network (draft)
“the embryo of an electronic computer that [the
Navy] expects will be able to walk, talk, see, write,
reproduce itself and be conscious of its existence."
1958
Frank Rosenblatt’s Perceptron
The Perceptron
Weight space
Consider all the different sets of weights that will
output the correct value for a 2-D input vector.
Here, threshold = 0
Input vector with
output value = 1
Good
weight
Bad
weight
NAND example
Input Data
0 0
1 0
0 1
1 1
1
1
1
0
One of many possible solutions:
NAND Decision Boundary
One possible solution:
NAND Decision Boundary
0 0
1 0
0 1
1 1
1
1
1
0
One possible solution:
Training Data
0 0
1 0
0 1
1 1
0
1
1
0
A single perceptron can
only solve linearly
separable problems
XOR Problem
Training Data
0 0
1 0
0 1
1 1
0
1
1
0
Multiple layers of
perceptrons solve the
XOR problem, but
Rosenblatt did not have
an learning algorithm to
set the weights
XOR Problem
Training Data
0 0
1 0
0 1
1 1
0
1
1
0
Multiple layers of
perceptrons solve the
XOR problem, but
Rosenblatt did not have
an learning algorithm to
set the weights
XOR Problem
Training Data
0 0
1 0
0 1
1 1
0
1
1
0
XOR Problem
1 0
1 1
1 1
0 1
2 weight
planes
Training Data
0 0
1 0
0 1
1 1
0
1
1
0
XOR Problem
1 0
1 1
1 1
0 1
1 weight
plane
Sigmoid (Logistic Function)
• The sigmoid function is similar to the binary threshold function, but it
is continuous
“Squashes” – outputs a value between 0 and 1
• It’s derivative has a nice property – it is computationally inexpensive
Sigmoid (Logistic function)
Sigmoid Neurons
Sigmoid Neurons • We can “bake in” the bias by augmenting with
an element, , that we set to a constant value
(say, 1) for every sample.
• now represents the bias value.
• With the bias “baked in,” the model has a simpler
notation and will be more computationally efficient
Forward propagation
Artificial Neural Network (draft)
Artificial Neural Network (draft)
Artificial Neural Network (draft)
Artificial Neural Network (draft)
}
Matrix notation
is easier to
read, and used
in production
code
How to train a feed forward net?
• There are several cost functions (cross entropy, classification error,
squared error).
• To measure the error in this sample we will use the squared error
• Major Difficulty
We know what the output target is, but nobody is telling us directly what the
hidden units should be
How to train a feed forward net?
• Try: randomly perturb one weight and see if it improves performance
• But this is very, very slow
Backpropagation, 1986*
• The “backward propagation of errors” after forward propagation
• Here is the cost for a single training sample
• If we calculate the error derivatives w.r.t. each weight, we can update
the weights with gradient descent.
Backpropagating errors
Step 0. Feed Forward
Network
Backpropagating errors
Step 1. Backpropagate the
error derivative to each
node
Backpropagating errors
Step 1. Backpropagate the
error derivative to each
node
Step 2. Use the node deltas
to compute the incoming
weight derivatives
Backpropagation error derivatives
Feed Forward
Back Propagate
Linear
Output
Neuron
Artificial Neural Network (draft)
Artificial Neural Network (draft)
Artificial Neural Network (draft)
Back Propagation can be used to train a neural net with which of
the following activation functions?
Logistic (sigmoid)
Linear
Binary threshold neurons (Perceptron)
Hyperbolic Tangent ( )

Review Gradient Descent
Artificial Neural Network (draft)
Selecting Hyper-Parameters
Generally, we use trial and error (with cross-validation) to select
hyperparameters
What learning rate?
Momentum?
How many layers?
How many nodes / layer?
Regularization coefficient?
Activation function(s)?
Regularization
• Without some form of regularization, large ANNs are prone to over
fitting
• ANNs can approximate any function; they can fit the noise in the
training data set
• One traditional solution is L2 regularization. We modify our error
function by including for every weight in the matrix.
• L2 regularization drives the weights towards 0
• As the weights approach zero, the sigmoid function becomes more
linear
• Recently, new forms of regularization have improved ANNs learning
New regularizers
• Force neurons to share weights
Learning Curves – Overfitting - regularization
NN libraries
• Theano (python)
• PyLearn2 (python)
• Torch (Lua)
• Deep Learning Toolbox (MATLAB)
• Numenta (python)
• Nnet (R)
Do walk through with ConvnetJS
Google Trends “Neural Network” searches
Google Trends “Random Forests” searches
Google Trends “Deep Learning” searches
Addressing ANN’s weaknesses: Averaging many
models
• Unlike random forests (which average many decision trees),
creating many neural network models has not been feasible
• Averaging models is important because it prevents over fitting
• NN Dropout (2012) provides a way to average many models,
without having to train them separately
Provide motivation for Deep Learning
Artificial Neural Network (draft)
Artificial Neural Network (draft)
Artificial Neural Network (draft)
Artificial Neural Network (draft)

More Related Content

PPTX
Activation functions
PPTX
Clock-8086 bus cycle
PPTX
Neural network
PPTX
Software Engineering Diversity
PPTX
I/O Management
PPT
Chapter 2 - Network Models
PPTX
Multi Processors And Multi Computers
PPTX
Presentation on cyclic redundancy check (crc)
Activation functions
Clock-8086 bus cycle
Neural network
Software Engineering Diversity
I/O Management
Chapter 2 - Network Models
Multi Processors And Multi Computers
Presentation on cyclic redundancy check (crc)

What's hot (20)

PPTX
Multilayer Perceptron Neural Network MLP
PPTX
First and follow set
PDF
Artificial Neural Networks Lect3: Neural Network Learning rules
PDF
Target language in compiler design
PDF
Neural Networks: Self-Organizing Maps (SOM)
PPTX
SCHEDULING ALGORITHMS
PPT
Perceptron algorithm
PPTX
Shuffle exchange networks
PPTX
Constraint satisfaction problems (csp)
PPTX
HML: Historical View and Trends of Deep Learning
PDF
Artificial Neural Networks Lect5: Multi-Layer Perceptron & Backpropagation
DOCX
Control Units : Microprogrammed and Hardwired:control unit
PPT
Support Vector Machines
PPTX
Multilayer perceptron
PPTX
Knowledge representation In Artificial Intelligence
PPT
Computer Organization and Architecture.
PPTX
Semaphore
PPT
Parallel processing
PPTX
Automata Theory - Turing machine
PDF
Computer organisation -morris mano
Multilayer Perceptron Neural Network MLP
First and follow set
Artificial Neural Networks Lect3: Neural Network Learning rules
Target language in compiler design
Neural Networks: Self-Organizing Maps (SOM)
SCHEDULING ALGORITHMS
Perceptron algorithm
Shuffle exchange networks
Constraint satisfaction problems (csp)
HML: Historical View and Trends of Deep Learning
Artificial Neural Networks Lect5: Multi-Layer Perceptron & Backpropagation
Control Units : Microprogrammed and Hardwired:control unit
Support Vector Machines
Multilayer perceptron
Knowledge representation In Artificial Intelligence
Computer Organization and Architecture.
Semaphore
Parallel processing
Automata Theory - Turing machine
Computer organisation -morris mano
Ad

Viewers also liked (20)

PDF
Artificial neural networks
PPTX
Artificial neural network
PPT
Artificial neural network model & hidden layers in multilayer artificial neur...
PPTX
neural network
PDF
Data Science - Part VIII - Artifical Neural Network
PPTX
Artificial neural network
PPTX
Artificial Neural Network
PPTX
Artificial Neural Networks for NIU
PPTX
Artificial neural networks (2)
PDF
(Artificial) Neural Network
PPT
Artificial neural networks
DOCX
PDF
Artificial Neural Network Implementation on FPGA – a Modular Approach
PDF
Artificial Neural Network Abstract
PPT
Ann by rutul mehta
PPTX
what is neural network....???
PDF
Artificial Neural Networks Lect1: Introduction & neural computation
PDF
Use of artificial neural network in pattern recognition
PPSX
Perceptron (neural network)
PPTX
14. mohsin dalvi artificial neural networks presentation
Artificial neural networks
Artificial neural network
Artificial neural network model & hidden layers in multilayer artificial neur...
neural network
Data Science - Part VIII - Artifical Neural Network
Artificial neural network
Artificial Neural Network
Artificial Neural Networks for NIU
Artificial neural networks (2)
(Artificial) Neural Network
Artificial neural networks
Artificial Neural Network Implementation on FPGA – a Modular Approach
Artificial Neural Network Abstract
Ann by rutul mehta
what is neural network....???
Artificial Neural Networks Lect1: Introduction & neural computation
Use of artificial neural network in pattern recognition
Perceptron (neural network)
14. mohsin dalvi artificial neural networks presentation
Ad

Similar to Artificial Neural Network (draft) (20)

PPT
Neural-Networks.ppt
PPTX
ML_Unit_2_Part_A
PPT
Artificial-Neural-Networks.ppt
PPT
Artificial Neural Network
PDF
Fuzzy Logic Final Report
PPTX
lecture13-NN-basics.pptx
PPTX
employed to cover the tampering traces of a tampered image. Image tampering
PPT
2011 0480.neural-networks
PDF
Introduction to Neural Networks
PPTX
Artificial neural networks
PPTX
Artificial Neural Networks ppt.pptx for final sem cse
PDF
Neural Networks
PPTX
Neural net and back propagation
PPTX
Artificial Neural Networks presentations
PPT
19_Learning.ppt
PPT
ann-ics320Part4.ppt
PPT
ann-ics320Part4.ppt
PPS
Neural Networks
PPTX
UNIT III (8).pptx
PPTX
Artificial Neural Network by Dr.C.R.Dhivyaa Kongu Engineering College
Neural-Networks.ppt
ML_Unit_2_Part_A
Artificial-Neural-Networks.ppt
Artificial Neural Network
Fuzzy Logic Final Report
lecture13-NN-basics.pptx
employed to cover the tampering traces of a tampered image. Image tampering
2011 0480.neural-networks
Introduction to Neural Networks
Artificial neural networks
Artificial Neural Networks ppt.pptx for final sem cse
Neural Networks
Neural net and back propagation
Artificial Neural Networks presentations
19_Learning.ppt
ann-ics320Part4.ppt
ann-ics320Part4.ppt
Neural Networks
UNIT III (8).pptx
Artificial Neural Network by Dr.C.R.Dhivyaa Kongu Engineering College

Recently uploaded (20)

PPTX
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
PPTX
TORCH INFECTIONS in pregnancy with toxoplasma
PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PPTX
SCIENCE 4 Q2W5 PPT.pptx Lesson About Plnts and animals and their habitat
PPTX
limit test definition and all limit tests
PPTX
GREEN FIELDS SCHOOL PPT ON HOLIDAY HOMEWORK
PPT
LEC Synthetic Biology and its application.ppt
PPTX
Understanding the Circulatory System……..
PDF
Packaging materials of fruits and vegetables
PDF
BET Eukaryotic signal Transduction BET Eukaryotic signal Transduction.pdf
PPT
THE CELL THEORY AND ITS FUNDAMENTALS AND USE
PPTX
Seminar Hypertension and Kidney diseases.pptx
PDF
Wound infection.pdfWound infection.pdf123
PDF
Communicating Health Policies to Diverse Populations (www.kiu.ac.ug)
PDF
GROUP 2 ORIGINAL PPT. pdf Hhfiwhwifhww0ojuwoadwsfjofjwsofjw
PPT
Enhancing Laboratory Quality Through ISO 15189 Compliance
PPT
Mutation in dna of bacteria and repairss
PPTX
Hypertension_Training_materials_English_2024[1] (1).pptx
PPTX
Presentation1 INTRODUCTION TO ENZYMES.pptx
PPTX
A powerpoint on colorectal cancer with brief background
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
TORCH INFECTIONS in pregnancy with toxoplasma
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
SCIENCE 4 Q2W5 PPT.pptx Lesson About Plnts and animals and their habitat
limit test definition and all limit tests
GREEN FIELDS SCHOOL PPT ON HOLIDAY HOMEWORK
LEC Synthetic Biology and its application.ppt
Understanding the Circulatory System……..
Packaging materials of fruits and vegetables
BET Eukaryotic signal Transduction BET Eukaryotic signal Transduction.pdf
THE CELL THEORY AND ITS FUNDAMENTALS AND USE
Seminar Hypertension and Kidney diseases.pptx
Wound infection.pdfWound infection.pdf123
Communicating Health Policies to Diverse Populations (www.kiu.ac.ug)
GROUP 2 ORIGINAL PPT. pdf Hhfiwhwifhww0ojuwoadwsfjofjwsofjw
Enhancing Laboratory Quality Through ISO 15189 Compliance
Mutation in dna of bacteria and repairss
Hypertension_Training_materials_English_2024[1] (1).pptx
Presentation1 INTRODUCTION TO ENZYMES.pptx
A powerpoint on colorectal cancer with brief background

Artificial Neural Network (draft)

  • 2. Why study ANNs? • To understand how the brain actually works • To understand a type of parallel computation • IBM’s “The Brain Chip” (1 million neurons and 256 synapses) • To solve practical problems • Artificial Neural Networks should be good for things brains are good at and bad at things brains are bad at (Vision, speech recognition) (eg: 32 * 71 = ???)
  • 4. What neurons look like Our model will be simplified:  Synapses -> Weighted Inputs  Soma -> An activation function  Axon -> Outputs Neuron – an electrically excitable cell that transmits information  Dendrites receive signals from many other neurons  These signals can be either excitatory or inhibitory  Soma (cell body) processes this information  Above a certain threshold, an electrical signal is fired down the axon.
  • 5. A Feed Forward Neural Net Weighted connections
  • 6. What can NNs do? • Image recognition  MNIST handwritten digits  Read reCAPTCHA better than humans do • Speech recognition and NLP • Answer the meaning of life
  • 7. Using recurrent neural net to predict the next character • In 2011, Ilya Sutskever used 5 millions strings of 100 characters, taken from Wikipedia. • Training took one month on a GPU • Once trained, the neural net will predict the next character in a sequence of characters • He fed it the phrase “The meaning of life is” _______________
  • 9. A Brief History of ANNs • 1943 – McCulluogh and Pitts NN models can represent any Boolean • 1949 – Donald Webb describes how learning might take place: “cells that fire together, wire together.” • 1959 – Rosenblatt’s perceptron can learn linearly separable data • 1969 – Minsky & Papert criticize the perceptron • 1970-1986 – The dark ages of neural networks (No funding) • 1986 – Hinton, LeCun et al. describe the backpropagation algorithm for training neural networks of arbitrary depth ( Paul Werbos, 1974) • 1997 – A.K. Dewdney – "Although neural nets do solve a few toy problems, their powers of computation are so limited that I am surprised anyone takes them seriously as a general problem- solving tool.“ • Other techniques (Random Forests[1995] and Support Vector Machines[1995]) are considered state of the art ML for classification problems • 2006 – Second Renaissance of neural networks with new methods for training deep and recurrent NNs.
  • 10. 1930 1940 1950 1960 1970 1980 1990 2000 2010 2020 Ln(#ofPublications) ANN Scholarly Publications Per Year (Ln Normalized) 1986 1969
  • 13. “the embryo of an electronic computer that [the Navy] expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence." 1958 Frank Rosenblatt’s Perceptron
  • 15. Weight space Consider all the different sets of weights that will output the correct value for a 2-D input vector. Here, threshold = 0 Input vector with output value = 1 Good weight Bad weight
  • 16. NAND example Input Data 0 0 1 0 0 1 1 1 1 1 1 0 One of many possible solutions:
  • 17. NAND Decision Boundary One possible solution:
  • 18. NAND Decision Boundary 0 0 1 0 0 1 1 1 1 1 1 0 One possible solution:
  • 19. Training Data 0 0 1 0 0 1 1 1 0 1 1 0 A single perceptron can only solve linearly separable problems XOR Problem
  • 20. Training Data 0 0 1 0 0 1 1 1 0 1 1 0 Multiple layers of perceptrons solve the XOR problem, but Rosenblatt did not have an learning algorithm to set the weights XOR Problem
  • 21. Training Data 0 0 1 0 0 1 1 1 0 1 1 0 Multiple layers of perceptrons solve the XOR problem, but Rosenblatt did not have an learning algorithm to set the weights XOR Problem
  • 22. Training Data 0 0 1 0 0 1 1 1 0 1 1 0 XOR Problem 1 0 1 1 1 1 0 1 2 weight planes
  • 23. Training Data 0 0 1 0 0 1 1 1 0 1 1 0 XOR Problem 1 0 1 1 1 1 0 1 1 weight plane
  • 24. Sigmoid (Logistic Function) • The sigmoid function is similar to the binary threshold function, but it is continuous “Squashes” – outputs a value between 0 and 1 • It’s derivative has a nice property – it is computationally inexpensive
  • 27. Sigmoid Neurons • We can “bake in” the bias by augmenting with an element, , that we set to a constant value (say, 1) for every sample. • now represents the bias value. • With the bias “baked in,” the model has a simpler notation and will be more computationally efficient
  • 33. } Matrix notation is easier to read, and used in production code
  • 34. How to train a feed forward net? • There are several cost functions (cross entropy, classification error, squared error). • To measure the error in this sample we will use the squared error • Major Difficulty We know what the output target is, but nobody is telling us directly what the hidden units should be
  • 35. How to train a feed forward net? • Try: randomly perturb one weight and see if it improves performance • But this is very, very slow
  • 36. Backpropagation, 1986* • The “backward propagation of errors” after forward propagation • Here is the cost for a single training sample • If we calculate the error derivatives w.r.t. each weight, we can update the weights with gradient descent.
  • 37. Backpropagating errors Step 0. Feed Forward Network
  • 38. Backpropagating errors Step 1. Backpropagate the error derivative to each node
  • 39. Backpropagating errors Step 1. Backpropagate the error derivative to each node Step 2. Use the node deltas to compute the incoming weight derivatives
  • 40. Backpropagation error derivatives Feed Forward Back Propagate Linear Output Neuron
  • 44. Back Propagation can be used to train a neural net with which of the following activation functions? Logistic (sigmoid) Linear Binary threshold neurons (Perceptron) Hyperbolic Tangent ( ) 
  • 47. Selecting Hyper-Parameters Generally, we use trial and error (with cross-validation) to select hyperparameters What learning rate? Momentum? How many layers? How many nodes / layer? Regularization coefficient? Activation function(s)?
  • 48. Regularization • Without some form of regularization, large ANNs are prone to over fitting • ANNs can approximate any function; they can fit the noise in the training data set • One traditional solution is L2 regularization. We modify our error function by including for every weight in the matrix. • L2 regularization drives the weights towards 0 • As the weights approach zero, the sigmoid function becomes more linear • Recently, new forms of regularization have improved ANNs learning
  • 49. New regularizers • Force neurons to share weights
  • 50. Learning Curves – Overfitting - regularization
  • 51. NN libraries • Theano (python) • PyLearn2 (python) • Torch (Lua) • Deep Learning Toolbox (MATLAB) • Numenta (python) • Nnet (R)
  • 52. Do walk through with ConvnetJS
  • 53. Google Trends “Neural Network” searches
  • 54. Google Trends “Random Forests” searches
  • 55. Google Trends “Deep Learning” searches
  • 56. Addressing ANN’s weaknesses: Averaging many models • Unlike random forests (which average many decision trees), creating many neural network models has not been feasible • Averaging models is important because it prevents over fitting • NN Dropout (2012) provides a way to average many models, without having to train them separately
  • 57. Provide motivation for Deep Learning