Deep Learning With Tensorflow
Deep Learning With Tensorflow
NO
CAT
1%
DEEP
LEARNING
DOG YES
99%
Cat/Dog Classifier
YES
CAT
99%
DEEP
LEARNING
DOG NO
1%
Cat/Dog Classifier
NO
CAT
45%
DEEP
LEARNING
DOG YES
55%
Cat/Dog Classifier
YES
CAT
55%
DEEP
LEARNING
DOG
NO
45%
Cat/Dog Classifier
YES
CAT
75%
DEEP
LEARNING
DOG
NO
25%
Topic Ordering (Logistics)
• There are many components in Artificial Neural Networks
• They all come together to make something useful.
• If something is not clear, please ask!
• Hard to figure out the best ordering of topics.
Neuron (Inspiration)
Source: http://webspace.ship.edu/cgboer/neuron.gif
Number of Neurons
Animal Number of Neurons
Common Jellyfish 5,600
Ant 250,000
Frog 16,000,000
Cat 760,000,000
Humans 86,000,000,000
African Elephant 257,000,000,000
Source: https://en.wikipedia.org/wiki/List_of_animals_by_number_of_neurons
Current Network Architectures
• Blob Size = Number of Parameters (connections between neurons)
ResNet-152 has
167,552 Neurons
w3 Nucleus
Input
x3
Dendrites
Combining Neurons Into Layers
Backpropagation
(How the network learns)
• How the network learns useful weights
• Will not go into depth on how it works
• Don’t need to know it to use off-the-shelf components in
TensorFlow
• Do need to know if you want to implement custom layers
NO
YES
CAT
48%
64%
DEEP
LEARNING
YES
NO
DOG
52%
36%
% Trained
Training Cat/Dog Classifier
YES
CAT
71%
53%
DEEP
LEARNING
NO
DOG
29%
47%
% Trained
Training Cat/Dog Classifier
NO
CAT
26%
6%
DEEP
LEARNING
YES
DOG
74%
94%
% Trained
Training Cat/Dog Classifier
NO
CAT
8%
11%
DEEP
LEARNING
YES
DOG
92%
89%
% Trained
Training Cat/Dog Classifier
DO THIS TENS/HUNDREDS
OF THOUSANDS OF TIMES
GPUs! (Aside/Optional)
• Each neuron can be computed in parallel (independent)
• GPUs have hundreds (even thousands) of relatively weak cores
• Nvidia has a virtual monopoly (thanks to the CUDA Toolkit)
• Nvidia supplies AWS, Azure, etc.
• Google signed a deal with AMD
(but also uses Nvidia)
TensorFlow!
• Google’s publicly available Deep Learning library
• In competition with Caffe, Torch, etc
• Is becoming more and more popular in industry.
Source: https://ml4a.github.io/ml4a/looking_inside_neural_nets/
Define Our Model
• Each pixel will have 10 associated weights (1 for each layer)
• There are 784 pixels in each image
• Our weight matrix will have dimensions 784 by 10
0
1
2
3
4
5
6
7
8
9
Artificial Neuron (Simplified)
Some scalar Some scalar
values values
(initialized
randomly)
Input
x1
Some scalar Some scalar
w1 value value < 1
Some
Input Output
x2 w2 SUM Softmax Percentage.
E.g. Six
w784
Input 10x of these
x784
Input Variable
x = tf.placeholder(tf.float32, [None, 784])
• Creates a placeholder variable “x”
• “x” doesn’t have a specific value yet
• Its just a variable, like in math
• Placeholder for our input images
• It is of type “TensorFlow Float 32”
• It has shape “None” by 784
• None means the first dimension can have any length
• 784 is the size of one image
Artificial Neuron (Simplified)
Some scalar Some scalar
values values
(initialized
randomly)
Input
x1
Some scalar Some scalar
w1 value value < 1
Some
Input Output
x2 w2 SUM Softmax Percentage.
E.g. Six
w784
Input 10x of these
x784
Network Variables (Weights)
W = tf.Variable(tf.zeros([784,10]))
• Creates a variable W (for “weight”) of size 784 by 10
• All elements of W are set to 0
• Unlike “placeholder”, Variable contains determined values
Artificial Neuron (Simplified)
Some scalar Some scalar
values values
(initialized
randomly)
Input
x1
Some scalar Some scalar
w1 value value < 1
Some
Input Output
x2 w2 SUM Softmax Percentage.
E.g. Six
Bias weight
b (scalar)
w784
Input 10x of these
Literally the
x784 1 value one.
Network Variables (Biases)
b = tf.Variable(tf.zeros([10]))
• Creates a variable b (for “bias”) of size 10 (by 1)
• All elements of b are set to 0
• Unlike “placeholder”, Variable contains determined values
Artificial Neuron (Simplified)
Some scalar Some scalar
values values
(initialized
randomly)
Input
x1
Some scalar Some scalar
w1 value value < 1
Some
Input Output
x2 w2 SUM Softmax Percentage.
E.g. Six
Bias weight
b (scalar)
w784
Input 10x of these
Literally the
x784 1 value one.
Network Output Variables
y = tf.nn.softmax(tf.matmul(x, W) + b)
• tf.matmul(x, W) performs a matrix multiplication
between input variable “x” and weight variable W
• tf.matmul(x, W) + b add the bias variable
• tf.nn.softmax(tf.matmul(x, W) + b)
perform the softmax operation
• y will have dimension None by 10
Artificial Neuron (Simplified)
Some scalar Some scalar
values values
(initialized
randomly)
Input
x1
Some scalar Some scalar
w1 value value < 1
Some
Input Output
x2 w2 SUM Softmax Percentage.
E.g. Six
Bias weight
b (scalar)
w784
Input 10x of these
Literally the
x784 1 value one.
Ground Truth Output Variables
yTruth = tf.placeholder(tf.float32, [None, 10])
• Creates a placeholder variable “yTruth”
• “y” doesn’t have a specific value yet
• Its just a variable, like in math
• Placeholder for Ground Truth one-hot label outputs
• It is of type “TensorFlow Float 32”
• It has shape “None” by 10
• None means the first dimension can have any length
• 10 is the number of classes
Loss Variable
loss = tf.reduce_mean(-tf.reduce_sum(yTruth * tf.log(y),
reduction_indices=1))
• tf.log(y) turns values close to 1 to be close to 0, and values
close to 0 to be close to –infinity
• yTruth*tf.log(y) only keeps the value of the actual class
• -tf.reduce_sum(yTruth*tf.log(y),reduction_indices=1))
Sums along the class dimension (mostly 0’s), fixes the sign
• tf.reduce_mean( … ) averages the vector into a scalar
Loss Variable Example
Predict (y) -Sum across labels
Sum across labels
yTruth
Average
log(y)
* log(y) Ground Truth (yTruth)