0% found this document useful (0 votes)
62 views

Fundamentals of Deep Learning

Uploaded by

daniyadevacc
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views

Fundamentals of Deep Learning

Uploaded by

daniyadevacc
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 195

FUNDAMENTALS OF

DEEP LEARNING
Part 1: An Introduction to Deep Learning

1
To see lecture notes, make full screen
and click the “notes” button

2
WELCOME!
3
THE GOALS OF THIS COURSE

• Get you up and on your feet quickly


• Build a foundation to tackle a deep learning project right away
• We won’t cover the whole field, but we’ll get a great head start
• Foundation from which to read articles, follow tutorials, take further
classes
Part 1: An Introduction to Deep Learning

Part 2: How a Neural Network Trains

AGENDA Part 3: Convolutional Neural Networks

Part 4: Data Augmentation and Deployment

Part 5: Pre-trained Models

Part 6: Advanced Architectures


AGENDA – PART 1
• History of AI
• The Deep Learning Revolution

• What is Deep Learning


• How Deep Learning is Transforming the World

• Overview of the Course


• First Exercise
HAVE FUN!
HUMAN VS MACHINE LEARNING
Relaxed Alertness

Human Machine

Rest and Digest Training

Fight-or-flight Prediction
LET’S GET STARTED

9
HISTORY OF AI
10
BEGINNING OF ARTIFICIAL INTELLIGENCE

COMPUTERS ARE MADE IN EARLY ON, GENERALIZED TURNED OUT TO BE HARDER


PART TO COMPLETE HUMAN INTELLIGENCE LOOKED THAN EXPECTED
TASKS POSSIBLE

11
EARLY NEURAL NETWORKS

Inspired by biology

Created in the 1950’s

Outclassed by Von
Neumann Architecture

12
EXPERT SYSTEMS

Highly complex

Programmed by hundreds of engineers

Rigorous programming of many rules


EXPERT SYSTEMS - LIMITATIONS

What are these three images?

14
HOW DO CHILDREN LEARN?

• Expose them to lots of data


• Give them the “correct
answer”
• They will pick up the
important patterns on their
own

15
THE DEEP LEARNING
REVOLUTION
16
DATA

- Networks need a lot of


information to learn from
- The digital era and the
internet has supplied that
data

17
COMPUTING POWER
Need a way for our artificial “brain” to observe lots of data
within a practical amount of time.

18
THE IMPORTANCE OF THE GPU

A Rendered Image A Neural Network

19
WHAT IS DEEP LEARNING?
20
DEEP LEARNING FLIPS TRADITIONAL
PROGRAMMING ON ITS HEAD

21
TRADITIONAL PROGRAMMING
Building a Classifier

1 2 3
Define a set of Program those Feed it examples,
rules for rules into the and the program
classification computer uses the rules to
classify
MACHINE LEARNING
Building a Classifier

1 2 3
Show model the Model takes Model learns to
examples with the guesses, we tell it correctly
answer of how to if it’s right or not categorize as it’s
classify training. The
system learns the
rules on its own
THIS IS A FUNDAMENTAL SHIFT
WHEN TO CHOOSE DEEP LEARNING

Classic Programming Deep Learning

If rules are clear


If rules are
and
nuanced, complex,
straightforward,
difficult to discern,
often better to just
use deep learning
program it
25
DEEP LEARNING COMPARED TO OTHER AI

Depth and complexity of networks

Up to billions of parameters (and growing)

Many layers in a model

Important for learning complex rules


HOW DEEP LEARNING IS
TRANSFORMING THE WORLD
27
COMPUTER VISION

ROBOTICS AND OBJECT SELF DRIVING


MANUFACTURING DETECTION CARS

28
NATURAL LANGUAGE PROCESSING

REAL TIME VOICE VIRTUAL


TRANSLATION RECOGNITION ASSISTANTS

29
RECOMMENDER SYSTEMS

CONTENT TARGETED SHOPPING


CURATION ADVERTISING RECOMMENDATIONS

30
REINFORCEMENT LEARNING

ALPHAGO BEATS AI BOTS BEAT STOCK TRADING


WORLD CHAMPION PROFESSIONAL ROBOTS
IN GO VIDEOGAMERS
31
OVERVIEW OF THE
COURSE
32
HANDS ON EXERCISES

• Get comfortable with the


process of deep learning
• Exposure to different models
and datatypes
• Get a jump-start to tackle
your own projects

33
STRUCTURE OF THE COURSE
“Hello World” of Deep Learning

Train a more complicated model

New architectures and techniques to improve


performance

Pre-trained models

Transfer learning
34
PLATFORM OF THE COURSE

GPU powered cloud server

JupyterLab platform

Jupyter notebooks for interactive coding


SOFTWARE OF THE COURSE

• Major deep learning platforms:

• TensorFlow + Keras (Google)


• Pytorch (Facebook)

• MXNet (Apache)
• We’ll be using TensorFlow and Keras

• Good idea to gain exposure to others


moving forward

36
FIRST EXERCISE:
CLASSIFY HANDWRITTEN
DIGITS
37
HELLO NEURAL NETWORKS

Train a network to • Historically important and


correctly classify
handwritten digits difficult task for computers

• Get exposed to the example, and


Try learning like a
Neural Network try to figure out the rules to how
it works

38
LET’S GO!

39
40
FUNDAMENTALS OF
DEEP LEARNING
Part 2: How a Neural Network Trains

41
Part 1: An Introduction to Deep
Learning

Part 2: How a Neural Network Trains

AGENDA Part 3: Convolutional Neural Networks

Part 4: Data Augmentation and


Deployment

Part 5: Pre-trained Models

Part 6: Advanced Architectures

42
AGENDA – PART 2
• Recap

• A Simpler Model

• From Neuron to Network

• Activation Functions

• Overfitting

• From Neuron to Classification

43
RECAP OF THE EXERCISE
What just happened?

Loaded and visualized our data

Edited our data (reshaped, normalized, to categorical)

Created our model

Compiled our model

Trained the model on our data

44
DATA PREPARATION
Input as an array

28 [0,0,0,24,75,184,185,78,32,55,0,0,0…]

28

45
DATA PREPARATION
Targets as categories

0 [1,0,0,0,0,0,0,0,0,0]

1 [0,1,0,0,0,0,0,0,0,0]

2 [0,0,1,0,0,0,0,0,0,0]
3 [0,0,0,1,0,0,0,0,0,0]
.
.
. 46
AN UNTRAINED MODEL

[ 0, 0, …, 0] (784,)

… … … (512,)
Layer
Size
… … … (512,)

(10,)

47
A SIMPLER MODEL
48
A SIMPLER MODEL
𝑦 = 𝑚𝑥 + 𝑏
6
5 m•x 𝑚 =?
x y 4
3

y
1 3 2

2 5
1
0 𝑦! b=?
0 2 4
x

49
A SIMPLER MODEL
𝑦 = 𝑚𝑥 + 𝑏
6
5 m•x 𝑚 =?
x y 4
3

y
1 3 2

2 5
1
0 𝑦! b=?
0 2 4
x

50
A SIMPLER MODEL
𝑦 = 𝑚𝑥 + 𝑏
Start
6 Random
5 m•x
4
x y !
𝒚
3 𝑚 = −1
y
1 3 4 2

2 5 3
1
0 𝑦! b=5
0 2 4
x

51
A SIMPLER MODEL
𝑦 = 𝑚𝑥 + 𝑏
6
x y !
𝒚 𝒆𝒓𝒓!
5 %
𝑀𝑆𝐸 = ∑&#$%(𝑦# − 𝑦,# )'
1 3 4 1 4 &
3

y
2 5 3 4 2
&
1 1
MSE = 2.5 0 𝑅𝑀𝑆𝐸 = ((𝑦# − 𝑦,# )'
0 2 4 𝑛
#$%
RMSE = 1.6 x

52
A SIMPLER MODEL
𝑦 = 𝑚𝑥 + 𝑏
6
x y !
𝒚 𝒆𝒓𝒓!
5
1 3 4 1 4
3

y
2 5 3 4 2
1
MSE = 2.5 0
0 2 4
RMSE = 1.6 x

53
THE LOSS CURVE

Loss Surface 16

MSE

MSE

54
THE LOSS CURVE

6 16
5
4
3
y

Current
2
1 MSE
0
0 2 4 Target
x
𝑚 = −1
b=5 0

55
THE LOSS CURVE

6 16
5
4
3
y

Old
2 Current
1 MSE
0
0 2 4 Target
x
𝑚 = −1
b=4 0

56
THE LOSS CURVE

6 16
5
4
3
y

2 Current
1 MSE
0
0 2 4 Target
x
𝑚=0
b=4 0

57
THE LOSS CURVE

The 16
Gradient Which direction loss decreases
the most
λ: The
learning
rate How far to travel

Epoch A model update with the full MSE


dataset

Target
Batch
A sample of the full dataset

Step An update to the weight 0


parameters

58
THE LOSS CURVE

The 16
Gradient Which direction loss decreases
the most
λ: The
learning
rate How far to travel

Epoch A model update with the full MSE


dataset

Target
Batch
A sample of the full dataset

Step An update to the weight 0


parameters

59
OPTIMIZERS

Loss – Momentum Optimizer • Adam


• Adagrad
• RMSprop
• SGD

60
FROM NEURON TO
NETWORK
61
BUILDING A NETWORK

• Scales to more inputs

w1 w2

𝑦!

62
BUILDING A NETWORK

x1 x2
w2 w3
w1 w4
• Scales to more inputs
• Can chain neurons
w5 w6

𝑦!

63
BUILDING A NETWORK

x1 x2
w2 w3
w1 w4
• Scales to more inputs
• Can chain neurons
w5 w6
• If all regressions are
linear, then output will
𝑦! also be a linear
regression

64
ACTIVATION FUNCTIONS
65
ACTIVATION FUNCTIONS

Linear ReLU Sigmoid

𝑤𝑥 + 𝑏 𝑖𝑓 𝑤𝑥 + 𝑏 > 0 1
𝑦! = 𝑤𝑥 + 𝑏 𝑦! = '
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝑦! =
1 + 𝑒 !(#$%&)

10 10 1
5 5
0 0 0,5
-5 -5
-10 -10 0
-10 -5 0 5 10 -10 -5 0 5 10 -10 -5 0 5 10

66
ACTIVATION FUNCTIONS

Linear ReLU Sigmoid

67
ACTIVATION FUNCTIONS

x1 2 2
w1 w2
0 0

-2 -2
-10 -5,5 -1 3,5 8 -10 -5,5 -1 3,5 8
w3 w4
2

𝑦! -2
-10 -5,5 -1 3,5 8

68
OVERFITTING
69
OVERFITTING
Why not have a super large neural network?

70
OVERFITTING
Which Trendline is Better?

0,9 0,9

0,8 0,8

0,7 0,7

0,6 0,6

0,5 0,5

0,4 0,4

0,3 0,3

0,2 0,2

0,1
MSE = .0000 0,1
MSE = .0113

0 0
0 0,2 0,4 0,6 0,8 1 0 0,2 0,4 0,6 0,8 1

71
OVERFITTING
Which Trendline is Better?

0,9 0,9

0,8 0,8

0,7 0,7

0,6 0,6

0,5 0,5

0,4 0,4

0,3 0,3

0,2 0,2

0,1
MSE = .0308 0,1
MSE = .0062

0 0
0 0,2 0,4 0,6 0,8 1 0 0,2 0,4 0,6 0,8 1

72
TRAINING VS VALIDATION DATA
Avoid memorization

Training data MSE Per Epoch


60
• Core dataset for the model to learn on
50

Validation data 40

MSE
30
• New data for model to see if it truly 20
understands (can generalize)
10
Overfitting 0
1 2 3 4 5 6 7 8 9 10
• When model performs well on the training Epoch
data, but not the validation data (evidence of
memorization) Training MSE

• Ideally the accuracy and loss should be Validation MSE - Expected


similar between both datasets Validation MSE - Overfitting

73
FROM REGRESSION TO
CLASSIFICATION
74
AN MNIST MODEL

[ 0, 0, …, 0] (784,)

… … … (512,)
Layer
Size
… … … (512,)

(10,)

75
AN MNIST MODEL

[ 0, 0, …, 0] (784,)

ReLU … … … (512,)
Layer
Size
ReLU … … … (512,)

Sigmoid
(10,)

76
AN MNIST MODEL

[ 0, 0, …, 0] (784,)

ReLU … … … (512,)
Layer
Size
ReLU … … … (512,)

Softmax
(10,)

77
RMSE FOR PROBABILITIES?

0
0 1 2 3 4

78
RMSE FOR PROBABILITIES?

0
0 1 2 3 4

79
CROSS ENTROPY

4 Cross Entropy
3 Blue Point Prediction
6
2 5
4
1
3

Loss
0 2
0 1 2 3 4 1
0

0,00
0,01
0,15
0,30
0,45
0,60
0,75
0,90
1,00
-1
Assigned Probability
Loss if True Loss if False

80
CROSS ENTROPY

4 Cross Entropy
3 Blue Point Prediction
6
2 5
4
1
3

Loss
0 2
0 1 2 3 4 1
0

0,00
0,01
0,15
0,30
0,45
0,60
0,75
0,90
1,00
𝐿𝑜𝑠𝑠 = − (𝑡 𝑥 ) log(𝑝 𝑥 ) + 1 − 𝑡 𝑥 ) log 1 − 𝑝(𝑥) -1
Assigned Probability
𝑡 𝑥 = 𝑡𝑎𝑟𝑔𝑒𝑡 (0 𝑖𝑓 𝐹𝑎𝑙𝑠𝑒, 1 𝑖𝑓 𝑇𝑟𝑢𝑒) Loss if True Loss if False

𝑝 𝑥 = 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝑝𝑜𝑖𝑛𝑡 𝑥


81
CROSS ENTROPY

4 Cross Entropy
3 Blue Point Prediction
6
2 5
4
1
3

Loss
0 2
0 1 2 3 4 1
0

0,00
0,01
0,15
0,30
0,45
0,60
0,75
0,90
1,00
-1
Assigned Probability
Loss if True Loss if False

82
BRINGING IT TOGETHER
83
THE NEXT EXERCISE
The American Sign Language Alphabet

84
LET’S GO!
85
APPENDIX: GRADIENT
DESCENT
HELPING THE COMPUTER CHEAT CALCULUS

86
LEARNING FROM ERROR

% & %
𝑀𝑆𝐸 = ∑#$%(𝑦 , '
− 𝑦) = ∑&#$%(𝑦 − (𝑚𝑥 + 𝑏))'
& &

1
𝑀𝑆𝐸 = ((3 − (𝑚 1 + 𝑏))' + (5 − (𝑚 2 + 𝑏))')
2

𝜕𝑀𝑆𝐸 𝜕𝑀𝑆𝐸
= 5𝑚 + 3𝑏 − 13 = 3𝑚 + 2𝑏 − 8
𝜕𝑚 𝜕𝑏
𝑚 = −1
𝜕𝑀𝑆𝐸 𝜕𝑀𝑆𝐸 b=5
= −3 = −1
𝜕𝑚 𝜕𝑏
87
THE LOSS CURVE

Loss Surface 16

Current

Target

88
THE LOSS CURVE

16
𝜕𝑀𝑆𝐸 𝜕𝑀𝑆𝐸
= −7 = −3
𝜕𝑚 𝜕𝑏

Target

89
THE LOSS CURVE

16
𝜕𝑀𝑆𝐸 𝜕𝑀𝑆𝐸
= −7 = −3
𝜕𝑚 𝜕𝑏

𝜕𝑀𝑆𝐸
m ∶= m − λ
𝜕𝑚 Target

𝜕𝑀𝑆𝐸
b ≔𝑏 −λ 0
𝜕𝑏

90
THE LOSS CURVE

16
𝜕𝑀𝑆𝐸 𝜕𝑀𝑆𝐸
= −7 = −3
𝜕𝑚 𝜕𝑏

𝜕𝑀𝑆𝐸
m ∶= m − λ
𝜕𝑚 Target
λ = .6
𝜕𝑀𝑆𝐸
b ≔𝑏 −λ 0
𝜕𝑏

91
THE LOSS CURVE

16
𝜕𝑀𝑆𝐸 𝜕𝑀𝑆𝐸
= −7 = −3
𝜕𝑚 𝜕𝑏

𝜕𝑀𝑆𝐸
m ∶= m − λ
𝜕𝑚 Target
λ = .005
𝜕𝑀𝑆𝐸
b ≔𝑏 −λ 0
𝜕𝑏

92
THE LOSS CURVE

16

λ = .1

m ≔ −1 + 7 λ = −0.3
Target
b ≔ 5 + 3 λ = 4.7

93
94
FUNDAMENTALS OF
DEEP LEARNING
Part 3: Convolutional Neural Networks

95
Part 1: An Introduction to Deep
Learning

Part 2: How a Neural Network Trains

AGENDA Part 3: Convolutional Neural Networks

Part 4: Data Augmentation and


Deployment

Part 5: Pre-trained Models

Part 6: Advanced Architectures

96
AGENDA – PART 3
• Kernels and Convolution

• Kernels and Neural Networks

• Other Layers in the Model

97
RECAP OF THE EXERCISE

Trained a dense neural network model

Training accuracy was high

Validation accuracy was low

Evidence of overfitting

98
KERNELS AND
CONVOLUTION
99
KERNELS AND CONVOLUTION

Blur Sharpen

Original Image

Brighten Darken

100
KERNELS AND CONVOLUTION

.06 .13 .06 0 -1 0


.13 .25 .13 -1 5 -1
.06 .13 .06 0 -1 0
Blur Sharpen

0 0 0 0 0 0
0 1.5 0 0 0.5 0
Original Image
0 0 0 0 0 0
Brighten Darken

101
KERNELS AND CONVOLUTION

Blur Kernel Original Image Convolved Image

1 0 1 1 0 1

0 1 0 0 1 0
.06 .13 .06
0 1 1 1 1 0
.13 .25 .13 ∗ 0 1 1 1 1 0
=
.06 .13 .06
1 0 1 1 0 1

1 1 0 0 1 1

102
KERNELS AND CONVOLUTION

Blur Kernel Original Image Convolved Image

1 0 1 1 0 1

0 1 0 0 1 0
.06 .13 .06
0 1 1 1 1 0
.13 .25 .13 ∗ 0 1 1 1 1 0
=
.06 .13 .06
1 0 1 1 0 1

1 1 0 0 1 1

103
KERNELS AND CONVOLUTION

Blur Kernel Original Image Convolved Image

u ltiply 1
.06 0 1
.06 1 0 1
M
00 .25
1 0 0 1 0
.06 .13 .06
00 .13
1 .06
1 1 1 0
.13 .25 .13 ∗ 0 1 1 1 1 0
=
.06 .13 .06
1 0 1 1 0 1

1 1 0 0 1 1

104
KERNELS AND CONVOLUTION

Blur Kernel Original Image Convolved Image

1
.06 0 1
.06 1 0 1
.56
00 .25
1 0 0 1 0
.06 .13 .06
00 .13
1 .06
1 1 1 0
.13 .25 .13 ∗ 0 1 1 1
Total
1 0
=
.06 .13 .06
1 0 1 1 0 1

1 1 0 0 1 1

105
KERNELS AND CONVOLUTION

Blur Kernel Original Image Convolved Image

1 00 1
.13 1
.06 0 1
.56 .57
0 .13
1 0 0 1 0
.06 .13 .06
0 .06
1 .13
1 .06
1 1 0
.13 .25 .13 ∗ 0 1 1 1 1 0
=
.06 .13 .06
1 0 1 1 0 1

1 1 0 0 1 1

106
KERNELS AND CONVOLUTION

Blur Kernel Original Image Convolved Image

1 0 1 1 0 1
.56 .57 .57 .56
0 1 0 0 1 0
.7 .82 .82 .7
.06 .13 .06
0 1 1 1 1 0
.13 .25 .13 ∗ 0 1 1 1 1 0
= .69 .95 .95 .69
.06 .13 .06 .64 .69 .69 .64
1 0 1 1 0 1

1 1 0 0 1 1

107
STRIDE

1 0 1 1 0 1

Stride 1 0 1 0 0 1 0 .56 .57 .57 .56

0 1 1 1 1 0

1 0 1 1 0 1

Stride 2 0 1 0 0 1 0 .56 .57

0 1 1 1 1 0

1 0 1 1 0 1

Stride 3 0 1 0 0 1 0 .56 .56

0 1 1 1 1 0
108
PADDING
Original Image Zero Padding

0 0 0 0 0 0 0 0

1 0 1 1 0 1 0 1 0 1 1 0 1 0

0 1 0 0 1 0 0 0 1 0 0 1 0 0

0 1 1 1 1 0 0 0 1 1 1 1 0 0

0 1 1 1 1 0 0 0 1 1 1 1 0 0

1 0 1 1 0 1 0 1 0 1 1 0 1 0

1 1 0 0 1 1 0 1 1 0 0 1 1 0

0 0 0 0 0 0 0 0
109
PADDING
Original Image Mirror Padding

1 1 0 1 1 0 1 1

1 0 1 1 0 1 1 1 0 1 1 0 1 1

0 1 0 0 1 0 0 0 1 0 0 1 0 0

0 1 1 1 1 0 0 0 1 1 1 1 0 0

0 1 1 1 1 0 0 0 1 1 1 1 0 0

1 0 1 1 0 1 1 1 0 1 1 0 1 1

1 1 0 0 1 1 1 1 1 0 0 1 1 1

1 1 1 0 0 1 1 1
110
KERNELS AND NEURAL
NETWORKS
111
KERNELS AND NEURAL NETWORKS

Kernel

w1 w2 w3
w4 w5 w6
w7 w8 w9

112
KERNELS AND NEURAL NETWORKS

Kernel Neuron

x1 w w x2
w1 w2 w3 w1
2 3
w4
w4 w5 w6
w5 w6
w7 w8 w9
𝑦!

113
KERNELS AND NEURAL NETWORKS



(28, 28, 2) (28, 28, 2)
Stacked Images Stacked Images

(3, 3, 1, 2) (3, 3, 2, 2) (512) (512)


Kernels Kernels Dense Dense
(28, 28, 1) (1568)
Image Input Flattened Image (24)
Vector Output Prediction

114
FINDING EDGES
Vertical Edges Original Image Horizontal Edges

1 0 -1 0 0 0 1 2 1
2 0 -2 0 1 0 0 0 0
1 0 -1 0 0 0 -1 -2 -1 115
Input

Convolution

Edges
Textures Convolution

Convolution

Convolution
Objects

Dense
NEURAL NETWORK PERCEPTION

Dense

Output
116
NEURAL NETWORK PERCEPTION

117
OTHER LAYERS IN THE
MODEL
118
MAX POOLING

110 256 153 67

12 89 88 43 256 153

10 15 50 55 23 55

23 9 49 23

119
DROPOUT

rate = 0 rate = .2 rate = .4

120
Input

Convolution

Max Pooling

Convolution

Dropout

Max Pooling

Convolution

Max Pooling
WHOLE ARCHITECTURE

Dense

Dense

Output
121
LET’S GO!
122
123
FUNDAMENTALS OF
DEEP LEARNING
Part 4: Data Augmentation and Deployment

124
Part 1: An Introduction to Deep
Learning

Part 2: How a Neural Network Trains

AGENDA Part 3: Convolutional Neural Networks

Part 4: Data Augmentation and


Deployment

Part 5: Pre-trained Models

Part 6: Advanced Architectures

125
AGENDA – PART 4
• Data Augmentation

• Model Deployment

126
RECAP OF THE EXERCISE

Analysis Solution

• CNN increased validation • Clean data provides better


accuracy examples
• Still seeing training accuracy • Dataset variety helps the model
higher than validation generalize

128
DATA AUGMENTATION
129
DATA AUGMENTATION

130
IMAGE FLIPPING
Horizontal Flip

Vertical Flip

131
ROTATION

90⁰

3 1
45

5 ⁰

0⁰
180⁰

⁰ 5
5⁰

2
270⁰
31

2
132
ZOOMING

133
WIDTH AND HEIGHT
SHIFTING

134
HOMOGRAPHY

135
BRIGHTNESS

136
CHANNEL
SHIFTING

137
MODEL DEPLOYMENT
138
MODEL DEPLOYMENT



(28, 28, 2) (28, 28, 2)
Stacked Images Stacked Images

(3, 3, 1, 2) (3, 3, 2, 2) (512) (512)


Kernels Kernels Dense Dense
(28, 28,1) (1568)
Image Input Flattened Image (24)
Vector Output Prediction

139
MODEL DEPLOYMENT

Training
Batch Input

Convolution

Max Pooling


140
MODEL DEPLOYMENT

(220, 155, 3) (220, 155, 1)


(1, 220, 155, 1)
(287, 433, 3)

Resize Greyscale “Batch”


141
LET’S TRY IT OUT!
142
143
FUNDAMENTALS OF
DEEP LEARNING
Part 5: Pre-trained Models

144
Part 1: An Introduction to Deep
Learning

Part 2: How a Neural Network Trains

AGENDA Part 3: Convolutional Neural Networks

Part 4: Data Augmentation and


Deployment

Part 5: Pre-trained Models

Part 6: Advanced Architectures

145
AGENDA – PART 5
• Review so far

• Pre-trained Models

• Transfer Learning

146
REVIEW SO FAR
147
REVIEW SO FAR

• Learning Rate
• Number of Layers
• Neurons per Layer
• Activation Functions
• Dropout
• Data

148
PRE-TRAINED MODELS
149
PRE-TRAINED MODELS

PYTORCH
HUB
150
PRE-TRAINED MODELS

IM GENET
151
THE NEXT CHALLENGE
An Automated Doggy Door

152
TRANSFER LEARNING
153
THE CHALLENGE AFTER
An Automated Presidential Doggy Door

154
TRANSFER LEARNING

155
TRANSFER LEARNING



(28, 28, 2) (28, 28, 2)
Stacked Images Stacked Images

(3, 3, 1, 2) (3, 3, 2, 2) (512) (512)


Kernels Kernels Dense Dense
(28, 28,1) (1568)
Image Input Flattened Image (10)
Vector Output Prediction

156
Input

More
Generalized
Convolution

Max Pooling

Convolution

Dropout

Max Pooling

Convolution
TRANSFER LEARNING

Max Pooling

Dense

Dense

Output
More

157
Specialized
TRANSFER LEARNING
Freezing the Model?

158
TRANSFER LEARNING

159
LET’S GET STARTED!
160
161
FUNDAMENTALS OF
DEEP LEARNING
Part 6: Advanced Architectures

162
Part 1: An Introduction to Deep
Learning

Part 2: How a Neural Network Trains

AGENDA Part 3: Convolutional Neural Networks

Part 4: Data Augmentation and


Deployment

Part 5: Pre-trained Models

Part 6: Advanced Architectures

163
AGENDA – PART 6
• Moving Forward

• Natural Language Processing

• Recurrent Neural Networks

• Other Architectures

• Closing Thoughts

164
MOVING FORWARD
165
FIELDS OF AI

Computer Vision
•Optometry

Natural Language Processing


•Linguistics

Reinforcement Learning
•Game Theory
•Psychology

Anomaly Detection
•Security
•Medicine

166
FIELDS OF AI

Computer Vision
•Optometry

Natural Language Processing


•Linguistics

Reinforcement Learning
•Game Theory
•Psychology

Anomaly Detection
•Security
•Medicine

167
FIELDS OF AI

Computer Vision
•Optometry

Natural Language Processing


•Linguistics

Reinforcement Learning
•Game Theory
•Psychology

Anomaly Detection
•Security
•Medicine

168
NATURAL LANGUAGE
PROCESSING
169
FROM WORDS TO NUMBERS

Dictionary
“A dog barked at a cat.”
1. A 8. Cat
2. An 9. Cats
[1, 10, 7, 4, 1, 8] 3. And 10. Dog
4. At 11. Dogs
5. Ate 12. Eat
6. Bark
7. Barked

170
FROM WORDS TO NUMBERS

Inputs Outputs

A A
An An Dictionary
And And
At At
1. A 8. Cat
Ate Ate
2. An 9. Cats
Bark Bark
Barked Barked
3. And 10. Dog
Cat Cat
4. At 11. Dogs
Cats Cats
5. Ate 12. Eat
Dog Dog
6. Bark
Dogs Dogs
7. Barked
Eat Eat

171
FROM WORDS TO NUMBERS

Inputs Outputs

0 0%
0 0% Dictionary
0 10%
0 5%
1. A 8. Cat
0 35%
2. An 9. Cats
0 0%
0 50%
3. And 10. Dog
0 0%
4. At 11. Dogs
0 0%
5. Ate 12. Eat
1 0%
6. Bark
0 0%
7. Barked
0 0%

172
FROM WORDS TO NUMBERS

Big
Giraffe
(.9, .9)

Llama
Bigger Dictionary
(-.9, .1)
1. A 31. Ate 61. Cats
2. An 32. Bark 62. Dog
3. And 33. Barked 63. Dogs
4. At 34. Cat 64. Eat
5. Ate 35. Cats 65. Eaten
6. Bark 36. Dog 66. A
7. Barked 37. Dogs 67. An
8. Cat 38. Eat 68. And
Domestic Wild 9.
10.
Cats
Dog
39.
40.
Eaten
A
69.
70.
At
Ate
11. Dogs 41. An 71. Bark
12. Eat 42. And 72. Barked
13. Eaten 43. At 73. Cat
14. A 44. Ate 74. Cats
15. An 45. Bark 75. Dog
16. And 46. Barked 76. Dogs

Falcon 17.
18.
At
Ate
47.
48.
Cat
Cats
77.
78.
Eat
Eaten
19. Bark 49. Dog 79. …
(.15, -.4) 20.
21.
Barked
Cat
50.
51.
Dogs
Eat
80.
81.


22. Cats 52. Eaten 82. …

Puffin
23. Dog 53. A

Kitty 24.
25.
Dogs
Eat
54.
55.
An
And

(-.75, -.8) (.85, -.65) 26.


27.
Eaten
A
56.
57.
At
Ate

Small 28.
29.
An
And
58.
59.
Bark
Barked
30. At 60. Cat

173
FROM WORDS TO NUMBERS

Inputs Technically Outputs


an
A Embedding A
An An Dictionary
And And
At At
1. A 8. Cat
Ate Ate
2. An 9. Cats
Bark Bark
Barked Barked
3. And 10. Dog
Cat Cat
4. At 11. Dogs
Cats Cats
5. Ate 12. Eat
Dog Dog
6. Bark
Dogs Dogs
7. Barked
Eat Eat

174
RECURRENT NEURAL
NETWORKS
175
RECURRENT NEURAL NETWORKS

“Cats say ___.”


“Dogs say ___.” Dictionary

1. Cats
2. Dogs
3. Meow
4. Say
5. Woof

176
RECURRENT NEURAL NETWORKS

“Cats say ___.”

Cats “Dogs say ___.” Dictionary


Dogs
Meow
Cats 1. Cats
Dogs
Say 2. Dogs
RNN Woof 3. Meow
Meow 4. Say
Say Outputs 5. Woof
Embedding
Woof

Inputs
177
RECURRENT NEURAL NETWORKS

0
“Cats say ___.”
0
0% “Dogs say ___.” Dictionary
0
0%
50%
1 1. Cats
50% 2. Dogs
0
RNN 0% 3. Meow
0 4. Say
0 Outputs 5. Woof
Embedding
0

Inputs
178
RECURRENT NEURAL NETWORKS

0
“Cats say ___.”
0
0% “Dogs say ___.” Dictionary
0 .1
0%
-.5
50%
1 1. Cats
.6
50% 2. Dogs
0
RNN 0% 3. Meow
0 4. Say
0 Outputs 5. Woof
Embedding
0

Inputs
179
RECURRENT NEURAL NETWORKS

0
“Cats say ___.”
0
“Dogs say ___.” Dictionary
0 .1

-.5
1 1. Cats
.6
2. Dogs
0
RNN 3. Meow
0 4. Say
0 Outputs 5. Woof
Embedding
0

Inputs
180
RECURRENT NEURAL NETWORKS

.1
“Cats say ___.”
-.5
“Dogs say ___.” Dictionary
.6

0 1. Cats
2. Dogs
0
RNN 3. Meow
0 4. Say
1 Outputs 5. Woof
Embedding
0

Inputs
181
RECURRENT NEURAL NETWORKS

.1
“Cats say ___.”
-.5
0% “Dogs say ___.” Dictionary
.6
-.3
0%
.2
100% 1. Cats
0
.5
0% 2. Dogs
0
RNN 0%
3. Meow
0 4. Say
1 Outputs 5. Woof
Embedding
0

Inputs
182
RECURRENT NEURAL NETWORKS

Input Input

RNN LSTM

Output Output

183
OTHER ARCHITECTURES
184
AUTOENCODERS

Inputs Outputs

185
AUTOENCODERS

Inputs Outputs

186
AUTOENCODERS

-.3 -.3

.6 .6

Encoder Decoder
187
GENERATIVE ADVERSARIAL NETWORKS (GANS)

Prediction

Real Discriminator
Images
Real

Generator
Fake

Noise Fake
Images

188
REINFORCEMENT LEARNING

Agent Environment

189
NEXT STEPS
190
ENABLING PORTABILITY WITH NGC CONTAINERS
Extensive
NGC Deep Learning Containers
- Diverse range of workloads and industry specific use cases
Optimized
- DL containers updated monthly
- Packed with latest features and superior performance
Secure & Reliable
- Scanned for vulnerabilities and crypto
- Tested on workstations, servers, & cloud instances
Scalable
- Supports multi-GPU & multi-node systems
Designed for Enterprise & HPC
- Supports Docker, Singularity & other runtimes
Run Anywhere
- Bare metal, VMs, Kubernetes
- x86, ARM, POWER
- Multi-cloud, on-prem, hybrid, edge Riva

Learn more about NGC Containers 191


NEXT STEPS FOR THIS CLASS

Step 1 Sign up for NGC


https://docs.nvidia.com/dgx/ng
c-registry-for-dgx-user-
guide/index.html
Step 2 Visit NGC Catalog
https://catalog.ngc.nvidia.com/
orgs/nvidia/containers/dli-dl-
fundamentals
Step 3 Pull and Run Container
Visit localhost:8888 to check out
a JupyterLab environment with
a Next Steps Project

192
CLOSING THOUGHTS
193
COPYING ROCKET SCIENCE

194
LET’S GET STARTED!
195
196

You might also like