0% found this document useful (0 votes)

7 views

Lecture14 - ML (FF, Autoenc, Dense Networks)

Uploaded by

1162407364

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Lecture14 - ML (FF, Autoenc, Dense Networks)

Uploaded by

1162407364

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Natural Language

Processing
Lecture 14:
Machine Learning: Feed-forward Neural Networks,
Autoencoders/embeddings, Dense networks

12 /7/2019

COMS W4705
Yassine Benajiba
Perceptron Expressiveness
• Simple perceptron learning algorithm, starts with an
arbitrary hyperplane and adjusts it using the training data.

• Step function is not differentiable, so no closed-form

solution.

• Perceptron produces a linear separator.

• Can only learn linearly separable patterns.

• Can represent boolean functions like and, or, not but not
the xor function.
The problem with xor
Multi-Layer Neural Networks

input layer hidden layer output layer

• Basic idea: represent any (non-linear) function as a composition of

soft-threshold functions. This is a form of non-linear regression.

• Lippmann 1987: Two hidden layers suffice to represent any arbitrary

region (provided enough neurons), even discontinuous functions!
Activation Functions
• One problem with perceptrons is that the threshold
function (step function) is undifferentiable.

• It is therefore unsuitable for gradient descent.

• One alternative is the sigmoid (logistic) function:

g(z) = 0 if z→-∞
g(z) = 1 if z→∞
Activation Functions
• Two other popular activation functions:
Output Representation
• Many NLP Problems are multi-class classification problems.

• Each output neuron represents one class. Predict the class

with the highest activation.

y0 0.9

y1 0.1

y2 0.7

y3 0.4
Softmax
• We often want the activation at the output layer to
represent probabilities.

• Normalize activation of each output unit by the sum of all

output activations (as in log-linear models).

z0 0.9

z1 0.1

z2 0.7 The network computes a probability

z3 0.4
Softmax
• We often want the activation at the output layer to
represent probabilities.

• Normalize activation of each output unit by the sum of all

output activations (as in log-linear models).

z0 0.35

z1 0.16

z2 0.28 The network computes a probability

z3 0.21
Learning in Multi-Layer
Neural Networks
• Network structure is fixed, but we want to train the weights. Assume
feed-forward neural networks: no connections that are loops.

• Backpropagation Algorithm:

• Given current weights, get network output and compute loss

function (assume multiple outputs / a vector of outputs).

• Can use gradient descent to update weights and minimize loss.

• Problem: We only know how to do this for the last layer!

• Idea: Propagate error backwards through the network.

Backpropagation
feed-forward computation of network outputs

x1 output vector
hw(x)
i hw(x)1 = a1
x2 k
input vector x
target vector y
hw(x)2 = a2
x3

Error function
x4 Etrain(w)

input layer hidden layer output layer

back propagation of error gradients

Negative Log-Likelihood
(also known as cross-entropy)

• Assume target output is a one-hot vector and c(y) is the

target class for target y.

• Compute the negative log-likehood for a single example

• Empirical error for the entire training data:

Stochastic Gradient Descent
(for a single unit)
• Goal: Learn parameters that minimize the empirical error.

Randomly initialize w
for a set number of iterations T:
shuffle training data
for j = 1...N:
for each wi (all weights in the network):

• is the learning rate.

• It often makes sense to compute the gradient over batches of examples,
instead of just one ("mini-batch").
Backpropgation
• Simplified multi-layer case (a single unit per layer):

x g g(x) f f(g(x)) Loss

w1 w2

• Stochastic Gradient Descent should perform the following

update:

• Problem: How do we compute the gradient for parameters w1

and w2?
Chain Rule of Calculus

• To compute gradients for hidden units, we need to apply the

chain rule of calculus:

The derivative of is
Backpropagation

x f f(x) g g(f(x)) Loss

w1 w2
Backpropagation
forward ... x f f(x) ... Loss
w

backward ... f ...

Assume we know

We want to compute to propagate it back.

and (for the weight update)

Backpropagation
forward ... x f f(x) ... Loss
w

backward ... f ...

to compute these
we have to know
the derivate of the
function f
Autoencoders
Embeddings
(Word level semantics)
Skip-Gram Model
• Input:
A single word in one-hot representation.

• Output: probability to see any single word as a context word.

0.02 a
0 d hidden
⋮
neurons 0.0 thought
0 Σ
0.04 cheese
eat 1 Σ
0 ⋮ 0.03 place
⋮ Σ
⋮

0 0.0 run
|V| neurons |V| neurons
softmax activation
• Softmax function normalizes the activation of the output neurons to sum up to 1.0.
Skip-Gram Model
• Compute error with respect to each context word.
wt-c place ...a place to eat delicious cheese .

⋮ (eat, place)
(eat, to)
wt-1 to (eat, delicious)
eat (eat,cheese)
wt+1 delicious
wt
⋮

wt+c cheese

• Combine errors for each word, then use combined error to update
weights using back-propagation.
Continuous Bag-of-Words
Model (CBOW)
wt-c

wt-1
wt

wt+1
SUM
⋮

wt+c

• Input: Context words. Averaged in the hidden layer.

• Output: Probability that each word is the target word.

Embeddings are Magic
(Mikolov 2016)

vector(‘king’) - vector(‘man’) + vector(‘woman’) ≈ vector(‘queen’)

Application: Word Pair
Relationships
Using Word Embeddings
• Word2Vec:

• https://code.google.com/archive/p/word2vec/

• GloVe: Global Vectors for Word Representation

• https://nlp.stanford.edu/projects/glove/

• Can either use pre-trained word embeddings or train them

on a large corpus.
Word embeddings
0.02 a
0 d hidden
⋮ neurons 0.0 thought
0 Σ
0.04 cheese
eat 1 Σ
0 ⋮ 0.03 place
⋮ Σ
⋮

0 0.0 run
|V| neurons |V| neurons
softmax activation
Word embeddings
Pros
- Groups semantically
similar words together
- A simple way to measure
similarity
- Great approach to better
deal with unseen words
in the training

Cons
- Doesn’t make a
difference between
function and content
words
- Only one representation How can we build a sentence
for polysemous words representation using word-level
- Non interpretable distributional representations?
semantic dimensions
Acknowledgments
• Some slides by Chris Kedzie

How To Build Your Own Neural Network From Scratch in
No ratings yet
How To Build Your Own Neural Network From Scratch in
6 pages
7 NN Apr 28 2021
No ratings yet
7 NN Apr 28 2021
81 pages
Neural Networks Handout
No ratings yet
Neural Networks Handout
7 pages
Lecture8 DeepLearning
No ratings yet
Lecture8 DeepLearning
94 pages
CC511 Week 5 - 6 - NN - BP
No ratings yet
CC511 Week 5 - 6 - NN - BP
62 pages
Foundations of Machine Learning: Module 6: Neural Network
No ratings yet
Foundations of Machine Learning: Module 6: Neural Network
68 pages
Artificial Intelligence - Chapter 7
No ratings yet
Artificial Intelligence - Chapter 7
18 pages
lect8_dnn (1)
No ratings yet
lect8_dnn (1)
33 pages
ML Unit-2
No ratings yet
ML Unit-2
141 pages
4.2 Ann
No ratings yet
4.2 Ann
26 pages
Notes_ML_02_Slides_RNN_ANN
No ratings yet
Notes_ML_02_Slides_RNN_ANN
105 pages
Unit - II ML
No ratings yet
Unit - II ML
9 pages
DS303_NN
No ratings yet
DS303_NN
20 pages
Main
No ratings yet
Main
25 pages
Lecture 10 Neural Network
No ratings yet
Lecture 10 Neural Network
34 pages
10 Multilayer Perceptrons
No ratings yet
10 Multilayer Perceptrons
54 pages
AN2DL_02_2324_Perceptron_2_FeedForward
No ratings yet
AN2DL_02_2324_Perceptron_2_FeedForward
55 pages
ANN MODULE 1 Part2
No ratings yet
ANN MODULE 1 Part2
58 pages
Ann4-3s.pdf 7oct PDF
No ratings yet
Ann4-3s.pdf 7oct PDF
21 pages
Lesson 3 Artificial Neural Network
No ratings yet
Lesson 3 Artificial Neural Network
77 pages
Deep Learning PDF
100% (1)
Deep Learning PDF
87 pages
Ece18898g Neural Networks
No ratings yet
Ece18898g Neural Networks
47 pages
Lecture 12 - Neural Networks (DONE!!) PDF
No ratings yet
Lecture 12 - Neural Networks (DONE!!) PDF
27 pages
Lecture+8
No ratings yet
Lecture+8
65 pages
2023-Lecture11-NeuralNetworks
No ratings yet
2023-Lecture11-NeuralNetworks
48 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
14 pages
Neural Network: Prof. Subodh Kumar Mohanty
No ratings yet
Neural Network: Prof. Subodh Kumar Mohanty
37 pages
Machine Learning
No ratings yet
Machine Learning
83 pages
neural (2)
No ratings yet
neural (2)
32 pages
Module I
No ratings yet
Module I
109 pages
Neural Network Presentation
No ratings yet
Neural Network Presentation
33 pages
2021 Lecture11 NeuralNetworks
No ratings yet
2021 Lecture11 NeuralNetworks
48 pages
A Beginner's Tutorial For CNN
100% (1)
A Beginner's Tutorial For CNN
35 pages
Neural Networks Backpropagation Algorithm: COMP4302/COMP5322, Lecture 4, 5
No ratings yet
Neural Networks Backpropagation Algorithm: COMP4302/COMP5322, Lecture 4, 5
11 pages
Notes Chapter8
No ratings yet
Notes Chapter8
4 pages
19_Learning
No ratings yet
19_Learning
31 pages
Session XX - Neural Network
No ratings yet
Session XX - Neural Network
43 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
14 pages
Neural Networks
No ratings yet
Neural Networks
52 pages
Bim309 Ai Week13
No ratings yet
Bim309 Ai Week13
53 pages
Foundations of Machine Learning: Module 6: Neural Network
No ratings yet
Foundations of Machine Learning: Module 6: Neural Network
19 pages
Understanding and Coding Neural Networks From Scratch in Python and R
No ratings yet
Understanding and Coding Neural Networks From Scratch in Python and R
12 pages
Lecture 10
No ratings yet
Lecture 10
155 pages
Lecture 14 - Neural Networks: Machine Learning March 18, 2010
No ratings yet
Lecture 14 - Neural Networks: Machine Learning March 18, 2010
50 pages
Multi Layer Perceptron Haykin
No ratings yet
Multi Layer Perceptron Haykin
50 pages
Neural networks unit-3
No ratings yet
Neural networks unit-3
14 pages
Neural Networks
No ratings yet
Neural Networks
29 pages
An Introduction To Neural Networks: Instituto Tecgraf PUC-Rio Nome: Fernanda Duarte Orientador: Marcelo Gattass
No ratings yet
An Introduction To Neural Networks: Instituto Tecgraf PUC-Rio Nome: Fernanda Duarte Orientador: Marcelo Gattass
45 pages
Module 3.Docxaiml
No ratings yet
Module 3.Docxaiml
20 pages
CS585 Lecture October15th
No ratings yet
CS585 Lecture October15th
162 pages
Artificial Neural Networks: HCMC University of Technology Sep. 2008
No ratings yet
Artificial Neural Networks: HCMC University of Technology Sep. 2008
71 pages
Large Scale Deep Learning
No ratings yet
Large Scale Deep Learning
170 pages
Artificial Neural Networks - MLP
No ratings yet
Artificial Neural Networks - MLP
52 pages
01 - Introduction To Deep Learning
No ratings yet
01 - Introduction To Deep Learning
56 pages
Machine Learning: Feed Forward Neural Networks Backpropagation Algorithm Cnns and Rnns
No ratings yet
Machine Learning: Feed Forward Neural Networks Backpropagation Algorithm Cnns and Rnns
127 pages
DL
No ratings yet
DL
73 pages
Artificial Neural Network: Lecture Module 22
No ratings yet
Artificial Neural Network: Lecture Module 22
54 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
15 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Chapter 5 RNN
No ratings yet
Chapter 5 RNN
38 pages
Unec 1700728516
No ratings yet
Unec 1700728516
105 pages
AI PPT 1
No ratings yet
AI PPT 1
9 pages
Lec01.301d-How To DL
No ratings yet
Lec01.301d-How To DL
33 pages
Linear Vector Quantization Neural Network
No ratings yet
Linear Vector Quantization Neural Network
3 pages
AML 03 Dense Neural Networks
No ratings yet
AML 03 Dense Neural Networks
20 pages
Soft Computing Notes
No ratings yet
Soft Computing Notes
127 pages
5 DAYS FACULTY DEVELOPMENT PROGRAM April 2024
No ratings yet
5 DAYS FACULTY DEVELOPMENT PROGRAM April 2024
3 pages
Object Detection with Deep Learning Models: Principles and Applications 1st Edition S. Poonkuntran pdf download
100% (1)
Object Detection with Deep Learning Models: Principles and Applications 1st Edition S. Poonkuntran pdf download
21 pages
Unit 5
No ratings yet
Unit 5
24 pages
Fundamentals of ML - Pre Quiz - Attempt Review
No ratings yet
Fundamentals of ML - Pre Quiz - Attempt Review
4 pages
Unit 4 - Week 3: Assignment 3
No ratings yet
Unit 4 - Week 3: Assignment 3
3 pages
Generative AI - More Than ChatGPT
100% (1)
Generative AI - More Than ChatGPT
33 pages
CNN RNN LSTM GRU Simple
100% (3)
CNN RNN LSTM GRU Simple
20 pages
Feed-Forward Neural Networks (Part 2: Learning)
No ratings yet
Feed-Forward Neural Networks (Part 2: Learning)
17 pages
AI Presentation
No ratings yet
AI Presentation
11 pages
ArchitectureDesign For DeepLearning
No ratings yet
ArchitectureDesign For DeepLearning
34 pages
Boltzmann Machine
No ratings yet
Boltzmann Machine
47 pages
Deep Learning 2017 Lecture5CNN
No ratings yet
Deep Learning 2017 Lecture5CNN
30 pages
CNN Building Blocks
No ratings yet
CNN Building Blocks
14 pages
LLM
No ratings yet
LLM
41 pages
CS-871-Lecture 1
No ratings yet
CS-871-Lecture 1
41 pages
Deep Learning Basics Lecture 6 Convolutional NN
No ratings yet
Deep Learning Basics Lecture 6 Convolutional NN
36 pages
An Efficient 3-Dimensional Node Localization Using Recurrent Neural Networks in Unmanned Aerial Vehicle-Assisted Wireless Networks
No ratings yet
An Efficient 3-Dimensional Node Localization Using Recurrent Neural Networks in Unmanned Aerial Vehicle-Assisted Wireless Networks
6 pages
Topic: Convolution Neural Network: Presented by
No ratings yet
Topic: Convolution Neural Network: Presented by
13 pages
Deep Learning - IIT Ropar - Unit 9 - Week 6
No ratings yet
Deep Learning - IIT Ropar - Unit 9 - Week 6
3 pages
Liu and Tuzel - 2016 - Coupled Generative Adversarial Networks
No ratings yet
Liu and Tuzel - 2016 - Coupled Generative Adversarial Networks
32 pages
Multiclass Text Classification On Unbalanced, Sparse and Noisy Data
No ratings yet
Multiclass Text Classification On Unbalanced, Sparse and Noisy Data
8 pages
Project Report
No ratings yet
Project Report
30 pages
Brochure CMU-DELE 03-05-2023 V12
No ratings yet
Brochure CMU-DELE 03-05-2023 V12
12 pages