SlideShare a Scribd company logo
NLP & Deep
Learning for
non-experts
Sanghamitra Deb
Staff Data Scientist
Chegg Inc
How to start projects in machine learning?
• Kaggle competitions ---
• Make sure to solve the ML problems for concept development
before competing
How to start projects in machine learning?
• Kaggle competitions ---
• Make sure to solve the ML
problems for concept
development before
competing
How to start projects in machine learning?
• Self guided workshops/projects ---
lets say you have data from Zomato
• Restaurant recommendation --
user based, content similarity
based.
• Restaurant tags from reviews.
• Sentiment analysis from reviews.
Outline
• What is NLP
• Bag of Words model for sentiment analysis using scikit learn
• DeepDive into deep learning
• Solve the sentiment analysis problem using keras
• A short into Convolution Neural Networks (CNN)
What is Natural
Language Processing?
• Giving structure to unstructured data
• Learn properties of the data that makes
decision making simple
• Provide concise information to drive
intelligence of different systems.
Why?
• Unstructured data cannot be consumed
directly
• Automate simple and complex
functionalities
• Inferences from text data becomes
queriable. This could help with regular BU
reports
• Understand customers better and take
necessary actions for better experience.
Applications
• Categorization of text
• Building domain specific Knowledge Graph
• Recommendations
• Web --- Search
• HR --- people analytics
• Medical --- drug discovery, automated
diagnosis
• ………..
What are the underlying tasks?
• Syntactic Parsing of sentences --- parsing based on structure
• Part of Speech Tagging
• Semantic Parsing -- mapping text directly into formal query language,
e.g. SQL queries for a pre-determined database schema.
• Dialogue state tracking --- chatbots
• Machine Translation
• Language modeling
• Text extraction
• Classification
Text Classification
Text Pre - processing Collecting Training Data Model Building
Offline
SME
• Reduces noise
• Ensures quality
• Improves overall performance
• Training Data Collection / Examples
of classes that we are trying to model
• Model performance is directly
correlated with quality of training
data
• Model selection
• Architecture
• Parameter Tuning
User
Online
Model Evaluation
Text Data
Data Source -- https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences
Model Building: a simple Bag of words (BOW)
model
https://realpython.com/python-keras-text-classification/
Model Building: a simple BOW model
https://realpython.com/python-keras-text-classification/
Deep
Learning
Deep learning algorithms seek
to exploit the unknown
structure in the input
distribution in order to discover
good representations, often at
multiple levels, with higher-level
learned features defined in
terms of lower-level features.
--- Yoshua Bengio
a kind of
learning where
the
representation
you form have
several levels of
abstraction,
rather than a
direct input to
output --- Peter
Norvig
When you hear the term deep learning, just think
of a large deep neural net. Deep refers to the
number of layers typically and so this kind of the
popular term that’s been adopted in the press. I
think of them as deep neural networks generally.
--- Andrew Ng
Why now?
• Explosion in labelled data.
• Exponential growth in
computation power with
cloud computing and
availability of GPUs
• Improvements in setting
initial conditions and
activation functions
Neural Network
Simulate the brain and get neurons densely interconnected in a
computer such that it can learn things, recognize patterns and take
decisions?
Neural Network
Simulate the brain and get neurons densely interconnected in a
computer such that it can learn things, recognize patterns and take
decisions?
What is a neuron?
Neural Network
Simulate the brain and get neurons densely interconnected in a
computer such that it can learn things, recognize patterns and take
decisions?
What is a neuron?
What is neuron?
https://www.slideshare.net/tw_dsconf/ss-62245351
a1
a2
a3
What is neuron?
https://www.slideshare.net/tw_dsconf/ss-62245351
a1
a2
a3
Neural Network
a1
a2
a3
Neural Network
• Each node is a function with input
and output vectors
• Every network structure is defined
by a set of functions
Output Layer
• Loss is minimized using
Gradient Descent
• Find network parameters
such that the loss is
minimized
• This is done by taking
derivatives of the loss wrt
parameters.
• Next the parameters are
updated by subtracting
learning rate times the
derivative
Commonly
used loss
functions
• Mean Squared Error Loss
• Mean Squared Logarithmic Error Loss
• Mean Absolute Error Loss
Regression Loss Functions
• Binary Cross-Entropy
• Hinge Loss
• Squared Hinge Loss
Binary Classification Loss Functions
• Multi-Class Cross-Entropy Loss
• Sparse Multiclass Cross-Entropy Loss
• Kullback Leibler Divergence Loss
Multi-Class Classification Loss Functions
Cost
Function –
Cross
Entropy
Dropout -- avoid overfitting
• Large weights in a neural network are a
sign of a more complex network that has
overfit the training data.
• Probabilistically dropping out nodes in the
network is a simple and effective
regularization method.
• A large network with more training and the
use of a weight constraint are suggested
when using dropout.
Optimization
Techniques
Gradient Descent
Adagrad
RMSprop
Adam
…
Adam Optimization
• adaptive moment estimation
• The method computes individual adaptive learning rates for different
parameters from estimates of first and second moments of the
gradients.
• Calculates an exponential moving average of the gradient and the
squared gradient, parameters control the decay rates of these moving
averages.
https://machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning/
Activation Functions
• Sigmoid/ Softmax
• Tanh
• Relu
• Leaky Relu
• Swish
Activation Functions
• Sigmoid/ Softmax
• Tanh
• Relu
• Leaky Relu
• Swish
Activation Functions
• Sigmoid/ Softmax
• Tanh
• Relu
• Leaky Relu
• Swish
Derivative
Activation Functions
• Sigmoid/ Softmax
• Tanh
• Relu
• Leaky Relu
• Swish
Derivative
a = max(0,z)
Activation Functions
• Sigmoid/ Softmax
• Tanh
• Relu
• Leaky Relu
• Swish
Activation Functions
• Sigmoid/ Softmax
• Tanh
• Relu
• Leaky Relu
• Swish
https://arxiv.org/abs/1710.05941v1
Text Classification Reminder!
https://realpython.com/python-keras-text-classification/
Text Classification using feed forward NN
https://realpython.com/python-keras-text-classification/
Text Classification using feed forward NN
Fit & measure accuracy!
plot_history(history)
Clearly overfits the data!
Can we do better? Word Embeddings
• Words are represented as dense
vectors
• These vectors are
• Learned during the training
task by the neural network
• Pre-trained, learned from
Language Models
• Encode the semantic meaning of
the word.
Text Pre-processing with Keras
PaddingTokenizing
Start with an Embedding Layer
• Embedding Layer of Keras which takes the previously calculated integers and
maps them to a dense vector of the embedding.
o Parameters
Ø input_dim: the size of the vocabulary
Ø output_dim: the size of the dense vector
Ø input_length: the length of the sequence
Hope to see you soon
Nice to see you again
After training
https://stats.stackexchange.com/questions/270546/how-does-keras-embedding-layer-work
Add a pooling layer
• MaxPooling1D/AveragePooling1D or
a GlobalMaxPooling1D/GlobalAveragePooling1D layer
• way to downsample (a way to reduce the size of) the incoming
feature vectors.
• Global max/average pooling takes the maximum/average of all
features whereas in the other case you have to define the pool size.
Definition of
the entire
model
Training
Using pre-trained word embeddings will lead to an accuracy of
0.82. This is a case of transfer learning.
https://realpython.com/python-keras-text-classification
Embeddings + Maxpooling -- Benifits
• Power of generalization --- embeddings are able to share information
across similar features.
• Fewer nodes with zero values.
Convolution Neural Network
Detect features ! Downsample.
What is a CNN?
In a traditional feedforward neural network we connect each
input neuron to each output neuron in the next layer. That’s
also called a fully connected layer, or affine layer.
• We use convolutions over the input layer to compute the
output. This results in local connections, where each region
of the input is connected to a neuron in the output. Each
layer applies different filters and combines the result
• During the training phase, a CNN automatically learns the
values of its filters based on the task you want to perform.
Tricky --- dimensions keep changing as we go from one layer to another
Model definition
Embedding_dim = 50
maxlen=10
Advantages
of CNN
• Character Based CNN
• Has the ability to deal with out of vocabulary
words. This makes it particularly suitable for user
generated raw text.
• Works for multiple languages.
• Model size is small since the tokens are limited to
the number of characters ~ 70. This makes real
life deployments easier and faster.
• Networks with convolutional and pooling
layers are useful for classification tasks in
which we expect to find strong local clues
regarding class membership.
Takeaways!
• If you have text data you need to use NLP
• Try a simple bag of words model for your data
• Having a high level understanding of deep learning will help with
better judgement in architecture design and choice of parameters.
• Deep Learning has the potential to give high performance, you do
need large amount of training data for the benefits.
Thank You
@sangha_deb
sangha123@gmail.com
Visualization of the architecture
50
10
GlobalMaxPool1D
DenseLayer
Sigmoid
Cov1D
Some helpful courses
https://www.coursera.org/learn/classification-vector-spaces-in-nlp
Appendix
Transfer Learning
Character Based CNNs.
https://papers.nips.cc/paper/5782-character-level-convolutional-networks-for-text-classification.pdf
• Embedding Layer
• Six convolutional layers, and 3 convolutional layers followed by a max pooling layer
• Two fully connected layer(dense layer in keras), neuron units are 1024.
• Output layer(dense layer), neuron units depends on classes. In this task, we set it 4.
Pre-processing
Setting Embedding Weights
Model
https://towardsdatascience.com/character-level-cnn-with-keras-50391c3adf33

More Related Content

What's hot (20)

PPTX
Word embedding
ShivaniChoudhary74
 
PDF
Week 4 advanced labeling, augmentation and data preprocessing
Ajay Taneja
 
PPT
LearningKit.ppt
butest
 
PPTX
Introduction to Machine Learning
Lior Rokach
 
PDF
Course 2 Machine Learning Data LifeCycle in Production - Week 1
Ajay Taneja
 
PPTX
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
Madhav Mishra
 
PPTX
Introduction to Machine Learning
Shahar Cohen
 
PDF
C3 w4
Ajay Taneja
 
PDF
Deep Dive into Hyperparameter Tuning
Shubhmay Potdar
 
PDF
Lecture 5 machine learning updated
Vajira Thambawita
 
DOC
DagdelenSiriwardaneY..
butest
 
PDF
"An Introduction to Machine Learning and How to Teach Machines to See," a Pre...
Edge AI and Vision Alliance
 
PPTX
Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...
Madhav Mishra
 
PDF
Machine Learning Data Life Cycle in Production (Week 2 feature engineering...
Ajay Taneja
 
PPTX
Analytics Boot Camp - Slides
Aditya Joshi
 
PPTX
Real-time DirectTranslation System for Sinhala and Tamil Languages.
Sheeyam Shellvacumar
 
PDF
C3 w3
Ajay Taneja
 
PDF
Deep Learning Interview Questions and Answers | Edureka
Edureka!
 
PDF
Feature Engineering
HJ van Veen
 
PPT
Machine Learning Applications in NLP.ppt
butest
 
Word embedding
ShivaniChoudhary74
 
Week 4 advanced labeling, augmentation and data preprocessing
Ajay Taneja
 
LearningKit.ppt
butest
 
Introduction to Machine Learning
Lior Rokach
 
Course 2 Machine Learning Data LifeCycle in Production - Week 1
Ajay Taneja
 
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
Madhav Mishra
 
Introduction to Machine Learning
Shahar Cohen
 
Deep Dive into Hyperparameter Tuning
Shubhmay Potdar
 
Lecture 5 machine learning updated
Vajira Thambawita
 
DagdelenSiriwardaneY..
butest
 
"An Introduction to Machine Learning and How to Teach Machines to See," a Pre...
Edge AI and Vision Alliance
 
Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...
Madhav Mishra
 
Machine Learning Data Life Cycle in Production (Week 2 feature engineering...
Ajay Taneja
 
Analytics Boot Camp - Slides
Aditya Joshi
 
Real-time DirectTranslation System for Sinhala and Tamil Languages.
Sheeyam Shellvacumar
 
Deep Learning Interview Questions and Answers | Edureka
Edureka!
 
Feature Engineering
HJ van Veen
 
Machine Learning Applications in NLP.ppt
butest
 

Similar to NLP and Deep Learning for non_experts (20)

PPTX
Natural Language Processing Advancements By Deep Learning: A Survey
Rimzim Thube
 
PPT
presentation.ppt
MadhuriChandanbatwe
 
PPTX
Empower with visual charts (1)and llms and generative ai.pptx
JOBANPREETSINGH62
 
PPTX
Deep Learning Made Easy with Deep Features
Turi, Inc.
 
PPTX
Deep Learning for Machine Translation
Matīss ‎‎‎‎‎‎‎  
 
PPTX
How to Build Deep Learning Models
Josh Patterson
 
PPTX
AI powered emotion recognition: From Inception to Production - Global AI Conf...
Apache MXNet
 
PPTX
AI powered emotion recognition: From Inception to Production - Global AI Conf...
Vandana Kannan
 
PDF
Synthetic dialogue generation with Deep Learning
S N
 
PDF
Deep Learning For Practitioners, lecture 2: Selecting the right applications...
ananth
 
PDF
Deep Domain
Zachary S. Brown
 
PPTX
Introduction to deep learning
Abhishek Bhandwaldar
 
PDF
Machine learning for IoT - unpacking the blackbox
Ivo Andreev
 
PDF
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018
Apache MXNet
 
PDF
OWF14 - Big Data : The State of Machine Learning in 2014
Paris Open Source Summit
 
PDF
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Saurabh Kaushik
 
PDF
Distributed Deep Learning with Docker at Salesforce
Docker, Inc.
 
PDF
Artificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
Jen Stirrup
 
PDF
Startup.Ml: Using neon for NLP and Localization Applications
Intel Nervana
 
PDF
Apache MXNet ODSC West 2018
Apache MXNet
 
Natural Language Processing Advancements By Deep Learning: A Survey
Rimzim Thube
 
presentation.ppt
MadhuriChandanbatwe
 
Empower with visual charts (1)and llms and generative ai.pptx
JOBANPREETSINGH62
 
Deep Learning Made Easy with Deep Features
Turi, Inc.
 
Deep Learning for Machine Translation
Matīss ‎‎‎‎‎‎‎  
 
How to Build Deep Learning Models
Josh Patterson
 
AI powered emotion recognition: From Inception to Production - Global AI Conf...
Apache MXNet
 
AI powered emotion recognition: From Inception to Production - Global AI Conf...
Vandana Kannan
 
Synthetic dialogue generation with Deep Learning
S N
 
Deep Learning For Practitioners, lecture 2: Selecting the right applications...
ananth
 
Deep Domain
Zachary S. Brown
 
Introduction to deep learning
Abhishek Bhandwaldar
 
Machine learning for IoT - unpacking the blackbox
Ivo Andreev
 
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018
Apache MXNet
 
OWF14 - Big Data : The State of Machine Learning in 2014
Paris Open Source Summit
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Saurabh Kaushik
 
Distributed Deep Learning with Docker at Salesforce
Docker, Inc.
 
Artificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
Jen Stirrup
 
Startup.Ml: Using neon for NLP and Localization Applications
Intel Nervana
 
Apache MXNet ODSC West 2018
Apache MXNet
 
Ad

More from Sanghamitra Deb (13)

PDF
odsc_2023.pdf
Sanghamitra Deb
 
PPTX
Multi-modal sources for predictive modeling using deep learning
Sanghamitra Deb
 
PPTX
Computer Vision Landscape : Present and Future
Sanghamitra Deb
 
PDF
Intro to NLP: Text Categorization and Topic Modeling
Sanghamitra Deb
 
PPTX
Computer Vision for Beginners
Sanghamitra Deb
 
PDF
NLP and Machine Learning for non-experts
Sanghamitra Deb
 
PDF
Democratizing NLP content modeling with transfer learning using GPUs
Sanghamitra Deb
 
PDF
Natural Language Comprehension: Human Machine Collaboration.
Sanghamitra Deb
 
PDF
Data day2017
Sanghamitra Deb
 
PDF
Extracting knowledgebase from text
Sanghamitra Deb
 
PDF
Extracting medical attributes and finding relations
Sanghamitra Deb
 
PDF
From Rocket Science to Data Science
Sanghamitra Deb
 
PPTX
Understanding Product Attributes from Reviews
Sanghamitra Deb
 
odsc_2023.pdf
Sanghamitra Deb
 
Multi-modal sources for predictive modeling using deep learning
Sanghamitra Deb
 
Computer Vision Landscape : Present and Future
Sanghamitra Deb
 
Intro to NLP: Text Categorization and Topic Modeling
Sanghamitra Deb
 
Computer Vision for Beginners
Sanghamitra Deb
 
NLP and Machine Learning for non-experts
Sanghamitra Deb
 
Democratizing NLP content modeling with transfer learning using GPUs
Sanghamitra Deb
 
Natural Language Comprehension: Human Machine Collaboration.
Sanghamitra Deb
 
Data day2017
Sanghamitra Deb
 
Extracting knowledgebase from text
Sanghamitra Deb
 
Extracting medical attributes and finding relations
Sanghamitra Deb
 
From Rocket Science to Data Science
Sanghamitra Deb
 
Understanding Product Attributes from Reviews
Sanghamitra Deb
 
Ad

Recently uploaded (20)

PDF
MODULE-5 notes [BCG402-CG&V] PART-B.pdf
Alvas Institute of Engineering and technology, Moodabidri
 
PPTX
Diabetes diabetes diabetes diabetes jsnsmxndm
130SaniyaAbduNasir
 
PPTX
darshai cross section and river section analysis
muk7971
 
PDF
June 2025 Top 10 Sites -Electrical and Electronics Engineering: An Internatio...
elelijjournal653
 
PDF
bs-en-12390-3 testing hardened concrete.pdf
ADVANCEDCONSTRUCTION
 
PPTX
Stability of IBR Dominated Grids - IEEE PEDG 2025 - short.pptx
ssuser307730
 
PPTX
Introduction to File Transfer Protocol with commands in FTP
BeulahS2
 
PPTX
Explore USA’s Best Structural And Non Structural Steel Detailing
Silicon Engineering Consultants LLC
 
PDF
Artificial Neural Network-Types,Perceptron,Problems
Sharmila Chidaravalli
 
PDF
Authentication Devices in Fog-mobile Edge Computing Environments through a Wi...
ijujournal
 
PPTX
template.pptxr4t5y67yrttttttttttttttttttttttttttttttttttt
SithamparanaathanPir
 
PPSX
OOPS Concepts in Python and Exception Handling
Dr. A. B. Shinde
 
PPTX
Kel.3_A_Review_on_Internet_of_Things_for_Defense_v3.pptx
Endang Saefullah
 
PDF
lesson4-occupationalsafetyandhealthohsstandards-240812020130-1a7246d0.pdf
arvingallosa3
 
PPTX
UNIT 1 - INTRODUCTION TO AI and AI tools and basic concept
gokuld13012005
 
PPTX
Comparison of Flexible and Rigid Pavements in Bangladesh
Arifur Rahman
 
PDF
Module - 4 Machine Learning -22ISE62.pdf
Dr. Shivashankar
 
PDF
Decision support system in machine learning models for a face recognition-bas...
TELKOMNIKA JOURNAL
 
PDF
Bayesian Learning - Naive Bayes Algorithm
Sharmila Chidaravalli
 
PDF
13th International Conference on Artificial Intelligence, Soft Computing (AIS...
ijait
 
MODULE-5 notes [BCG402-CG&V] PART-B.pdf
Alvas Institute of Engineering and technology, Moodabidri
 
Diabetes diabetes diabetes diabetes jsnsmxndm
130SaniyaAbduNasir
 
darshai cross section and river section analysis
muk7971
 
June 2025 Top 10 Sites -Electrical and Electronics Engineering: An Internatio...
elelijjournal653
 
bs-en-12390-3 testing hardened concrete.pdf
ADVANCEDCONSTRUCTION
 
Stability of IBR Dominated Grids - IEEE PEDG 2025 - short.pptx
ssuser307730
 
Introduction to File Transfer Protocol with commands in FTP
BeulahS2
 
Explore USA’s Best Structural And Non Structural Steel Detailing
Silicon Engineering Consultants LLC
 
Artificial Neural Network-Types,Perceptron,Problems
Sharmila Chidaravalli
 
Authentication Devices in Fog-mobile Edge Computing Environments through a Wi...
ijujournal
 
template.pptxr4t5y67yrttttttttttttttttttttttttttttttttttt
SithamparanaathanPir
 
OOPS Concepts in Python and Exception Handling
Dr. A. B. Shinde
 
Kel.3_A_Review_on_Internet_of_Things_for_Defense_v3.pptx
Endang Saefullah
 
lesson4-occupationalsafetyandhealthohsstandards-240812020130-1a7246d0.pdf
arvingallosa3
 
UNIT 1 - INTRODUCTION TO AI and AI tools and basic concept
gokuld13012005
 
Comparison of Flexible and Rigid Pavements in Bangladesh
Arifur Rahman
 
Module - 4 Machine Learning -22ISE62.pdf
Dr. Shivashankar
 
Decision support system in machine learning models for a face recognition-bas...
TELKOMNIKA JOURNAL
 
Bayesian Learning - Naive Bayes Algorithm
Sharmila Chidaravalli
 
13th International Conference on Artificial Intelligence, Soft Computing (AIS...
ijait
 

NLP and Deep Learning for non_experts

  • 1. NLP & Deep Learning for non-experts Sanghamitra Deb Staff Data Scientist Chegg Inc
  • 2. How to start projects in machine learning? • Kaggle competitions --- • Make sure to solve the ML problems for concept development before competing
  • 3. How to start projects in machine learning? • Kaggle competitions --- • Make sure to solve the ML problems for concept development before competing
  • 4. How to start projects in machine learning? • Self guided workshops/projects --- lets say you have data from Zomato • Restaurant recommendation -- user based, content similarity based. • Restaurant tags from reviews. • Sentiment analysis from reviews.
  • 5. Outline • What is NLP • Bag of Words model for sentiment analysis using scikit learn • DeepDive into deep learning • Solve the sentiment analysis problem using keras • A short into Convolution Neural Networks (CNN)
  • 6. What is Natural Language Processing? • Giving structure to unstructured data • Learn properties of the data that makes decision making simple • Provide concise information to drive intelligence of different systems.
  • 7. Why? • Unstructured data cannot be consumed directly • Automate simple and complex functionalities • Inferences from text data becomes queriable. This could help with regular BU reports • Understand customers better and take necessary actions for better experience.
  • 8. Applications • Categorization of text • Building domain specific Knowledge Graph • Recommendations • Web --- Search • HR --- people analytics • Medical --- drug discovery, automated diagnosis • ………..
  • 9. What are the underlying tasks? • Syntactic Parsing of sentences --- parsing based on structure • Part of Speech Tagging • Semantic Parsing -- mapping text directly into formal query language, e.g. SQL queries for a pre-determined database schema. • Dialogue state tracking --- chatbots • Machine Translation • Language modeling • Text extraction • Classification
  • 10. Text Classification Text Pre - processing Collecting Training Data Model Building Offline SME • Reduces noise • Ensures quality • Improves overall performance • Training Data Collection / Examples of classes that we are trying to model • Model performance is directly correlated with quality of training data • Model selection • Architecture • Parameter Tuning User Online Model Evaluation
  • 11. Text Data Data Source -- https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences
  • 12. Model Building: a simple Bag of words (BOW) model https://realpython.com/python-keras-text-classification/
  • 13. Model Building: a simple BOW model https://realpython.com/python-keras-text-classification/
  • 14. Deep Learning Deep learning algorithms seek to exploit the unknown structure in the input distribution in order to discover good representations, often at multiple levels, with higher-level learned features defined in terms of lower-level features. --- Yoshua Bengio a kind of learning where the representation you form have several levels of abstraction, rather than a direct input to output --- Peter Norvig When you hear the term deep learning, just think of a large deep neural net. Deep refers to the number of layers typically and so this kind of the popular term that’s been adopted in the press. I think of them as deep neural networks generally. --- Andrew Ng
  • 15. Why now? • Explosion in labelled data. • Exponential growth in computation power with cloud computing and availability of GPUs • Improvements in setting initial conditions and activation functions
  • 16. Neural Network Simulate the brain and get neurons densely interconnected in a computer such that it can learn things, recognize patterns and take decisions?
  • 17. Neural Network Simulate the brain and get neurons densely interconnected in a computer such that it can learn things, recognize patterns and take decisions? What is a neuron?
  • 18. Neural Network Simulate the brain and get neurons densely interconnected in a computer such that it can learn things, recognize patterns and take decisions? What is a neuron?
  • 22. Neural Network • Each node is a function with input and output vectors • Every network structure is defined by a set of functions
  • 24. • Loss is minimized using Gradient Descent • Find network parameters such that the loss is minimized • This is done by taking derivatives of the loss wrt parameters. • Next the parameters are updated by subtracting learning rate times the derivative
  • 25. Commonly used loss functions • Mean Squared Error Loss • Mean Squared Logarithmic Error Loss • Mean Absolute Error Loss Regression Loss Functions • Binary Cross-Entropy • Hinge Loss • Squared Hinge Loss Binary Classification Loss Functions • Multi-Class Cross-Entropy Loss • Sparse Multiclass Cross-Entropy Loss • Kullback Leibler Divergence Loss Multi-Class Classification Loss Functions
  • 27. Dropout -- avoid overfitting • Large weights in a neural network are a sign of a more complex network that has overfit the training data. • Probabilistically dropping out nodes in the network is a simple and effective regularization method. • A large network with more training and the use of a weight constraint are suggested when using dropout.
  • 29. Adam Optimization • adaptive moment estimation • The method computes individual adaptive learning rates for different parameters from estimates of first and second moments of the gradients. • Calculates an exponential moving average of the gradient and the squared gradient, parameters control the decay rates of these moving averages. https://machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning/
  • 30. Activation Functions • Sigmoid/ Softmax • Tanh • Relu • Leaky Relu • Swish
  • 31. Activation Functions • Sigmoid/ Softmax • Tanh • Relu • Leaky Relu • Swish
  • 32. Activation Functions • Sigmoid/ Softmax • Tanh • Relu • Leaky Relu • Swish Derivative
  • 33. Activation Functions • Sigmoid/ Softmax • Tanh • Relu • Leaky Relu • Swish Derivative a = max(0,z)
  • 34. Activation Functions • Sigmoid/ Softmax • Tanh • Relu • Leaky Relu • Swish
  • 35. Activation Functions • Sigmoid/ Softmax • Tanh • Relu • Leaky Relu • Swish https://arxiv.org/abs/1710.05941v1
  • 37. Text Classification using feed forward NN https://realpython.com/python-keras-text-classification/
  • 38. Text Classification using feed forward NN
  • 39. Fit & measure accuracy! plot_history(history) Clearly overfits the data!
  • 40. Can we do better? Word Embeddings • Words are represented as dense vectors • These vectors are • Learned during the training task by the neural network • Pre-trained, learned from Language Models • Encode the semantic meaning of the word.
  • 41. Text Pre-processing with Keras PaddingTokenizing
  • 42. Start with an Embedding Layer • Embedding Layer of Keras which takes the previously calculated integers and maps them to a dense vector of the embedding. o Parameters Ø input_dim: the size of the vocabulary Ø output_dim: the size of the dense vector Ø input_length: the length of the sequence Hope to see you soon Nice to see you again After training https://stats.stackexchange.com/questions/270546/how-does-keras-embedding-layer-work
  • 43. Add a pooling layer • MaxPooling1D/AveragePooling1D or a GlobalMaxPooling1D/GlobalAveragePooling1D layer • way to downsample (a way to reduce the size of) the incoming feature vectors. • Global max/average pooling takes the maximum/average of all features whereas in the other case you have to define the pool size.
  • 45. Training Using pre-trained word embeddings will lead to an accuracy of 0.82. This is a case of transfer learning. https://realpython.com/python-keras-text-classification
  • 46. Embeddings + Maxpooling -- Benifits • Power of generalization --- embeddings are able to share information across similar features. • Fewer nodes with zero values.
  • 47. Convolution Neural Network Detect features ! Downsample.
  • 48. What is a CNN? In a traditional feedforward neural network we connect each input neuron to each output neuron in the next layer. That’s also called a fully connected layer, or affine layer. • We use convolutions over the input layer to compute the output. This results in local connections, where each region of the input is connected to a neuron in the output. Each layer applies different filters and combines the result • During the training phase, a CNN automatically learns the values of its filters based on the task you want to perform. Tricky --- dimensions keep changing as we go from one layer to another
  • 50. Advantages of CNN • Character Based CNN • Has the ability to deal with out of vocabulary words. This makes it particularly suitable for user generated raw text. • Works for multiple languages. • Model size is small since the tokens are limited to the number of characters ~ 70. This makes real life deployments easier and faster. • Networks with convolutional and pooling layers are useful for classification tasks in which we expect to find strong local clues regarding class membership.
  • 51. Takeaways! • If you have text data you need to use NLP • Try a simple bag of words model for your data • Having a high level understanding of deep learning will help with better judgement in architecture design and choice of parameters. • Deep Learning has the potential to give high performance, you do need large amount of training data for the benefits.
  • 53. Visualization of the architecture 50 10 GlobalMaxPool1D DenseLayer Sigmoid Cov1D
  • 57. Character Based CNNs. https://papers.nips.cc/paper/5782-character-level-convolutional-networks-for-text-classification.pdf • Embedding Layer • Six convolutional layers, and 3 convolutional layers followed by a max pooling layer • Two fully connected layer(dense layer in keras), neuron units are 1024. • Output layer(dense layer), neuron units depends on classes. In this task, we set it 4.