SlideShare a Scribd company logo
SEQUENCE-TO-SEQUENCE LEARNING USING
DEEP LEARNING FOR OPTICAL CHARACTER
RECOGNITION
Advisor
Dr. Devinder Kaur
Presented
By
Vishal Vijay Shankar Mishra
AGENDA
• Problem Statement
 Converting Mathematical Equations into Latex
representation.
• Approach (Deep Learning Techniques)
 Convolutional Neural Network (CNN)
 Recurrent Neural Network (RNN)
 Long Term-Short Memory (LTSM)
 Attention Model
• Introduction to CNN
 Gist of Neural Network
 Architecture of CNN
• CNN layers
 Convolution Layer
 Non-Linear Activation Layer (ReLu)
 Pooling Layer
• Hyper-Parameters.
• Introduction to RNN
 Architecture of RNN
 Working of RNN
 RNN Example
• Drawback of RNN
• LSTM
 Architecture of LSTM
 Working of LSTM
 LSTM Example
• Proposed Model
• Results and Future work
• Conclusion
AGENDA
• Problem Statement
 Converting Mathematical Equations into Latex
representation.
• Approach (Deep Learning Techniques)
 Convolutional Neural Network (CNN)
 Recurrent Neural Network (RNN)
 Long Term-Short Memory (LTSM)
 Attention Model
• Introduction to CNN
 Gist of Neural Network
 Architecture of CNN
• CNN layers
 Convolution Layer
 Non-Linear Activation Layer (ReLu)
 Pooling Layer
• Hyper-Parameters.
• Introduction to RNN
 Architecture of RNN
 Working of RNN
 RNN Example
• Drawback of RNN
• LSTM
 Architecture of LSTM
 Working of LSTM
 LSTM Example
• Proposed Model
• Results and Future work
• Conclusion
PROBLEM STATEMENT
•In this thesis, I have implemented a sequence-to-
sequence analysis using Deep Learning for Optical
Character Recognition.
•I have used the images of the mathematical equations to
convert it into LATEX representation.
AGENDA
• Problem Statement
 Converting Mathematical Equations into Latex
representation.
• Approach (Deep Learning Techniques)
 Convolutional Neural Network (CNN)
 Recurrent Neural Network (RNN)
 Long Term-Short Memory (LTSM)
 Attention Model
• Introduction to CNN
 Gist of Neural Network
 Architecture of CNN
• CNN layers
 Convolution Layer
 Non-Linear Activation Layer (ReLu)
 Pooling Layer
• Hyper-Parameters.
• Introduction to RNN
 Architecture of RNN
 Working of RNN
 RNN Example
• Drawback of RNN
• LSTM
 Architecture of LSTM
 Working of LSTM
 LSTM Example
• Proposed Model
• Results and Future work
• Conclusion
APPROACH (DEEP LEARNING TECHNIQUES)
• To accomplish this research work, I have used the following deep
learning techniques.
Convolutional Neural Network (CNN)
Recurrent Neural Network (RNN)
Long Term-Short Memory (LSTM)
Attention model.
• In the subsequent slides, I’ll try to give the gist of the techniques.
WHAT IS DEEP NEURAL NETWORK?
• Deep Neural Networks are those networks that have more than 2 layer to perform the task.
WHY DO WE NEED DEEP NEURAL NETWORK?
• Neural nets tend to be computationally expensive for data with simple
patterns; in such cases you should use a model like Logistic Regression or
an SVM.
• As the pattern complexity increases, neural nets start to outperform other
machine learning methods.
• At the highest levels of pattern complexity –for example high-resolution
images
• – neural nets with a small number of layers will require a number of nodes
that grows exponentially with the number of unique patterns. Even then,
the net would likely take excessive time to train, or simply would fail to
AGENDA
• Problem Statement
 Converting Mathematical Equations into Latex
representation.
• Approach (Deep Learning Techniques)
 Convolutional Neural Network (CNN)
 Recurrent Neural Network (RNN)
 Long Term-Short Memory (LTSM)
 Attention Model
• Introduction to CNN
 Gist of Neural Network
 Architecture of CNN
• CNN layers
 Convolution Layer
 Non-Linear Activation Layer (ReLu)
 Pooling Layer
• Hyper-Parameters.
• Introduction to RNN
 Architecture of RNN
 Working of RNN
 RNN Example
• Drawback of RNN
• LSTM
 Architecture of LSTM
 Working of LSTM
 LSTM Example
• Proposed Model
• Results and Future work
• Conclusion
WHY CONVOLUTIONAL NEURAL NETWORK?
WHY CONVOLUTIONAL NEURAL NETWORK?
WHY CONVOLUTIONAL NEURAL NETWORK?
WHY CONVOLUTIONAL NEURAL NETWORK?
WHY CONVOLUTIONAL NEURAL NETWORK?
INTRODUCTION TO CNN
• Architecture of CNN
WORKING OF CNN
WORKING OF CNN
WORKING OF CNN
WORKING OF CNN
WORKING OF CNN
WORKING OF CNN
WORKING OF CNN
WORKING OF CNN
WORKING OF CNN
WORKING OF CNN
WORKING OF CNN
WORKING OF CNN
WORKING OF CNN
WORKING OF CNN
WORKING OF CNN
WORKING OF CNN
WORKING OF CNN
WORKING OF CNN
AGENDA
• Problem Statement
 Converting Mathematical Equations into Latex
representation.
• Approach (Deep Learning Techniques)
 Convolutional Neural Network (CNN)
 Recurrent Neural Network (RNN)
 Long Term-Short Memory (LTSM)
 Attention Model
• Introduction to CNN
 Gist of Neural Network
 Architecture of CNN
• CNN layers
 Convolution Layer
 Non-Linear Activation Layer (ReLu)
 Pooling Layer
• Hyper-Parameters.
• Introduction to RNN
 Architecture of RNN
 Working of RNN
 RNN Example
• Drawback of RNN
• LSTM
 Architecture of LSTM
 Working of LSTM
 LSTM Example
• Proposed Model
• Results and Future work
• Conclusion
LAYERS IN CNN
CONVOLUTION LAYER
CONVOLUTION LAYER
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.
NON-LINEAR ACTIVATION LAYER (RELU)
• ReLU is a non-linear activation function, which is used to apply elementwise non-linearity.
• ReLU layer applies an activation function to each element, such as the max(0, x) thresholding to zero.
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.
AGENDA
• Problem Statement
 Converting Mathematical Equations into Latex
representation.
• Approach (Deep Learning Techniques)
 Convolutional Neural Network (CNN)
 Recurrent Neural Network (RNN)
 Long Term-Short Memory (LTSM)
 Attention Model
• Introduction to CNN
 Gist of Neural Network
 Architecture of CNN
• CNN layers
 Convolution Layer
 Non-Linear Activation Layer (ReLu)
 Pooling Layer
• Hyper-Parameters.
• Introduction to RNN
 Architecture of RNN
 Working of RNN
 RNN Example
• Drawback of RNN
• LSTM
 Architecture of LSTM
 Working of LSTM
 LSTM Example
• Proposed Model
• Results and Future work
• Conclusion
HYPER-PARAMETERS
• Convolution
• Filter Size
• Number of Filters
• Padding
• Stride
• Pooling
• Filter Size
• Stride
• Fully Connected
• Number of neurons
AGENDA
• Problem Statement
 Converting Mathematical Equations into Latex
representation.
• Approach (Deep Learning Techniques)
 Convolutional Neural Network (CNN)
 Recurrent Neural Network (RNN)
 Long Term-Short Memory (LTSM)
 Attention Model
• Introduction to CNN
 Gist of Neural Network
 Architecture of CNN
• CNN layers
 Convolution Layer
 Non-Linear Activation Layer (ReLu)
 Pooling Layer
• Hyper-Parameters.
• Introduction to RNN
 Architecture of RNN
 Working of RNN
 RNN Example
• Drawback of RNN
• LSTM
 Architecture of LSTM
 Working of LSTM
 LSTM Example
• Proposed Model
• Results and Future work
• Conclusion
INTRODUCTION TO RNN
• RNNs are a type of artificial neural network designed to recognize the patterns in sequences of data. It is used to
process the data sequentially.
• Why can’t we accomplish this task with Feed forward network?
• The drawback of feed forward network is that, it doesn't remember the inputs over the period of time.
• To process the data sequentially we need a network that behave recurrently.
• Architecture of RNN
RNN have
loop
• RNNs are not all that different than Neural Network. RNN can be thought of as multiple copies of the same network,
each passing a message to a successor. An unrolled RNN is shown below.
• In fast last few years, there have been incredible success applying RNNs to a variety of problems: speech recognition,
language modeling, translation, image captioning…. The list goes on.
An Unrolled RNN
DRAWBACK OF AN RNN
• RNN has a problem of long term dependency. It doesn’t remember the inputs after certain time steps.
• This problem occurs due to gradient exploding or gradient vanishing while performing backpropagation.
• I’ll try to example this problem with an example.
• Let’s consider a language model trying to predict the next word based on the previous ones.
• For example “the clouds are in the sky ”. So in order to predict the sky we don’t need any further context.
In such cases, where the gap between the relevant information and the place that it’s needed is small, RNNs can
learn to use the past information
• But there are also cases where we need to know more context of the input.
• For example “I grew up in France ……………………………. I speak fluent French” .
• Unfortunately, as that gap grows, RNNs become unable to learn to connect the information.
• This happens due to vanishing gradient and exploding gradient problem.
VANISHING GRADIENT
EXPLODING GRADIENT
HOW TO OVERCOME THESE CHALLENGES?
• For Vanishing gradient we can use,
ReLu activation function: We can use activation functions like
ReLU, which gives output one while calculating gradient.
LSTM, GRUs : Different network architectures that has been
specially designed can be used to combat this problem.
• For Exploding gradient we can use,
Clip gradients at threshold: clip the gradient when it goes
higher than a threshold.
AGENDA
• Problem Statement
 Converting Mathematical Equations into Latex
representation.
• Approach (Deep Learning Techniques)
 Convolutional Neural Network (CNN)
 Recurrent Neural Network (RNN)
 Long Term-Short Memory (LTSM)
 Attention Model
• Introduction to CNN
 Gist of Neural Network
 Architecture of CNN
• CNN layers
 Convolution Layer
 Non-Linear Activation Layer (ReLu)
 Pooling Layer
• Hyper-Parameters.
• Introduction to RNN
 Architecture of RNN
 Working of RNN
 RNN Example
• Drawback of RNN
• LSTM
 Architecture of LSTM
 Working of LSTM
 LSTM Example
• Proposed Model
• Results and Future work
• Conclusion
LONG SHORT-TERM MEMORY (LSTM)
• Long Short Term Memory network – usually just called “LSTM” – are a special kind of RNN
• They are capable of learning long-term dependencies.
ARCHITECTURE OF LSTM
• Why LSTM is different then RNN, because LSTM has cell state that deals with long term dependencies.
WORKING OF LSTM
• Step 1: The first step in the LSTM is to identify those information that are not required and will be thrown away from
the cell state. This decision is made of neural network with sigmoid activation function called forget gate layer.
• 𝑊𝑓 = weight
• ℎ 𝑡−1 = output from the previous time step
• 𝑋𝑡 = New input
• 𝑏𝑓 = bias
𝑓𝑡 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝑊𝑓 ℎ 𝑡−1, 𝑋𝑡 + 𝑏𝑓
WORKING OF LSTM
• Step 2: The next step is to decide, what new information we’re going to store in the cell state. This whole process
comprises of following steps. A NN layer with sigmoid sigmoid called the “input gate layer” decides which values will
be updated. Next, a NN layer with tanh activation creates a new vector that could be added to the state.
• In the next step, we’ll combine these two to
• V update the state.
𝑖 𝑡 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝑊𝑖 ℎ 𝑡−1, 𝑋𝑡 + 𝑏𝑖
𝐶𝑡′ = 𝑡𝑎𝑛ℎ(𝑊𝑐 ℎ 𝑡−1, 𝑋𝑡 + 𝑏 𝑐
WORKING OF LSTM
• Step 3: Now, we will update the old cell states, Ct-1 into the new cell state Ct. First, we multiply the old state (Ct-1) by ft,
forgetting the things we decided to forget earlier. Then, we add it * 𝐶𝑡
′
. This is the new vector values, scaled by how
much we decided to update each cell state value.
𝐶𝑡 = 𝑓𝑡 ∗ 𝐶𝑡−1 + 𝑖 𝑡 ∗ 𝐶𝑡′
WORKING OF LSTM
• Step 4: We will run a sigmoid layer which decides what parts of the cell state we’re going to output. Then, we put the
cell state through tanh (push the values to be between -1 and 1) and multiply it by the output of the sigmoid gate,
so that we only output the parts we decided to.
𝑂𝑡 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝑊𝑜 ℎ 𝑡−1, 𝑋𝑡 + 𝑏 𝑜
ℎ 𝑡 = 𝑂𝑡 ∗ tan h( 𝐶𝑡 s
PROPOSED LSTM VARIANT WITH PEEPHOLE CONNECTION
• The easiest but very powerful solution to the conventional LSTM unit is to introduce a weighted “peephole”
connections from the cell state unit (Ct-1) to all the gates in the same memory unit. The peephole connections allow
every gate to assess the current cell state even though the output gate is closed.
STOCHASTIC “HARD” ATTENTION MODEL
• With an attention mechanism, the image is first
divided into n, parts, and we compute with a
CNN representations of each part y1,y2………yn.
When LSTM is generating a new word, the
attention mechanism is focusing on the
relevant part of the image, so the decoder only
uses specific parts of the image.
• In stochastic process like Hard attention
mechanism, rather than using all the hidden
states yt as an input for the decoding, the
process finds the probabilities of a hidden state
with respect to location variable 𝑠𝑡. The
gradients are obtained by reinforcement
learning.
AGENDA
• Problem Statement
 Converting Mathematical Equations into Latex
representation.
• Approach (Deep Learning Techniques)
 Convolutional Neural Network (CNN)
 Recurrent Neural Network (RNN)
 Long Term-Short Memory (LTSM)
 Attention Model
• Introduction to CNN
 Gist of Neural Network
 Architecture of CNN
• CNN layers
 Convolution Layer
 Non-Linear Activation Layer (ReLu)
 Pooling Layer
• Hyper-Parameters.
• Introduction to RNN
 Architecture of RNN
 Working of RNN
 RNN Example
• Drawback of RNN
• LSTM
 Architecture of LSTM
 Working of LSTM
 LSTM Example
• Proposed Model
• Results
• Conclusion and Future work
PROPOSED MODEL
PROPOSED MODEL
• Original Image
• Predicted Latex:
• Rendered Predicted Image:
Actual Test results on Test set:
RESULTS
• The proposed method is compared with the previous two methods called INFTY and WYGIWYS on the bases of BLEU
(Bilingual evaluation understudy) metric and Exact Match. BLEU is a metric to evaluate the quality for the predicted
Latex markup representation of the image. Exact Match is the metric which represents the percentage of the images
classified correctly.
• It can be seen that the proposed method scores better than the previous methods. The proposed model generated
results close to 76% which is the highest in this research area. Previously, the highest result was around 75% achieved
by WYGIWYS (What You Get Is What You See) model. The BLEU and Exact Match scores of the proposed model are
slightly above the existing model however, this is a significant achievement considering the low GPU resources and
small dataset.
Model Preprocessing BLEU Exact Match
INFTY - 51.20 15.60
WYGIWYS Tokenize 73.71 74.46
PROPOSED MODEL Tokenize 75.08 75.87
Actual Test results on Test set:
FUTURE WORK.
• For possible future work, this research can be scaled from printed
mathematical formulas images to the hand written mathematical
formulas images. To recognize the hand written mathematical
formulas, one can implement the bidirectional LSTM with CNN
• This model can be used to solve the mathematical question
based on formulas.
• An API (Application Program Interface) can be created to solve
the mathematical problems.
REFERENCES.
[1] R. 1. Anderson, Syntax-directed recognition of handprinted mathematics., CA: Symposium, 1967.
[2] K. Cho, A. Courville and Y. Bengio, "Describing Multimedia Content Using Attention-Based Encoder-Decoder Networks," in IEEE, CA, 2015.
[3] A. K. a. E. Learned-Miller, "Learning on the Fly: Font-Free Approaches to Difficult OCR Problems," MA, 2000.
[4] D. Lopresti, "Optical Character Recognition Errors and Their Effects on Natural Language Processing," International Journal on Document Analysis and Recognition, 19 12
[5] WILDML, "WILDML," [Online]. Available: http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/.
[6] S. a. S. J. Hochreiter, "Long Short-Term Memory. Neural Computation," 1997.
[7] S. Yan, "Understanding LSTM and its diagrams," Software engineer & wantrepreneur. Interested in computer graphics, bitcoin and deep learning., 13 03 2016. [Online]. [1]
https://medium.com/@shiyan/understanding-lstm-and-its-diagrams-37e2f46f1714.
[8] C. R. a. D. P. W. Ellis, "FEED-FORWARD NETWORKS WITH ATTENTION CAN SOLVE SOME LONG-TERM MEMORY PROBLEMS," ICLR, 2016.
[9] a. F.-F. L. Karpathy, Image captioning., 2015.
[10] F. A. Gers, N. N. Schraudolph and J. Schmidhuber, "Learning Precise Timing with LSTM Recurrent Networks," Journal of Machine Learning Reserach, 8 2002.
•Questions ???????????
•Thank You !!!!!!!!!
Ad

More Related Content

What's hot (20)

Convolutional Neural Network
Convolutional Neural NetworkConvolutional Neural Network
Convolutional Neural Network
Vignesh Suresh
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
Appsilon Data Science
 
Convolutional Neural Network (CNN) - image recognition
Convolutional Neural Network (CNN)  - image recognitionConvolutional Neural Network (CNN)  - image recognition
Convolutional Neural Network (CNN) - image recognition
YUNG-KUEI CHEN
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)
Muhammad Haroon
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learning
leopauly
 
Genetic Algorithms Made Easy
Genetic Algorithms Made EasyGenetic Algorithms Made Easy
Genetic Algorithms Made Easy
Prakash Pimpale
 
Autoencoders in Deep Learning
Autoencoders in Deep LearningAutoencoders in Deep Learning
Autoencoders in Deep Learning
milad abbasi
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
MojammilHusain
 
CNN and its applications by ketaki
CNN and its applications by ketakiCNN and its applications by ketaki
CNN and its applications by ketaki
Ketaki Patwari
 
Understanding cnn
Understanding cnnUnderstanding cnn
Understanding cnn
Rucha Gole
 
Rnn & Lstm
Rnn & LstmRnn & Lstm
Rnn & Lstm
Subash Chandra Pakhrin
 
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousryHands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Ahmed Yousry
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNet
SungminYou
 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)
Basit Rafiq
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial Networks
Mark Chang
 
Transformer in Vision
Transformer in VisionTransformer in Vision
Transformer in Vision
Sangmin Woo
 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network
Yan Xu
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
Ashray Bhandare
 
Applying your Convolutional Neural Networks
Applying your Convolutional Neural NetworksApplying your Convolutional Neural Networks
Applying your Convolutional Neural Networks
Databricks
 
Autoencoder
AutoencoderAutoencoder
Autoencoder
Mehrnaz Faraz
 
Convolutional Neural Network
Convolutional Neural NetworkConvolutional Neural Network
Convolutional Neural Network
Vignesh Suresh
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
Appsilon Data Science
 
Convolutional Neural Network (CNN) - image recognition
Convolutional Neural Network (CNN)  - image recognitionConvolutional Neural Network (CNN)  - image recognition
Convolutional Neural Network (CNN) - image recognition
YUNG-KUEI CHEN
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)
Muhammad Haroon
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learning
leopauly
 
Genetic Algorithms Made Easy
Genetic Algorithms Made EasyGenetic Algorithms Made Easy
Genetic Algorithms Made Easy
Prakash Pimpale
 
Autoencoders in Deep Learning
Autoencoders in Deep LearningAutoencoders in Deep Learning
Autoencoders in Deep Learning
milad abbasi
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
MojammilHusain
 
CNN and its applications by ketaki
CNN and its applications by ketakiCNN and its applications by ketaki
CNN and its applications by ketaki
Ketaki Patwari
 
Understanding cnn
Understanding cnnUnderstanding cnn
Understanding cnn
Rucha Gole
 
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousryHands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Ahmed Yousry
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNet
SungminYou
 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)
Basit Rafiq
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial Networks
Mark Chang
 
Transformer in Vision
Transformer in VisionTransformer in Vision
Transformer in Vision
Sangmin Woo
 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network
Yan Xu
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
Ashray Bhandare
 
Applying your Convolutional Neural Networks
Applying your Convolutional Neural NetworksApplying your Convolutional Neural Networks
Applying your Convolutional Neural Networks
Databricks
 

Similar to Convolutional Neural Network and RNN for OCR problem. (20)

Complete solution for Recurrent neural network.pptx
Complete solution for Recurrent neural network.pptxComplete solution for Recurrent neural network.pptx
Complete solution for Recurrent neural network.pptx
ArunKumar674066
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
Junaid Bhat
 
10.0 SequenceModeling-merged-compressed_edited.pptx
10.0 SequenceModeling-merged-compressed_edited.pptx10.0 SequenceModeling-merged-compressed_edited.pptx
10.0 SequenceModeling-merged-compressed_edited.pptx
ykchia03
 
Sequence Modeling RNN-Anand Kumar (1).pptx
Sequence Modeling RNN-Anand Kumar (1).pptxSequence Modeling RNN-Anand Kumar (1).pptx
Sequence Modeling RNN-Anand Kumar (1).pptx
SimhadriSevitha1
 
Recurrent Neural Network
Recurrent Neural NetworkRecurrent Neural Network
Recurrent Neural Network
Mohammad Sabouri
 
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
changedaeoh
 
A Survey of Convolutional Neural Networks
A Survey of Convolutional Neural NetworksA Survey of Convolutional Neural Networks
A Survey of Convolutional Neural Networks
Rimzim Thube
 
CNN, Deep Learning ResNet_30_Slide_Presentation.pptx
CNN, Deep Learning ResNet_30_Slide_Presentation.pptxCNN, Deep Learning ResNet_30_Slide_Presentation.pptx
CNN, Deep Learning ResNet_30_Slide_Presentation.pptx
OnUrTipsIncorporatio
 
Sequence Modelling with Deep Learning
Sequence Modelling with Deep LearningSequence Modelling with Deep Learning
Sequence Modelling with Deep Learning
Natasha Latysheva
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architectures
ananth
 
DSRLab seminar Introduction to deep learning
DSRLab seminar   Introduction to deep learningDSRLab seminar   Introduction to deep learning
DSRLab seminar Introduction to deep learning
Poo Kuan Hoong
 
04 Deep CNN (Ch_01 to Ch_3).pptx
04 Deep CNN (Ch_01 to Ch_3).pptx04 Deep CNN (Ch_01 to Ch_3).pptx
04 Deep CNN (Ch_01 to Ch_3).pptx
ZainULABIDIN496386
 
Sequence Model pytorch at colab with gpu.pdf
Sequence Model pytorch at colab with gpu.pdfSequence Model pytorch at colab with gpu.pdf
Sequence Model pytorch at colab with gpu.pdf
FEG
 
Lecture on Deep Learning
Lecture on Deep LearningLecture on Deep Learning
Lecture on Deep Learning
Yasas Senarath
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
ananth
 
240115_Attention Is All You Need (2017 NIPS).pptx
240115_Attention Is All You Need (2017 NIPS).pptx240115_Attention Is All You Need (2017 NIPS).pptx
240115_Attention Is All You Need (2017 NIPS).pptx
thanhdowork
 
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Márton Miháltz
 
Unit-5.pptx notes for artificial intelligence
Unit-5.pptx notes for artificial intelligenceUnit-5.pptx notes for artificial intelligence
Unit-5.pptx notes for artificial intelligence
ChandanBGowda2
 
Deep Learning: Application & Opportunity
Deep Learning: Application & OpportunityDeep Learning: Application & Opportunity
Deep Learning: Application & Opportunity
iTrain
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspective
Anirban Santara
 
Complete solution for Recurrent neural network.pptx
Complete solution for Recurrent neural network.pptxComplete solution for Recurrent neural network.pptx
Complete solution for Recurrent neural network.pptx
ArunKumar674066
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
Junaid Bhat
 
10.0 SequenceModeling-merged-compressed_edited.pptx
10.0 SequenceModeling-merged-compressed_edited.pptx10.0 SequenceModeling-merged-compressed_edited.pptx
10.0 SequenceModeling-merged-compressed_edited.pptx
ykchia03
 
Sequence Modeling RNN-Anand Kumar (1).pptx
Sequence Modeling RNN-Anand Kumar (1).pptxSequence Modeling RNN-Anand Kumar (1).pptx
Sequence Modeling RNN-Anand Kumar (1).pptx
SimhadriSevitha1
 
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
changedaeoh
 
A Survey of Convolutional Neural Networks
A Survey of Convolutional Neural NetworksA Survey of Convolutional Neural Networks
A Survey of Convolutional Neural Networks
Rimzim Thube
 
CNN, Deep Learning ResNet_30_Slide_Presentation.pptx
CNN, Deep Learning ResNet_30_Slide_Presentation.pptxCNN, Deep Learning ResNet_30_Slide_Presentation.pptx
CNN, Deep Learning ResNet_30_Slide_Presentation.pptx
OnUrTipsIncorporatio
 
Sequence Modelling with Deep Learning
Sequence Modelling with Deep LearningSequence Modelling with Deep Learning
Sequence Modelling with Deep Learning
Natasha Latysheva
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architectures
ananth
 
DSRLab seminar Introduction to deep learning
DSRLab seminar   Introduction to deep learningDSRLab seminar   Introduction to deep learning
DSRLab seminar Introduction to deep learning
Poo Kuan Hoong
 
04 Deep CNN (Ch_01 to Ch_3).pptx
04 Deep CNN (Ch_01 to Ch_3).pptx04 Deep CNN (Ch_01 to Ch_3).pptx
04 Deep CNN (Ch_01 to Ch_3).pptx
ZainULABIDIN496386
 
Sequence Model pytorch at colab with gpu.pdf
Sequence Model pytorch at colab with gpu.pdfSequence Model pytorch at colab with gpu.pdf
Sequence Model pytorch at colab with gpu.pdf
FEG
 
Lecture on Deep Learning
Lecture on Deep LearningLecture on Deep Learning
Lecture on Deep Learning
Yasas Senarath
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
ananth
 
240115_Attention Is All You Need (2017 NIPS).pptx
240115_Attention Is All You Need (2017 NIPS).pptx240115_Attention Is All You Need (2017 NIPS).pptx
240115_Attention Is All You Need (2017 NIPS).pptx
thanhdowork
 
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Márton Miháltz
 
Unit-5.pptx notes for artificial intelligence
Unit-5.pptx notes for artificial intelligenceUnit-5.pptx notes for artificial intelligence
Unit-5.pptx notes for artificial intelligence
ChandanBGowda2
 
Deep Learning: Application & Opportunity
Deep Learning: Application & OpportunityDeep Learning: Application & Opportunity
Deep Learning: Application & Opportunity
iTrain
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspective
Anirban Santara
 
Ad

Recently uploaded (20)

Structural Response of Reinforced Self-Compacting Concrete Deep Beam Using Fi...
Structural Response of Reinforced Self-Compacting Concrete Deep Beam Using Fi...Structural Response of Reinforced Self-Compacting Concrete Deep Beam Using Fi...
Structural Response of Reinforced Self-Compacting Concrete Deep Beam Using Fi...
Journal of Soft Computing in Civil Engineering
 
Mathematical foundation machine learning.pdf
Mathematical foundation machine learning.pdfMathematical foundation machine learning.pdf
Mathematical foundation machine learning.pdf
TalhaShahid49
 
Machine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptxMachine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptx
rajeswari89780
 
Smart_Storage_Systems_Production_Engineering.pptx
Smart_Storage_Systems_Production_Engineering.pptxSmart_Storage_Systems_Production_Engineering.pptx
Smart_Storage_Systems_Production_Engineering.pptx
rushikeshnavghare94
 
new ppt artificial intelligence historyyy
new ppt artificial intelligence historyyynew ppt artificial intelligence historyyy
new ppt artificial intelligence historyyy
PianoPianist
 
Introduction to Zoomlion Earthmoving.pptx
Introduction to Zoomlion Earthmoving.pptxIntroduction to Zoomlion Earthmoving.pptx
Introduction to Zoomlion Earthmoving.pptx
AS1920
 
Reagent dosing (Bredel) presentation.pptx
Reagent dosing (Bredel) presentation.pptxReagent dosing (Bredel) presentation.pptx
Reagent dosing (Bredel) presentation.pptx
AlejandroOdio
 
Introduction to FLUID MECHANICS & KINEMATICS
Introduction to FLUID MECHANICS &  KINEMATICSIntroduction to FLUID MECHANICS &  KINEMATICS
Introduction to FLUID MECHANICS & KINEMATICS
narayanaswamygdas
 
IntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdfIntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdf
Luiz Carneiro
 
Level 1-Safety.pptx Presentation of Electrical Safety
Level 1-Safety.pptx Presentation of Electrical SafetyLevel 1-Safety.pptx Presentation of Electrical Safety
Level 1-Safety.pptx Presentation of Electrical Safety
JoseAlbertoCariasDel
 
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
inmishra17121973
 
DT REPORT by Tech titan GROUP to introduce the subject design Thinking
DT REPORT by Tech titan GROUP to introduce the subject design ThinkingDT REPORT by Tech titan GROUP to introduce the subject design Thinking
DT REPORT by Tech titan GROUP to introduce the subject design Thinking
DhruvChotaliya2
 
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdffive-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
AdityaSharma944496
 
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdfMAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
ssuser562df4
 
Compiler Design Unit1 PPT Phases of Compiler.pptx
Compiler Design Unit1 PPT Phases of Compiler.pptxCompiler Design Unit1 PPT Phases of Compiler.pptx
Compiler Design Unit1 PPT Phases of Compiler.pptx
RushaliDeshmukh2
 
Lidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptx
Lidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptxLidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptx
Lidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptx
RishavKumar530754
 
Oil-gas_Unconventional oil and gass_reseviours.pdf
Oil-gas_Unconventional oil and gass_reseviours.pdfOil-gas_Unconventional oil and gass_reseviours.pdf
Oil-gas_Unconventional oil and gass_reseviours.pdf
M7md3li2
 
QA/QC Manager (Quality management Expert)
QA/QC Manager (Quality management Expert)QA/QC Manager (Quality management Expert)
QA/QC Manager (Quality management Expert)
rccbatchplant
 
International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)
samueljackson3773
 
DSP and MV the Color image processing.ppt
DSP and MV the  Color image processing.pptDSP and MV the  Color image processing.ppt
DSP and MV the Color image processing.ppt
HafizAhamed8
 
Mathematical foundation machine learning.pdf
Mathematical foundation machine learning.pdfMathematical foundation machine learning.pdf
Mathematical foundation machine learning.pdf
TalhaShahid49
 
Machine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptxMachine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptx
rajeswari89780
 
Smart_Storage_Systems_Production_Engineering.pptx
Smart_Storage_Systems_Production_Engineering.pptxSmart_Storage_Systems_Production_Engineering.pptx
Smart_Storage_Systems_Production_Engineering.pptx
rushikeshnavghare94
 
new ppt artificial intelligence historyyy
new ppt artificial intelligence historyyynew ppt artificial intelligence historyyy
new ppt artificial intelligence historyyy
PianoPianist
 
Introduction to Zoomlion Earthmoving.pptx
Introduction to Zoomlion Earthmoving.pptxIntroduction to Zoomlion Earthmoving.pptx
Introduction to Zoomlion Earthmoving.pptx
AS1920
 
Reagent dosing (Bredel) presentation.pptx
Reagent dosing (Bredel) presentation.pptxReagent dosing (Bredel) presentation.pptx
Reagent dosing (Bredel) presentation.pptx
AlejandroOdio
 
Introduction to FLUID MECHANICS & KINEMATICS
Introduction to FLUID MECHANICS &  KINEMATICSIntroduction to FLUID MECHANICS &  KINEMATICS
Introduction to FLUID MECHANICS & KINEMATICS
narayanaswamygdas
 
IntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdfIntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdf
Luiz Carneiro
 
Level 1-Safety.pptx Presentation of Electrical Safety
Level 1-Safety.pptx Presentation of Electrical SafetyLevel 1-Safety.pptx Presentation of Electrical Safety
Level 1-Safety.pptx Presentation of Electrical Safety
JoseAlbertoCariasDel
 
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
inmishra17121973
 
DT REPORT by Tech titan GROUP to introduce the subject design Thinking
DT REPORT by Tech titan GROUP to introduce the subject design ThinkingDT REPORT by Tech titan GROUP to introduce the subject design Thinking
DT REPORT by Tech titan GROUP to introduce the subject design Thinking
DhruvChotaliya2
 
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdffive-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
AdityaSharma944496
 
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdfMAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
ssuser562df4
 
Compiler Design Unit1 PPT Phases of Compiler.pptx
Compiler Design Unit1 PPT Phases of Compiler.pptxCompiler Design Unit1 PPT Phases of Compiler.pptx
Compiler Design Unit1 PPT Phases of Compiler.pptx
RushaliDeshmukh2
 
Lidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptx
Lidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptxLidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptx
Lidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptx
RishavKumar530754
 
Oil-gas_Unconventional oil and gass_reseviours.pdf
Oil-gas_Unconventional oil and gass_reseviours.pdfOil-gas_Unconventional oil and gass_reseviours.pdf
Oil-gas_Unconventional oil and gass_reseviours.pdf
M7md3li2
 
QA/QC Manager (Quality management Expert)
QA/QC Manager (Quality management Expert)QA/QC Manager (Quality management Expert)
QA/QC Manager (Quality management Expert)
rccbatchplant
 
International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)
samueljackson3773
 
DSP and MV the Color image processing.ppt
DSP and MV the  Color image processing.pptDSP and MV the  Color image processing.ppt
DSP and MV the Color image processing.ppt
HafizAhamed8
 
Ad

Convolutional Neural Network and RNN for OCR problem.

  • 1. SEQUENCE-TO-SEQUENCE LEARNING USING DEEP LEARNING FOR OPTICAL CHARACTER RECOGNITION Advisor Dr. Devinder Kaur Presented By Vishal Vijay Shankar Mishra
  • 2. AGENDA • Problem Statement  Converting Mathematical Equations into Latex representation. • Approach (Deep Learning Techniques)  Convolutional Neural Network (CNN)  Recurrent Neural Network (RNN)  Long Term-Short Memory (LTSM)  Attention Model • Introduction to CNN  Gist of Neural Network  Architecture of CNN • CNN layers  Convolution Layer  Non-Linear Activation Layer (ReLu)  Pooling Layer • Hyper-Parameters. • Introduction to RNN  Architecture of RNN  Working of RNN  RNN Example • Drawback of RNN • LSTM  Architecture of LSTM  Working of LSTM  LSTM Example • Proposed Model • Results and Future work • Conclusion
  • 3. AGENDA • Problem Statement  Converting Mathematical Equations into Latex representation. • Approach (Deep Learning Techniques)  Convolutional Neural Network (CNN)  Recurrent Neural Network (RNN)  Long Term-Short Memory (LTSM)  Attention Model • Introduction to CNN  Gist of Neural Network  Architecture of CNN • CNN layers  Convolution Layer  Non-Linear Activation Layer (ReLu)  Pooling Layer • Hyper-Parameters. • Introduction to RNN  Architecture of RNN  Working of RNN  RNN Example • Drawback of RNN • LSTM  Architecture of LSTM  Working of LSTM  LSTM Example • Proposed Model • Results and Future work • Conclusion
  • 4. PROBLEM STATEMENT •In this thesis, I have implemented a sequence-to- sequence analysis using Deep Learning for Optical Character Recognition. •I have used the images of the mathematical equations to convert it into LATEX representation.
  • 5. AGENDA • Problem Statement  Converting Mathematical Equations into Latex representation. • Approach (Deep Learning Techniques)  Convolutional Neural Network (CNN)  Recurrent Neural Network (RNN)  Long Term-Short Memory (LTSM)  Attention Model • Introduction to CNN  Gist of Neural Network  Architecture of CNN • CNN layers  Convolution Layer  Non-Linear Activation Layer (ReLu)  Pooling Layer • Hyper-Parameters. • Introduction to RNN  Architecture of RNN  Working of RNN  RNN Example • Drawback of RNN • LSTM  Architecture of LSTM  Working of LSTM  LSTM Example • Proposed Model • Results and Future work • Conclusion
  • 6. APPROACH (DEEP LEARNING TECHNIQUES) • To accomplish this research work, I have used the following deep learning techniques. Convolutional Neural Network (CNN) Recurrent Neural Network (RNN) Long Term-Short Memory (LSTM) Attention model. • In the subsequent slides, I’ll try to give the gist of the techniques.
  • 7. WHAT IS DEEP NEURAL NETWORK? • Deep Neural Networks are those networks that have more than 2 layer to perform the task.
  • 8. WHY DO WE NEED DEEP NEURAL NETWORK? • Neural nets tend to be computationally expensive for data with simple patterns; in such cases you should use a model like Logistic Regression or an SVM. • As the pattern complexity increases, neural nets start to outperform other machine learning methods. • At the highest levels of pattern complexity –for example high-resolution images • – neural nets with a small number of layers will require a number of nodes that grows exponentially with the number of unique patterns. Even then, the net would likely take excessive time to train, or simply would fail to
  • 9. AGENDA • Problem Statement  Converting Mathematical Equations into Latex representation. • Approach (Deep Learning Techniques)  Convolutional Neural Network (CNN)  Recurrent Neural Network (RNN)  Long Term-Short Memory (LTSM)  Attention Model • Introduction to CNN  Gist of Neural Network  Architecture of CNN • CNN layers  Convolution Layer  Non-Linear Activation Layer (ReLu)  Pooling Layer • Hyper-Parameters. • Introduction to RNN  Architecture of RNN  Working of RNN  RNN Example • Drawback of RNN • LSTM  Architecture of LSTM  Working of LSTM  LSTM Example • Proposed Model • Results and Future work • Conclusion
  • 15. INTRODUCTION TO CNN • Architecture of CNN
  • 34. AGENDA • Problem Statement  Converting Mathematical Equations into Latex representation. • Approach (Deep Learning Techniques)  Convolutional Neural Network (CNN)  Recurrent Neural Network (RNN)  Long Term-Short Memory (LTSM)  Attention Model • Introduction to CNN  Gist of Neural Network  Architecture of CNN • CNN layers  Convolution Layer  Non-Linear Activation Layer (ReLu)  Pooling Layer • Hyper-Parameters. • Introduction to RNN  Architecture of RNN  Working of RNN  RNN Example • Drawback of RNN • LSTM  Architecture of LSTM  Working of LSTM  LSTM Example • Proposed Model • Results and Future work • Conclusion
  • 64. NON-LINEAR ACTIVATION LAYER (RELU) • ReLU is a non-linear activation function, which is used to apply elementwise non-linearity. • ReLU layer applies an activation function to each element, such as the max(0, x) thresholding to zero.
  • 68. AGENDA • Problem Statement  Converting Mathematical Equations into Latex representation. • Approach (Deep Learning Techniques)  Convolutional Neural Network (CNN)  Recurrent Neural Network (RNN)  Long Term-Short Memory (LTSM)  Attention Model • Introduction to CNN  Gist of Neural Network  Architecture of CNN • CNN layers  Convolution Layer  Non-Linear Activation Layer (ReLu)  Pooling Layer • Hyper-Parameters. • Introduction to RNN  Architecture of RNN  Working of RNN  RNN Example • Drawback of RNN • LSTM  Architecture of LSTM  Working of LSTM  LSTM Example • Proposed Model • Results and Future work • Conclusion
  • 69. HYPER-PARAMETERS • Convolution • Filter Size • Number of Filters • Padding • Stride • Pooling • Filter Size • Stride • Fully Connected • Number of neurons
  • 70. AGENDA • Problem Statement  Converting Mathematical Equations into Latex representation. • Approach (Deep Learning Techniques)  Convolutional Neural Network (CNN)  Recurrent Neural Network (RNN)  Long Term-Short Memory (LTSM)  Attention Model • Introduction to CNN  Gist of Neural Network  Architecture of CNN • CNN layers  Convolution Layer  Non-Linear Activation Layer (ReLu)  Pooling Layer • Hyper-Parameters. • Introduction to RNN  Architecture of RNN  Working of RNN  RNN Example • Drawback of RNN • LSTM  Architecture of LSTM  Working of LSTM  LSTM Example • Proposed Model • Results and Future work • Conclusion
  • 71. INTRODUCTION TO RNN • RNNs are a type of artificial neural network designed to recognize the patterns in sequences of data. It is used to process the data sequentially. • Why can’t we accomplish this task with Feed forward network? • The drawback of feed forward network is that, it doesn't remember the inputs over the period of time. • To process the data sequentially we need a network that behave recurrently. • Architecture of RNN RNN have loop
  • 72. • RNNs are not all that different than Neural Network. RNN can be thought of as multiple copies of the same network, each passing a message to a successor. An unrolled RNN is shown below. • In fast last few years, there have been incredible success applying RNNs to a variety of problems: speech recognition, language modeling, translation, image captioning…. The list goes on. An Unrolled RNN
  • 73. DRAWBACK OF AN RNN • RNN has a problem of long term dependency. It doesn’t remember the inputs after certain time steps. • This problem occurs due to gradient exploding or gradient vanishing while performing backpropagation. • I’ll try to example this problem with an example. • Let’s consider a language model trying to predict the next word based on the previous ones. • For example “the clouds are in the sky ”. So in order to predict the sky we don’t need any further context. In such cases, where the gap between the relevant information and the place that it’s needed is small, RNNs can learn to use the past information
  • 74. • But there are also cases where we need to know more context of the input. • For example “I grew up in France ……………………………. I speak fluent French” . • Unfortunately, as that gap grows, RNNs become unable to learn to connect the information. • This happens due to vanishing gradient and exploding gradient problem.
  • 77. HOW TO OVERCOME THESE CHALLENGES? • For Vanishing gradient we can use, ReLu activation function: We can use activation functions like ReLU, which gives output one while calculating gradient. LSTM, GRUs : Different network architectures that has been specially designed can be used to combat this problem. • For Exploding gradient we can use, Clip gradients at threshold: clip the gradient when it goes higher than a threshold.
  • 78. AGENDA • Problem Statement  Converting Mathematical Equations into Latex representation. • Approach (Deep Learning Techniques)  Convolutional Neural Network (CNN)  Recurrent Neural Network (RNN)  Long Term-Short Memory (LTSM)  Attention Model • Introduction to CNN  Gist of Neural Network  Architecture of CNN • CNN layers  Convolution Layer  Non-Linear Activation Layer (ReLu)  Pooling Layer • Hyper-Parameters. • Introduction to RNN  Architecture of RNN  Working of RNN  RNN Example • Drawback of RNN • LSTM  Architecture of LSTM  Working of LSTM  LSTM Example • Proposed Model • Results and Future work • Conclusion
  • 79. LONG SHORT-TERM MEMORY (LSTM) • Long Short Term Memory network – usually just called “LSTM” – are a special kind of RNN • They are capable of learning long-term dependencies.
  • 80. ARCHITECTURE OF LSTM • Why LSTM is different then RNN, because LSTM has cell state that deals with long term dependencies.
  • 81. WORKING OF LSTM • Step 1: The first step in the LSTM is to identify those information that are not required and will be thrown away from the cell state. This decision is made of neural network with sigmoid activation function called forget gate layer. • 𝑊𝑓 = weight • ℎ 𝑡−1 = output from the previous time step • 𝑋𝑡 = New input • 𝑏𝑓 = bias 𝑓𝑡 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝑊𝑓 ℎ 𝑡−1, 𝑋𝑡 + 𝑏𝑓
  • 82. WORKING OF LSTM • Step 2: The next step is to decide, what new information we’re going to store in the cell state. This whole process comprises of following steps. A NN layer with sigmoid sigmoid called the “input gate layer” decides which values will be updated. Next, a NN layer with tanh activation creates a new vector that could be added to the state. • In the next step, we’ll combine these two to • V update the state. 𝑖 𝑡 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝑊𝑖 ℎ 𝑡−1, 𝑋𝑡 + 𝑏𝑖 𝐶𝑡′ = 𝑡𝑎𝑛ℎ(𝑊𝑐 ℎ 𝑡−1, 𝑋𝑡 + 𝑏 𝑐
  • 83. WORKING OF LSTM • Step 3: Now, we will update the old cell states, Ct-1 into the new cell state Ct. First, we multiply the old state (Ct-1) by ft, forgetting the things we decided to forget earlier. Then, we add it * 𝐶𝑡 ′ . This is the new vector values, scaled by how much we decided to update each cell state value. 𝐶𝑡 = 𝑓𝑡 ∗ 𝐶𝑡−1 + 𝑖 𝑡 ∗ 𝐶𝑡′
  • 84. WORKING OF LSTM • Step 4: We will run a sigmoid layer which decides what parts of the cell state we’re going to output. Then, we put the cell state through tanh (push the values to be between -1 and 1) and multiply it by the output of the sigmoid gate, so that we only output the parts we decided to. 𝑂𝑡 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝑊𝑜 ℎ 𝑡−1, 𝑋𝑡 + 𝑏 𝑜 ℎ 𝑡 = 𝑂𝑡 ∗ tan h( 𝐶𝑡 s
  • 85. PROPOSED LSTM VARIANT WITH PEEPHOLE CONNECTION • The easiest but very powerful solution to the conventional LSTM unit is to introduce a weighted “peephole” connections from the cell state unit (Ct-1) to all the gates in the same memory unit. The peephole connections allow every gate to assess the current cell state even though the output gate is closed.
  • 86. STOCHASTIC “HARD” ATTENTION MODEL • With an attention mechanism, the image is first divided into n, parts, and we compute with a CNN representations of each part y1,y2………yn. When LSTM is generating a new word, the attention mechanism is focusing on the relevant part of the image, so the decoder only uses specific parts of the image. • In stochastic process like Hard attention mechanism, rather than using all the hidden states yt as an input for the decoding, the process finds the probabilities of a hidden state with respect to location variable 𝑠𝑡. The gradients are obtained by reinforcement learning.
  • 87. AGENDA • Problem Statement  Converting Mathematical Equations into Latex representation. • Approach (Deep Learning Techniques)  Convolutional Neural Network (CNN)  Recurrent Neural Network (RNN)  Long Term-Short Memory (LTSM)  Attention Model • Introduction to CNN  Gist of Neural Network  Architecture of CNN • CNN layers  Convolution Layer  Non-Linear Activation Layer (ReLu)  Pooling Layer • Hyper-Parameters. • Introduction to RNN  Architecture of RNN  Working of RNN  RNN Example • Drawback of RNN • LSTM  Architecture of LSTM  Working of LSTM  LSTM Example • Proposed Model • Results • Conclusion and Future work
  • 90. • Original Image • Predicted Latex: • Rendered Predicted Image: Actual Test results on Test set:
  • 91. RESULTS • The proposed method is compared with the previous two methods called INFTY and WYGIWYS on the bases of BLEU (Bilingual evaluation understudy) metric and Exact Match. BLEU is a metric to evaluate the quality for the predicted Latex markup representation of the image. Exact Match is the metric which represents the percentage of the images classified correctly. • It can be seen that the proposed method scores better than the previous methods. The proposed model generated results close to 76% which is the highest in this research area. Previously, the highest result was around 75% achieved by WYGIWYS (What You Get Is What You See) model. The BLEU and Exact Match scores of the proposed model are slightly above the existing model however, this is a significant achievement considering the low GPU resources and small dataset. Model Preprocessing BLEU Exact Match INFTY - 51.20 15.60 WYGIWYS Tokenize 73.71 74.46 PROPOSED MODEL Tokenize 75.08 75.87
  • 92. Actual Test results on Test set:
  • 93. FUTURE WORK. • For possible future work, this research can be scaled from printed mathematical formulas images to the hand written mathematical formulas images. To recognize the hand written mathematical formulas, one can implement the bidirectional LSTM with CNN • This model can be used to solve the mathematical question based on formulas. • An API (Application Program Interface) can be created to solve the mathematical problems.
  • 94. REFERENCES. [1] R. 1. Anderson, Syntax-directed recognition of handprinted mathematics., CA: Symposium, 1967. [2] K. Cho, A. Courville and Y. Bengio, "Describing Multimedia Content Using Attention-Based Encoder-Decoder Networks," in IEEE, CA, 2015. [3] A. K. a. E. Learned-Miller, "Learning on the Fly: Font-Free Approaches to Difficult OCR Problems," MA, 2000. [4] D. Lopresti, "Optical Character Recognition Errors and Their Effects on Natural Language Processing," International Journal on Document Analysis and Recognition, 19 12 [5] WILDML, "WILDML," [Online]. Available: http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/. [6] S. a. S. J. Hochreiter, "Long Short-Term Memory. Neural Computation," 1997. [7] S. Yan, "Understanding LSTM and its diagrams," Software engineer & wantrepreneur. Interested in computer graphics, bitcoin and deep learning., 13 03 2016. [Online]. [1] https://medium.com/@shiyan/understanding-lstm-and-its-diagrams-37e2f46f1714. [8] C. R. a. D. P. W. Ellis, "FEED-FORWARD NETWORKS WITH ATTENTION CAN SOLVE SOME LONG-TERM MEMORY PROBLEMS," ICLR, 2016. [9] a. F.-F. L. Karpathy, Image captioning., 2015. [10] F. A. Gers, N. N. Schraudolph and J. Schmidhuber, "Learning Precise Timing with LSTM Recurrent Networks," Journal of Machine Learning Reserach, 8 2002.

Editor's Notes

  • #15: In addition these type of networks don’t take into account or understand the relation between the space and the pixels in the images. Particular within images we know that pixels in near by space are much more correlated than those further part. So these networks by being fully connected don’t take this sort of consideration into account. So what we are going to do by our understanding about space relation, we are going to delete some connections
  • #17: . So instead of fully connected layer. Now we have the units in the hidden layer are connected to the near by pixels in the input layer.
  • #18: Now instead of weight we call these values as filters
  • #74: Point 5 it is obvious that the next word is going to be sky. In such a ca
  • #78: Truncated BTT RMSprop to adjust learning rate.