SlideShare a Scribd company logo
The brain’s guide to dealing with context in
language understanding
Ted Willke, Javier Turek, and Vy Vo
Intel Labs
November 8th, 2019
Alex Huth and Shailee Jain
UT-Austin
Natural Language Understanding
!2
A form of natural language processing that deals with machine
reading comprehension.
Example:
“The problem to be solved is: Tom has twice as
many fish as Mary has guppies. If Mary has 3
guppies, what is the number of fish Tom has?”
(D.G. Bobrow, 1964)
A 1960’s example
!3
“The problem to be
solved is: If the
number of customers
Tom gets is twice the
square of 20 percent
of the number of
advertisements he
runs, and the number
of advertisements he
runs is 45, what is
the number of
customers Tom gets?”
Input Text
“The number (of/op)
customers Tom (gets/
verb) is 2 (times/op 1)
the (square/op 1) of 20
(percent/op 2) (of/op)
the number (of/op)
advertisements (he/pro)
runs (period/dlm) The
number (of/op)
advertisements (he/pro)
runs is 45 (period/dlm)
(what/qword) is the
number (of/op)
customers Tom (gets/
verb) (qmark/DLM)”
NLP
(Lisp example)
Canonical sentences, with mark-up
NLU
Answer
“The number of
customers Tom
gets is 162”
NLU derives meaning from

the lexicon, grammar and
context.
E.g., what is the meaning of

“(he/pro) runs” here?
(D.G. Bobrow, 1964)
Applications of NLU
!4
Super-valuable stuff!
Machine translation Question answering
(The Stanford Question Answering Dataset 2.0)
Machine reasoning
(Arista, Allen AI)(Google Translate)
(Even visual!)
(Zhu et al., 2015)
The importance of context in language understanding
•Retaining information about
narratives is key to effective
comprehension.
•This information must be:
•Represented
•Organized
•Effectively applied
https://rajpurkar.github.io/SQuAD-explorer/explore/v2.0/dev/Economic_inequality.html
The brain is great at this. What can it teach us?
Key questions for this talk
How does the brain organize and represent narratives?
What can deep learning models tell us about the brain?
Are the more effective ones more brain-like?
How well do deep learning models deal with narrative context?
Key questions for this talk
How does the brain organize and represent narratives?
What can deep learning models tell us about the brain?
Are the more effective ones more brain-like?
How well do deep learning models deal with narrative context?
The brain’s organization
!8
In order to understand language, the human brain explicitly
represents information at a hierarchy of different timescales
across different brain areas
•Early stages: auditory processing in
milliseconds to words at sub-second
Representations at long timescales shown to exist in separate
brain areas but little is known about their structure and format.
(Lerner et al., 2011)
•Later stages: derive meaning by
combining information across minutes
and hours
Key questions for this talk
How does the brain organize and represent narratives?
How well do deep learning models deal with narrative context?
What can deep learning models tell us about the brain?
Are the more effective ones more brain-like?
A look at recent state-of-the-art models
Recurrent Neural Networks
Temporal Convolutional Networks
Transformer Networks
Evaluating the performance of these models
•Sequence modeling
Given an input sequence x0, . . . , xT
and desired corresponding outputs (predictions) y0, . . . , yT
we wish to learn a function ̂y0, . . . , ̂yT = f(x0, . . . , xT)
where depends only on past inputs (causal).x0, . . . , xtyt
Use as a proxy to study the performance of backbone models for NLU
E.g., predicting next character

or word
•Sequence modeling applied to language is language modeling
•Self-supervised, basis for many other NLP tasks, and exploits context for prediction
Example sequence modeling tasks
•Add: Add two numbers that are marked in a long sequence, and output
the sum after a delay
•Copy: Copy a short sequence that appears much earlier in a long
sequence
•Classify (MNIST): Given a sequence of pixel values from MNIST
(784x1), predict the corresponding digit (0-9)
•Predict word (LAMBADA): Given a dataset of 10K passages from
novels, with average context of 4.6 sentences, predict the last word of a
target sentence
A look at recent state-of-the-art models
Recurrent Neural Networks
Temporal Convolutional Networks
Transformer Networks
Using recurrence to solve the problem
!14
Can process a sequence of vectors by applying

a recurrence formula at each time step:
xt
ht = fW(ht−1, xt)
new state some function

with params, W
old state input vector at time t
The same function and parameters are used at every time step!
Example:
Character-level
language model
!15
Predicting the next
character…
Vocabulary:
[h,e,l,o]
Training sequence:
“hello”
(Example adapted from Stanford’s excellent CS231n course. Thank you Fei-Fei Li, Justin Johnson, and Serena Young!)
Example:
Character-level
language model

sampling
!16
Vocabulary:
[h,e,l,o]
At test time,
sample characters

one at a time and

feed back to model
- Vanishing and exploding gradient problem
- Smaller weight given to long-term interactions
Dealing with longer timescales
!17
• Learning long-term dependencies is difficult
- Little training success for sequences > 10-20 in
length
• Solution: Gated RNNs
- Control over timescale of integration of feedback
- Eliminates repeated matrix multiplies
singular value < 1 singular value > 1
One possible solution: LSTM
• Long Short-Term Memory
!18
- Provides uninterrupted gradient flow
- Solves the problem at the expense of more
parameters
• As revolutionary for sequential processing as
CNNs were for spatial processing
- Toy problems: long sequence recall, long-distance
interactions (math), classification and ordering of
widely-separated symbols, noisy inputs, etc.
- Real applications: natural machine translation, text-to-
speech, music and handwriting generation
!19
Multilayer RNNs
depth
time
hl
t = tanh Wl
(
hl
t−1
hl−1
t )
h ∈ ℝn
Wl
= [n × 2n]
Writing Shakespeare
!20
Multi-layer RNN:

3-layers with 512 hidden nodes
…
…
…
…
…
depth
time
!21
At first:
and further…
train further…
and further….
(Andrej Karpathy’s blog: The Unreasonable Effectiveness of Recurrent Neural Networks)
!22
After a few hours of training:
(Andrej Karpathy’s blog: The Unreasonable Effectiveness of Recurrent Neural Networks)
!23(Andrej Karpathy’s blog: The Unreasonable Effectiveness of Recurrent Neural Networks)
The Stacks Project: Open source textbook on algebraic geometry
•Latex source!
•455910 lines of code
Can RNNs learn complex

syntactic structures?
!24(Andrej Karpathy’s blog: The Unreasonable Effectiveness of Recurrent Neural Networks)
Algebraic Geometry (Latex)
Generates nearly compilable Latex!
!25(Andrej Karpathy’s blog: The Unreasonable Effectiveness of Recurrent Neural Networks)
Algebraic Geometry (Latex)
!26(Andrej Karpathy’s blog: The Unreasonable Effectiveness of Recurrent Neural Networks)
Algebraic Geometry (Latex)
Too long term of a dependency?
Never closes!
!27(Andrej Karpathy’s blog: The Unreasonable Effectiveness of Recurrent Neural Networks)
Code generation?
•Concatenated into a

giant file (474 MB of C)
•10 million parameter RNN
!28(Andrej Karpathy’s blog: The Unreasonable Effectiveness of Recurrent Neural Networks)
•Concatenated into a

giant file (474 MB of C)
•10 million parameter RNN
Comments here and there
Proper syntax for strings and pointers
Correctly learns to use brackets
Often uses undefined variables!
Declares variables it never uses!
!29(Andrej Karpathy’s blog: The Unreasonable Effectiveness of Recurrent Neural Networks)
Within scope
But vacuous!
Another problem with long-term dependencies
A look at recent state-of-the-art models
Recurrent Neural Networks
Temporal Convolutional Networks
Transformer Networks
!31
Temporal Convolutional Neural Networks
(Bai et al., 2018)
TCN = 1D FCN + causal convolution
Benefits:
• Parallelism!
• Flexible receptive field size
• Stable gradients
• Low memory for training
• Variable input lengths
Details:
• Uses dilated convolutions for exponential receptive field vs depth
• Effective history is and , where is the layer number
• Uses residuals, ReLUs, and weight normalization
• Spatial dropout
k(1 − d) d = 𝒪(2i
) i
!32
TCNs versus LSTMs
(Bai et al., 2018)
The ‘unlimited memory’ of LSTMs is quite limited
compared to the expansive receptive field of the
generic TCN.
Copy memory task (last 10 elements evaluated)
A look at recent state-of-the-art models
Recurrent Neural Networks
Temporal Convolutional Networks
Transformer Networks
!34
Transformer Networks
(Vaswani et al., 2017)
Relies entirely on attention to compute

representations!
Details:
• Encoder-decoder structure and auto-regressive model
• Multi-headed self-attention mechanisms
• FC feed forward networks applied to each position separately and identically
• Input and output embeddings used
• No recurrence and no convolution, so must inject positional encodings
Benefits:
• Low computational complexity
• Highly-parallelizable computation
• Low ‘path length’ for long-term

dependencies
Attention(Q, K, V) = softmax
(
QKT
dk )
V
Decoder attends

to all positions

in input seq
Encoder has

self-attention

for each layer
Decoder also has

self-attention masked

for causality
!35
Why self-attention?
(Vaswani et al., 2017)
is the sequence length, is the representation dimension, is the kernel size

for convolutions, and is the neighborhood size in restricted attention.
n d k
r
It’s not only the length of context that matters, but also the ease by which it
can be accessed.
longer path

lengths
d > n
more

ops
!36
Transformers vs TCNs
(Vaswani et al., 2017)
Google’s TCN for NMT
Even with a relative-limited context
(e.g., 128), Transformers win.
FAIR’s TCN with attention
Machine Translation
(Dai et al., 2019)
But with a segment-level recurrence mechanism,
it is freed of fixed context lengths and it soars.
Transformer-XL
WikiText-103 word-level sequence modeling
!37
Transformer-XL
(Dai et al., 2019)
Continued gains in performance to 1000+ contexts
Total hallucination!

(but nice generalization)
Key questions for this talk
How does the brain organize and represent narratives?
How well do deep learning models deal with narrative context?
What can deep learning models tell us about the brain?
Are the more effective ones more brain-like?
Are deep neural networks organized by timescale?
!39
=
?
Neural Network
Neural Network
Neural Network
The boy went out to fly an _____
airplane
short
intermediate
long
timescale
The methodology
!40
Story Neural models Neural activations
Goal: Determine how well NN layer activations predict fMRI data (regression).
Predicting brain activity with encoding models
!41
Eickenberg et al., NeuroImage 2017
Kell et al., Neuron 2018
Relative predictive power of models
!42(Jain et al., 2018)
LSTM vs Embedding
(Jain et al., unpublished)
Transformer vs Embedding
Layer-specific correlations for LSTM
!43
(Jain et al., 2018)
Low-level

speech processing

region
Higher

semantic region
white = no layer preference
Open questions
!44
Why do LSTMs perform so poorly?
Not all that predictive.
Not exhibiting layer-specific correlations.
Do TCNs and Transformers exhibit multi-timescale characteristics?
Layer-specific correlations for Transformer
!45
layers
early late
(Jain et al., unpublished)
Yes!
Layer-specific correlations for Transformer
!46
(Jain et al., unpublished)TCNs look similar.
Encoding model performance for Transformer
!47
• Averaged across 3 subjects
• Contextual models from all layers
outperform embedding
• Increasing context length (to a
point) helps all layers
• Long context representations are
still missing information!
TCNs exhibit similar characteristics but do not seem to learn the same representations.
(Jain et al., unpublished)
Summary and Challenges
!48
•The brain’s language pathway is organized into a multi-timescale hierarchy, making it
very effective at utilizing context
•Language models are catching up, with Transformer-XL in the lead
•TCNs and Transformers indeed have explicit multi-timescale hierarchies
- Last layers have lower predictive performance, why?
- How to get more out of context at longer timescales?
- Lack of clear timescales in RNNs should lead to a revisiting of their depth
characteristics. (E.g., see Turek et al. 2019, https://arxiv.org/abs/1909.00021)
•More study needed on representations
- What specific information is captured in representations across the cortex?
- Are the same representations found across deep learning architectures?
!49
Thank you!
ted.willke@intel.com
NeurIPS Workshop on
Context and Compositionality in Biological and Artificial Neural Systems
Saturday, December 14th, 2019
https://context-composition.github.io/
Ad

More Related Content

What's hot (20)

Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLP
Satyam Saxena
 
NLP Bootcamp
NLP BootcampNLP Bootcamp
NLP Bootcamp
Anuj Gupta
 
AINL 2016: Filchenkov
AINL 2016: FilchenkovAINL 2016: Filchenkov
AINL 2016: Filchenkov
Lidia Pivovarova
 
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Universitat Politècnica de Catalunya
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
Bhaskar Mitra
 
AINL 2016: Castro, Lopez, Cavalcante, Couto
AINL 2016: Castro, Lopez, Cavalcante, CoutoAINL 2016: Castro, Lopez, Cavalcante, Couto
AINL 2016: Castro, Lopez, Cavalcante, Couto
Lidia Pivovarova
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for Retrieval
Bhaskar Mitra
 
Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!
Roelof Pieters
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
Roelof Pieters
 
LSTM Tutorial
LSTM TutorialLSTM Tutorial
LSTM Tutorial
Ralph Schlosser
 
A Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information RetrievalA Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information Retrieval
Bhaskar Mitra
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ers
Roelof Pieters
 
A Brief Introduction on Recurrent Neural Network and Its Application
A Brief Introduction on Recurrent Neural Network and Its ApplicationA Brief Introduction on Recurrent Neural Network and Its Application
A Brief Introduction on Recurrent Neural Network and Its Application
Xiaohu ZHU
 
Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10)
Larry Guo
 
Recurrent Neural Network
Recurrent Neural NetworkRecurrent Neural Network
Recurrent Neural Network
Mohammad Sabouri
 
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
Universitat Politècnica de Catalunya
 
Talk from NVidia Developer Connect
Talk from NVidia Developer ConnectTalk from NVidia Developer Connect
Talk from NVidia Developer Connect
Anuj Gupta
 
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
Understanding RNN and LSTM
Understanding RNN and LSTMUnderstanding RNN and LSTM
Understanding RNN and LSTM
健程 杨
 
Electricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural NetworksElectricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural Networks
Taegyun Jeon
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLP
Satyam Saxena
 
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Universitat Politècnica de Catalunya
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
Bhaskar Mitra
 
AINL 2016: Castro, Lopez, Cavalcante, Couto
AINL 2016: Castro, Lopez, Cavalcante, CoutoAINL 2016: Castro, Lopez, Cavalcante, Couto
AINL 2016: Castro, Lopez, Cavalcante, Couto
Lidia Pivovarova
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for Retrieval
Bhaskar Mitra
 
Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!
Roelof Pieters
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
Roelof Pieters
 
A Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information RetrievalA Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information Retrieval
Bhaskar Mitra
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ers
Roelof Pieters
 
A Brief Introduction on Recurrent Neural Network and Its Application
A Brief Introduction on Recurrent Neural Network and Its ApplicationA Brief Introduction on Recurrent Neural Network and Its Application
A Brief Introduction on Recurrent Neural Network and Its Application
Xiaohu ZHU
 
Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10)
Larry Guo
 
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
Universitat Politècnica de Catalunya
 
Talk from NVidia Developer Connect
Talk from NVidia Developer ConnectTalk from NVidia Developer Connect
Talk from NVidia Developer Connect
Anuj Gupta
 
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
Understanding RNN and LSTM
Understanding RNN and LSTMUnderstanding RNN and LSTM
Understanding RNN and LSTM
健程 杨
 
Electricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural NetworksElectricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural Networks
Taegyun Jeon
 

Similar to Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding (20)

CNN for modeling sentence
CNN for modeling sentenceCNN for modeling sentence
CNN for modeling sentence
ANISH BHANUSHALI
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Saurabh Kaushik
 
Dcnn for text
Dcnn for textDcnn for text
Dcnn for text
捷恩 蔡
 
Generating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural NetworksGenerating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural Networks
Jonathan Mugan
 
AINL 2016: Nikolenko
AINL 2016: NikolenkoAINL 2016: Nikolenko
AINL 2016: Nikolenko
Lidia Pivovarova
 
Neural Network Language Models for Candidate Scoring in Multi-System Machine...
 Neural Network Language Models for Candidate Scoring in Multi-System Machine... Neural Network Language Models for Candidate Scoring in Multi-System Machine...
Neural Network Language Models for Candidate Scoring in Multi-System Machine...
Matīss ‎‎‎‎‎‎‎  
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
Karthik Murugesan
 
[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo
JaeJun Yoo
 
LongT5_Efficient Text-toText Transformer for Long Sequences_san.pptx
LongT5_Efficient Text-toText Transformer for Long Sequences_san.pptxLongT5_Efficient Text-toText Transformer for Long Sequences_san.pptx
LongT5_Efficient Text-toText Transformer for Long Sequences_san.pptx
San Kim
 
Gnerative AI presidency Module1_L4_LLMs_new.pptx
Gnerative AI presidency  Module1_L4_LLMs_new.pptxGnerative AI presidency  Module1_L4_LLMs_new.pptx
Gnerative AI presidency Module1_L4_LLMs_new.pptx
Arunnaik63
 
Applying Deep Learning Machine Translation to Language Services
Applying Deep Learning Machine Translation to Language ServicesApplying Deep Learning Machine Translation to Language Services
Applying Deep Learning Machine Translation to Language Services
Yannis Flet-Berliac
 
MachineLearning_Road to deep learning.pdf
MachineLearning_Road to deep learning.pdfMachineLearning_Road to deep learning.pdf
MachineLearning_Road to deep learning.pdf
ssuser012286
 
rnn_review.10.pdf
rnn_review.10.pdfrnn_review.10.pdf
rnn_review.10.pdf
FlyingColours13
 
Neural Networks-1
Neural Networks-1Neural Networks-1
Neural Networks-1
Sai Kumar Dwivedi
 
Deep learning for nlp
Deep learning for nlpDeep learning for nlp
Deep learning for nlp
Viet-Trung TRAN
 
Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From Scratch
Natasha Latysheva
 
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
NILESH VERMA
 
Artificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computationArtificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computation
Mohammed Bennamoun
 
Molecular autoencoder
Molecular autoencoderMolecular autoencoder
Molecular autoencoder
Dan Elton
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
Amr Rashed
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Saurabh Kaushik
 
Generating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural NetworksGenerating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural Networks
Jonathan Mugan
 
Neural Network Language Models for Candidate Scoring in Multi-System Machine...
 Neural Network Language Models for Candidate Scoring in Multi-System Machine... Neural Network Language Models for Candidate Scoring in Multi-System Machine...
Neural Network Language Models for Candidate Scoring in Multi-System Machine...
Matīss ‎‎‎‎‎‎‎  
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
Karthik Murugesan
 
[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo
JaeJun Yoo
 
LongT5_Efficient Text-toText Transformer for Long Sequences_san.pptx
LongT5_Efficient Text-toText Transformer for Long Sequences_san.pptxLongT5_Efficient Text-toText Transformer for Long Sequences_san.pptx
LongT5_Efficient Text-toText Transformer for Long Sequences_san.pptx
San Kim
 
Gnerative AI presidency Module1_L4_LLMs_new.pptx
Gnerative AI presidency  Module1_L4_LLMs_new.pptxGnerative AI presidency  Module1_L4_LLMs_new.pptx
Gnerative AI presidency Module1_L4_LLMs_new.pptx
Arunnaik63
 
Applying Deep Learning Machine Translation to Language Services
Applying Deep Learning Machine Translation to Language ServicesApplying Deep Learning Machine Translation to Language Services
Applying Deep Learning Machine Translation to Language Services
Yannis Flet-Berliac
 
MachineLearning_Road to deep learning.pdf
MachineLearning_Road to deep learning.pdfMachineLearning_Road to deep learning.pdf
MachineLearning_Road to deep learning.pdf
ssuser012286
 
Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From Scratch
Natasha Latysheva
 
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
NILESH VERMA
 
Artificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computationArtificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computation
Mohammed Bennamoun
 
Molecular autoencoder
Molecular autoencoderMolecular autoencoder
Molecular autoencoder
Dan Elton
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
Amr Rashed
 
Ad

More from MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
MLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
MLconf
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
MLconf
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
MLconf
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
MLconf
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
MLconf
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
MLconf
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
MLconf
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
MLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
MLconf
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
MLconf
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
MLconf
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
MLconf
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
MLconf
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
MLconf
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
MLconf
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
MLconf
 
Madalina Fiterau - Hybrid Machine Learning Methods for the Interpretation and...
Madalina Fiterau - Hybrid Machine Learning Methods for the Interpretation and...Madalina Fiterau - Hybrid Machine Learning Methods for the Interpretation and...
Madalina Fiterau - Hybrid Machine Learning Methods for the Interpretation and...
MLconf
 
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
MLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
MLconf
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
MLconf
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
MLconf
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
MLconf
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
MLconf
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
MLconf
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
MLconf
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
MLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
MLconf
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
MLconf
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
MLconf
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
MLconf
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
MLconf
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
MLconf
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
MLconf
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
MLconf
 
Madalina Fiterau - Hybrid Machine Learning Methods for the Interpretation and...
Madalina Fiterau - Hybrid Machine Learning Methods for the Interpretation and...Madalina Fiterau - Hybrid Machine Learning Methods for the Interpretation and...
Madalina Fiterau - Hybrid Machine Learning Methods for the Interpretation and...
MLconf
 
Ad

Recently uploaded (20)

Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 

Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding

  • 1. The brain’s guide to dealing with context in language understanding Ted Willke, Javier Turek, and Vy Vo Intel Labs November 8th, 2019 Alex Huth and Shailee Jain UT-Austin
  • 2. Natural Language Understanding !2 A form of natural language processing that deals with machine reading comprehension. Example: “The problem to be solved is: Tom has twice as many fish as Mary has guppies. If Mary has 3 guppies, what is the number of fish Tom has?” (D.G. Bobrow, 1964)
  • 3. A 1960’s example !3 “The problem to be solved is: If the number of customers Tom gets is twice the square of 20 percent of the number of advertisements he runs, and the number of advertisements he runs is 45, what is the number of customers Tom gets?” Input Text “The number (of/op) customers Tom (gets/ verb) is 2 (times/op 1) the (square/op 1) of 20 (percent/op 2) (of/op) the number (of/op) advertisements (he/pro) runs (period/dlm) The number (of/op) advertisements (he/pro) runs is 45 (period/dlm) (what/qword) is the number (of/op) customers Tom (gets/ verb) (qmark/DLM)” NLP (Lisp example) Canonical sentences, with mark-up NLU Answer “The number of customers Tom gets is 162” NLU derives meaning from
 the lexicon, grammar and context. E.g., what is the meaning of
 “(he/pro) runs” here? (D.G. Bobrow, 1964)
  • 4. Applications of NLU !4 Super-valuable stuff! Machine translation Question answering (The Stanford Question Answering Dataset 2.0) Machine reasoning (Arista, Allen AI)(Google Translate) (Even visual!) (Zhu et al., 2015)
  • 5. The importance of context in language understanding •Retaining information about narratives is key to effective comprehension. •This information must be: •Represented •Organized •Effectively applied https://rajpurkar.github.io/SQuAD-explorer/explore/v2.0/dev/Economic_inequality.html The brain is great at this. What can it teach us?
  • 6. Key questions for this talk How does the brain organize and represent narratives? What can deep learning models tell us about the brain? Are the more effective ones more brain-like? How well do deep learning models deal with narrative context?
  • 7. Key questions for this talk How does the brain organize and represent narratives? What can deep learning models tell us about the brain? Are the more effective ones more brain-like? How well do deep learning models deal with narrative context?
  • 8. The brain’s organization !8 In order to understand language, the human brain explicitly represents information at a hierarchy of different timescales across different brain areas •Early stages: auditory processing in milliseconds to words at sub-second Representations at long timescales shown to exist in separate brain areas but little is known about their structure and format. (Lerner et al., 2011) •Later stages: derive meaning by combining information across minutes and hours
  • 9. Key questions for this talk How does the brain organize and represent narratives? How well do deep learning models deal with narrative context? What can deep learning models tell us about the brain? Are the more effective ones more brain-like?
  • 10. A look at recent state-of-the-art models Recurrent Neural Networks Temporal Convolutional Networks Transformer Networks
  • 11. Evaluating the performance of these models •Sequence modeling Given an input sequence x0, . . . , xT and desired corresponding outputs (predictions) y0, . . . , yT we wish to learn a function ̂y0, . . . , ̂yT = f(x0, . . . , xT) where depends only on past inputs (causal).x0, . . . , xtyt Use as a proxy to study the performance of backbone models for NLU E.g., predicting next character
 or word •Sequence modeling applied to language is language modeling •Self-supervised, basis for many other NLP tasks, and exploits context for prediction
  • 12. Example sequence modeling tasks •Add: Add two numbers that are marked in a long sequence, and output the sum after a delay •Copy: Copy a short sequence that appears much earlier in a long sequence •Classify (MNIST): Given a sequence of pixel values from MNIST (784x1), predict the corresponding digit (0-9) •Predict word (LAMBADA): Given a dataset of 10K passages from novels, with average context of 4.6 sentences, predict the last word of a target sentence
  • 13. A look at recent state-of-the-art models Recurrent Neural Networks Temporal Convolutional Networks Transformer Networks
  • 14. Using recurrence to solve the problem !14 Can process a sequence of vectors by applying
 a recurrence formula at each time step: xt ht = fW(ht−1, xt) new state some function
 with params, W old state input vector at time t The same function and parameters are used at every time step!
  • 15. Example: Character-level language model !15 Predicting the next character… Vocabulary: [h,e,l,o] Training sequence: “hello” (Example adapted from Stanford’s excellent CS231n course. Thank you Fei-Fei Li, Justin Johnson, and Serena Young!)
  • 16. Example: Character-level language model
 sampling !16 Vocabulary: [h,e,l,o] At test time, sample characters
 one at a time and
 feed back to model
  • 17. - Vanishing and exploding gradient problem - Smaller weight given to long-term interactions Dealing with longer timescales !17 • Learning long-term dependencies is difficult - Little training success for sequences > 10-20 in length • Solution: Gated RNNs - Control over timescale of integration of feedback - Eliminates repeated matrix multiplies singular value < 1 singular value > 1
  • 18. One possible solution: LSTM • Long Short-Term Memory !18 - Provides uninterrupted gradient flow - Solves the problem at the expense of more parameters • As revolutionary for sequential processing as CNNs were for spatial processing - Toy problems: long sequence recall, long-distance interactions (math), classification and ordering of widely-separated symbols, noisy inputs, etc. - Real applications: natural machine translation, text-to- speech, music and handwriting generation
  • 19. !19 Multilayer RNNs depth time hl t = tanh Wl ( hl t−1 hl−1 t ) h ∈ ℝn Wl = [n × 2n]
  • 20. Writing Shakespeare !20 Multi-layer RNN:
 3-layers with 512 hidden nodes … … … … … depth time
  • 21. !21 At first: and further… train further… and further…. (Andrej Karpathy’s blog: The Unreasonable Effectiveness of Recurrent Neural Networks)
  • 22. !22 After a few hours of training: (Andrej Karpathy’s blog: The Unreasonable Effectiveness of Recurrent Neural Networks)
  • 23. !23(Andrej Karpathy’s blog: The Unreasonable Effectiveness of Recurrent Neural Networks) The Stacks Project: Open source textbook on algebraic geometry •Latex source! •455910 lines of code Can RNNs learn complex
 syntactic structures?
  • 24. !24(Andrej Karpathy’s blog: The Unreasonable Effectiveness of Recurrent Neural Networks) Algebraic Geometry (Latex) Generates nearly compilable Latex!
  • 25. !25(Andrej Karpathy’s blog: The Unreasonable Effectiveness of Recurrent Neural Networks) Algebraic Geometry (Latex)
  • 26. !26(Andrej Karpathy’s blog: The Unreasonable Effectiveness of Recurrent Neural Networks) Algebraic Geometry (Latex) Too long term of a dependency? Never closes!
  • 27. !27(Andrej Karpathy’s blog: The Unreasonable Effectiveness of Recurrent Neural Networks) Code generation? •Concatenated into a
 giant file (474 MB of C) •10 million parameter RNN
  • 28. !28(Andrej Karpathy’s blog: The Unreasonable Effectiveness of Recurrent Neural Networks) •Concatenated into a
 giant file (474 MB of C) •10 million parameter RNN Comments here and there Proper syntax for strings and pointers Correctly learns to use brackets Often uses undefined variables! Declares variables it never uses!
  • 29. !29(Andrej Karpathy’s blog: The Unreasonable Effectiveness of Recurrent Neural Networks) Within scope But vacuous! Another problem with long-term dependencies
  • 30. A look at recent state-of-the-art models Recurrent Neural Networks Temporal Convolutional Networks Transformer Networks
  • 31. !31 Temporal Convolutional Neural Networks (Bai et al., 2018) TCN = 1D FCN + causal convolution Benefits: • Parallelism! • Flexible receptive field size • Stable gradients • Low memory for training • Variable input lengths Details: • Uses dilated convolutions for exponential receptive field vs depth • Effective history is and , where is the layer number • Uses residuals, ReLUs, and weight normalization • Spatial dropout k(1 − d) d = 𝒪(2i ) i
  • 32. !32 TCNs versus LSTMs (Bai et al., 2018) The ‘unlimited memory’ of LSTMs is quite limited compared to the expansive receptive field of the generic TCN. Copy memory task (last 10 elements evaluated)
  • 33. A look at recent state-of-the-art models Recurrent Neural Networks Temporal Convolutional Networks Transformer Networks
  • 34. !34 Transformer Networks (Vaswani et al., 2017) Relies entirely on attention to compute
 representations! Details: • Encoder-decoder structure and auto-regressive model • Multi-headed self-attention mechanisms • FC feed forward networks applied to each position separately and identically • Input and output embeddings used • No recurrence and no convolution, so must inject positional encodings Benefits: • Low computational complexity • Highly-parallelizable computation • Low ‘path length’ for long-term
 dependencies Attention(Q, K, V) = softmax ( QKT dk ) V Decoder attends
 to all positions
 in input seq Encoder has
 self-attention
 for each layer Decoder also has
 self-attention masked
 for causality
  • 35. !35 Why self-attention? (Vaswani et al., 2017) is the sequence length, is the representation dimension, is the kernel size
 for convolutions, and is the neighborhood size in restricted attention. n d k r It’s not only the length of context that matters, but also the ease by which it can be accessed. longer path
 lengths d > n more
 ops
  • 36. !36 Transformers vs TCNs (Vaswani et al., 2017) Google’s TCN for NMT Even with a relative-limited context (e.g., 128), Transformers win. FAIR’s TCN with attention Machine Translation (Dai et al., 2019) But with a segment-level recurrence mechanism, it is freed of fixed context lengths and it soars. Transformer-XL WikiText-103 word-level sequence modeling
  • 37. !37 Transformer-XL (Dai et al., 2019) Continued gains in performance to 1000+ contexts Total hallucination!
 (but nice generalization)
  • 38. Key questions for this talk How does the brain organize and represent narratives? How well do deep learning models deal with narrative context? What can deep learning models tell us about the brain? Are the more effective ones more brain-like?
  • 39. Are deep neural networks organized by timescale? !39 = ? Neural Network Neural Network Neural Network The boy went out to fly an _____ airplane short intermediate long timescale
  • 40. The methodology !40 Story Neural models Neural activations Goal: Determine how well NN layer activations predict fMRI data (regression).
  • 41. Predicting brain activity with encoding models !41 Eickenberg et al., NeuroImage 2017 Kell et al., Neuron 2018
  • 42. Relative predictive power of models !42(Jain et al., 2018) LSTM vs Embedding (Jain et al., unpublished) Transformer vs Embedding
  • 43. Layer-specific correlations for LSTM !43 (Jain et al., 2018) Low-level
 speech processing
 region Higher
 semantic region white = no layer preference
  • 44. Open questions !44 Why do LSTMs perform so poorly? Not all that predictive. Not exhibiting layer-specific correlations. Do TCNs and Transformers exhibit multi-timescale characteristics?
  • 45. Layer-specific correlations for Transformer !45 layers early late (Jain et al., unpublished) Yes!
  • 46. Layer-specific correlations for Transformer !46 (Jain et al., unpublished)TCNs look similar.
  • 47. Encoding model performance for Transformer !47 • Averaged across 3 subjects • Contextual models from all layers outperform embedding • Increasing context length (to a point) helps all layers • Long context representations are still missing information! TCNs exhibit similar characteristics but do not seem to learn the same representations. (Jain et al., unpublished)
  • 48. Summary and Challenges !48 •The brain’s language pathway is organized into a multi-timescale hierarchy, making it very effective at utilizing context •Language models are catching up, with Transformer-XL in the lead •TCNs and Transformers indeed have explicit multi-timescale hierarchies - Last layers have lower predictive performance, why? - How to get more out of context at longer timescales? - Lack of clear timescales in RNNs should lead to a revisiting of their depth characteristics. (E.g., see Turek et al. 2019, https://arxiv.org/abs/1909.00021) •More study needed on representations - What specific information is captured in representations across the cortex? - Are the same representations found across deep learning architectures?
  • 49. !49 Thank you! [email protected] NeurIPS Workshop on Context and Compositionality in Biological and Artificial Neural Systems Saturday, December 14th, 2019 https://context-composition.github.io/