Icml2016 Memnn Tutorial
Icml2016 Memnn Tutorial
Language Understanding
Jason Weston
Facebook AI Research
Intelligent Conversational
Agents
End-to-End Dialog Agents
While it is possible to build useful dialog agents as a set
of separate black boxes with joining logic (Google Now,
Cortana, Siri, .. ?) we believe a true dialog agent should:
Be able to combine all its knowledge to fulfill
complex tasks.
Handle long open-ended conversations involving
effectively tracking many latent variables.
Be able to learn (new tasks) via conversation.
Our bet: Machine Learning End-to-End systems is
the way forward in the long-run.
Memory Networks
Class of models that combine large memory with learning
component that can read and write to it.
Incorporates reasoning with attention over memory (RAM).
Most ML has limited memory which is more-or-less all that’s
needed for “low level” tasks e.g. object detection.
[Figure by Saina
Sukhbaatar]
Variants of the class…
Some options and extensions:
Representation of inputs and memories could use all
kinds of encodings: bag of words, RNN style reading at
word or character level, etc.
Different possibilities for output module: e.g. multi-
class classifier or uses an RNN to output sentences.
If the memory is huge (e.g. Wikipedia) we need to
organize the memories. Solution: hash the memories to
store in buckets (topics). Then, memory addressing and
reading doesn’t operate on all memories.
If the memory is full, there could be a way of removing
one it thinks is most useless; i.e. it ``forgets’’ somehow.
That would require a scoring function of the utility of each
memory..
Task (1) Factoid QA with
Single Supporting Fact
(“where is actor”)
.
Memory Network Models
implemented models
[Figure by Saina
Sukhbaatar]
The First MemNN
Implemention
I (input): converts to bag-of-word-embeddings x.
G (generalization): stores x in next available slot mN.
O (output): Loops over all memories k=1 or 2
times:
1st loop max: finds best match mi with x.
2nd loop max: finds best match mJ with (x, mi).
The output o is represented with (x, mi, mJ).
Similar to before, except now for both mo1 and mo2 we need to
have two terms considering them as the second or third
argument to the SOt as they may appear on either side during
inference:
bAbI Experiment 1
• 10k sentences. (Actor: only ask questions about actors.)
• Difficulty: how many sentences in the past when entity
mentioned.
• Fully supervised (supporting sentences are labeled).
• Compare RNN (no supervision)
and MemNN hops k = 1 or 2, & with/without time
features.
Parsing:
“Grammar as a Foreign Language” O. Vinyals, L. Kaiser, T. Koo, S.
Petrov, I. Sutskever, G. Hinton.
Entailment:
“Reasoning about Entailment with Neural Attention” T. Rocktäschel,
E. Grefenstette, K. Hermann, T. Kočiský, P. Blunsom.
Summarization:
“A Neural Attention Model for Abstractive Sentence
Reasoning with synthetic
language
“A Roadmap towards Machine Intelligence” T. Mikolov, A. Joulin, M.
Baroni.
Conducting Dialog:
“Hierarchical Neural Network Generative Models for Movie Dialogues” I.
Serban, A. Sordoni, Y. Bengio, A. Courville, J. Pineau.
“A Neural Network Approach to Context-Sensitive Generation of
Conversational Responses” Sordoni et al.
“Neural Responding Machine for Short-Text Conversation” L. Shang, Z. Lu,
H.Li.
“Evaluating Prerequisite Qualities for Learning End-to-End Dialog Systems”
J. Dodge, A. Gane, X. Zhang, A. Bordes, S. Chopra, A. Miller, A. Szlam, J. Weston.
.
(3) Factoid QA with Three
Supporting Facts
Similarly, one can make a task with three supporting facts:
Note that the two questions above have exactly the same words,
but in a different order, and different answers.
(8) Lists/Sets
Tests ability to produce lists/sets:
read
address
in
Memory g Controlle
Module read r module
address
in
g
Attention weights
/ Soft address
To controller
Softma (added to
x controller state)
Dot Product
Addressing signal
(controller
state vector)
Memory vectors
Question & Answering
Answer kitchen
Memory Module
Weighted
Sum
Controller
Dot product +
softmax
LSTM 49% 20
3 hops 87.6.% 11
So we still fail on some
tasks….
.. and we could also make more tasks that we
fail on!
1) Bypass module
2) Self-Supervision
• However now beaten by many results, especially (Yih et al. ACL ‘15)
that achieves 52.5! Several hand engineered features are used in
that case. Note WebQuestions is very small (4k train+valid).
Recent Work: New Models for QA on
documents Miller et al. Key-Value Memory
Networks for Directly Reading Documents.
arXiv:1606.03126.
Recent Work: New Models for QA on
documents Miller et al. Key-Value Memory
Networks for Directly Reading Documents.
arXiv:1606.03126.
WikiQA Results
dialog data? With multiple
exchanges?
Everything we showed so far was question answering
potentially with long-term context.
We have also built a Movie Dialog Dataset
Closed, but large, domain about movies (75k entities, 3.5M
ex).
Ask facts about movies?
Ask for opinions (recommendations) about movies?
Dialog combining facts and opinions?
General chit-chat about movies (statements not
questions)?
Some movies I like are Heat, Kids, Fight Club, Shaun of the
Dead, The Avengers, Skyfall, and Jurassic Park. Can you
suggest something else I might like? Ocean's Eleven
(Dialog 3) QA+Recs: combination
dialog
Sample input contexts and target replies (in red) from Dialog Task 3:
I think the Terminator movies really suck, I mean the first one
was kinda ok, but after that they got really cheesy. Even the
second one which people somehow think is great. And after
that... forgeddabotit.
C’mon the second one was still pretty cool.. Arny was still so
badass, as was Sararah Connor’s character.. and the way they
blended real action and effects was perhaps the last of its
kind...
example
Results
Ubuntu Data
Dialog dataset: Ubuntu IRC channel logs, users ask
questions about issues they are having with
Ubuntu and get answers by other users. (Lowe et
al., ‘15)
E.g. a baby talking to its parents, and seeing them talk to each other.
Learning From Human
Responses
Mary went to the hallway.
Data:
bAbI tasks: fb.ai/babi
SimpleQuestions dataset (100k questions): fb.ai/babi
Children’s Book Test dataset: fb.ai/babi
Movie Dialog Dataest: fb.ai/babi
Code:
Memory Networks: https://github.com/facebook/MemNN
Simulation tasks generator: https://github.com/facebook/bAbI-tasks
RAM Issues
How to decide what to write and what not to write in the
memory?
How to represent knowledge to be stored in memories?
Types of memory (arrays, stacks, or stored within weights of
model), when they should be used, and how can they be learnt?
How to do fast retrieval of relevant knowledge from memories
when the scale is huge?
How to build hierarchical memories, e.g. multiscale attention?
How to build hierarchical reasoning, e.g. composition of
functions?
How to incorporate forgetting/compression of information?
How to evaluate reasoning models? Are artificial tasks a good
way? Where do they break down and real tasks are needed?
Can we draw inspiration from how animal or human memories
Thanks!