0% found this document useful (0 votes)
7 views

Lec15 Qa

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Lec15 Qa

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Lecture 14:

Ques-on Answering

Wei Xu
(many slides from Greg Durrett)
QA is very broad
‣ Factoid QA: what states border Mississippi?, when was Barack Obama
born?
‣ Lots of this could be handled by QA from a knowledge base, if we had a
big enough knowledge base
‣ “Ques<on answering” as a term is so broad as to be meaningless
‣ Is P=NP?
‣ What is 4+5?
‣ What is the transla=on of [sentence] into French? [McCann et al.,
2018]
2
Classical Ques-on Answering
‣ Form seman-c representa-on from seman-c parsing, execute against
structured knowledge base
Q: “where was Barack Obama born”

λx. type(x, Location) ∧ born_in(Barack_Obama, x)


(other representa-ons like SQL possible too…)

‣ How to deal with open-domain data/rela-ons? Need data to learn how


to ground every predicate or need to be able to produce predicates in a
zero-shot way
Reading Comprehension
‣ “AI challenge problem”:
answer ques-on given
context
‣ Recognizing Textual
Entailment (2006)
‣ MCTest (2013): 500
passages, 4 ques-ons
per passage
‣ Two ques-ons per
passage explicitly require
cross-sentence reasoning Richardson (2013)
Dataset Explosion
‣ 10+ QA datasets released since 2015
‣ Children’s Book Test, CNN/Daily Mail, SQuAD, TriviaQA are most well-
known (others: SearchQA, MS Marco, RACE, WikiHop, …)
‣ Ques-on answering: ques-ons are in natural language
‣ Answers: mul-ple choice or require picking from the passage
‣ Require human annota-on
‣ “Cloze” task: word (o`en an en-ty) is removed from a sentence
‣ Answers: mul-ple choice, pick from passage, or pick from vocabulary
‣ Can be created automa-cally from things that aren’t ques-ons
Children’s Book Test

????

‣ Children’s Book Test: take a sec-on of a children’s story, block out an


en-ty and predict it (one-doc mul--sentence cloze task) Hill et al. (2015)
bAbI
‣ Evalua-on on 20 tasks proposed as building blocks for building “AI-
complete” systems
‣ Various levels of difficulty, exhibit different linguis-c phenomena
‣ Small vocabulary, language isn’t truly “natural”

Weston et al. (2014)


Dataset Proper-es
‣ Axis 1: QA vs. cloze (Children’s Book Test)

‣ Axis 2: single-sentence vs. passage


‣ O`en shallow methods work well because most answers are in a
single sentence (SQuAD, MCTest)
‣ Some explicitly require linking between mul-ple sentences (MCTest)
‣ Axis 3: single-document (datasets in this lecture) vs. mul--document
(TriviaQA, WikiHop, HotPotQA, …)
Memory Networks
Memory Networks
‣ Memory networks let you reference input with agen-on
‣ Encode input items into two vectors: a key and a value
‣ Keys compute agen-on weights given a query, weighted sum of values
gives the output

Sukhbaatar et al. (2015)


Memory Networks
‣ Three layers of memory network where the
query representa-on is updated addi-vely
based on the memories at each step

‣ How to encode the sentences?


‣ Bag of words (average embeddings)
‣ Posi-onal encoding: mul-ply each word by a
vector capturing posi-on in sentence

Sukhbaatar et al. (2015)


Evalua-on: bAbI

‣ 3-hop memory network


does pregy well, beger
than LSTM at processing
these types of examples
Evalua-on: Children’s Book Test
‣ Outperforms LSTMs
substan-ally with
the right supervision
Memory Network Takeaways
‣ Memory networks provide a way of agending to abstrac-ons over the
input

‣ Useful for cloze tasks where far-back context is necessary

‣ What can we do with more basic agen-on?


CNN/Daily Mail: Agen-ve Reader
CNN/Daily Mail
‣ Single-document, (usually) single-
sentence cloze task

‣ Formed based on ar-cle


summaries — informa-on should
mostly be present, makes it
easier than Children’s Book Test

‣ Need to process the ques-on,


can’t just use LSTM LMs

Hermann et al. (2015), Chen et al. (2016)


CNN/Daily Mail
‣ LSTM reader: encode ques-on, encode passage, predict en-ty

Mary

X visited England ||| Mary visited England


‣ Can also use textual entailment-like models
Mul-class classifica-on
problem over en--es
X visited England Mary in the document

Mary visited England Hermann et al. (2015), Chen et al. (2016)


CNN/Daily Mail
‣ Agen-ve reader:
u = encode query
s = encode sentence
r = agen-on(u -> s)
predic-on = f(candidate, u, r)

‣ Uses fixed-size
representa-ons for the
final predic-on, mul-class
classifica-on

Hermann et al. (2015)


CNN/Daily Mail
‣ Chen et al (2016): small
changes to the agen-ve
reader
‣ Addi-onal analysis of the
task found that many of
the remaining ques-ons
were unanswerable or
extremely difficult
Stanford Agen-ve Reader 76.2 76.5 79.5 78.7

Hermann et al. (2015), Chen et al. (2016)


SQuAD: Bidirec-onal Agen-on Flow
SQuAD
‣ Single-document, single-sentence ques-on-answering task where the
answer is always a substring of the passage
‣ Predict start and end indices of the answer in the passage

Rajpurkar et al. (2016)


SQuAD
What was Marie Curie the first female recipient of?

START END

first female recipient of the Nobel Prize .

‣ Like a tagging problem over the sentence (not mul-class classifica-on),


but we need some way of agending to the query

Rajpurkar et al. (2016)


Bidirec-onal Agen-on Flow
‣ Passage (context) and query are both encoded with BiLSTMs
‣ Context-to-query agen-on: compute so`max over columns of S, take
weighted sum of u based on agen-on weights for each passage word
X
ũi = ↵ij uj ‣ query “specialized”
j to the ith word
↵ij = softmaxj (Sij ) ‣ dist over query

query U
Sij = hi · uj

passage H
Seo et al. (2016)
Bidirec-onal Agen-on Flow

Each passage
word now “knows
about” the query

Seo et al. (2016)


QA with BERT

What was Marie Curie the first female recipient of ? [SEP] One of the most famous people born in Warsaw was Marie …

‣ Predict start and end posi<ons in passage


25 ‣ No need for cross-a8en<on mechanisms! Devlin et al. (2019)
SQuAD SOTA: 2018
‣ BiDAF: 73 EM / 81 F1

‣ nlnet, QANet, r-net —


dueling super complex
systems (much more than
BiDAF…)

‣ BERT: transformer-based
approach with pretraining
on 3B tokens
SQuAD 2.0 SOTA: Spring 2019
SQuAD SOTA: Spring 19
‣ SQuAD 2.0: harder dataset
because some ques<ons
are unanswerable

‣ Industry contest

27
SQuAD 2.0 SOTA: Fall
SQuAD SOTA: Today 2019
‣ Performance is very
saturated

‣ Harder QA sezngs are


needed!

28
SQuAD 2.0 SOTA: Today
SQuAD SOTA: Today
‣ Performance is very
saturated

‣ Harder QA sezngs are


needed!

29
TriviaQA
‣ Totally figuring this
out is very challenging
‣ Coref:
the failed campaign
movie of the same name

‣ Lots of surface clues:


1961, campaign, etc.

‣ Systems can do well


without really
understanding the text
30 Joshi et al. (2017)
What are these models learning?
‣ “Who…”: knows to look for people

‣ “Which film…”: can iden<fy movies and then spot keywords that
are related to the ques<on

‣ Unless ques<ons are made super tricky (target closely-related


en<<es who are easily confused), they’re usually not so hard to
answer

31
Takeaways
‣ Many flavors of reading comprehension tasks: cloze or actual ques-ons,
single or mul--sentence

‣ Memory networks let you reference input in an agen-on-like way, useful


for generalizing language models to long-range reasoning

‣ Complex agen-on schemes can match queries against input texts and
iden-fy answers

You might also like