0% found this document useful (0 votes)
3 views

seg-s2s2-bart

Uploaded by

mmhameedkhan6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

seg-s2s2-bart

Uploaded by

mmhameedkhan6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Seq2seq Pre-training

‣ LMs P(w): trained unidirec onally


‣ Masked LMs: trained bidirec onally but with masking
‣ How can we pre-train a model for P(y|x)?
‣ Why was BERT e ec ve? Predic ng a mask requires some kind of text
“understanding”:

‣ What would it take to impart the same “skills” for sequence predic on?
ff
ti
ti
ti
ti
ti
BART

In lling is longer
spans than masking
‣ Several possible strategies for corrup ng a sequence are explored in
the BART paper

Lewis et al. (2019)


fi
ti
BART
‣ Sequence-to-sequence Transformer trained on this data: permute/
make/delete tokens, then predict full sequence autoregressively

Lewis et al. (2019)


BERT vs. BART
‣ BERT: only parameters are an
encoder, trained with masked
language modeling objec ve.
Cannot generate text or do
seq2seq tasks A _ C _ E

‣ BART: both an encoder and a


decoder. Can also use just the
encoder wherever we would
use BERT

Lewis et al. (2019)


ti
BART for Summariza on
‣ Pre-train on the BART task: take random chunks of text, noise them
according to the schemes described, and try to “decode” the clean text

‣ Fine-tune on a summariza on dataset: a news ar cle is the input and


a summary of that ar cle is the output (usually 1-3 sentences
depending on the dataset)

Lewis et al. (2019)


ti
ti
ti
ti

You might also like