SlideShare a Scribd company logo
Seman&c	
  Analysis	
  in	
  Language	
  Technology	
  
http://stp.lingfil.uu.se/~santinim/sais/2016/sais_2016.htm 



Summarization
Marina	
  San(ni	
  
san$nim@stp.lingfil.uu.se	
  
	
  
Department	
  of	
  Linguis(cs	
  and	
  
Philology	
  
Uppsala	
  University,	
  Uppsala,	
  
Sweden	
  
	
  
Spring	
  2016	
  
	
  
	
  
Previous	
  Lecture:	
  Rela$on	
  Extrac$on	
  
2	
  
What’s	
  a	
  rela$on?	
  
•  A	
  rela(on	
  can	
  be	
  formally	
  defined	
  in	
  the	
  form	
  of	
  a	
  tuple	
  	
  
•  t	
  =	
  (e1;	
  e2	
  …;	
  en)	
  	
  
•  where	
  the	
  ei	
  are	
  en((es	
  in	
  a	
  predefined	
  rela(on	
  r	
  within	
  
document	
  D.	
  	
  
•  Most	
  rela(on	
  extrac(on	
  systems	
  focus	
  on	
  extrac(ng	
  binary	
  
rela$ons.	
  	
  
•  Examples	
  of	
  binary	
  rela(ons	
  include	
  
•  located-­‐in(CMU,	
  PiHsburgh),	
  	
  
•  father-­‐of(ManuelBlum,	
  Avrim	
  Blum).	
  	
  
•  It	
  is	
  also	
  possible	
  to	
  go	
  to	
  higher-­‐order	
  rela(ons	
  as	
  well	
  and	
  
extract	
  more	
  complex	
  rela(ons	
  (ex	
  biomedicine).	
  	
  
3	
  
Why	
  Rela$on	
  Extrac$on?	
  
•  There	
  exists	
  a	
  vast	
  amount	
  of	
  unstructured	
  electronic	
  text	
  on	
  the	
  
Web,	
  including	
  newswire,	
  blogs	
  ,emails,	
  governmental	
  
documents,	
  chats,	
  and	
  so	
  on.	
  	
  
•  The	
  whole	
  idea	
  of	
  IE	
  is	
  turn	
  unstructured	
  text	
  into	
  structured	
  by	
  
annota(ng	
  seman(c	
  informa(on.	
  
•  RE	
  is	
  the	
  task	
  	
  of	
  recognizing	
  rela(ons	
  between	
  en((es	
  in	
  
unstructured	
  text.	
  	
  
!
If a query to a search engine is “When was Gandhi born ?”,
then the expected answer would be“Gandhi was born in 1869”.
The template of the answer is <PERSON> born-in <YEAR> which
is nothing but the relational triple: !
born in(PERSON, YEAR) !
where PERSON and YEAR are the entities. !
4	
  
Watch	
  out!	
  
•  RE	
  =	
  extract	
  facts	
  from	
  unstructured	
  texts,	
  ie	
  rela(ons	
  that	
  exist	
  
betw	
  en((es,	
  such	
  as	
  dates,	
  proper	
  names,	
  companies.	
  	
  
•  Other	
  rela(ons	
  (related	
  to	
  Word	
  Senses):	
  seman(c	
  rela(ons	
  
betw	
  concepts:	
  hyperonyms,	
  hyponyms,	
  etc.	
  like	
  in	
  Wordnet.	
  	
  
5	
  
How	
  to	
  build	
  rela$on	
  extractors	
  
1.  Hand-­‐wriHen	
  paHerns	
  
2.  Supervised	
  machine	
  learning	
  
3.  Semi-­‐supervised	
  and	
  unsupervised	
  	
  
•  Bootstrapping	
  (using	
  seeds)	
  
•  Distant	
  supervision	
  
•  Unsupervised	
  learning	
  from	
  the	
  web	
  
6	
  
Seed-­‐based	
  or	
  bootstrapping	
  approaches	
  
to	
  rela$on	
  extrac$on	
  
•  No	
  training	
  set?	
  Maybe	
  you	
  have:	
  
•  A	
  few	
  seed	
  tuples	
  	
  or	
  
•  A	
  few	
  high-­‐precision	
  paHerns	
  
•  Can	
  you	
  use	
  those	
  seeds	
  to	
  do	
  something	
  useful?	
  
•  Bootstrapping:	
  use	
  the	
  seeds	
  to	
  directly	
  learn	
  to	
  populate	
  a	
  
rela(on	
  
7	
  
Roughly	
  said:	
  Use	
  seeds	
  to	
  ini(alize	
  a	
  
process	
  of	
  annota(on,	
  then	
  refine	
  
through	
  itera(ons	
  
Dipre:	
  Extract	
  <author,book>	
  pairs	
  
•  Start	
  with	
  5	
  seeds:	
  
	
  
	
  
	
  
	
  
•  Find	
  Instances:	
  
The	
  Comedy	
  of	
  Errors,	
  by	
  	
  William	
  Shakespeare,	
  was	
  
The	
  Comedy	
  of	
  Errors,	
  by	
  	
  William	
  Shakespeare,	
  is	
  
The	
  Comedy	
  of	
  Errors,	
  one	
  of	
  William	
  Shakespeare's	
  earliest	
  aHempts	
  
The	
  Comedy	
  of	
  Errors,	
  one	
  of	
  William	
  Shakespeare's	
  most	
  
•  Extract	
  paHerns	
  (group	
  by	
  middle,	
  take	
  longest	
  common	
  prefix/suffix)	
  
?x , by ?y , ?x , one of ?y ‘s !
•  Now	
  iterate,	
  finding	
  new	
  seeds	
  that	
  match	
  the	
  paHern	
  
!
Brin, Sergei. 1998. Extracting Patterns and Relations from the World Wide Web.
Author	
   Book	
  
Isaac	
  Asimov	
   The	
  Robots	
  of	
  Dawn	
  
David	
  Brin	
   Star(de	
  Rising	
  
James	
  Gleick	
   Chaos:	
  Making	
  a	
  New	
  
Science	
  
Charles	
  Dickens	
   Great	
  Expecta(ons	
  
William	
  
Shakespeare	
  
The	
  Comedy	
  of	
  Errors	
  
8	
  
Prac$cal	
  Ac$vity	
  
Search	
  for	
  phrasal	
  paHerns	
  on	
  the	
  web	
  	
  
	
  
Our	
  seeds:	
  	
  
"*	
  is	
  a	
  novel	
  by	
  *"	
  	
  
"*	
  wrote	
  the	
  novel	
  *"	
  	
  
"the	
  novel	
  *	
  was	
  wriHen	
  by	
  *"	
  
op#onally	
  add	
  more	
  phrases…	
  
	
  
Further	
  refinemets	
  that	
  we	
  felt	
  are	
  needed:	
  	
  
•  get	
  read	
  of	
  non-­‐informa(ve	
  text	
  included	
  in	
  the	
  returned	
  strings	
  
(maybe	
  via	
  adding	
  addi(onal	
  paHerns	
  in	
  the	
  regular	
  expressions)	
  
•  Iden(fy	
  name	
  en((es	
  
•  Maybe	
  via	
  Reg	
  Expressions	
  (eg.	
  iden(fy	
  words	
  star(ng	
  with	
  uppercase)	
  
•  Maybe	
  combining	
  seeds	
  and	
  a	
  NER	
  system	
  
•  ect.	
  
9c	
  
Google is fantastic, but
also unpredictable… à
different behaviours
depending on the
machines, domains, and
some “hidden” criteria…
	
  
End	
  of	
  previous	
  lecture	
  
10	
  
Acknowledgements
Most	
  slides	
  borrowed	
  or	
  adapted	
  from:	
  
Dan	
  Jurafsky	
  and	
  Christopher	
  Manning,	
  Coursera	
  
Some	
  inspira(on	
  from	
  Dragomir	
  Radev,	
  Coursera	
  ….	
  	
  
	
  	
  
	
  
J&M(2009)	
  	
  	
  
	
  
	
  	
  	
  
Text	
  Summariza$on	
  
12	
  
Summary	
  
13	
  
News	
  Summariza$on	
  
14	
  
Book	
  Summaries	
  
15	
  
Cliff’s	
  Notes	
  are	
  a	
  series	
  
of	
  student	
  study	
  guides	
  
available	
  primarily	
  in	
  the	
  
United	
  States.	
  
Movie	
  Summaries	
  
16	
  
Search	
  Engine	
  Snippets	
  
17	
  
Genres	
  
18	
  
Types	
  of	
  Summaries	
  
19	
  
Stages	
  
20	
  
Summariza$on	
  
21	
  
Human	
  Summariza$on	
  and	
  Abstrac$ng	
  
22	
  
Extrac$ve	
  Summariza$on	
  
23	
  
Question
Answering
Summarization in
Question
Answering
Text	
  Summariza$on	
  
•  Goal:	
  produce	
  an	
  abridged	
  version	
  of	
  a	
  text	
  that	
  contains	
  
informa(on	
  that	
  is	
  important	
  or	
  relevant	
  to	
  a	
  user.	
  
	
   	
   	
   	
  	
  
•  Summariza$on	
  Applica$ons	
  
•  outlines	
  or	
  abstracts	
  of	
  any	
  document,	
  ar(cle,	
  etc	
  
•  summaries	
  of	
  email	
  threads	
  
•  ac$on	
  items	
  from	
  a	
  mee(ng	
  
•  simplifying	
  text	
  by	
  compressing	
  sentences	
  
25	
  
What	
  to	
  summarize?	
  	
  
Single	
  vs.	
  mul$ple	
  documents	
  
•  Single-­‐document	
  summariza$on	
  
•  Given	
  a	
  single	
  document,	
  produce	
  
•  abstract	
  
•  outline	
  
•  headline	
  
•  Mul$ple-­‐document	
  summariza$on	
  
•  Given	
  a	
  group	
  of	
  documents,	
  produce	
  a	
  gist	
  of	
  the	
  content:	
  
•  a	
  series	
  of	
  news	
  stories	
  on	
  the	
  same	
  event	
  
•  a	
  set	
  of	
  web	
  pages	
  about	
  some	
  topic	
  or	
  ques(on	
  
26	
  
Query-­‐focused	
  Summariza$on	
  
&	
  	
  Generic	
  Summariza$on	
  
•  Generic	
  summariza(on:	
  
•  	
  Summarize	
  the	
  content	
  of	
  a	
  document	
  
•  Query-­‐focused	
  summariza(on:	
  
•  	
  summarize	
  a	
  document	
  with	
  respect	
  to	
  an	
  
informa(on	
  need	
  expressed	
  in	
  a	
  user	
  query.	
  
•  a	
  kind	
  of	
  complex	
  ques(on	
  answering:	
  
•  Answer	
  a	
  ques(on	
  by	
  summarizing	
  a	
  document	
  
that	
  has	
  the	
  informa(on	
  to	
  construct	
  the	
  answer	
  	
  
27	
  
Summariza$on	
  for	
  Ques$on	
  Answering:	
  
Snippets	
  
•  Create	
  snippets	
  summarizing	
  a	
  web	
  page	
  for	
  a	
  query	
  
•  Google:	
  156	
  characters	
  (about	
  26	
  words)	
  plus	
  (tle	
  and	
  link	
  
28	
  
Summariza$on	
  for	
  Ques$on	
  Answering:	
  
Mul$ple	
  documents	
  
Create	
  answers	
  to	
  complex	
  ques(ons	
  
summarizing	
  mul(ple	
  documents.	
  
•  Instead	
  of	
  giving	
  a	
  snippet	
  for	
  each	
  document	
  
•  Create	
  a	
  cohesive	
  answer	
  that	
  combines	
  
informa(on	
  from	
  each	
  document	
  
29	
  
Extrac$ve	
  summariza$on	
  &	
  	
  
Abstrac$ve	
  summariza$on	
  
•  Extrac(ve	
  summariza(on:	
  
•  create	
  the	
  summary	
  from	
  phrases	
  or	
  sentences	
  in	
  the	
  source	
  
document(s)	
  
•  Abstrac(ve	
  summariza(on:	
  
•  express	
  the	
  ideas	
  in	
  the	
  source	
  documents	
  using	
  (at	
  least	
  in	
  
part)	
  different	
  words	
  
30	
  
Simple	
  baseline:	
  take	
  the	
  first	
  sentence	
  
31	
  
Question
Answering
Generating Snippets
and other Single-
Document Answers
Snippets:	
  query-­‐focused	
  summaries	
  
33	
  
Summariza$on:	
  Three	
  Stages	
  
1.  content	
  selec(on:	
  choose	
  sentences	
  to	
  extract	
  
from	
  the	
  document	
  
2.  informa(on	
  ordering:	
  choose	
  an	
  order	
  to	
  place	
  
them	
  in	
  the	
  summary	
  
3.  sentence	
  realiza(on:	
  clean	
  up	
  the	
  sentences	
  
34	
  
Document
Sentence
Segmentation
Sentence
Extraction
All sentences
from documents
Extracted
sentences
Information
Ordering
Sentence
Realization
Summary
Content Selection
Sentence
Simplification
Basic	
  Summariza$on	
  Algorithm	
  
1.  content	
  selec(on:	
  choose	
  sentences	
  to	
  extract	
  
from	
  the	
  document	
  
2.  informa(on	
  ordering:	
  just	
  use	
  document	
  order	
  
3.  sentence	
  realiza(on:	
  keep	
  original	
  sentences	
  
35	
  
Document
Sentence
Segmentation
Sentence
Extraction
All sentences
from documents
Extracted
sentences
Information
Ordering
Sentence
Realization
Summary
Content Selection
Sentence
Simplification
Unsupervised	
  content	
  selec$on	
  
•  Intui(on	
  da(ng	
  back	
  to	
  Luhn	
  (1958):	
  
•  Choose	
  sentences	
  that	
  have	
  salient	
  or	
  informa(ve	
  words	
  
•  Two	
  approaches	
  to	
  defining	
  salient	
  words	
  
1.  o-­‐idf:	
  weigh	
  each	
  word	
  wi	
  in	
  document	
  j	
  by	
  o-­‐idf	
  
2.  topic	
  signature:	
  choose	
  a	
  smaller	
  set	
  of	
  salient	
  words	
  
•  mutual	
  informa(on	
  
•  log-­‐likelihood	
  ra(o	
  (LLR)	
  	
  Dunning	
  (1993),	
  Lin	
  and	
  Hovy	
  (2000)	
  
36	
  
weight(wi ) = tfij ×idfi
weight(wi ) =
1 if -2logλ(wi ) >10
0 otherwise
!
"
#
$#
H.	
  P.	
  Luhn.	
  1958.	
  The	
  Automa(c	
  Crea(on	
  of	
  Literature	
  Abstracts.	
  
IBM	
  Journal	
  of	
  Research	
  and	
  Development.	
  2:2,	
  159-­‐165.	
  	
  
Topic	
  signature-­‐based	
  content	
  selec$on	
  
with	
  queries	
  
•  choose	
  words	
  that	
  are	
  informa(ve	
  either	
  	
  
•  by	
  log-­‐likelihood	
  ra(o	
  (LLR)	
  
•  or	
  by	
  appearing	
  in	
  the	
  query	
  
•  Weigh	
  a	
  sentence	
  (or	
  window)	
  by	
  weight	
  of	
  its	
  words:	
  
37	
  
Conroy,	
  Schlesinger,	
  and	
  O’Leary	
  2006	
  
weight(wi ) =
1 if -2logλ(wi ) >10
1 if wi ∈ question
0 otherwise
"
#
$$
%
$
$
weight(s) =
1
S
weight(w)
w∈S
∑
(could	
  learn	
  more	
  
complex	
  weights)	
  
Supervised	
  content	
  selec$on	
  
•  Given:	
  	
  
•  a	
  labeled	
  training	
  set	
  of	
  good	
  
summaries	
  for	
  each	
  document	
  
•  Align:	
  
•  the	
  sentences	
  in	
  the	
  document	
  
with	
  sentences	
  in	
  the	
  summary	
  
•  Extract	
  features	
  
•  posi(on	
  (first	
  sentence?)	
  	
  
•  length	
  of	
  sentence	
  
•  word	
  informa(veness,	
  cue	
  phrases	
  
•  cohesion	
  
•  Train	
  
•  Problems:	
  
•  hard	
  to	
  get	
  labeled	
  training	
  
data	
  
•  alignment	
  difficult	
  
•  performance	
  not	
  beHer	
  than	
  
unsupervised	
  algorithms	
  
•  So	
  in	
  prac(ce:	
  
•  Unsupervised	
  content	
  
selec$on	
  is	
  more	
  common	
  
•  a	
  binary	
  classifier	
  (put	
  sentence	
  in	
  summary?	
  yes	
  or	
  no)	
  
	
  
Question
Answering
Evalua(ng	
  Summaries:	
  
ROUGE	
  
ROUGE	
  (Recall	
  Oriented	
  Understudy	
  for	
  
Gis$ng	
  Evalua$on)	
  	
  
•  Intrinsic	
  metric	
  for	
  automa(cally	
  evalua(ng	
  summaries	
  
•  Based	
  on	
  BLEU	
  (a	
  metric	
  used	
  for	
  machine	
  transla(on)	
  
•  Not	
  as	
  good	
  as	
  human	
  evalua(on	
  (“Did	
  this	
  answer	
  the	
  user’s	
  ques(on?”)	
  
•  But	
  much	
  more	
  convenient	
  
•  Given	
  a	
  document	
  D,	
  and	
  an	
  automa(c	
  summary	
  X:	
  
1.  Have	
  N	
  humans	
  produce	
  a	
  set	
  of	
  reference	
  summaries	
  	
  of	
  D	
  
2.  Run	
  system,	
  giving	
  automa(c	
  summary	
  X	
  
3.  What	
  percentage	
  of	
  the	
  bigrams	
  from	
  the	
  reference	
  
summaries	
  appear	
  in	
  X?	
  
40	
  
Lin and Hovy 2003	
  
ROUGE − 2 =
min(count(i, X),count(i,S))
bigrams i∈S
∑
s∈{RefSummaries}
∑
count(i,S)
bigrams i∈S
∑
s∈{RefSummaries}
∑
A	
  ROUGE	
  example:	
  
Q:	
  “What	
  is	
  water	
  spinach?”	
  
Human	
  1:	
  Water	
  spinach	
  is	
  a	
  green	
  leafy	
  vegetable	
  grown	
  in	
  the	
  
tropics.	
  
Human	
  2:	
  	
  Water	
  spinach	
  is	
  a	
  semi-­‐aqua(c	
  tropical	
  plant	
  grown	
  as	
  a	
  
vegetable.	
  
Human	
  3:	
  Water	
  spinach	
  is	
  a	
  commonly	
  eaten	
  leaf	
  vegetable	
  of	
  Asia.	
  
•  System	
  answer:	
  Water	
  spinach	
  is	
  a	
  leaf	
  vegetable	
  commonly	
  eaten	
  
in	
  tropical	
  areas	
  of	
  Asia.	
  
•  ROUGE-­‐2	
  	
  =	
  
41	
   10	
  +	
  9	
  +	
  9	
  
3	
  +	
  3	
  +	
  6	
  
=	
  12/28	
  =	
  .43	
  	
  
Question
Answering
Summarization for
Complex Questions
Defini$on	
  ques$ons	
  
Q:	
  What	
  is	
  water	
  spinach?	
  
A:	
  Water	
  spinach	
  (ipomoea	
  aqua(ca)	
  is	
  a	
  semi-­‐aqua(c	
  leafy	
  
green	
  plant	
  with	
  long	
  hollow	
  stems	
  and	
  spear-­‐	
  or	
  heart-­‐
shaped	
  leaves,	
  widely	
  grown	
  throughout	
  Asia	
  as	
  a	
  leaf	
  
vegetable.	
  The	
  leaves	
  and	
  stems	
  are	
  oten	
  eaten	
  s(r-­‐fried	
  
flavored	
  with	
  salt	
  or	
  in	
  soups.	
  Other	
  common	
  names	
  include	
  
morning	
  glory	
  vegetable,	
  kangkong	
  (Malay),	
  rau	
  muong	
  
(Viet.),	
  ong	
  choi	
  (Cant.),	
  and	
  kong	
  xin	
  cai	
  (Mand.).	
  It	
  is	
  not	
  
related	
  to	
  spinach,	
  but	
  is	
  closely	
  related	
  to	
  sweet	
  potato	
  and	
  
convolvulus.	
  	
  
Medical	
  ques$ons	
  
Q:	
  In	
  children	
  with	
  an	
  acute	
  febrile	
  illness,	
  what	
  is	
  
the	
  efficacy	
  of	
  single	
  medica(on	
  therapy	
  with	
  
acetaminophen	
  or	
  ibuprofen	
  in	
  reducing	
  fever?	
  
A:	
  Ibuprofen	
  provided	
  greater	
  temperature	
  
decrement	
  and	
  longer	
  dura(on	
  of	
  an(pyresis	
  than	
  
acetaminophen	
  when	
  the	
  two	
  drugs	
  were	
  
administered	
  in	
  approximately	
  equal	
  doses.	
  
(PubMedID:	
  1621668,	
  Evidence	
  Strength:	
  A)	
  
Demner-­‐Fushman	
  and	
  Lin	
  (2007)	
  	
  
Other	
  complex	
  ques$ons	
  
1.  How	
  is	
  compost	
  made	
  and	
  used	
  for	
  gardening	
  (including	
  
different	
  types	
  of	
  compost,	
  their	
  uses,	
  origins	
  and	
  benefits)?	
  
2.  What	
  causes	
  train	
  wrecks	
  and	
  what	
  can	
  be	
  done	
  to	
  prevent	
  
them?	
  
3.  Where	
  have	
  poachers	
  endangered	
  wildlife,	
  what	
  wildlife	
  has	
  
been	
  endangered	
  and	
  what	
  steps	
  have	
  been	
  taken	
  to	
  prevent	
  
poaching?	
  
4.  What	
  has	
  been	
  the	
  human	
  toll	
  in	
  death	
  or	
  injury	
  of	
  tropical	
  
storms	
  in	
  recent	
  years?	
  	
  
45	
  
Modified	
  from	
  the	
  DUC	
  2005	
  compe((on	
  (Hoa	
  Trang	
  Dang	
  2005)	
  
Answering	
  harder	
  ques$ons:	
  
Query-­‐focused	
  mul$-­‐document	
  summariza$on	
  
•  The	
  (boHom-­‐up)	
  snippet	
  method	
  
•  Find	
  a	
  set	
  of	
  relevant	
  documents	
  
•  Extract	
  informa(ve	
  sentences	
  from	
  the	
  documents	
  
•  Order	
  and	
  modify	
  the	
  sentences	
  into	
  an	
  answer	
  
•  The	
  (top-­‐down)	
  informa(on	
  extrac(on	
  method	
  
•  build	
  specific	
  answerers	
  for	
  different	
  ques(on	
  types:	
  
•  defini(on	
  ques(ons	
  
•  biography	
  ques(ons	
  	
  
•  certain	
  medical	
  ques(ons	
  
Query-­‐Focused	
  Mul$-­‐Document	
  
Summariza$on	
  
47	
  
•  a	
  
Document
Document
Document
Document
Document
Input Docs
Sentence
Segmentation
All sentences
from documents
Sentence
Simplification
Content Selection
Sentence
Extraction:
LLR, MMR
Extracted
sentences
Information
Ordering
Sentence
Realization
Summary
All sentences
plus simplified versions
Query
Informa$on	
  Ordering	
  
•  Chronological	
  ordering:	
  
•  Order	
  sentences	
  by	
  the	
  date	
  of	
  the	
  document	
  (for	
  summarizing	
  news)..	
  	
  	
  	
  
	
  (Barzilay,	
  Elhadad,	
  and	
  McKeown	
  2002)	
  
•  Coherence:	
  
•  Choose	
  orderings	
  that	
  make	
  neighboring	
  sentences	
  similar	
  (by	
  cosine).	
  
•  Choose	
  orderings	
  in	
  which	
  neighboring	
  sentences	
  discuss	
  the	
  same	
  en(ty	
  
(Barzilay	
  and	
  Lapata	
  2007)	
  	
  
•  Topical	
  ordering	
  
•  Learn	
  the	
  ordering	
  of	
  topics	
  in	
  the	
  source	
  documents	
  
48	
  
Domain-­‐specific	
  answering:	
  
The	
  Informa$on	
  Extrac$on	
  method	
  
•  a	
  good	
  biography	
  of	
  a	
  person	
  contains:	
  
•  a	
  person’s	
  birth/death,	
  fame	
  factor,	
  educa$on,	
  na$onality	
  and	
  so	
  on	
  
•  a	
  good	
  defini$on	
  contains:	
  
•  genus	
  or	
  hypernym	
  
•  The	
  Hajj	
  is	
  a	
  type	
  of	
  ritual	
  
•  a	
  medical	
  answer	
  about	
  a	
  drug’s	
  use	
  contains:	
  
•  the	
  problem	
  (the	
  medical	
  condi(on),	
  	
  
•  the	
  interven$on	
  (the	
  drug	
  or	
  procedure),	
  and	
  	
  
•  the	
  outcome	
  (the	
  result	
  of	
  the	
  study).	
  
Informa$on	
  that	
  should	
  be	
  in	
  the	
  answer	
  
for	
  3	
  kinds	
  of	
  ques$ons	
  
The end
Ad

Recommended

Pandas
Pandas
maikroeder
 
NLTK
NLTK
Girish Khanzode
 
Information retrieval 8 term weighting
Information retrieval 8 term weighting
Vaibhav Khanna
 
Cross-lingual Information Retrieval
Cross-lingual Information Retrieval
Shadi Saleh
 
Machine Learning Interpretability / Explainability
Machine Learning Interpretability / Explainability
Raouf KESKES
 
Introduction to text classification using naive bayes
Introduction to text classification using naive bayes
Dhwaj Raj
 
Text data mining1
Text data mining1
KU Leuven
 
Introduction to natural language processing, history and origin
Introduction to natural language processing, history and origin
Shubhankar Mohan
 
Text Analytics Presentation
Text Analytics Presentation
Skylar Ritchie
 
Natural Language Processing
Natural Language Processing
Mariana Soffer
 
Information Extraction
Information Extraction
ssbd6985
 
Text mining
Text mining
ThejeswiniChivukula
 
Probabilistic information retrieval models & systems
Probabilistic information retrieval models & systems
Selman Bozkır
 
Data wrangling week1
Data wrangling week1
Ferdin Joe John Joseph PhD
 
information retrieval Techniques and normalization
information retrieval Techniques and normalization
Ameenababs
 
Topic Modeling
Topic Modeling
Kyunghoon Kim
 
Data Mining Concepts
Data Mining Concepts
Dung Nguyen
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
VenkateshMurugadas
 
Feature Engineering
Feature Engineering
HJ van Veen
 
Abstractive Text Summarization
Abstractive Text Summarization
Tho Phan
 
Online machine learning in Streaming Applications
Online machine learning in Streaming Applications
Stavros Kontopoulos
 
NAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITION
live_and_let_live
 
Vector space model of information retrieval
Vector space model of information retrieval
Nanthini Dominique
 
Vector space model in information retrieval
Vector space model in information retrieval
Tharuka Vishwajith Sarathchandra
 
Evaluation in Information Retrieval
Evaluation in Information Retrieval
Dishant Ailawadi
 
Information Extraction
Information Extraction
Rubén Izquierdo Beviá
 
Natural language processing (nlp)
Natural language processing (nlp)
Kuppusamy P
 
Ensemble learning Techniques
Ensemble learning Techniques
Babu Priyavrat
 
Lecture: Semantic Word Clouds
Lecture: Semantic Word Clouds
Marina Santini
 
Lecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic Web
Marina Santini
 

More Related Content

What's hot (20)

Text Analytics Presentation
Text Analytics Presentation
Skylar Ritchie
 
Natural Language Processing
Natural Language Processing
Mariana Soffer
 
Information Extraction
Information Extraction
ssbd6985
 
Text mining
Text mining
ThejeswiniChivukula
 
Probabilistic information retrieval models & systems
Probabilistic information retrieval models & systems
Selman Bozkır
 
Data wrangling week1
Data wrangling week1
Ferdin Joe John Joseph PhD
 
information retrieval Techniques and normalization
information retrieval Techniques and normalization
Ameenababs
 
Topic Modeling
Topic Modeling
Kyunghoon Kim
 
Data Mining Concepts
Data Mining Concepts
Dung Nguyen
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
VenkateshMurugadas
 
Feature Engineering
Feature Engineering
HJ van Veen
 
Abstractive Text Summarization
Abstractive Text Summarization
Tho Phan
 
Online machine learning in Streaming Applications
Online machine learning in Streaming Applications
Stavros Kontopoulos
 
NAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITION
live_and_let_live
 
Vector space model of information retrieval
Vector space model of information retrieval
Nanthini Dominique
 
Vector space model in information retrieval
Vector space model in information retrieval
Tharuka Vishwajith Sarathchandra
 
Evaluation in Information Retrieval
Evaluation in Information Retrieval
Dishant Ailawadi
 
Information Extraction
Information Extraction
Rubén Izquierdo Beviá
 
Natural language processing (nlp)
Natural language processing (nlp)
Kuppusamy P
 
Ensemble learning Techniques
Ensemble learning Techniques
Babu Priyavrat
 
Text Analytics Presentation
Text Analytics Presentation
Skylar Ritchie
 
Natural Language Processing
Natural Language Processing
Mariana Soffer
 
Information Extraction
Information Extraction
ssbd6985
 
Probabilistic information retrieval models & systems
Probabilistic information retrieval models & systems
Selman Bozkır
 
information retrieval Techniques and normalization
information retrieval Techniques and normalization
Ameenababs
 
Data Mining Concepts
Data Mining Concepts
Dung Nguyen
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
VenkateshMurugadas
 
Feature Engineering
Feature Engineering
HJ van Veen
 
Abstractive Text Summarization
Abstractive Text Summarization
Tho Phan
 
Online machine learning in Streaming Applications
Online machine learning in Streaming Applications
Stavros Kontopoulos
 
Vector space model of information retrieval
Vector space model of information retrieval
Nanthini Dominique
 
Evaluation in Information Retrieval
Evaluation in Information Retrieval
Dishant Ailawadi
 
Natural language processing (nlp)
Natural language processing (nlp)
Kuppusamy P
 
Ensemble learning Techniques
Ensemble learning Techniques
Babu Priyavrat
 

Viewers also liked (19)

Lecture: Semantic Word Clouds
Lecture: Semantic Word Clouds
Marina Santini
 
Lecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic Web
Marina Santini
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Marina Santini
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)
Marina Santini
 
Lecture: Question Answering
Lecture: Question Answering
Marina Santini
 
Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)
Marina Santini
 
Towards Contextualized Information: How Automatic Genre Identification Can Help
Towards Contextualized Information: How Automatic Genre Identification Can Help
Marina Santini
 
Information Gain
Information Gain
guest32311f
 
Lecture 2: Preliminaries (Understanding and Preprocessing data)
Lecture 2: Preliminaries (Understanding and Preprocessing data)
Marina Santini
 
Lecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language Technology
Marina Santini
 
Lecture 3b: Decision Trees (1 part)
Lecture 3b: Decision Trees (1 part)
Marina Santini
 
09 semantic web & ontologies
09 semantic web & ontologies
Marina Santini
 
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Marina Santini
 
Lecture 9 Perceptron
Lecture 9 Perceptron
Marina Santini
 
Lecture: Regular Expressions and Regular Languages
Lecture: Regular Expressions and Regular Languages
Marina Santini
 
Lecture: Context-Free Grammars
Lecture: Context-Free Grammars
Marina Santini
 
Decision tree example problem
Decision tree example problem
SATYABRATA PRADHAN
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
Marina Santini
 
Decision tree
Decision tree
R A Akerkar
 
Lecture: Semantic Word Clouds
Lecture: Semantic Word Clouds
Marina Santini
 
Lecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic Web
Marina Santini
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Marina Santini
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)
Marina Santini
 
Lecture: Question Answering
Lecture: Question Answering
Marina Santini
 
Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)
Marina Santini
 
Towards Contextualized Information: How Automatic Genre Identification Can Help
Towards Contextualized Information: How Automatic Genre Identification Can Help
Marina Santini
 
Information Gain
Information Gain
guest32311f
 
Lecture 2: Preliminaries (Understanding and Preprocessing data)
Lecture 2: Preliminaries (Understanding and Preprocessing data)
Marina Santini
 
Lecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language Technology
Marina Santini
 
Lecture 3b: Decision Trees (1 part)
Lecture 3b: Decision Trees (1 part)
Marina Santini
 
09 semantic web & ontologies
09 semantic web & ontologies
Marina Santini
 
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Marina Santini
 
Lecture: Regular Expressions and Regular Languages
Lecture: Regular Expressions and Regular Languages
Marina Santini
 
Lecture: Context-Free Grammars
Lecture: Context-Free Grammars
Marina Santini
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
Marina Santini
 
Ad

Similar to Lecture: Summarization (20)

Nx2422722277
Nx2422722277
IJERA Editor
 
Document Summarization
Document Summarization
Pratik Kumar
 
Survey on Open IE
Survey on Open IE
ChristinaNiklaus
 
Knowledge acquisition using automated techniques
Knowledge acquisition using automated techniques
University of Melbourne, Australia
 
EXTRACTING ARABIC RELATIONS FROM THE WEB
EXTRACTING ARABIC RELATIONS FROM THE WEB
ijcsit
 
Shilpa shukla processing_text
Shilpa shukla processing_text
shilpashukla01
 
Natural Language Processing and Search Intent Understanding C3 Conductor 2019...
Natural Language Processing and Search Intent Understanding C3 Conductor 2019...
Dawn Anderson MSc DigM
 
Hypertext
Hypertext
patrickalfredwaluchio
 
Summarization in Computational linguistics
Summarization in Computational linguistics
Ahmad Mashhood
 
Effective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP Systems
Andre Freitas
 
IRJET- A Survey Paper on Text Summarization Methods
IRJET- A Survey Paper on Text Summarization Methods
IRJET Journal
 
Tldr
Tldr
Narayana Murthy
 
Networks and Natural Language Processing
Networks and Natural Language Processing
Ahmed Magdy Ezzeldin, MSc.
 
From Linked Data to Semantic Applications
From Linked Data to Semantic Applications
Andre Freitas
 
Mining Web content for Enhanced Search
Mining Web content for Enhanced Search
Roi Blanco
 
Open IE tutorial 2018
Open IE tutorial 2018
Andre Freitas
 
mlss
mlss
MaiAGE-INRA, Paris Sud, LIMSI-CNRS
 
Copy of 10text (2)
Copy of 10text (2)
Uma Se
 
Chapter 10 Data Mining Techniques
Chapter 10 Data Mining Techniques
Houw Liong The
 
Web and text
Web and text
Institute of Technology Telkom
 
Document Summarization
Document Summarization
Pratik Kumar
 
EXTRACTING ARABIC RELATIONS FROM THE WEB
EXTRACTING ARABIC RELATIONS FROM THE WEB
ijcsit
 
Shilpa shukla processing_text
Shilpa shukla processing_text
shilpashukla01
 
Natural Language Processing and Search Intent Understanding C3 Conductor 2019...
Natural Language Processing and Search Intent Understanding C3 Conductor 2019...
Dawn Anderson MSc DigM
 
Summarization in Computational linguistics
Summarization in Computational linguistics
Ahmad Mashhood
 
Effective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP Systems
Andre Freitas
 
IRJET- A Survey Paper on Text Summarization Methods
IRJET- A Survey Paper on Text Summarization Methods
IRJET Journal
 
From Linked Data to Semantic Applications
From Linked Data to Semantic Applications
Andre Freitas
 
Mining Web content for Enhanced Search
Mining Web content for Enhanced Search
Roi Blanco
 
Open IE tutorial 2018
Open IE tutorial 2018
Andre Freitas
 
Copy of 10text (2)
Copy of 10text (2)
Uma Se
 
Chapter 10 Data Mining Techniques
Chapter 10 Data Mining Techniques
Houw Liong The
 
Ad

More from Marina Santini (18)

Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Marina Santini
 
Towards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology Applications
Marina Santini
 
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
Marina Santini
 
An Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability Features
Marina Santini
 
Relation Extraction
Relation Extraction
Marina Santini
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense Disambiguation
Marina Santini
 
Lecture: Word Senses
Lecture: Word Senses
Marina Santini
 
Sentiment Analysis
Sentiment Analysis
Marina Santini
 
Semantic Role Labeling
Semantic Role Labeling
Marina Santini
 
Semantics and Computational Semantics
Semantics and Computational Semantics
Marina Santini
 
Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)
Marina Santini
 
Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1)
Marina Santini
 
Lecture 5: Interval Estimation
Lecture 5: Interval Estimation
Marina Santini
 
Lecture 1: Introduction to the Course (Practical Information)
Lecture 1: Introduction to the Course (Practical Information)
Marina Santini
 
Lecture: Joint, Conditional and Marginal Probabilities
Lecture: Joint, Conditional and Marginal Probabilities
Marina Santini
 
Mathematics for Language Technology: Introduction to Probability Theory
Mathematics for Language Technology: Introduction to Probability Theory
Marina Santini
 
Lecture: Automata
Lecture: Automata
Marina Santini
 
Lecture: Semantic Word Clouds
Lecture: Semantic Word Clouds
Marina Santini
 
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Marina Santini
 
Towards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology Applications
Marina Santini
 
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
Marina Santini
 
An Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability Features
Marina Santini
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense Disambiguation
Marina Santini
 
Semantic Role Labeling
Semantic Role Labeling
Marina Santini
 
Semantics and Computational Semantics
Semantics and Computational Semantics
Marina Santini
 
Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)
Marina Santini
 
Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1)
Marina Santini
 
Lecture 5: Interval Estimation
Lecture 5: Interval Estimation
Marina Santini
 
Lecture 1: Introduction to the Course (Practical Information)
Lecture 1: Introduction to the Course (Practical Information)
Marina Santini
 
Lecture: Joint, Conditional and Marginal Probabilities
Lecture: Joint, Conditional and Marginal Probabilities
Marina Santini
 
Mathematics for Language Technology: Introduction to Probability Theory
Mathematics for Language Technology: Introduction to Probability Theory
Marina Santini
 
Lecture: Semantic Word Clouds
Lecture: Semantic Word Clouds
Marina Santini
 

Recently uploaded (20)

Values Education 10 Quarter 1 Module .pptx
Values Education 10 Quarter 1 Module .pptx
JBPafin
 
Romanticism in Love and Sacrifice An Analysis of Oscar Wilde’s The Nightingal...
Romanticism in Love and Sacrifice An Analysis of Oscar Wilde’s The Nightingal...
KaryanaTantri21
 
List View Components in Odoo 18 - Odoo Slides
List View Components in Odoo 18 - Odoo Slides
Celine George
 
Peer Teaching Observations During School Internship
Peer Teaching Observations During School Internship
AjayaMohanty7
 
ENGLISH_Q1_W1 PowerPoint grade 3 quarter 1 week 1
ENGLISH_Q1_W1 PowerPoint grade 3 quarter 1 week 1
jutaydeonne
 
Gladiolous Cultivation practices by AKL.pdf
Gladiolous Cultivation practices by AKL.pdf
kushallamichhame
 
How to Customize Quotation Layouts in Odoo 18
How to Customize Quotation Layouts in Odoo 18
Celine George
 
English 3 Quarter 1_LEwithLAS_Week 1.pdf
English 3 Quarter 1_LEwithLAS_Week 1.pdf
DeAsisAlyanajaneH
 
ENGLISH-5 Q1 Lesson 1.pptx - Story Elements
ENGLISH-5 Q1 Lesson 1.pptx - Story Elements
Mayvel Nadal
 
Paper 107 | From Watchdog to Lapdog: Ishiguro’s Fiction and the Rise of “Godi...
Paper 107 | From Watchdog to Lapdog: Ishiguro’s Fiction and the Rise of “Godi...
Rajdeep Bavaliya
 
LDMMIA Yoga S10 Free Workshop Grad Level
LDMMIA Yoga S10 Free Workshop Grad Level
LDM & Mia eStudios
 
INDUCTIVE EFFECT slide for first prof pharamacy students
INDUCTIVE EFFECT slide for first prof pharamacy students
SHABNAM FAIZ
 
Birnagar High School Platinum Jubilee Quiz.pptx
Birnagar High School Platinum Jubilee Quiz.pptx
Sourav Kr Podder
 
How to Manage Different Customer Addresses in Odoo 18 Accounting
How to Manage Different Customer Addresses in Odoo 18 Accounting
Celine George
 
ECONOMICS, DISASTER MANAGEMENT, ROAD SAFETY - STUDY MATERIAL [10TH]
ECONOMICS, DISASTER MANAGEMENT, ROAD SAFETY - STUDY MATERIAL [10TH]
SHERAZ AHMAD LONE
 
CRYPTO TRADING COURSE BY FINANCEWORLD.IO
CRYPTO TRADING COURSE BY FINANCEWORLD.IO
AndrewBorisenko3
 
Vitamin and Nutritional Deficiencies.pptx
Vitamin and Nutritional Deficiencies.pptx
Vishal Chanalia
 
Hurricane Helene Application Documents Checklists
Hurricane Helene Application Documents Checklists
Mebane Rash
 
How payment terms are configured in Odoo 18
How payment terms are configured in Odoo 18
Celine George
 
NSUMD_M1 Library Orientation_June 11, 2025.pptx
NSUMD_M1 Library Orientation_June 11, 2025.pptx
Julie Sarpy
 
Values Education 10 Quarter 1 Module .pptx
Values Education 10 Quarter 1 Module .pptx
JBPafin
 
Romanticism in Love and Sacrifice An Analysis of Oscar Wilde’s The Nightingal...
Romanticism in Love and Sacrifice An Analysis of Oscar Wilde’s The Nightingal...
KaryanaTantri21
 
List View Components in Odoo 18 - Odoo Slides
List View Components in Odoo 18 - Odoo Slides
Celine George
 
Peer Teaching Observations During School Internship
Peer Teaching Observations During School Internship
AjayaMohanty7
 
ENGLISH_Q1_W1 PowerPoint grade 3 quarter 1 week 1
ENGLISH_Q1_W1 PowerPoint grade 3 quarter 1 week 1
jutaydeonne
 
Gladiolous Cultivation practices by AKL.pdf
Gladiolous Cultivation practices by AKL.pdf
kushallamichhame
 
How to Customize Quotation Layouts in Odoo 18
How to Customize Quotation Layouts in Odoo 18
Celine George
 
English 3 Quarter 1_LEwithLAS_Week 1.pdf
English 3 Quarter 1_LEwithLAS_Week 1.pdf
DeAsisAlyanajaneH
 
ENGLISH-5 Q1 Lesson 1.pptx - Story Elements
ENGLISH-5 Q1 Lesson 1.pptx - Story Elements
Mayvel Nadal
 
Paper 107 | From Watchdog to Lapdog: Ishiguro’s Fiction and the Rise of “Godi...
Paper 107 | From Watchdog to Lapdog: Ishiguro’s Fiction and the Rise of “Godi...
Rajdeep Bavaliya
 
LDMMIA Yoga S10 Free Workshop Grad Level
LDMMIA Yoga S10 Free Workshop Grad Level
LDM & Mia eStudios
 
INDUCTIVE EFFECT slide for first prof pharamacy students
INDUCTIVE EFFECT slide for first prof pharamacy students
SHABNAM FAIZ
 
Birnagar High School Platinum Jubilee Quiz.pptx
Birnagar High School Platinum Jubilee Quiz.pptx
Sourav Kr Podder
 
How to Manage Different Customer Addresses in Odoo 18 Accounting
How to Manage Different Customer Addresses in Odoo 18 Accounting
Celine George
 
ECONOMICS, DISASTER MANAGEMENT, ROAD SAFETY - STUDY MATERIAL [10TH]
ECONOMICS, DISASTER MANAGEMENT, ROAD SAFETY - STUDY MATERIAL [10TH]
SHERAZ AHMAD LONE
 
CRYPTO TRADING COURSE BY FINANCEWORLD.IO
CRYPTO TRADING COURSE BY FINANCEWORLD.IO
AndrewBorisenko3
 
Vitamin and Nutritional Deficiencies.pptx
Vitamin and Nutritional Deficiencies.pptx
Vishal Chanalia
 
Hurricane Helene Application Documents Checklists
Hurricane Helene Application Documents Checklists
Mebane Rash
 
How payment terms are configured in Odoo 18
How payment terms are configured in Odoo 18
Celine George
 
NSUMD_M1 Library Orientation_June 11, 2025.pptx
NSUMD_M1 Library Orientation_June 11, 2025.pptx
Julie Sarpy
 

Lecture: Summarization

  • 1. Seman&c  Analysis  in  Language  Technology   http://stp.lingfil.uu.se/~santinim/sais/2016/sais_2016.htm 
 
 Summarization Marina  San(ni   [email protected]fil.uu.se     Department  of  Linguis(cs  and   Philology   Uppsala  University,  Uppsala,   Sweden     Spring  2016      
  • 2. Previous  Lecture:  Rela$on  Extrac$on   2  
  • 3. What’s  a  rela$on?   •  A  rela(on  can  be  formally  defined  in  the  form  of  a  tuple     •  t  =  (e1;  e2  …;  en)     •  where  the  ei  are  en((es  in  a  predefined  rela(on  r  within   document  D.     •  Most  rela(on  extrac(on  systems  focus  on  extrac(ng  binary   rela$ons.     •  Examples  of  binary  rela(ons  include   •  located-­‐in(CMU,  PiHsburgh),     •  father-­‐of(ManuelBlum,  Avrim  Blum).     •  It  is  also  possible  to  go  to  higher-­‐order  rela(ons  as  well  and   extract  more  complex  rela(ons  (ex  biomedicine).     3  
  • 4. Why  Rela$on  Extrac$on?   •  There  exists  a  vast  amount  of  unstructured  electronic  text  on  the   Web,  including  newswire,  blogs  ,emails,  governmental   documents,  chats,  and  so  on.     •  The  whole  idea  of  IE  is  turn  unstructured  text  into  structured  by   annota(ng  seman(c  informa(on.   •  RE  is  the  task    of  recognizing  rela(ons  between  en((es  in   unstructured  text.     ! If a query to a search engine is “When was Gandhi born ?”, then the expected answer would be“Gandhi was born in 1869”. The template of the answer is <PERSON> born-in <YEAR> which is nothing but the relational triple: ! born in(PERSON, YEAR) ! where PERSON and YEAR are the entities. ! 4  
  • 5. Watch  out!   •  RE  =  extract  facts  from  unstructured  texts,  ie  rela(ons  that  exist   betw  en((es,  such  as  dates,  proper  names,  companies.     •  Other  rela(ons  (related  to  Word  Senses):  seman(c  rela(ons   betw  concepts:  hyperonyms,  hyponyms,  etc.  like  in  Wordnet.     5  
  • 6. How  to  build  rela$on  extractors   1.  Hand-­‐wriHen  paHerns   2.  Supervised  machine  learning   3.  Semi-­‐supervised  and  unsupervised     •  Bootstrapping  (using  seeds)   •  Distant  supervision   •  Unsupervised  learning  from  the  web   6  
  • 7. Seed-­‐based  or  bootstrapping  approaches   to  rela$on  extrac$on   •  No  training  set?  Maybe  you  have:   •  A  few  seed  tuples    or   •  A  few  high-­‐precision  paHerns   •  Can  you  use  those  seeds  to  do  something  useful?   •  Bootstrapping:  use  the  seeds  to  directly  learn  to  populate  a   rela(on   7   Roughly  said:  Use  seeds  to  ini(alize  a   process  of  annota(on,  then  refine   through  itera(ons  
  • 8. Dipre:  Extract  <author,book>  pairs   •  Start  with  5  seeds:           •  Find  Instances:   The  Comedy  of  Errors,  by    William  Shakespeare,  was   The  Comedy  of  Errors,  by    William  Shakespeare,  is   The  Comedy  of  Errors,  one  of  William  Shakespeare's  earliest  aHempts   The  Comedy  of  Errors,  one  of  William  Shakespeare's  most   •  Extract  paHerns  (group  by  middle,  take  longest  common  prefix/suffix)   ?x , by ?y , ?x , one of ?y ‘s ! •  Now  iterate,  finding  new  seeds  that  match  the  paHern   ! Brin, Sergei. 1998. Extracting Patterns and Relations from the World Wide Web. Author   Book   Isaac  Asimov   The  Robots  of  Dawn   David  Brin   Star(de  Rising   James  Gleick   Chaos:  Making  a  New   Science   Charles  Dickens   Great  Expecta(ons   William   Shakespeare   The  Comedy  of  Errors   8  
  • 9. Prac$cal  Ac$vity   Search  for  phrasal  paHerns  on  the  web       Our  seeds:     "*  is  a  novel  by  *"     "*  wrote  the  novel  *"     "the  novel  *  was  wriHen  by  *"   op#onally  add  more  phrases…     Further  refinemets  that  we  felt  are  needed:     •  get  read  of  non-­‐informa(ve  text  included  in  the  returned  strings   (maybe  via  adding  addi(onal  paHerns  in  the  regular  expressions)   •  Iden(fy  name  en((es   •  Maybe  via  Reg  Expressions  (eg.  iden(fy  words  star(ng  with  uppercase)   •  Maybe  combining  seeds  and  a  NER  system   •  ect.   9c   Google is fantastic, but also unpredictable… à different behaviours depending on the machines, domains, and some “hidden” criteria…  
  • 10. End  of  previous  lecture   10  
  • 11. Acknowledgements Most  slides  borrowed  or  adapted  from:   Dan  Jurafsky  and  Christopher  Manning,  Coursera   Some  inspira(on  from  Dragomir  Radev,  Coursera  ….           J&M(2009)              
  • 15. Book  Summaries   15   Cliff’s  Notes  are  a  series   of  student  study  guides   available  primarily  in  the   United  States.  
  • 22. Human  Summariza$on  and  Abstrac$ng   22  
  • 25. Text  Summariza$on   •  Goal:  produce  an  abridged  version  of  a  text  that  contains   informa(on  that  is  important  or  relevant  to  a  user.             •  Summariza$on  Applica$ons   •  outlines  or  abstracts  of  any  document,  ar(cle,  etc   •  summaries  of  email  threads   •  ac$on  items  from  a  mee(ng   •  simplifying  text  by  compressing  sentences   25  
  • 26. What  to  summarize?     Single  vs.  mul$ple  documents   •  Single-­‐document  summariza$on   •  Given  a  single  document,  produce   •  abstract   •  outline   •  headline   •  Mul$ple-­‐document  summariza$on   •  Given  a  group  of  documents,  produce  a  gist  of  the  content:   •  a  series  of  news  stories  on  the  same  event   •  a  set  of  web  pages  about  some  topic  or  ques(on   26  
  • 27. Query-­‐focused  Summariza$on   &    Generic  Summariza$on   •  Generic  summariza(on:   •   Summarize  the  content  of  a  document   •  Query-­‐focused  summariza(on:   •   summarize  a  document  with  respect  to  an   informa(on  need  expressed  in  a  user  query.   •  a  kind  of  complex  ques(on  answering:   •  Answer  a  ques(on  by  summarizing  a  document   that  has  the  informa(on  to  construct  the  answer     27  
  • 28. Summariza$on  for  Ques$on  Answering:   Snippets   •  Create  snippets  summarizing  a  web  page  for  a  query   •  Google:  156  characters  (about  26  words)  plus  (tle  and  link   28  
  • 29. Summariza$on  for  Ques$on  Answering:   Mul$ple  documents   Create  answers  to  complex  ques(ons   summarizing  mul(ple  documents.   •  Instead  of  giving  a  snippet  for  each  document   •  Create  a  cohesive  answer  that  combines   informa(on  from  each  document   29  
  • 30. Extrac$ve  summariza$on  &     Abstrac$ve  summariza$on   •  Extrac(ve  summariza(on:   •  create  the  summary  from  phrases  or  sentences  in  the  source   document(s)   •  Abstrac(ve  summariza(on:   •  express  the  ideas  in  the  source  documents  using  (at  least  in   part)  different  words   30  
  • 31. Simple  baseline:  take  the  first  sentence   31  
  • 34. Summariza$on:  Three  Stages   1.  content  selec(on:  choose  sentences  to  extract   from  the  document   2.  informa(on  ordering:  choose  an  order  to  place   them  in  the  summary   3.  sentence  realiza(on:  clean  up  the  sentences   34   Document Sentence Segmentation Sentence Extraction All sentences from documents Extracted sentences Information Ordering Sentence Realization Summary Content Selection Sentence Simplification
  • 35. Basic  Summariza$on  Algorithm   1.  content  selec(on:  choose  sentences  to  extract   from  the  document   2.  informa(on  ordering:  just  use  document  order   3.  sentence  realiza(on:  keep  original  sentences   35   Document Sentence Segmentation Sentence Extraction All sentences from documents Extracted sentences Information Ordering Sentence Realization Summary Content Selection Sentence Simplification
  • 36. Unsupervised  content  selec$on   •  Intui(on  da(ng  back  to  Luhn  (1958):   •  Choose  sentences  that  have  salient  or  informa(ve  words   •  Two  approaches  to  defining  salient  words   1.  o-­‐idf:  weigh  each  word  wi  in  document  j  by  o-­‐idf   2.  topic  signature:  choose  a  smaller  set  of  salient  words   •  mutual  informa(on   •  log-­‐likelihood  ra(o  (LLR)    Dunning  (1993),  Lin  and  Hovy  (2000)   36   weight(wi ) = tfij ×idfi weight(wi ) = 1 if -2logλ(wi ) >10 0 otherwise ! " # $# H.  P.  Luhn.  1958.  The  Automa(c  Crea(on  of  Literature  Abstracts.   IBM  Journal  of  Research  and  Development.  2:2,  159-­‐165.    
  • 37. Topic  signature-­‐based  content  selec$on   with  queries   •  choose  words  that  are  informa(ve  either     •  by  log-­‐likelihood  ra(o  (LLR)   •  or  by  appearing  in  the  query   •  Weigh  a  sentence  (or  window)  by  weight  of  its  words:   37   Conroy,  Schlesinger,  and  O’Leary  2006   weight(wi ) = 1 if -2logλ(wi ) >10 1 if wi ∈ question 0 otherwise " # $$ % $ $ weight(s) = 1 S weight(w) w∈S ∑ (could  learn  more   complex  weights)  
  • 38. Supervised  content  selec$on   •  Given:     •  a  labeled  training  set  of  good   summaries  for  each  document   •  Align:   •  the  sentences  in  the  document   with  sentences  in  the  summary   •  Extract  features   •  posi(on  (first  sentence?)     •  length  of  sentence   •  word  informa(veness,  cue  phrases   •  cohesion   •  Train   •  Problems:   •  hard  to  get  labeled  training   data   •  alignment  difficult   •  performance  not  beHer  than   unsupervised  algorithms   •  So  in  prac(ce:   •  Unsupervised  content   selec$on  is  more  common   •  a  binary  classifier  (put  sentence  in  summary?  yes  or  no)    
  • 40. ROUGE  (Recall  Oriented  Understudy  for   Gis$ng  Evalua$on)     •  Intrinsic  metric  for  automa(cally  evalua(ng  summaries   •  Based  on  BLEU  (a  metric  used  for  machine  transla(on)   •  Not  as  good  as  human  evalua(on  (“Did  this  answer  the  user’s  ques(on?”)   •  But  much  more  convenient   •  Given  a  document  D,  and  an  automa(c  summary  X:   1.  Have  N  humans  produce  a  set  of  reference  summaries    of  D   2.  Run  system,  giving  automa(c  summary  X   3.  What  percentage  of  the  bigrams  from  the  reference   summaries  appear  in  X?   40   Lin and Hovy 2003   ROUGE − 2 = min(count(i, X),count(i,S)) bigrams i∈S ∑ s∈{RefSummaries} ∑ count(i,S) bigrams i∈S ∑ s∈{RefSummaries} ∑
  • 41. A  ROUGE  example:   Q:  “What  is  water  spinach?”   Human  1:  Water  spinach  is  a  green  leafy  vegetable  grown  in  the   tropics.   Human  2:    Water  spinach  is  a  semi-­‐aqua(c  tropical  plant  grown  as  a   vegetable.   Human  3:  Water  spinach  is  a  commonly  eaten  leaf  vegetable  of  Asia.   •  System  answer:  Water  spinach  is  a  leaf  vegetable  commonly  eaten   in  tropical  areas  of  Asia.   •  ROUGE-­‐2    =   41   10  +  9  +  9   3  +  3  +  6   =  12/28  =  .43    
  • 43. Defini$on  ques$ons   Q:  What  is  water  spinach?   A:  Water  spinach  (ipomoea  aqua(ca)  is  a  semi-­‐aqua(c  leafy   green  plant  with  long  hollow  stems  and  spear-­‐  or  heart-­‐ shaped  leaves,  widely  grown  throughout  Asia  as  a  leaf   vegetable.  The  leaves  and  stems  are  oten  eaten  s(r-­‐fried   flavored  with  salt  or  in  soups.  Other  common  names  include   morning  glory  vegetable,  kangkong  (Malay),  rau  muong   (Viet.),  ong  choi  (Cant.),  and  kong  xin  cai  (Mand.).  It  is  not   related  to  spinach,  but  is  closely  related  to  sweet  potato  and   convolvulus.    
  • 44. Medical  ques$ons   Q:  In  children  with  an  acute  febrile  illness,  what  is   the  efficacy  of  single  medica(on  therapy  with   acetaminophen  or  ibuprofen  in  reducing  fever?   A:  Ibuprofen  provided  greater  temperature   decrement  and  longer  dura(on  of  an(pyresis  than   acetaminophen  when  the  two  drugs  were   administered  in  approximately  equal  doses.   (PubMedID:  1621668,  Evidence  Strength:  A)   Demner-­‐Fushman  and  Lin  (2007)    
  • 45. Other  complex  ques$ons   1.  How  is  compost  made  and  used  for  gardening  (including   different  types  of  compost,  their  uses,  origins  and  benefits)?   2.  What  causes  train  wrecks  and  what  can  be  done  to  prevent   them?   3.  Where  have  poachers  endangered  wildlife,  what  wildlife  has   been  endangered  and  what  steps  have  been  taken  to  prevent   poaching?   4.  What  has  been  the  human  toll  in  death  or  injury  of  tropical   storms  in  recent  years?     45   Modified  from  the  DUC  2005  compe((on  (Hoa  Trang  Dang  2005)  
  • 46. Answering  harder  ques$ons:   Query-­‐focused  mul$-­‐document  summariza$on   •  The  (boHom-­‐up)  snippet  method   •  Find  a  set  of  relevant  documents   •  Extract  informa(ve  sentences  from  the  documents   •  Order  and  modify  the  sentences  into  an  answer   •  The  (top-­‐down)  informa(on  extrac(on  method   •  build  specific  answerers  for  different  ques(on  types:   •  defini(on  ques(ons   •  biography  ques(ons     •  certain  medical  ques(ons  
  • 47. Query-­‐Focused  Mul$-­‐Document   Summariza$on   47   •  a   Document Document Document Document Document Input Docs Sentence Segmentation All sentences from documents Sentence Simplification Content Selection Sentence Extraction: LLR, MMR Extracted sentences Information Ordering Sentence Realization Summary All sentences plus simplified versions Query
  • 48. Informa$on  Ordering   •  Chronological  ordering:   •  Order  sentences  by  the  date  of  the  document  (for  summarizing  news)..          (Barzilay,  Elhadad,  and  McKeown  2002)   •  Coherence:   •  Choose  orderings  that  make  neighboring  sentences  similar  (by  cosine).   •  Choose  orderings  in  which  neighboring  sentences  discuss  the  same  en(ty   (Barzilay  and  Lapata  2007)     •  Topical  ordering   •  Learn  the  ordering  of  topics  in  the  source  documents   48  
  • 49. Domain-­‐specific  answering:   The  Informa$on  Extrac$on  method   •  a  good  biography  of  a  person  contains:   •  a  person’s  birth/death,  fame  factor,  educa$on,  na$onality  and  so  on   •  a  good  defini$on  contains:   •  genus  or  hypernym   •  The  Hajj  is  a  type  of  ritual   •  a  medical  answer  about  a  drug’s  use  contains:   •  the  problem  (the  medical  condi(on),     •  the  interven$on  (the  drug  or  procedure),  and     •  the  outcome  (the  result  of  the  study).  
  • 50. Informa$on  that  should  be  in  the  answer   for  3  kinds  of  ques$ons