0% found this document useful (0 votes)

6 views

Lecture 5

Uploaded by

Beekan Gammadaa

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Lecture 5

Uploaded by

Beekan Gammadaa

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 56

Parts of Speech Tagging

Today
• Parts of speech (POS)
• Tagsets
• POS Tagging
– Rule-based tagging
– HMMs and Viterbi algorithm

2
Parts of Speech
• 8 (ish) traditional parts of speech
– Noun, verb, adjective, preposition, adverb,
article, interjection, pronoun, conjunction, etc
– Called: parts-of-speech, lexical categories,
word classes, morphological classes, lexical
tags...
– Lots of debate within linguistics about the
number, nature, and universality of these
• We’ll completely ignore this debate.

3
POS examples
• N noun chair, bandwidth, pacing
• V verb study, debate, munch
• ADJ adjective purple, tall, ridiculous
• ADVadverb unfortunately, slowly
• P preposition of, by, to
• PRO pronoun I, me, mine
• DET determiner the, a, that, those

4
POS Tagging
• The process of assigning a part-of-speech or
lexical class marker to each word in a
collection.
WORD
tag

the
DET
koala
N
put V
the
DET
keys N 5
Why is POS Tagging Useful?
• First step of a vast number of practical tasks
• Speech synthesis
– How to pronounce “lead”?
– INsult inSULT
– OBject obJECT
– OVERflow overFLOW
– DIScount disCOUNT
– CONtent conTENT
• Parsing
– Need to know if a word is an N or V before you can parse
• Information extraction
– Finding names, relations, etc.
• Machine Translation
6
Open and Closed Classes
• Closed class: a small fixed membership
– Prepositions: of, in, by, …
– Auxiliaries: may, can, will had, been, …
– Pronouns: I, you, she, mine, his, them, …
– Usually function words (short common words which
play a role in grammar)
• Open class: new ones can be created all the time
– English has 4: Nouns, Verbs, Adjectives, Adverbs
– Many languages have these 4, but not all!

7
Open Class Words
• Nouns
– Proper nouns (Boulder, Granby, Eli Manning)
• English capitalizes these.
– Common nouns (the rest).
– Count nouns and mass nouns
• Count: have plurals, get counted: goat/goats, one goat, two goats
• Mass: don’t get counted (snow, salt, communism) (*two snows)
• Adverbs: tend to modify things
– Unfortunately, John walked home extremely slowly yesterday
– Directional/locative adverbs (here,home, downhill)
– Degree adverbs (extremely, very, somewhat)
– Manner adverbs (slowly, slinkily, delicately)
• Verbs
– In English, have morphological affixes (eat/eats/eaten)

8
Closed Class Words
Examples:
– prepositions: on, under, over, …
– particles: up, down, on, off, …
– determiners: a, an, the, …
– pronouns: she, who, I, ..
– conjunctions: and, but, or, …
– auxiliary verbs: can, may should, …
– numerals: one, two, three, third, …

9
Prepositions from CELEX

10
English Particles

11
Conjunctions

12
POS Tagging
Choosing a Tagset
• There are so many parts of speech, potential distinctions we
can draw
• To do POS tagging, we need to choose a standard set of tags to
work with
• Could pick very coarse tagsets
– N, V, Adj, Adv.
• More commonly used set is finer grained, the “Penn TreeBank
tagset”, 45 tags
– PRP$, WRB, WP$, VBG
• Even more fine-grained tagsets exist

13
Penn TreeBank POS Tagset

14
Using the Penn Tagset
• The/DT grand/JJ jury/NN
commmented/VBD on/IN a/DT number/NN
of/IN other/JJ topics/NNS ./.
• Prepositions and subordinating conjunctions
marked IN (“although/IN I/PRP..”)
• Except the preposition/complementizer “to”
is just marked “TO”.

15
POS Tagging
• Words often have more than one POS: back
– The back door = JJ
– On my back = NN
– Win the voters back = RB
– Promised to back the bill = VB
• The POS tagging problem is to determine
the POS tag for a particular instance of a
word.

16
How Hard is POS Tagging? Measuring
Ambiguity

17
Two Methods for POS Tagging

1. Rule-based tagging
– (ENGTWOL)
2. Stochastic
1. Probabilistic sequence models
• HMM (Hidden Markov Model) tagging
• MEMMs (Maximum Entropy Markov Models)

18
Rule-Based Tagging
• Start with a dictionary
• Assign all possible tags to words from the
dictionary
• Write rules by hand to selectively remove
tags
• Leaving the correct tag for each word.

19
Start With a Dictionary
• she: PRP
• promised: VBN,VBD
• to TO
• back: VB, JJ, RB, NN
• the: DT
• bill: NN, VB

• Etc… for the ~100,000 words of English with

more than 1 tag

20
Assign Every Possible Tag

NN
RB
VBN JJ VB
PRP VBD TO VB DT NN
She promised to back the bill

21
Write Rules to Eliminate Tags

Eliminate VBN if VBD is an option when

VBN|VBD follows “<start> PRP”
NN
RB
VBN JJ VB
PRP VBD TO VB DT NN
She promised to back the bill

22
Stage 1 of ENGTWOL Tagging
• First Stage: Run words through FST
morphological analyzer to get all parts of speech.
• Example: Pavlov had shown that salivation …

Pavlov PAVLOV N NOM SG PROPER

had HAVE V PAST VFIN SVO
HAVE PCP2 SVO
shown SHOW PCP2 SVOO SVO SV
that ADV
PRON DEM SG
DET CENTRAL DEM SG
CS
salivation N NOM SG

23
Stage 2 of ENGTWOL Tagging
• Second Stage: Apply NEGATIVE constraints.
• Example: Adverbial “that” rule
– Eliminates all readings of “that” except the one in
• “It isn’t that odd”
Given input: “that”
If
(+1 A/ADV/QUANT) ;if next word is adj/adv/quantifier
(+2 SENT-LIM) ;following which is E-O-S
(NOT -1 SVOC/A) ; and the previous word is not a
; verb like “consider” which
; allows adjective complements
; in “I consider that odd”
Then eliminate non-ADV tags
Else eliminate ADV
24
Hidden Markov Model Tagging
• Using an HMM to do POS tagging is a special
case of Bayesian inference
– Foundational work in computational linguistics
– Bledsoe 1959: OCR
– Mosteller and Wallace 1964: authorship
identification
• It is also related to the “noisy channel” model
that’s the basis for ASR, OCR and MT

25
POS Tagging as Sequence Classification
• We are given a sentence (an “observation” or
“sequence of observations”)
– Secretariat is expected to race tomorrow
• What is the best sequence of tags that
corresponds to this sequence of observations?
• Probabilistic view:
– Consider all possible sequences of tags
– Out of this universe of sequences, choose the tag
sequence which is most probable given the
observation sequence of n words w1…wn.

26
Getting to HMMs
• We want, out of all sequences of n tags t1…tn the single tag
sequence such that P(t1…tn|w1…wn) is highest.

• Hat ^ means “our estimate of the best one”

• Argmaxx f(x) means “the x such that f(x) is maximized”

27
Getting to HMMs
• This equation is guaranteed to give us the
best tag sequence

• But how to make it operational? How to

compute this value?
• Intuition of Bayesian classification:
– Use Bayes rule to transform this equation into a
set of other probabilities that are easier to
compute
28
Using Bayes Rule

29
Likelihood and Prior

30
Two Kinds of Probabilities
• Tag transition probabilities p(ti|ti-1)
– Determiners likely to precede adjs and nouns
• That/DT flight/NN
• The/DT yellow/JJ hat/NN
• So we expect P(NN|DT) and P(JJ|DT) to be high
• But P(DT|JJ) to be:
– Compute P(NN|DT) by counting in a labeled corpus:

31
Two Kinds of Probabilities

• Word likelihood probabilities p(wi|ti)

– VBZ (3sg Pres verb) likely to be “is”
– Compute P(is|VBZ) by counting in a labeled
corpus:

32
Example: The Verb “race”

• Secretariat/NNP is/VBZ expected/VBN to/TO

race/VB tomorrow/NR
• People/NNS continue/VB to/TO inquire/VB
the/DT reason/NN for/IN the/DT race/NN for/IN
outer/JJ space/NN
• How do we pick the right tag?

33
Disambiguating “race”

35
Hidden Markov Models
• What we’ve described with these two kinds
of probabilities is a Hidden Markov Model
(HMM)

36
Definitions
• A weighted finite-state automaton adds
probabilities to the arcs
– The sum of the probabilities leaving any arc must sum
to one
• A Markov chain is a special case of a WFST in
which the input sequence uniquely determines
which states the automaton will go through
• Markov chains can’t represent inherently
ambiguous problems
– Useful for assigning probabilities to unambiguous
sequences

37
Markov Chain for Weather

38
Markov Chain for Words

39
Markov Chain: “First-order observable
Markov Model”
• A set of states
– Q = q1, q2…qN; the state at time t is qt
• Transition probabilities:
– a set of probabilities A = a01a02…an1…ann.
– Each aij represents the probability of transitioning from
state i to state j
– The set of these is the transition probability matrix A

• Current state only depends on previous state

P(qi | q1 ...qi 1) P(qi | qi 1 )

40
Markov Chain for Weather
• What is the probability of 4 consecutive
rainy days?
• Sequence is rainy-rainy-rainy-rainy
• I.e., state sequence is 3-3-3-3
• P(3,3,3,3) =
– 1a11a11a11a11 = 0.2 x (0.6)3 = 0.0432

41
HMM for Ice Cream
• You are a climatologist in the year 2799
• Studying global warming
• You can’t find any records of the weather in
Baltimore, MA for summer of 2007
• But you find Jason Eisner’s diary
• Which lists how many ice-creams Jason ate
every date that summer
• Our job: figure out how hot it was

42
Hidden Markov Model
• For Markov chains, the output symbols are the same as
the states.
– See hot weather: we’re in state hot
• But in part-of-speech tagging (and other things)
– The output symbols are words
– But the hidden states are part-of-speech tags
• So we need an extension!
• A Hidden Markov Model is an extension of a Markov
chain in which the input symbols are not the same as the
states.
• This means we don’t know which state we are in.

43
Hidden Markov Models
• States Q = q1, q2…qN;
• Observations O= o1, o2…oN;
– Each observation is a symbol from a vocabulary V =
{v1,v2,…vV}
• Transition probabilities
– Transition probability matrix A = {aij}
aij P(qt  j | qt 1 i) 1 i, j N
• Observation likelihoods
– Output probability matrix B={bi(k)}
 bi (k) P(X t ok | qt i)
 i P(q1 i) 1 i N
• Special initial probability vector  44
Eisner Task
• Given
– Ice Cream Observation Sequence:
1,2,3,2,2,2,3…
• Produce:
– Weather Sequence: H,C,H,H,H,C…

45
HMM for Ice Cream

46
Transition Probabilities

47
Observation Likelihoods

48
Decoding
• Ok, now we have a complete model that can give
us what we need. Recall that we need to get

• We could just enumerate all paths given the input

and use the model to assign probabilities to each.
– Not a good idea.
– Luckily dynamic programming (last seen in Ch. 3 with
minimum edit distance) helps us here

49
The Viterbi Algorithm

50
Viterbi Example

51
Viterbi Summary
• Create an array
– With columns corresponding to inputs
– Rows corresponding to possible states
• Sweep through the array in one pass filling
the columns left to right using our transition
probs and observations probs
• Dynamic programming key is that we need
only store the MAX prob path to each cell,
(not all paths).

52
Evaluation
• So once you have you POS tagger running
how do you evaluate it?
– Overall error rate with respect to a gold-
standard test set.
– Error rates on particular tags
– Error rates on particular words
– Tag confusions...

53
Error Analysis
• Look at a confusion matrix

• See what errors are causing problems

– Noun (NN) vs ProperNoun (NNP) vs Adj (JJ)
– Preterite (VBD) vs Participle (VBN) vs Adjective (JJ)

54
Evaluation
• The result is compared with a manually
coded “Gold Standard”
– Typically accuracy reaches 96-97%
– This may be compared with result for a
baseline tagger (one that uses no context).
• Important: 100% is impossible even for
human annotators.

55
Summary
• Parts of speech
• Tagsets
• Part of speech tagging
• HMM Tagging
– Markov Chains
– Hidden Markov Models

LOTE Spanish 613 PDF
100% (1)
LOTE Spanish 613 PDF
94 pages
Your Total Solution for First Grade Workbook
From Everand
Your Total Solution for First Grade Workbook
Thinking Kids
No ratings yet
Action Verbs
No ratings yet
Action Verbs
4 pages
Word Classes and Part-of-Speech (POS) Tagging: CS4705 Julia Hirschberg
No ratings yet
Word Classes and Part-of-Speech (POS) Tagging: CS4705 Julia Hirschberg
40 pages
Cme4408 p6 Pos Tagging
No ratings yet
Cme4408 p6 Pos Tagging
33 pages
Lecture 04
No ratings yet
Lecture 04
42 pages
Hmm
No ratings yet
Hmm
94 pages
Part-Of-Speech (POS) Tagging
No ratings yet
Part-Of-Speech (POS) Tagging
53 pages
Lecture05-Hmm Pos Tagging
No ratings yet
Lecture05-Hmm Pos Tagging
38 pages
Part-Of-Speech Tagging: A Simple But Useful Form of Linguistic Analysis
No ratings yet
Part-Of-Speech Tagging: A Simple But Useful Form of Linguistic Analysis
18 pages
Part-of-Speech (POS) Tagging
No ratings yet
Part-of-Speech (POS) Tagging
47 pages
lecture7-pos-tagging
No ratings yet
lecture7-pos-tagging
33 pages
10 - POS Tagging
No ratings yet
10 - POS Tagging
75 pages
Pos Tagging
No ratings yet
Pos Tagging
84 pages
Pos Tagging
No ratings yet
Pos Tagging
84 pages
Module 2 HMMppt
No ratings yet
Module 2 HMMppt
31 pages
Lect6 Pos
No ratings yet
Lect6 Pos
62 pages
Session 6 - Part-Of-Speech Tagging, Sequence Labeling
No ratings yet
Session 6 - Part-Of-Speech Tagging, Sequence Labeling
86 pages
Parts of Speech
No ratings yet
Parts of Speech
26 pages
Part of Speech Tagging (Chapter 5) : Adapted From Kathy Mccoy'S Presentation Downloaded From The Web, September 2010
No ratings yet
Part of Speech Tagging (Chapter 5) : Adapted From Kathy Mccoy'S Presentation Downloaded From The Web, September 2010
63 pages
POStagging
No ratings yet
POStagging
72 pages
Part-Of-Speech Tagging: A Simple But Useful Form of Linguistic Analysis Christopher Manning
No ratings yet
Part-Of-Speech Tagging: A Simple But Useful Form of Linguistic Analysis Christopher Manning
14 pages
NLP 4
No ratings yet
NLP 4
83 pages
Lec3-posner intro
No ratings yet
Lec3-posner intro
30 pages
Roark - Lec 2 - HMM Viterbi Forward
No ratings yet
Roark - Lec 2 - HMM Viterbi Forward
37 pages
AI6122 Topic 1.2 - WordLevel
No ratings yet
AI6122 Topic 1.2 - WordLevel
63 pages
SLoSP 2007 1
No ratings yet
SLoSP 2007 1
42 pages
4.Chapter5_ Syntactic and Semantic Representations
No ratings yet
4.Chapter5_ Syntactic and Semantic Representations
47 pages
F15 CS194 Lec 05 Natural Language
No ratings yet
F15 CS194 Lec 05 Natural Language
69 pages
Pos Tagging
No ratings yet
Pos Tagging
128 pages
Lecture 4
No ratings yet
Lecture 4
37 pages
NLP Chapter 3
No ratings yet
NLP Chapter 3
50 pages
NLP04 PartOfSpeechTagging
No ratings yet
NLP04 PartOfSpeechTagging
52 pages
NLP_Week_02
No ratings yet
NLP_Week_02
55 pages
DR Pushpak's Talk IIT Bombay, Ex IIT Patna
No ratings yet
DR Pushpak's Talk IIT Bombay, Ex IIT Patna
136 pages
POS Tagging: Introduction: Heng Ji
No ratings yet
POS Tagging: Introduction: Heng Ji
35 pages
Catching Up
No ratings yet
Catching Up
11 pages
Natural Language Processing PDF
100% (1)
Natural Language Processing PDF
47 pages
Lec-5 POStagging
No ratings yet
Lec-5 POStagging
24 pages
Lecture 6
No ratings yet
Lecture 6
43 pages
Text Preprocessing
No ratings yet
Text Preprocessing
59 pages
3 - Naive Bayes
No ratings yet
3 - Naive Bayes
60 pages
Chapter Four 1
No ratings yet
Chapter Four 1
91 pages
NLP_Week_02
No ratings yet
NLP_Week_02
54 pages
Lec2-3- Classical Syntactic Analysis
No ratings yet
Lec2-3- Classical Syntactic Analysis
58 pages
NLP-Lectures 4,5,6
No ratings yet
NLP-Lectures 4,5,6
85 pages
Lecture-8. Only For This Batch
No ratings yet
Lecture-8. Only For This Batch
46 pages
lecture5-ngrams
No ratings yet
lecture5-ngrams
40 pages
Introduction To Language Modeling Final
No ratings yet
Introduction To Language Modeling Final
69 pages
Unit 2b
No ratings yet
Unit 2b
22 pages
Intro Text Mining
No ratings yet
Intro Text Mining
83 pages
8 POSNER Intro May 6 2021
No ratings yet
8 POSNER Intro May 6 2021
26 pages
Natural Language Processing
No ratings yet
Natural Language Processing
57 pages
Ilak Pos Tagging
No ratings yet
Ilak Pos Tagging
48 pages
Rizvi College of Engineering: DLO8012: Natural Language Processing
No ratings yet
Rizvi College of Engineering: DLO8012: Natural Language Processing
16 pages
Chapter Two Natural Language Processing
No ratings yet
Chapter Two Natural Language Processing
141 pages
Natural Language Processing
No ratings yet
Natural Language Processing
17 pages
Natural Language Processing
No ratings yet
Natural Language Processing
47 pages
05 Introduction To NLP
No ratings yet
05 Introduction To NLP
63 pages
Evaluating Language Models
No ratings yet
Evaluating Language Models
21 pages
NLP
No ratings yet
NLP
17 pages
5 Sequence Learning
No ratings yet
5 Sequence Learning
50 pages
Translation Techniques
No ratings yet
Translation Techniques
4 pages
A. Negation Is An Important Problem in English Because
No ratings yet
A. Negation Is An Important Problem in English Because
10 pages
Meaning and Synonymy in Natural Languages
No ratings yet
Meaning and Synonymy in Natural Languages
15 pages
Unity Vocab Quick Reference
No ratings yet
Unity Vocab Quick Reference
2 pages
Syntax Assignmment - Yoruba
No ratings yet
Syntax Assignmment - Yoruba
2 pages
Inflectional Affix: Shafiera Maharani Putri 121711233002
No ratings yet
Inflectional Affix: Shafiera Maharani Putri 121711233002
47 pages
Adjectives 140524073135 Phpapp02
No ratings yet
Adjectives 140524073135 Phpapp02
42 pages
Noun Clauses With WH Words and If Whether
No ratings yet
Noun Clauses With WH Words and If Whether
6 pages
Lesson 1 - Sentence Subject and Predicate
No ratings yet
Lesson 1 - Sentence Subject and Predicate
25 pages
Superlatives: Short Adectives
No ratings yet
Superlatives: Short Adectives
2 pages
Verbal Nouns
No ratings yet
Verbal Nouns
2 pages
Planificare Calendaristică An Şcolar 2010-2011 Semestrul I
No ratings yet
Planificare Calendaristică An Şcolar 2010-2011 Semestrul I
14 pages
Garcia, Jerome Mathew P - DLP - Degrees of Adjectives
No ratings yet
Garcia, Jerome Mathew P - DLP - Degrees of Adjectives
12 pages
Ernest Sosa (1984) - Mind-Body Interaction and Supervenient Causation. Midwest Studies in Philosophy 9 (1) 271-81.
No ratings yet
Ernest Sosa (1984) - Mind-Body Interaction and Supervenient Causation. Midwest Studies in Philosophy 9 (1) 271-81.
11 pages
Adverbial Sub-clauses (1)
No ratings yet
Adverbial Sub-clauses (1)
61 pages
Verb To Be: Affirmative / Short Form Negative / Short Form Interrogative
33% (3)
Verb To Be: Affirmative / Short Form Negative / Short Form Interrogative
1 page
To Be - Affirmative
No ratings yet
To Be - Affirmative
11 pages
ملخص لجميع دروس الباك مع الحلول 3
No ratings yet
ملخص لجميع دروس الباك مع الحلول 3
63 pages
Verb To Be
No ratings yet
Verb To Be
11 pages
Midterm Test 2
No ratings yet
Midterm Test 2
4 pages
English 8 Q3 W4
No ratings yet
English 8 Q3 W4
3 pages
Group 5 - Characteristics of Phrase Structure Rules
No ratings yet
Group 5 - Characteristics of Phrase Structure Rules
25 pages
Quantifiers: We Have Many Friends
No ratings yet
Quantifiers: We Have Many Friends
3 pages
Carte Gramatica Limbii Engleze in Scheme, CataragaA
No ratings yet
Carte Gramatica Limbii Engleze in Scheme, CataragaA
70 pages
كراسة خط في اللغة الانجليزية للصف السابع الفصل الثاني
No ratings yet
كراسة خط في اللغة الانجليزية للصف السابع الفصل الثاني
47 pages
Fce Guide
100% (1)
Fce Guide
27 pages
Grammar - Unit 5 - Simple and Compound Sentences
No ratings yet
Grammar - Unit 5 - Simple and Compound Sentences
4 pages
Voegwoorde werkkaart
No ratings yet
Voegwoorde werkkaart
2 pages

Lecture 5

Uploaded by

Lecture 5

Uploaded by

Parts of Speech Tagging

• Etc… for the ~100,000 words of English with

Eliminate VBN if VBD is an option when

Pavlov PAVLOV N NOM SG PROPER

• Hat ^ means “our estimate of the best one”

• But how to make it operational? How to

• Word likelihood probabilities p(wi|ti)

• Secretariat/NNP is/VBZ expected/VBN to/TO

• Current state only depends on previous state

• We could just enumerate all paths given the input

• See what errors are causing problems

You might also like