0% found this document useful (0 votes)

89 views

NLP-Lectures 4,5,6

POS tagging algorithms aim to assign part-of-speech tags to words in sequences. There are three main types of POS tagging algorithms: rule-based tagging uses hand-written rules but has high precision; statistical tagging uses supervised learning on tagged corpora; and hybrid tagging combines approaches. Statistical tagging uses n-grams and hidden Markov models to probabilistically determine the tag sequence with the highest likelihood.

Uploaded by

أحمد عزمي

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

89 views

NLP-Lectures 4,5,6

Uploaded by

أحمد عزمي

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 85

POS TAGGING ALGORITHMS

Lecture 4
POS TAGGING

• Words can have multiple parts of speech.

• The process of assigning a part-of-speech to each word in a sequence.
• Words can belong to many parts-of-speech. For Example→ back:
✓The back/JJ door (adjective)
✓On its back/NN (noun)
✓Win the voters back/RB (adverb)
✓Promise to back/VB you in a fight (verb)
• We want to decide the appropriate tag given a particular sequence of tokens.
POS TAGGING ALGORITHMS

• Rule-based tagging use limited number of hand-written static rules to distinguish tag
ambiguity which causes high development cost but high precision also.
• Statistical/Stochastic tagging: need supervised learning with tagged corpora and
statistical inference. It is language independent and provides acceptable precision.
✓ HMM tagging is probabilistic method that choose the tag sequence which
maximizes the product of word likelihood and tag sequence probability.
• Hybrid-based tagging:
✓ Maximum Entropy tagging: Combination of several knowledge sources.
✓ Transformation based tagging: Based on rules automatically acquired.
✓ Decision tree tagging.
RULE-BASED TAGGING

• First stage − it uses a dictionary to assign each word a list of potential parts-of-speech.
• Second stage − it uses large lists of hand-written disambiguation rules to sort down
the list to a single part-of-speech for each word.
• Properties of Rule-Based POS Tagging:
✓ These taggers are knowledge-driven taggers.
✓ The rules in Rule-based POS tagging are built manually.
✓ The information is coded in the form of rules.
✓ We have some limited number of rules approximately around 1000.
✓ Causes high development cost but high precision also
RULE-BASED TAGGING

• Start with a dictionary

• Assign all possible tags to words from the dictionary.
• Write rules (‘by hand’) to selectively remove tags
RULE-BASED TAGGING EXAMPLE

×
Rule: Eliminate VBN (past participle) if VBD (past tense) is an option when
(VBN or VBD) follows “<s> PRP (personal pronoun)”
These kinds of rules become unwieldy and force determinism where there may not be any.
STOCHASTIC/ STATISTIC POS TAGGING

• The model that includes frequency or probability (statistics) can be called stochastic.
• Word Frequency Approach
✓It disambiguate the words based on the probability that a word occurs
with a particular tag.
✓The tag encountered most frequently with the word in the training set
is the one assigned to an ambiguous instance of that word.
✓The main issue with this approach is that it may yield inadmissible
sequence of tags.
STOCHASTIC/ STATISTIC POS TAGGING

• Tag Sequence Probabilities

✓The tagger calculates the probability of a given sequence of tags
occurring.
✓It is also called n-gram approach. It is called so because the best tag for
a given word is determined by the probability at which it occurs with
the n previous tags.
PROPERTIES OF STOCHASTIC POS TAGGING

• This POS tagging is based on the probability of tag occurring.

• It requires training corpus
• There would be no probability for the words that do not exist in the corpus.
• It uses different testing corpus (other than training corpus).
• It is the simplest POS tagging because it chooses most frequent tags associated with a
word in training corpus.
POS TAGGING AND LANGUAGE MODEL
Can We Use Statistics Instead?

• Bayes’ Rule

𝑃 (𝑋, 𝑌) = 𝑃 (𝑋) 𝑃(𝑌|𝑋)= 𝑃 (𝑌) 𝑃(𝑋|𝑌)

Can We Use Statistics Instead?

• Bayes’ Rule
Can We Use Statistics Instead?

• Bayes’ Rule

𝑃 (𝑋, 𝑌) = 𝑃 (𝑋) 𝑃(𝑌|𝑋)= 𝑃 (𝑌) 𝑃(𝑋|𝑌)

CHAIN RULE

• The chain rule: (This extends Bayes rule to longer sequences)

OR
LANGUAGE MODEL

• Language model: The statistical model of a language.

(e.g., probabilities of words in an ordered sequence).
WHAT DO WE DO WITH A LANGUAGE MODEL

• Language model can do word prediction (Guess the next word…)

• Given a sequence of letters, what is the likelihood of the next letter?
• Language models can score and sort sentences.

• Language model can do POS tagging.

Promise to back the ball
P( V, To,V, DT,N) >> P( V, To, N, DT,N)
CHAIN RULE

• To assign probabilities to entire sequences.

• To estimate the probability of the last word of an n-gram given the previous
words.
CHAIN RULE

• The probability that the next word is food after I like Chinese
PROBLEM WITH CHAIN RULE

• The longer the sequence, the less likely we are to find it in a training
corpus
THANKS
N-GRAM
Lecture 5
PROBLEM WITH CHAIN RULE

• Assume statistical independence

MARKOV ASSUMPTION (SOLUTION)

• The probability of the next word depends only on the previous k words.
• N-gram is the simplest model that assigns probabilities to sentences and sequences of
words.

• N-Gram probabilities come from a training corpus.

• The larger the n, the more the number of parameters to estimate
N-GRAM

• N-gram is the simplest model that assigns probabilities to sentences and

sequences of words.
• An n-gram is a sequence of N words: “please turn your homework”
✓ A unigram is a single word like “please”, “turn”, “your”, “homework”
✓ A bigram is a two-word sequence of words like
“please turn”, “turn your”, or ”your homework”
✓ A trigram is a three-word sequence of words like
“please turn your”, or “turn your homework”.
N-gram (Start Symbols)

Bigram:
N-gram (Start Symbols)
N-gram (End Symbol)
EXAMPLE
EXAMPLE
EXAMPLE

• Let’s compute simple N-gram models of speech queries about restaurants.

• Unigram
EXAMPLE – BIGRAM COUNT
EXAMPLE- BIGRAM PROBABILITIES

• Obtain likelihoods by dividing bigram counts by unigram counts.

EXAMPLE- BIGRAM PROBABILITIES
Bigram To Estimate The Probability Of Whole
Sentences.
• We need to use the start (<s>) and end (</s>) tags here.
N-GRAM AND HIDDEN MARKOV
MODELS
PART-OF-SPEECH-TAGGING USING HIDDEN
MARKOV MODEL
• A special case of Bayesian inference.
• HMM taggers make two simplifying assumptions.
✓The first assumption is that the probability of a word appearing is dependent
only on its own part-of-speech tag; that it is independent of other words
around and other tags around.
✓The second assumption is that the probability of a tag appearing is dependent
only on the previous tag, the bigram assumption.
• What is the best sequence of tags which corresponds to the sequence of words?
PART-OF-SPEECH-TAGGING USING HIDDEN
MARKOV MODEL

First Assumption Second Assumption

PART-OF-SPEECH-TAGGING USING HIDDEN
MARKOV MODEL
PART-OF-SPEECH-TAGGING USING HIDDEN
MARKOV MODEL
• Example: determiners are very likely to precede adjectives and nouns, as in sequences
like:
that/DT flight/NN and the/DT yellow/JJ hat/NN.
• The probabilities P(NN|DT) and P(JJ|DT) to be high. But in English, adjectives don’t tend
to precede determiners, so the probability P(DT|JJ) ought to be low.
PART-OF-SPEECH-TAGGING USING HMM

• Likelihood 𝑷(𝒘𝒊 |𝒕𝒊 ), represent the probability, given that we see a given tag,
that it will be associated with a given word.
• For example if we were to see the tag VBZ (third person singular present verb) and
guess the verb that is likely to have that tag, we might likely guess the verb is, since the
verb to be is so common in English.
• A word likelihood probability like P(is|VBZ) again by counting, out of the times we see
VBZ in a corpus, how many of those times the VBZ is labeling the word is.
EXAMPLE

race is a verb (VB) or race is a common noun (NN)

THANKS
CORPUS
Lecture 6 (part 2)
CORPUS

• A corpus is a large and structured set of machine-readable texts that have been
produced in a natural communicative setting.
• Its plural is corpora.
• Language is infinite but a corpus must be finite in size.
• Main Elements in designing a corpus:
✓Corpus Representativeness
✓Corpus Size
Corpus Representativeness

• “A corpus is thought to be representative of the language variety it is supposed

to represent if the findings based on its contents can be generalized to the said
language variety”.
• “Representativeness refers to the extent to which a sample includes the full
range of variability in a population”.
• Representativeness of a corpus are determined by the following two factors:
✓Balance: The range of genre include in a corpus
✓Sampling: How the chunks for each genre are selected.
Corpus Representativeness- Balance

• Corpus balance – the range of genre included in a corpus.

• A balanced corpus covers a wide range of text categories, which are supposed to be
representatives of the language.
• There is no reliable scientific measure for balance, we can say that the accepted balance
is determined by its intended uses only.
Corpus Representativeness- Sampling

• Corpus representativeness and balance is very closely associated with

sampling.
• The kinds of texts included, the number of texts, the selection of
particular texts, the selection of text samples from within texts, and the
length of text samples.
• Each of these involves a sampling decision, either conscious or not.
CORPUS SIZE

• How large the corpus should be? There is no specific answer to this question.
• The size of the corpus depends upon the purpose as well as on some practical
considerations as follows:
✓Kind of query anticipated from the user.
✓The methodology used by the users to study the data.
✓Availability of the source of data.
• With the advancement in technology, the corpus size also increases.
EXAMPLES OF CORPUS
TREE-BANK CORPUS

• A linguistically parsed text corpus that annotates syntactic or semantic sentence

structure.
• Term ‘treebank’, which represents that the most common way of representing
the grammatical analysis is by means of a tree structure.
• Treebanks are created on the top of a corpus, which has already been annotated
with part-of-speech tags.
TYPES OF TREE-BANK CORPUS

• Semantic Treebanks
✓These Treebanks use a formal representation (if-then) of sentence’s semantic
structure.
✓They vary in the depth of their semantic/meaning representation.
✓Examples:
o Robot Commands Treebank,
o Geoquery,
o Groningen Meaning Bank,
o RoboCup Corpus.
TYPES OF TREE-BANK CORPUS

• Syntactic Treebanks
✓ Opposite to the semantic Treebanks.
✓ Parsed syntactic tree –dependency grammar
✓ For example,
o Penn Arabic Treebank, Columbia Arabic Treebank are syntactic Treebanks created in
Arabia language.
o Sininca syntactic Treebank created in Chinese language.
o Lucy, Susane and BLLIP WSJ syntactic corpus created in English language.
o Penn treebank in English (a shallow semantic)
Applications of Treebank Corpus

• In Computational Linguistics
✓part-of-speech taggers, parsers, semantic analyzers and machine translation
systems.
• In Corpus Linguistics
✓ study syntactic phenomena.
• In Theoretical Linguistics and Psycholinguistics
✓Interaction evidence.
PROPBANK CORPUS

• PropBank more specifically called “Proposition Bank” is a corpus,

• It is annotated with predicate argument relations (info about basic semantic).
• The corpus is a verb-oriented resource; the annotations here are more closely related to
the syntactic level.
• In Natural Language Processing (NLP), the PropBank project has played a very significant
role. It helps in semantic role labeling.
• semantic role labeling: assugn labels to words or phrases according to its semantic
role(agent, goal, result).
VERBNET (VN)

• VerbNet (VN) is the hierarchical domain-independent and largest lexical

resource present in English.
• It incorporates both semantic as well as syntactic information about its contents.
• VN is a broad-coverage verb lexicon having mappings to other lexical resources
such as WordNet, Xtag and FrameNet.
• It is organized into verb classes.
VERBNET (VN)

• Each VerbNet (VN) class contains:

✓A set of syntactic descriptions or syntactic frames
o Such as transitive, intransitive, prepositional phrases, ..etc.
✓A set of semantic descriptions
o Such as human, organization
WORDNET

• It is a lexical database for English language.

• It is the part of the NLTK corpus.
• Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms
called Synsets.
• All the synsets are linked with the help of conceptual-semantic and lexical
relations.
• WordNet is used for various purposes like word-sense disambiguation,
information retrieval, automatic text classification and machine translation,
finding similarity.
OTHER EXAMPLES

• Switchboard corpus
120 hours ≈ 2.4M tokens
2.4K spoken telephone conversations between US English speakers.
• Brown corpus
1M tokens, 61,805 types. Balanced collection of genres in US English from
1961.
THANKS
N-GRAM EVALUATION
Lecture 6
N-GRAM (LOG Probability)
N-GRAM (LOG Probability)

• Example: Given the log of these conditional probabilities:

log(P(Mary|<s>))=-4 log(P(</s>|cats))=-1 log(P(likes|Mary)) =-7 log(P(cats|likes))=-100
• Approximate the log probability of the following sentence with bigrams : “<s> Mary likes
cats </s>”
• Solution:
log(P(<s> Mary likes cats </s>)) = (-4)+(-7)+(-100)+(-1)= -112
EVALUATING A LANGUAGE MODEL

• How can we quantify the goodness of a model?

• How do we know whether one model is better than another?
• N-gram language models are evaluated by separating the corpus into a training set and a
test set, training the model on the training set, and evaluating on the test set.

• There are 2 general ways of evaluating LMs:

✓Extrinsic: in terms of some external measure (this depends on some task
or application).
✓Intrinsic: in terms of properties of the LM itself.
EXTRINSIC EVALUATION

• The utility of a language model is often determined in practice.

• Example:
1. Alternately embed LMs A and B into a speech recognizer.
2. Run speech recognition using each model.
3. Compare recognition rates between the system that uses LM A And the
system that uses LM B.
INTRINSIC EVALUATION

• An intrinsic evaluation metric is one which measures the quality of a model independent
of any application.
• Perplexity is the most common intrinsic evaluation metric for N-gram language models.
• The perplexity (PP) of a language model on a test set is the inverse probability of the test
set, normalized by the number of words.
• The higher the conditional probability of the word sequence, the lower the
perplexity.
• Minimizing perplexity is equivalent to maximizing the test set probability according
to the language model.
• Perplexity is related inversely to the likelihood of the test sequence according to the
model.
For a test set W = w1w2 . . .wN,
The perplexity is the inverse probability of
the test set, normalized by the number of
words.

We can use the chain rule to expand the

probability of W.

if we are computing the perplexity of W with

a bigram language model.

Lower perplexity → a better model

PERPLEXITY
PROBLEMS IN N-GRAM
PROBLEMS IN N-GRAM

• Sparse data caused by the fact that our maximum likelihood estimate was
based on a particular set of training data.
• But because any corpus is limited, some perfectly acceptable English word
sequences are missing. (Zero probability N-gram).
• A few words occur very frequently.
• Many words occur very infrequently.
• If we have no way to determine the distribution of unseen N-grams, how can we
estimate them?
SMOOTHING

• Assign some non-zero probability to any N-gram, even one that was never
observed in training.
• Smoothing addresses the poor estimates that are due to variability in
small data sets.
• Make the distribution more uniform.
SMOOTHING

• Smoothing algorithms provide a better way of estimating the probability of N-grams.

✓ Laplace Smoothing (Add-one smoothing)

Types :the number of distinct words in a corpus or vocabulary size V

SMOOTHING

• Smoothing algorithms provide a better way of estimating the probability of N-grams

than Maximum Likelihood Estimation.
✓ Laplace Smoothing (Add-one smoothing)

Does this give a proper probability distribution? Yes

Laplace smoothed bigram count
Laplace smoothed probabilities (v=1446)
ADD SMOOTHING (For Larger Corpora)
GOOD-TURING

• Define 𝑁𝑐 as the number of N-grams that occur c times.

• Idea: get rid of zeros by re-estimating c using the MLE estimate of words
that occur c + 1 times.
• Example:

• Zipf:
✓Unseen words should behave more like hapax legomena.
✓ Words that occur a lot should behave like other words that occur a lot.
GOOD-TURING ADJUSTMENTS
GOOD-TURING LIMITATIONS
SMOOTHING

• Smoothing algorithms:

✓ Katz smoothing
✓Simple interpolation (Jelinek-Mercer)
✓Absolute discounting
✓ Kneser-Ney smoothing
• Commonly used N-gram smoothing algorithms rely on lower-order
N-gram counts via backoff or interpolation.
BACKOFF
Interpolation
SMOOTHING

• Interpolation involves combining higher- and lower-order models.

• Interpolation always mix the probability estimates from all the N-gram
estimators, i.e., we do a weighted interpolation of trigram, bigram, and
unigram counts.
• In a Katz backoff N-gram model, if the N-gram we need has zero
counts, we approximate it by backing off to the (N-1)-gram. We continue
backing off until we reach a history that has some counts.
• Only “back off ” to a lower order N-gram if we have zero evidence for a
higher-order N-gram.
SMOOTHING

• Jelinek-Mercer performs better on small training sets; Katz performs

better on large training sets.
• Katz smoothing performs well on N-grams with large counts; Kneser-Ney
is best for small counts.
• Interpolated models are superior to backoff models for low (nonzero)
counts.
THANKS

Perspectives Preintermediate Teachers Pack
50% (2)
Perspectives Preintermediate Teachers Pack
288 pages
NLP 4
No ratings yet
NLP 4
83 pages
Chapter Four 1
No ratings yet
Chapter Four 1
91 pages
Part-of-Speech (POS) Tagging
No ratings yet
Part-of-Speech (POS) Tagging
47 pages
Word Classes and Part-of-Speech (POS) Tagging: CS4705 Julia Hirschberg
No ratings yet
Word Classes and Part-of-Speech (POS) Tagging: CS4705 Julia Hirschberg
40 pages
Lect6 Pos
No ratings yet
Lect6 Pos
62 pages
Lecture 4
No ratings yet
Lecture 4
37 pages
Introduction To Language Modeling Final
No ratings yet
Introduction To Language Modeling Final
69 pages
Unit 1 NLP KCS072
No ratings yet
Unit 1 NLP KCS072
12 pages
Parts of Speech
No ratings yet
Parts of Speech
26 pages
Lecture 5
No ratings yet
Lecture 5
56 pages
NLP CH 2
No ratings yet
NLP CH 2
59 pages
NLP - Pos and N-Gram Models
No ratings yet
NLP - Pos and N-Gram Models
21 pages
Hmm
No ratings yet
Hmm
94 pages
Part-Of-Speech (POS) Tagging
No ratings yet
Part-Of-Speech (POS) Tagging
53 pages
Assignment 3
No ratings yet
Assignment 3
12 pages
nlp-unit-iii-notes
No ratings yet
nlp-unit-iii-notes
30 pages
Cme4408 p6 Pos Tagging
No ratings yet
Cme4408 p6 Pos Tagging
33 pages
Lecture 16-17-18-19
No ratings yet
Lecture 16-17-18-19
42 pages
Analysis of Statistical Parsing in Natural Language Processing
No ratings yet
Analysis of Statistical Parsing in Natural Language Processing
6 pages
Lecture 4
No ratings yet
Lecture 4
87 pages
Part of Speech Tagging (Chapter 5) : Adapted From Kathy Mccoy'S Presentation Downloaded From The Web, September 2010
No ratings yet
Part of Speech Tagging (Chapter 5) : Adapted From Kathy Mccoy'S Presentation Downloaded From The Web, September 2010
63 pages
Pos Tagging
No ratings yet
Pos Tagging
84 pages
NLP Unit 5
No ratings yet
NLP Unit 5
10 pages
CH 6. Applications of AI-NLP
No ratings yet
CH 6. Applications of AI-NLP
65 pages
POS Tagging: Introduction: Heng Ji
No ratings yet
POS Tagging: Introduction: Heng Ji
35 pages
Lecture Notes On Syntactic Processing
No ratings yet
Lecture Notes On Syntactic Processing
14 pages
unit-3
No ratings yet
unit-3
50 pages
N-Grams and Corpus Linguistics: Julia Hirschberg
No ratings yet
N-Grams and Corpus Linguistics: Julia Hirschberg
47 pages
Multi-Tagging For Transition-Based Dependency Parsing
No ratings yet
Multi-Tagging For Transition-Based Dependency Parsing
10 pages
723
No ratings yet
723
5 pages
Chapter 6-NLP
No ratings yet
Chapter 6-NLP
8 pages
lecture7-pos-tagging
No ratings yet
lecture7-pos-tagging
33 pages
Session 6 - Part-Of-Speech Tagging, Sequence Labeling
No ratings yet
Session 6 - Part-Of-Speech Tagging, Sequence Labeling
86 pages
NLP Unit-4
No ratings yet
NLP Unit-4
48 pages
N-Gram in NLP
No ratings yet
N-Gram in NLP
15 pages
POStagging
No ratings yet
POStagging
72 pages
Pos Tagging
No ratings yet
Pos Tagging
84 pages
10 - POS Tagging
No ratings yet
10 - POS Tagging
75 pages
Unit-3 (NLP)
No ratings yet
Unit-3 (NLP)
28 pages
Speech Recognition Architecture
No ratings yet
Speech Recognition Architecture
13 pages
Lec-3 Language Modeling N-Grams
No ratings yet
Lec-3 Language Modeling N-Grams
41 pages
Lec-5 POStagging
No ratings yet
Lec-5 POStagging
24 pages
lecture5-ngrams
No ratings yet
lecture5-ngrams
40 pages
Introduction Machine Learning & NLP: 17B1NCI731 (Credits:3, Contact Hours: 3)
No ratings yet
Introduction Machine Learning & NLP: 17B1NCI731 (Credits:3, Contact Hours: 3)
93 pages
NLP Week 2 Rationalist and Empiricist Paradigms in Natural Language Processing
No ratings yet
NLP Week 2 Rationalist and Empiricist Paradigms in Natural Language Processing
28 pages
3 cs626 Pos Tagging Week of 8aug22
No ratings yet
3 cs626 Pos Tagging Week of 8aug22
27 pages
Unit 5 NLP
No ratings yet
Unit 5 NLP
24 pages
Roark - Lec 2 - HMM Viterbi Forward
No ratings yet
Roark - Lec 2 - HMM Viterbi Forward
37 pages
topics
No ratings yet
topics
85 pages
NLP_Unit 3
No ratings yet
NLP_Unit 3
11 pages
Lecture 03
No ratings yet
Lecture 03
41 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
25 pages
3 Natural Language Processing-PoS Tagging
No ratings yet
3 Natural Language Processing-PoS Tagging
14 pages
Notes of NLP - Unit-2
No ratings yet
Notes of NLP - Unit-2
23 pages
CH2
No ratings yet
CH2
119 pages
CME4408 P5 N-grams Smooting
No ratings yet
CME4408 P5 N-grams Smooting
43 pages
Sepe A POS Tagger For Spanish
No ratings yet
Sepe A POS Tagger For Spanish
10 pages
Part of Speech Tagging
No ratings yet
Part of Speech Tagging
13 pages
Parts of Speech Tagging - Rule-Based
No ratings yet
Parts of Speech Tagging - Rule-Based
7 pages
The Future of Search
From Everand
The Future of Search
Andres J. Clary
No ratings yet
English A
No ratings yet
English A
19 pages
Direc and Indirect Speech
No ratings yet
Direc and Indirect Speech
4 pages
A2 Summer Examination Cards 2
No ratings yet
A2 Summer Examination Cards 2
15 pages
Research Project On Passive Voice (Revised)
No ratings yet
Research Project On Passive Voice (Revised)
83 pages
To Be 1 - 3
100% (2)
To Be 1 - 3
3 pages
0 - de Thi Syntax - VB2 - 2021
100% (1)
0 - de Thi Syntax - VB2 - 2021
2 pages
Modal Verb
No ratings yet
Modal Verb
4 pages
Issue of The Teaching of English Tenses and Aspects
No ratings yet
Issue of The Teaching of English Tenses and Aspects
12 pages
A Lesson On Tenses
No ratings yet
A Lesson On Tenses
59 pages
passive-voice-personal-impersonal-structure_82573
No ratings yet
passive-voice-personal-impersonal-structure_82573
2 pages
Preview Hindimasterjee - Basic Grammar Handbook
No ratings yet
Preview Hindimasterjee - Basic Grammar Handbook
32 pages
(Đề) LU6 - Staying Safe
No ratings yet
(Đề) LU6 - Staying Safe
5 pages
Pertemuan 1 Bipa
No ratings yet
Pertemuan 1 Bipa
5 pages
Unit 5 Test
No ratings yet
Unit 5 Test
2 pages
B2 First
No ratings yet
B2 First
17 pages
Unit 3 - Word Formation
No ratings yet
Unit 3 - Word Formation
4 pages
FOCGB5 AK Utest 5B
No ratings yet
FOCGB5 AK Utest 5B
2 pages
Copyediting (LECTURE 2)
100% (1)
Copyediting (LECTURE 2)
28 pages
[IELTS 4-5] UNIT 12
No ratings yet
[IELTS 4-5] UNIT 12
19 pages
Gerunds Used As Subjects: Grammar Worksheet
No ratings yet
Gerunds Used As Subjects: Grammar Worksheet
1 page
Adverbial Clause
No ratings yet
Adverbial Clause
2 pages
Hebrew Grammar
100% (2)
Hebrew Grammar
12 pages
Elements of Voice Overview
No ratings yet
Elements of Voice Overview
7 pages
Batch 1
No ratings yet
Batch 1
11 pages
Eng I 2019
100% (1)
Eng I 2019
16 pages
TOEFL Preparation Handout by Adit Contents
No ratings yet
TOEFL Preparation Handout by Adit Contents
50 pages
June Test Centers
No ratings yet
June Test Centers
1 page
Verb Patterns
No ratings yet
Verb Patterns
1 page
045 - Khozin Arifin Noor - Reg B - Pert 7
No ratings yet
045 - Khozin Arifin Noor - Reg B - Pert 7
4 pages

NLP-Lectures 4,5,6

Uploaded by

NLP-Lectures 4,5,6

Uploaded by

POS TAGGING ALGORITHMS

• Words can have multiple parts of speech.

• Start with a dictionary

• Tag Sequence Probabilities

• This POS tagging is based on the probability of tag occurring.

𝑃 (𝑋, 𝑌) = 𝑃 (𝑋) 𝑃(𝑌|𝑋)= 𝑃 (𝑌) 𝑃(𝑋|𝑌)

𝑃 (𝑋, 𝑌) = 𝑃 (𝑋) 𝑃(𝑌|𝑋)= 𝑃 (𝑌) 𝑃(𝑋|𝑌)

• The chain rule: (This extends Bayes rule to longer sequences)

• Language model: The statistical model of a language.

• Language model can do word prediction (Guess the next word…)

• Language model can do POS tagging.

• To assign probabilities to entire sequences.

• Assume statistical independence

• N-Gram probabilities come from a training corpus.

• N-gram is the simplest model that assigns probabilities to sentences and

• Let’s compute simple N-gram models of speech queries about restaurants.

• Obtain likelihoods by dividing bigram counts by unigram counts.

First Assumption Second Assumption

race is a verb (VB) or race is a common noun (NN)

• “A corpus is thought to be representative of the language variety it is supposed

• Corpus balance – the range of genre included in a corpus.

• Corpus representativeness and balance is very closely associated with

• A linguistically parsed text corpus that annotates syntactic or semantic sentence

• PropBank more specifically called “Proposition Bank” is a corpus,

• VerbNet (VN) is the hierarchical domain-independent and largest lexical

• Each VerbNet (VN) class contains:

• It is a lexical database for English language.

• Example: Given the log of these conditional probabilities:

• How can we quantify the goodness of a model?

• There are 2 general ways of evaluating LMs:

• The utility of a language model is often determined in practice.

We can use the chain rule to expand the

if we are computing the perplexity of W with

Lower perplexity → a better model

• Smoothing algorithms provide a better way of estimating the probability of N-grams.

✓ Laplace Smoothing (Add-one smoothing)

Types :the number of distinct words in a corpus or vocabulary size V

• Smoothing algorithms provide a better way of estimating the probability of N-grams

Does this give a proper probability distribution? Yes

• Define 𝑁𝑐 as the number of N-grams that occur c times.

• Interpolation involves combining higher- and lower-order models.

• Jelinek-Mercer performs better on small training sets; Katz performs

You might also like