0% found this document useful (0 votes)

13 views

13 Ngramlm

Uploaded by

shreyanshdwivedi704

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

13 Ngramlm

Uploaded by

shreyanshdwivedi704

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

N-Gram Language Modeling

Mausam
(Based on slides of Michael Collins, Dan Jurafsky, Dan Klein,
Chris Manning, Luke Zettlemoyer)
Outline
• Motivation
• Task Definition
• N-Gram Probability Estimation
• Evaluation

2
The Language Modeling Problem
 Setup: Assume a (finite) vocabulary of words

 We can construct an (infinite) set of strings

 Data: given a training set of example sentences

 Problem: estimate a probability distribution
The Noisy-Channel Model
• We want to predict a sentence given acoustics:

• The noisy channel approach:

Acoustic model: Distributions Language model:

over acoustic waves given a Distributions over sequences
sentence of words (sentences)
Acoustically Scored Hypotheses

the station signs are in deep in english -14732

the stations signs are in deep in english -14735
the station signs are in deep into english -14739
the station 's signs are in deep in english -14740
the station signs are in deep in the english -14741
the station signs are indeed in english -14757
the station 's signs are indeed in english -14760
the station signs are indians in english -14790
the station signs are indian in english -14799
the stations signs are indians in english -14807
the stations signs are indians and english -14815
ASR System Components
Language Model Acoustic Model

source channel
w a
P(w) P(a|w)

best observed
decoder
w a

argmax P(w|a) = argmax P(a|w)P(w)

w w
MT System Components
Language Model Translation Model

source channel
e f
P(e) P(f|e)

best observed
decoder
e f

argmax P(e|f) = argmax P(f|e)P(e)

e e
Probabilistic Language Models: Other Applications
• Why assign a probability to a sentence?
• Machine Translation:
• P(high winds tonite) > P(large winds tonite)
• Speech Recognition
• P(I saw a van) >> P(eyes awe of an)
• Spell Correction
• The office is about fifteen minuets from my house
• P(about fifteen minutes from) > P(about fifteen minuets from)
• + Summarization, question-answering, etc., etc.!!
Outline
• Motivation
• Task Definition
• N-Gram Probability Estimation
• Evaluation

11
Probabilistic Language Modeling
• Goal: compute the probability of a sentence or
sequence of words:
P(W) = P(w1,w2,w3,w4,w5…wn)

• Related task: probability of an upcoming word:

P(w5|w1,w2,w3,w4)
• A model that computes either of these:
P(W) or P(wn|w1,w2…wn-1) is called a language model.
How to compute P(W)
• How to compute this joint probability:

• P(its, water, is, so, transparent, that)

P(“its water is so transparent”) =

P(its) × P(water|its) × P(is|its water)
× P(so|its water is) × P(transparent|its water is so)
How to estimate these probabilities
• Could we just count and divide?

P(the | its water is so transparent that) =

Count(its water is so transparent that the)
Count(its water is so transparent that)

• No! Too many possible sentences!

• We’ll never see enough data for estimating these
Markov Assumption

• Simplifying assumption:
Andrei Markov

P(the | its water is so transparent that) » P(the | that)

• Or maybe
P(the | its water is so transparent that) » P(the | transparent that)
Markov Assumption

… wn )   P ( wi | wi  k …
P( w1w2   wi 1 )
i

• In other words, we approximate each

component in the product
… w )  P( w | w
P( wi | w1w2  … w )

i 1 i i k i 1
Simplest Case: Unigram Models
• Simplest case: unigrams
… wn )   P ( wi )
P( w1w2 
i
• Generative process: pick a word, pick a word, … until you pick </s>
• Graphical model:
w1 w2 …………. wn-1 </s>
• Examples:
• fifth, an, of, futures, the, an, incorporated, a, a, the,
inflation, most, dollars, quarter, in, is, mass
• thrift, did, eighty, said, hard, 'm, july, bullish
• that, or, limited, the

• Big problem with unigrams: P(the the the the) >> P(I like ice cream)!
Bigram Models
• Conditioned on previous single word

… wi 1 )  P ( wi | wi 1 )
P( wi | w1w2 
• Generative process: pick <s>, pick a word conditioned on previous one,
repeat until to pick </s>

• Graphical model: <s> w1 w2 wn-1 </s>

• Examples:
• texaco, rose, one, in, this, issue, is, pursuing, growth, in, a,
boiler, house, said, mr., gurria, mexico, 's, motion, control,
proposal, without, permission, from, five, hundred, fifty, five,
yen
• outside, new, car, parking, lot, of, the, agreement, reached
• this, would, be, a, record, november
N-Gram Models
• We can extend to trigrams, 4-grams, 5-grams
• N-gram models are (weighted) regular languages
• Many linguistic arguments that language isn’t regular.
• Long-distance effects: “The computer which I had just put into the
machine room on the fifth floor ___.”
• Recursive structure
• We often get away with n-gram models

• PCFG LM (later):
• [This, quarter, ‘s, surprisingly, independent, attack, paid, off,
the, risk, involving, IRS, leaders, and, transportation, prices, .]
• [It, could, be, announced, sometime, .]
• [Mr., Toseland, believes, the, average, defense, economy, is,
drafted, from, slightly, more, than, 12, stocks, .]
Outline
• Motivation
• Task Definition
• N-Gram Probability Estimation
• Evaluation

42
Evaluation: How good is our model?
• Does our language model prefer good sentences to bad ones?
• Assign higher probability to “real” or “frequently observed” sentences
• Than “ungrammatical” or “rarely observed” sentences?
• We train parameters of our model on a training set.
• We test the model’s performance on data we haven’t seen.
• A test set is an unseen dataset that is different from our training set,
totally unused.
• An evaluation metric tells us how well our model does on the test set.
Extrinsic evaluation of N-gram models
• Best evaluation for comparing models A and B
• Put each model in a task
• spelling corrector, speech recognizer, MT system
• Run the task, get an accuracy for A and for B
• How many misspelled words corrected properly
• How many words translated correctly
• Compare accuracy for A and B
Difficulty of extrinsic (in-vivo) evaluation of N-
gram models
• Extrinsic evaluation
• Time-consuming; requires building applications, new data
• So
• Sometimes use intrinsic evaluation: perplexity
• Bad approximation
• unless the test data looks just like the training data
• So generally only useful in pilot experiments
• But is helpful to think about.
Intuition of Perplexity
mushrooms 0.1
• The Shannon Game:
• How well can we predict the next word? pepperoni 0.1
anchovies 0.01
I always order pizza with cheese and ____
….
The 33rd President of the US was ____
fried rice 0.0001
I saw a ____ ….
• Unigrams are terrible at this game. (Why?) and 1e-100
• A better model of a text
• is one which assigns a higher probability to the word that actually occurs
Perplexity
The best language model is one that best predicts an unseen test set
• Gives the highest P(sentence) -
1
PP(W ) = P(w1w2 ...wN ) N
Perplexity is the inverse probability of
the test set, normalized by the number
1
of words: = N
P(w1w2 ...wN )

Chain rule:

For bigrams:

Minimizing perplexity is the same as maximizing probability

The Shannon Game intuition for perplexity
• From Josh Goodman
• How hard is the task of recognizing digits ‘0,1,2,3,4,5,6,7,8,9’
• Perplexity 10
• How hard is recognizing (30,000) names at Microsoft.
• Perplexity = 30,000
• If a system has to recognize
• Operator (1 in 4)
• Sales (1 in 4)
• Technical Support (1 in 4)
• 30,000 names (1 in 120,000 each)
• Perplexity is 53
• Perplexity is weighted equivalent branching factor
Perplexity as branching factor
• Let’s suppose a sentence consisting of random digits
• What is the perplexity of this sentence according to a model
that assign P=1/10 to each digit?
Another form of Perplexity

• Lower is better!
• Example:
• uniform model  perplexity is N
• Interpretation: effective vocabulary size (accounting for statistical regularities)
• Typical values for newspaper text:
• Uniform: 20,000; Unigram: 1000s, Bigram: 700-1000, Trigram: 100-200
• Important note:
• Its easy to get bogus perplexities by having bogus probabilities that sum to
more than one over their event spaces. Be careful!
Lower perplexity = better model

• Training 38 million words, test 1.5 million words, WSJ

N-gram Unigram Bigram Trigram

Order
Perplexity 962 170 109

14 Ngramlm
No ratings yet
14 Ngramlm
67 pages
3 LM Jan 08 2021
No ratings yet
3 LM Jan 08 2021
77 pages
6.Chapter6_LanguageModel
No ratings yet
6.Chapter6_LanguageModel
33 pages
Chapter 03-Number System
No ratings yet
Chapter 03-Number System
88 pages
Session 2-3 Language Modeling
No ratings yet
Session 2-3 Language Modeling
69 pages
3 LM 2024
No ratings yet
3 LM 2024
78 pages
Language Modeling
No ratings yet
Language Modeling
88 pages
Language Modeling: Introduction To N-Grams
No ratings yet
Language Modeling: Introduction To N-Grams
88 pages
3 LM Jan 08 2021
No ratings yet
3 LM Jan 08 2021
77 pages
N-Gram Language Models Lecture
No ratings yet
N-Gram Language Models Lecture
59 pages
N-Gram Language Models Lecture
No ratings yet
N-Gram Language Models Lecture
56 pages
3_LM_2024
No ratings yet
3_LM_2024
78 pages
08 Language Models
No ratings yet
08 Language Models
69 pages
N Grams
No ratings yet
N Grams
51 pages
Lecture 2. N-Gram LMs
No ratings yet
Lecture 2. N-Gram LMs
77 pages
Language Model PDF
No ratings yet
Language Model PDF
76 pages
Lec-3 Language Modeling N-Grams
No ratings yet
Lec-3 Language Modeling N-Grams
41 pages
KEN2570 4 LanguageModel
No ratings yet
KEN2570 4 LanguageModel
17 pages
Language Modeling: Introduction To N-Grams
No ratings yet
Language Modeling: Introduction To N-Grams
79 pages
Lecture 4
No ratings yet
Lecture 4
87 pages
Language Modeling and Spelling Correction
No ratings yet
Language Modeling and Spelling Correction
97 pages
lm24aug
No ratings yet
lm24aug
84 pages
CS 388: Natural Language Processing:: N-Gram Language Models
No ratings yet
CS 388: Natural Language Processing:: N-Gram Language Models
22 pages
Week 4
No ratings yet
Week 4
37 pages
Ngrams
100% (1)
Ngrams
22 pages
Multimedia Application L5
No ratings yet
Multimedia Application L5
35 pages
Notes of NLP - Unit-2
No ratings yet
Notes of NLP - Unit-2
23 pages
LM
No ratings yet
LM
76 pages
NLP - N-Gram Language Model
No ratings yet
NLP - N-Gram Language Model
22 pages
Lecture 5: Language Modeling (N-Gram, BOW)
No ratings yet
Lecture 5: Language Modeling (N-Gram, BOW)
25 pages
Multimedia Application L6
No ratings yet
Multimedia Application L6
63 pages
Introduction To Language Modeling Final
No ratings yet
Introduction To Language Modeling Final
69 pages
NLP Unit-4
No ratings yet
NLP Unit-4
48 pages
Lecture 4
No ratings yet
Lecture 4
37 pages
CME4408 P5 N-grams Smooting
No ratings yet
CME4408 P5 N-grams Smooting
43 pages
2. Language Modeling
No ratings yet
2. Language Modeling
50 pages
5)Lecture-Feb11&13&17&18
No ratings yet
5)Lecture-Feb11&13&17&18
21 pages
lecture5-ngrams
No ratings yet
lecture5-ngrams
40 pages
Lecture 03
No ratings yet
Lecture 03
41 pages
Artificial Intelligence: N-Gram Models: Russell & Norvig: Section 22.1
No ratings yet
Artificial Intelligence: N-Gram Models: Russell & Norvig: Section 22.1
32 pages
3-Lecture Three - (Chapter Two-N-gram Language Models)
No ratings yet
3-Lecture Three - (Chapter Two-N-gram Language Models)
28 pages
Lecture04-Ngram Lang Models
No ratings yet
Lecture04-Ngram Lang Models
39 pages
NLP PLM
No ratings yet
NLP PLM
35 pages
N-Grams and Corpus Linguistics: Julia Hirschberg
No ratings yet
N-Grams and Corpus Linguistics: Julia Hirschberg
47 pages
session10_cs2731 nlp LM
No ratings yet
session10_cs2731 nlp LM
47 pages
Lecture - 3 - Statistical Language Models
No ratings yet
Lecture - 3 - Statistical Language Models
56 pages
NLP_Unit2 (2)
No ratings yet
NLP_Unit2 (2)
65 pages
Language Modeling: Prabhleen Juneja Thapar Institute of Engineering & Technology
No ratings yet
Language Modeling: Prabhleen Juneja Thapar Institute of Engineering & Technology
36 pages
01 Introduction To N-Grams 8-41
No ratings yet
01 Introduction To N-Grams 8-41
13 pages
Lecture 6 to 8 N-gram
No ratings yet
Lecture 6 to 8 N-gram
19 pages
BCSE306L_AI_MODULE-7_SMSATAPATHY
No ratings yet
BCSE306L_AI_MODULE-7_SMSATAPATHY
51 pages
Natural Language Processing:: N-Gram Language Models
No ratings yet
Natural Language Processing:: N-Gram Language Models
48 pages
NLP 5th unit
No ratings yet
NLP 5th unit
19 pages
NLP m2
No ratings yet
NLP m2
74 pages
Language Modeling
No ratings yet
Language Modeling
88 pages
Week 3
No ratings yet
Week 3
24 pages
3_2
No ratings yet
3_2
26 pages
Lecture 3 - Language Modelling and RNNs Part 1
No ratings yet
Lecture 3 - Language Modelling and RNNs Part 1
44 pages
Lecture13 LM YirenWang
No ratings yet
Lecture13 LM YirenWang
8 pages
50 Python Concepts Every Developer Should Know
From Everand
50 Python Concepts Every Developer Should Know
Hernando Abella
No ratings yet
Deep Learnong
No ratings yet
Deep Learnong
14 pages
Zahoor CV PDF
No ratings yet
Zahoor CV PDF
4 pages
Video Metadata Generation and Classification-1
No ratings yet
Video Metadata Generation and Classification-1
26 pages
Audio Deepfake Detection Paper
No ratings yet
Audio Deepfake Detection Paper
6 pages
Assistive Technology For People With Hearing/Speaking Disabilities in Qatar
No ratings yet
Assistive Technology For People With Hearing/Speaking Disabilities in Qatar
8 pages
Mad Lab Report
0% (2)
Mad Lab Report
27 pages
H.) UNIT - 8 - EVOLUTION OF TELEVISION PDF
No ratings yet
H.) UNIT - 8 - EVOLUTION OF TELEVISION PDF
26 pages
CODE Magazine - January-February 2025
No ratings yet
CODE Magazine - January-February 2025
76 pages
Home Automation System Based Mobile Application: Poonphon Suesaowaluk
No ratings yet
Home Automation System Based Mobile Application: Poonphon Suesaowaluk
6 pages
Acoustic Triangulation Attack
No ratings yet
Acoustic Triangulation Attack
43 pages
Phonet
No ratings yet
Phonet
14 pages
Matlab EXPO 2021
No ratings yet
Matlab EXPO 2021
57 pages
Srilm - An Extensible Language Modeling Toolkit
No ratings yet
Srilm - An Extensible Language Modeling Toolkit
4 pages
Pattern Recognition Ai Notes
No ratings yet
Pattern Recognition Ai Notes
5 pages
What Is Navidrive ?: AVI Rive IS A System Which Enables YOU
No ratings yet
What Is Navidrive ?: AVI Rive IS A System Which Enables YOU
55 pages
FluentNet - End-to-End Detection of Speech Disfluency With Deep Learning
No ratings yet
FluentNet - End-to-End Detection of Speech Disfluency With Deep Learning
13 pages
The Voice Recognition (
No ratings yet
The Voice Recognition (
6 pages
YouTube TT Instructions v6
No ratings yet
YouTube TT Instructions v6
6 pages
Voice Recognition Module
No ratings yet
Voice Recognition Module
6 pages
SIP Report - HPCL
No ratings yet
SIP Report - HPCL
54 pages
NLP Programming en 01 Unigramlm
No ratings yet
NLP Programming en 01 Unigramlm
28 pages
Speech Enhancement
No ratings yet
Speech Enhancement
60 pages
Seminar Report ON Artificial Passenger
No ratings yet
Seminar Report ON Artificial Passenger
17 pages
MLT Quantum
No ratings yet
MLT Quantum
138 pages
Gondi
No ratings yet
Gondi
7 pages
Arificial Intelligence Notes
No ratings yet
Arificial Intelligence Notes
19 pages
Intelligent Clinical Documentation: Harnessing Generative AI For Patient-Centric Clinical Note Generation
No ratings yet
Intelligent Clinical Documentation: Harnessing Generative AI For Patient-Centric Clinical Note Generation
15 pages
CH 1
No ratings yet
CH 1
28 pages
Voice Recognition
100% (1)
Voice Recognition
3 pages

13 Ngramlm

Uploaded by

13 Ngramlm

Uploaded by

N-Gram Language Modeling

 We can construct an (infinite) set of strings

 Data: given a training set of example sentences

• The noisy channel approach:

Acoustic model: Distributions Language model:

the station signs are in deep in english -14732

argmax P(w|a) = argmax P(a|w)P(w)

argmax P(e|f) = argmax P(f|e)P(e)

• Related task: probability of an upcoming word:

• P(its, water, is, so, transparent, that)

P(“its water is so transparent”) =

P(the | its water is so transparent that) =

• No! Too many possible sentences!

P(the | its water is so transparent that) » P(the | that)

• In other words, we approximate each

• Graphical model: <s> w1 w2 wn-1 </s>

Minimizing perplexity is the same as maximizing probability

• Training 38 million words, test 1.5 million words, WSJ

N-gram Unigram Bigram Trigram

You might also like