Lecture 5: Language Modeling (N-Gram, BOW)

The document discusses language modeling techniques like n-grams and bag-of-words models. It explains that n-gram models predict the next word given the previous n-1 words. For example, a bigram model predicts the next word based on the current word. The document also covers probabilistic language modeling, including calculating conditional probabilities. It discusses using n-gram models for tasks like machine translation by assigning probabilities to word sequences.

Uploaded by

Manaal Azfar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

90 views

Lecture 5: Language Modeling (N-Gram, BOW)

Uploaded by

Manaal Azfar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 25

Lecture 5: Language Modeling

(N-gram, BOW)
Lecture Objectives:

•Student will be able to understand Language Modeling

Techniques
•Students Will be able to understand statistical
computing of terms & formation a matrix of term
probabilities.
CSC-441: Natural Language Processing
What is NLP??
NLP is the
branch of
computer science
focused on
developing
systems that
allow computers
to communicate
with people using
everyday
language
Probabilistic Language Modeling
Assign a probability to a sentence :P(S)=P(∑Wi)
• Goal: compute the probability of a sentence or
sequence of words:
P(W) = P(w1,w2,w3,w4,w5…wn)
• Related task: probability of an upcoming word:
P(w5|w1,w2,w3,w4)
• A model that computes either of these:
P(W) or P(wn|w1,w2…wn-1) is called a
language model.
• Better: the grammar But language model or LM is
standard
Conditional Probability
• A conditional probability is a probability
whose sample space has been limited to only
those outcomes that fulfill a certain condition.
• The conditional probability of event A given
that event B has happened is
P(A|B)=P(A ∩ B)/P(B).
• The order is very important do not think
that P(A|B)=P(B|A)! THEY ARE
DIFFERENT.
Examples
• Suppose that A and B are events with probabilities:
P(A)=1/3, P(B)=1/4,
P(A ∩ B)=1/10
• Find each of the following:
1. P(A | B) = P(A ∩ B)/P(B)=1/10/1/4=4/10
2. P(B | A) = P(A ∩ B)/P(A)=1/10/1/3=3/10
3. P(A’ | B’) = P(A’ ∩ B’)/P(B’)=
P((A U B)’)/(1-P(B))=(1-P(A U B))/(1 – P(B))=
(1 – (P(A)+P(B)-P(A ∩ B)))/(1-P(B))=
(1 – (1/3+1/4-1/10))/(1-1/10)=(1-29/60)/9/10=
31/60/9/10=31/54.
A view of machine translation
• Assigning probabilities to sequences of
words is essential in machine translation.

他向记者介绍了主要内
容
He to reporters introduced main content

Could be translated as:

•he introduced reporters to the main contents of the statement
•he briefed to reporters the main contents of the statement
•he briefed reporters on the main contents of the statement
Language Models (LMs)

Also known as “N-gram models”

N-gram models
– Definition: 𝑝(𝑥𝑛| 𝑥𝑛−1 , … , 𝑥𝑛−𝑁+1 )
– predict the next word given N-1 previous words
1-gram = unigram
– 𝑝(𝑥𝑛 )
2-gram = bigram
– 𝑝(𝑥𝑛 | 𝑥𝑛−1 )
3-gram = trigram
– 𝑝(𝑥𝑛 |𝑥𝑛−2 , 𝑥𝑛−1 )
Probability value is invariant with respect to ‘𝑛’
“N-gram” (without “models”) means N-word sequences
Unigram
Bigram
Example
<s> I am Sam </s> ,<s> Sam I am </s> , <s> I am Sam </s>
,<s> I do not like green eggs and ham </s>
i am Sam do not like green eggs and ham
<S> 2/3 0 0 0 0 0 0 0 0 0
I 0 2/3 0 1/3 0 0 0 0 0 0
Am 0 0 2/3 0 0 0 0 0 0 0
Sam 1/3 0 0 0 0 0 0 0 0 0
Do 0 0 0 0 1/1 0 0 0 0 0
not 0 0 0 0 0 1/1 0 0 0 0
Like 0 0 0 0 0 0 1/1 0 0 0
Green 0 0 0 0 0 0 0 1/1 0 0
Eggs 0 0 0 0 0 0 0 0 1/1 0
And 0 0 0 0 0 0 0 0 0 1/1
ham 0 0 0 0 0 0 0 0 0 0
Raw bigram counts
• Out of 9222 sentences
Raw bigram probabilities
• Normalize by unigrams:

• Result:
Bigram estimates of sentence
probabilities
P(<s> I want english food </s>) =
P(I|<s>)
× P(want|I)
× P(chinese|want)
× P(food|chineses)
× P(</s>|food)
=
Training Language Models?
Google N-Gram Release
• serve as the inspiration 1390
• serve as the installation 136
• serve as the institute 187
• serve as the institution 279
• serve as the institutional 461
• serve as the instructional 173
• serve as the instructor 286
• serve as the indicator 120
• serve as the indicators 45
• serve as the indispensable 111

http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html
Smoothing techniques
• Laplace smoothing
• Pretend we saw each word one more time than we
did
• Just add one to all the counts!

• MLE estimate:

• Add-1 estimate:
Examples
• Here are a few other useful probabilities:
P(i|<s>) = 0.25 P(english|want) = 0.0011
P(food|english) = 0.5 P(</s>|food) = 0.68
add-1 smoothed probabilities: P(i|<s>)=0.19 and P(</s>|
food)= 0:40
Evaluation: How good is our model?
• Does our language model prefer good sentences to
bad ones?
– Assign higher probability to “real” or “frequently observed”
sentences
• Than “ungrammatical” or “rarely observed” sentences?
• We train parameters of our model on a training set.
• We test the model’s performance on data we haven’t seen.
– A test set is an unseen dataset that is different from our training set,
totally unused.
– An evaluation metric tells us how well our model does on the test
set.
Perplexity
The best language model is one that best predicts an unseen test set

Perplexity is the inverse probability of

the test set, normalized by the number
of words:

Chain rule:

For bigrams:

Minimizing perplexity is the same

as maximizing probability
Perplexity
• What is the perplexity of the sentence
according to a model that assign P=1/10 to
each digit? (Shannon Game)
• How about a letter?
– Is it 26?
• Does the model fit the data?
– A good model will give a high probability
to a real sentence
Out of Vocabulary estimation?
How to estimate unknown words?
As for unknown word
=count(unknown)/count(words)=0/# of
words=??
And hence we cannot compute perplexity
(can’t divide by 0)!

Solution
Smoothing Technique
Markov Models
• The assumption that the probability of a word
depends only on the previous word is called a
Markov assumption.
• Markov models are the class of probabilistic
models that assume we can predict the
probability of some future unit without looking
too far into the past.

P ( the | its water is so transparent that)  P ( the | that )

Markov Assumption

P( w1w2  wn )   P( wi | wi k  wi 1 )
i

• In other words, we approximate each

component in the product

P(wi | w1w2  wi 1 )  P(wi | wi k  wi 1 )

Summary

• Language Models = “word sequence prediction” as a

probabilistic model
• Can be used for information Extraction
• N-gram model help to perform machine translation

24
References
• Wikipedia.com
• Prof. Jason Eisner (Natural Language
Processing)John Hopkins University.
• Web.Standford.edu

ELT Curriculum - Design, Innovation, and Management, The - White, Ronald v-1
100% (2)
ELT Curriculum - Design, Innovation, and Management, The - White, Ronald v-1
210 pages
N Grams
No ratings yet
N Grams
51 pages
Lec-3 Language Modeling N-Grams
No ratings yet
Lec-3 Language Modeling N-Grams
41 pages
Artificial Intelligence: N-Gram Models: Russell & Norvig: Section 22.1
No ratings yet
Artificial Intelligence: N-Gram Models: Russell & Norvig: Section 22.1
32 pages
KEN2570 4 LanguageModel
No ratings yet
KEN2570 4 LanguageModel
17 pages
Lecture 4
No ratings yet
Lecture 4
87 pages
14 Ngramlm
No ratings yet
14 Ngramlm
67 pages
3 LM Jan 08 2021
No ratings yet
3 LM Jan 08 2021
77 pages
Multimedia Application L5
No ratings yet
Multimedia Application L5
35 pages
N-Gram Language Models Lecture
No ratings yet
N-Gram Language Models Lecture
56 pages
N-Gram Language Models Lecture
No ratings yet
N-Gram Language Models Lecture
59 pages
3 LM Jan 08 2021
No ratings yet
3 LM Jan 08 2021
77 pages
Language Modeling: Introduction To N-Grams
No ratings yet
Language Modeling: Introduction To N-Grams
79 pages
13 Ngramlm
No ratings yet
13 Ngramlm
27 pages
5)Lecture-Feb11&13&17&18
No ratings yet
5)Lecture-Feb11&13&17&18
21 pages
NLP Unit-4
No ratings yet
NLP Unit-4
48 pages
Lecture 2. N-Gram LMs
No ratings yet
Lecture 2. N-Gram LMs
77 pages
6.Chapter6_LanguageModel
No ratings yet
6.Chapter6_LanguageModel
33 pages
Notes of NLP - Unit-2
No ratings yet
Notes of NLP - Unit-2
23 pages
Language Modeling: Prabhleen Juneja Thapar Institute of Engineering & Technology
No ratings yet
Language Modeling: Prabhleen Juneja Thapar Institute of Engineering & Technology
36 pages
lm24aug
No ratings yet
lm24aug
84 pages
CME4408 P5 N-grams Smooting
No ratings yet
CME4408 P5 N-grams Smooting
43 pages
3 LM 2024
No ratings yet
3 LM 2024
78 pages
3_LM_2024
No ratings yet
3_LM_2024
78 pages
Session 2-3 Language Modeling
No ratings yet
Session 2-3 Language Modeling
69 pages
Ngrams
100% (1)
Ngrams
22 pages
CS 388: Natural Language Processing:: N-Gram Language Models
No ratings yet
CS 388: Natural Language Processing:: N-Gram Language Models
22 pages
Multimedia Application L6
No ratings yet
Multimedia Application L6
63 pages
Language Model PDF
No ratings yet
Language Model PDF
76 pages
Language Modeling and Spelling Correction
No ratings yet
Language Modeling and Spelling Correction
97 pages
Week 3
No ratings yet
Week 3
24 pages
Lecture - 3 - Statistical Language Models
No ratings yet
Lecture - 3 - Statistical Language Models
56 pages
NLP m2
No ratings yet
NLP m2
74 pages
Language Modeling
No ratings yet
Language Modeling
88 pages
08 Language Models
No ratings yet
08 Language Models
69 pages
Language Modeling: Introduction To N-Grams
No ratings yet
Language Modeling: Introduction To N-Grams
88 pages
Introduction To Language Modeling Final
No ratings yet
Introduction To Language Modeling Final
69 pages
Lecture 4
No ratings yet
Lecture 4
37 pages
NLP - N-Gram Language Model
No ratings yet
NLP - N-Gram Language Model
22 pages
Week 4
No ratings yet
Week 4
37 pages
Chapter 03-Number System
No ratings yet
Chapter 03-Number System
88 pages
IS 7118 Unit-4 N-Grams
100% (2)
IS 7118 Unit-4 N-Grams
93 pages
Lecture_4_N_grams
No ratings yet
Lecture_4_N_grams
29 pages
language modelling_
No ratings yet
language modelling_
17 pages
04_N-gram Language Models
No ratings yet
04_N-gram Language Models
41 pages
NLP_Unit2 (2)
No ratings yet
NLP_Unit2 (2)
65 pages
Lecture04-Ngram Lang Models
No ratings yet
Lecture04-Ngram Lang Models
39 pages
LM
No ratings yet
LM
76 pages
Lecture 6 to 8 N-gram
No ratings yet
Lecture 6 to 8 N-gram
19 pages
Lecture 03
No ratings yet
Lecture 03
41 pages
Language Models: CS6370: Natural Language Processing
No ratings yet
Language Models: CS6370: Natural Language Processing
35 pages
N-Grams and Corpus Linguistics: Julia Hirschberg
No ratings yet
N-Grams and Corpus Linguistics: Julia Hirschberg
47 pages
3. Language Modeling
No ratings yet
3. Language Modeling
43 pages
NLP-UNITS-IV-V
No ratings yet
NLP-UNITS-IV-V
30 pages
N-Grams and Smoothing: CSC 371: Spring 2012
No ratings yet
N-Grams and Smoothing: CSC 371: Spring 2012
39 pages
Module 2
No ratings yet
Module 2
98 pages
02 Estimating N-Gram Probabilities 9-38
No ratings yet
02 Estimating N-Gram Probabilities 9-38
4 pages
3-Lecture Three - (Chapter Two-N-gram Language Models)
No ratings yet
3-Lecture Three - (Chapter Two-N-gram Language Models)
28 pages
2. Language Modeling
No ratings yet
2. Language Modeling
50 pages
Unit 2b
No ratings yet
Unit 2b
22 pages
How to Find Inter-Groups Differences Using Spss/Excel/Web Tools in Common Experimental Designs: Book 1
From Everand
How to Find Inter-Groups Differences Using Spss/Excel/Web Tools in Common Experimental Designs: Book 1
P.Y. Cheng
No ratings yet
Lecture 3: Text Processing & Minimum Edit Distance Algorithm
No ratings yet
Lecture 3: Text Processing & Minimum Edit Distance Algorithm
57 pages
Lecture 1: Introduction To NLP: Understand Concepts Applications
No ratings yet
Lecture 1: Introduction To NLP: Understand Concepts Applications
32 pages
Org 100h Mov Al, 5 Bin 00000101b Mov BL, 10 Bin 00001010b 5 + 10 15 (Decimal) or Hex 0Fh or Bin 00001111b Add Al, BL Ret
No ratings yet
Org 100h Mov Al, 5 Bin 00000101b Mov BL, 10 Bin 00001010b 5 + 10 15 (Decimal) or Hex 0Fh or Bin 00001111b Add Al, BL Ret
5 pages
Oop Project Proposal Form Fall 2019
No ratings yet
Oop Project Proposal Form Fall 2019
4 pages
Lab Pack
No ratings yet
Lab Pack
23 pages
Speech Level Shift and Conversational Strategy in Japanese Discourse
No ratings yet
Speech Level Shift and Conversational Strategy in Japanese Discourse
17 pages
English: Diphthongs and Affixes
No ratings yet
English: Diphthongs and Affixes
4 pages
Life Stages PDF
No ratings yet
Life Stages PDF
2 pages
P3 Writing Tasks (Diary)
No ratings yet
P3 Writing Tasks (Diary)
8 pages
5 Constituent Analysis
No ratings yet
5 Constituent Analysis
21 pages
LE - Q3 - English 4 - Lesson 3 - Weeks 3-5
100% (1)
LE - Q3 - English 4 - Lesson 3 - Weeks 3-5
27 pages
Lesson Plan in English Grade 7
No ratings yet
Lesson Plan in English Grade 7
5 pages
Latin Noun Drill 1
No ratings yet
Latin Noun Drill 1
1 page
Aptis General PDF
No ratings yet
Aptis General PDF
16 pages
Present Perfect and Present Perfet Continous
No ratings yet
Present Perfect and Present Perfet Continous
3 pages
Bai Tap Chuyen Sau Tieng Anh 3 Bai 9
No ratings yet
Bai Tap Chuyen Sau Tieng Anh 3 Bai 9
12 pages
Kisi-kisi Sas Bahasa Inggris Kelas 6 Semester 1
No ratings yet
Kisi-kisi Sas Bahasa Inggris Kelas 6 Semester 1
6 pages
Typical Adjective Endings 1bac
No ratings yet
Typical Adjective Endings 1bac
5 pages
Diagnostic Test 1st Forms
No ratings yet
Diagnostic Test 1st Forms
3 pages
Unit Ii Unit Iii 1
No ratings yet
Unit Ii Unit Iii 1
27 pages
2024 Template Bullet
No ratings yet
2024 Template Bullet
1 page
Notes_part-24_YET Complete English Grammar
No ratings yet
Notes_part-24_YET Complete English Grammar
39 pages
It Was Amazing Experience - SIMPLE PAST TENSE
No ratings yet
It Was Amazing Experience - SIMPLE PAST TENSE
18 pages
Unit 4 Revision Exercises
No ratings yet
Unit 4 Revision Exercises
2 pages
An Error Analysis On The Use of Simple Past Tense in Students' Recount Text Writing (A Study at The Tenth Grade of MA NU Banat Kudus)
No ratings yet
An Error Analysis On The Use of Simple Past Tense in Students' Recount Text Writing (A Study at The Tenth Grade of MA NU Banat Kudus)
157 pages
soal ass kelas 11
No ratings yet
soal ass kelas 11
4 pages
1609656195887
100% (1)
1609656195887
160 pages
Skripsi Nur Fitria Lestari
No ratings yet
Skripsi Nur Fitria Lestari
218 pages
Exercises on Semantics 2
No ratings yet
Exercises on Semantics 2
28 pages
The Analysis of Grammatical Errors in Essay Writing of Intermediate Students in District Narowal
No ratings yet
The Analysis of Grammatical Errors in Essay Writing of Intermediate Students in District Narowal
9 pages
KS4 Unit 9 - 10 School - Education
No ratings yet
KS4 Unit 9 - 10 School - Education
3 pages
Tugas Bahasa Inggris
No ratings yet
Tugas Bahasa Inggris
4 pages
Grammar Exam Specification Matrix
No ratings yet
Grammar Exam Specification Matrix
2 pages