Week-2
Week-2
EL
Pawan Goyal
PT CSE, IITKGP
Week 2: Lecture 1
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 1 / 20
Spelling Correction
EL
PT
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 2 / 20
Spelling Correction
EL
PT
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 2 / 20
Spelling Correction
EL
PT
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 2 / 20
Spelling Correction
EL
behalf
behave
....
PT
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 2 / 20
Spelling Correction
EL
behalf
behave
....
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 2 / 20
Spelling Correction
EL
behalf
behave
....
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 2 / 20
Spelling Correction
EL
behalf
behave
....
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 2 / 20
Spelling Correction
EL
behalf
behave
....
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 2 / 20
Edit Distance
EL
The minimum edit distance between two strings
PT
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 3 / 20
Edit Distance
EL
The minimum edit distance between two strings
Is the minimum number of editing operations
PT
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 3 / 20
Edit Distance
EL
The minimum edit distance between two strings
Is the minimum number of editing operations
PT
I Insertion
I Deletion
I Substitution
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 3 / 20
Minimum Edit Distance
Example
Edit distance from ‘intention’ to ‘execution’
EL
PT
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 4 / 20
Minimum Edit Distance
Example
Edit distance from ‘intention’ to ‘execution’
EL
PT
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 4 / 20
Minimum Edit Distance
EL
PT
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 5 / 20
Minimum Edit Distance
EL
PT
If each operation has a cost of 1 (Levenshtein)
N
I Distance between these is 5
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 5 / 20
Minimum Edit Distance
EL
PT
If each operation has a cost of 1 (Levenshtein)
N
I Distance between these is 5
If substitution costs 2 (alternate version)
I Distance between these is 8
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 5 / 20
How to find the Minimum Edit Distance?
EL
PT
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 6 / 20
How to find the Minimum Edit Distance?
Searching for a path (sequence of edits) from the start string to the final string:
EL
PT
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 6 / 20
How to find the Minimum Edit Distance?
Searching for a path (sequence of edits) from the start string to the final string:
Initial state: the word we are transforming
EL
PT
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 6 / 20
How to find the Minimum Edit Distance?
Searching for a path (sequence of edits) from the start string to the final string:
Initial state: the word we are transforming
EL
Operators: insert, delete, substitute
PT
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 6 / 20
How to find the Minimum Edit Distance?
Searching for a path (sequence of edits) from the start string to the final string:
Initial state: the word we are transforming
EL
Operators: insert, delete, substitute
Goal state: the word we are trying to get to
PT
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 6 / 20
How to find the Minimum Edit Distance?
Searching for a path (sequence of edits) from the start string to the final string:
Initial state: the word we are transforming
EL
Operators: insert, delete, substitute
Goal state: the word we are trying to get to
Path cost: what we want to minimize: the number of edits
PT
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 6 / 20
How to find the Minimum Edit Distance?
Searching for a path (sequence of edits) from the start string to the final string:
Initial state: the word we are transforming
EL
Operators: insert, delete, substitute
Goal state: the word we are trying to get to
Path cost: what we want to minimize: the number of edits
PT
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 6 / 20
Minimum Edit as Search
How to navigate?
EL
The space of all edit sequences is huge
PT
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 7 / 20
Minimum Edit as Search
How to navigate?
EL
The space of all edit sequences is huge
Lot of distinct paths end up at the same state
PT
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 7 / 20
Minimum Edit as Search
How to navigate?
EL
The space of all edit sequences is huge
Lot of distinct paths end up at the same state
PT
Don’t have to keep track of all of them
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 7 / 20
Minimum Edit as Search
How to navigate?
EL
The space of all edit sequences is huge
Lot of distinct paths end up at the same state
PT
Don’t have to keep track of all of them
Keep track of the shortest path to each state
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 7 / 20
Defining Minimum Edit Distance Matrix
EL
Y of length m
PT
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 8 / 20
Defining Minimum Edit Distance Matrix
EL
Y of length m
We define D(i, j)
PT
the edit distance between X[1..i] and Y[1..j]
i.e., the first i characters of X and the first j characters of Y
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 8 / 20
Defining Minimum Edit Distance Matrix
EL
Y of length m
We define D(i, j)
PT
the edit distance between X[1..i] and Y[1..j]
i.e., the first i characters of X and the first j characters of Y
N
Thus, the edit distance between X and Y is D(n, m)
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 8 / 20
Computing Minimum Edit Distance
Dynamic Programming
EL
A tabular computation of D(n, m)
PT
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 9 / 20
Computing Minimum Edit Distance
Dynamic Programming
EL
A tabular computation of D(n, m)
Solving problems by combining solutions to subproblems
PT
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 9 / 20
Computing Minimum Edit Distance
Dynamic Programming
EL
A tabular computation of D(n, m)
Solving problems by combining solutions to subproblems
Bottom-up
PT
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 9 / 20
Computing Minimum Edit Distance
Dynamic Programming
EL
A tabular computation of D(n, m)
Solving problems by combining solutions to subproblems
Bottom-up
I
PT
Compute D(i, j) for small i, j
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 9 / 20
Computing Minimum Edit Distance
Dynamic Programming
EL
A tabular computation of D(n, m)
Solving problems by combining solutions to subproblems
Bottom-up
I
I PT
Compute D(i, j) for small i, j
Compute larger D(i, j) based on previously computed smaller values
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 9 / 20
Computing Minimum Edit Distance
Dynamic Programming
EL
A tabular computation of D(n, m)
Solving problems by combining solutions to subproblems
Bottom-up
I
I PT
Compute D(i, j) for small i, j
Compute larger D(i, j) based on previously computed smaller values
Compute D(i, j) for all i and j till you get to D(n, m)
N
I
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 9 / 20
Dynamic Programming Algorithm
EL
PT
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 10 / 20
The Edit Distance Table
EL
PT
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 11 / 20
The Edit Distance Table
EL
PT
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 11 / 20
The Edit Distance Table
EL
PT
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 12 / 20
Computing Alignments
EL
PT
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 13 / 20
Computing Alignments
EL
PT
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 13 / 20
Computing Alignments
EL
I We often need to align characters of the two strings to each other
PT
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 13 / 20
Computing Alignments
EL
I We often need to align characters of the two strings to each other
We do this by keeping a “backtrace”
PT
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 13 / 20
Computing Alignments
EL
I We often need to align characters of the two strings to each other
We do this by keeping a “backtrace”
PT
Every time we enter a cell, remember where we came from
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 13 / 20
Computing Alignments
EL
I We often need to align characters of the two strings to each other
We do this by keeping a “backtrace”
PT
Every time we enter a cell, remember where we came from
When we reach the end,
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 13 / 20
Computing Alignments
EL
I We often need to align characters of the two strings to each other
We do this by keeping a “backtrace”
Trace back the path from the upper right corner to read off the alignment
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 13 / 20
The Edit Distance Table
EL
PT
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 14 / 20
The Edit Distance Table
EL
PT
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 15 / 20
Minimum Edit with Backtrace
EL
PT
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 16 / 20
Adding Backtrace to Minimum Edit
EL
PT
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 17 / 20
The distance matrix
EL
PT
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 18 / 20
The distance matrix
EL
from (0,0) to (M,N)
corresponds to an alignment
of two sequences.
PT
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 18 / 20
The distance matrix
EL
from (0,0) to (M,N)
corresponds to an alignment
of two sequences.
PT An optimal alignment is
composed of optimal
N
sub-alignments.
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 18 / 20
Result of Backtrace
EL
PT
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 19 / 20
Performance
Time
EL
PT
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 20 / 20
Performance
Time
O(nm)
EL
Space
PT
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 20 / 20
Performance
Time
O(nm)
EL
Space
O(nm)
Backtrace
PT
N
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 20 / 20
Performance
Time
O(nm)
EL
Space
O(nm)
Backtrace
PT
N
O(n + m)
Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 20 / 20
Weighted Edit Distance, Other variations
EL
Pawan Goyal
PT CSE, IITKGP
Week 2: Lecture 2
N
Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 1 / 12
Weighted Edit Distance
EL
Why to add weights to the computation?
Some letters are more likely to be mistyped.
PT
N
Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 2 / 12
Confusion Matrix for Spelling Errors
EL
PT
N
Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 3 / 12
Keyboard Design
EL
PT
N
Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 4 / 12
Weighted Minimum Edit Distance
EL
PT
N
Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 5 / 12
How to modify the algorithm with transpose?
Transpose
transpose(x, y) = (y, x)
Also known as metathesis
EL
PT
N
Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 6 / 12
How to modify the algorithm with transpose?
Transpose
transpose(x, y) = (y, x)
Also known as metathesis
EL
Modification to the dynamic programmic algorithm
PT
D(i − 1, j) + 1 (deletion)
D(i, j − 1) + 1 (insertion)
(
1 if (x[i] 6= y[j])(substitution)
N
D[i][j] = min D(i − 1, j − 1)+
0 otherwise
D(i − 2, j − 2) + 1 (x[i] = y[j − 1] and x[i − 1] = y[j]
(transposition)
Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 6 / 12
How to find dictionary entries with smallest edit distance?
EL
PT
N
Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 7 / 12
How to find dictionary entries with smallest edit distance?
Naïve Method
Compute edit ditance from the query term to each dictionary term – an
exhaustive search
EL
PT
N
Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 7 / 12
How to find dictionary entries with smallest edit distance?
Naïve Method
Compute edit ditance from the query term to each dictionary term – an
exhaustive search
EL
Can be made efficient if we do it over a trie structure
PT
N
Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 7 / 12
How to find dictionary entries with smallest edit distance?
Naïve Method
Compute edit ditance from the query term to each dictionary term – an
exhaustive search
EL
Can be made efficient if we do it over a trie structure
PT
N
Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 7 / 12
How to find dictionary entries with smallest edit distance?
EL
PT
N
Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 8 / 12
How to find dictionary entries with smallest edit distance?
EL
transpose + substitution + insertion) from the query term and search
them in the dictionary.
PT
N
Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 8 / 12
How to find dictionary entries with smallest edit distance?
EL
transpose + substitution + insertion) from the query term and search
them in the dictionary.
PT
For a word of length 9, alphabet of size 36, this will lead to 114,324 terms
to search for
N
Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 8 / 12
How to find dictionary entries with smallest edit distance?
EL
transpose + substitution + insertion) from the query term and search
them in the dictionary.
to search for
PT
For a word of length 9, alphabet of size 36, this will lead to 114,324 terms
Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 8 / 12
How to find dictionary entries with smallest edit distance?
EL
term (offline)
Generate terms with an edit distance ≤ 2 (deletes) from the input terms
and search in dictionary
PT
N
Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 9 / 12
How to find dictionary entries with smallest edit distance?
EL
term (offline)
Generate terms with an edit distance ≤ 2 (deletes) from the input terms
and search in dictionary
PT
Number of deletes within edit distance ≤ 2 for a word of length 9 will be 45
N
Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 9 / 12
How to find dictionary entries with smallest edit distance?
EL
term (offline)
Generate terms with an edit distance ≤ 2 (deletes) from the input terms
and search in dictionary
PT
Number of deletes within edit distance ≤ 2 for a word of length 9 will be 45
N
A further check is required to remove the false positives
Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 9 / 12
Spelling Correction
EL
PT
N
Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 10 / 12
Spelling Correction
EL
behaf → behalf
PT
N
Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 10 / 12
Spelling Correction
EL
behaf → behalf
PT
Typographical errors: three → there
Cognitive errors (homophones): piece → peace, too → two
N
Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 10 / 12
Non-word spelling errors
EL
The larger the dictionary the better
PT
N
Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 11 / 12
Non-word spelling errors
EL
The larger the dictionary the better
PT
Non-word spelling error correction
Generate candidates: real words that are similar to the error word
Choose the best one:
N
I Shortest weighted edit distance
I Highest noisy channel probabliity
Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 11 / 12
Real word spelling errors
EL
Find candidate words with similar pronunciations
Find candidate words with similar spelling
Include w in candidate set
PT
N
Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 12 / 12
Real word spelling errors
EL
Find candidate words with similar pronunciations
Find candidate words with similar spelling
Include w in candidate set
Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 12 / 12
Noisy Channel Model for Spelling Correction
EL
Pawan Goyal
PT CSE, IITKGP
Week 2: Lecture 3
N
Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 1 / 17
Noisy Channel
EL
ŵ = arg maxP(w|x)
w∈V
PT
N
Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 2 / 17
Noisy Channel
EL
ŵ = arg maxP(w|x)
w∈V
PT
= arg max
w∈V
P(x|w)P(w)
P(x)
N
Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 2 / 17
Noisy Channel
EL
ŵ = arg maxP(w|x)
w∈V
PT
= arg max
w∈V
P(x|w)P(w)
P(x)
N
= arg maxP(x|w)P(w)
w∈V
Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 2 / 17
Non-word spelling error: acress
EL
Words with similar pronuncitation
Small edit distance of pronunciation to error
PT
N
Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 3 / 17
Non-word spelling error: acress
EL
Words with similar pronuncitation
Small edit distance of pronunciation to error
PT
Damerau-Levenshtein edit distance
Minimum edit distance, where edits are:
N
Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 3 / 17
Non-word spelling error: acress
EL
Words with similar pronuncitation
Small edit distance of pronunciation to error
PT
Damerau-Levenshtein edit distance
Minimum edit distance, where edits are:
N
Insertion, Deletion, Substitution,
Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 3 / 17
Non-word spelling error: acress
EL
Words with similar pronuncitation
Small edit distance of pronunciation to error
PT
Damerau-Levenshtein edit distance
Minimum edit distance, where edits are:
N
Insertion, Deletion, Substitution,
Transposition of two adjacent letters
Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 3 / 17
Words within edit distance 1 of acress
EL
PT
N
Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 4 / 17
Candidate generation
EL
Almost all errors within edit distance 2
PT
N
Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 5 / 17
Candidate generation
EL
Almost all errors within edit distance 2
Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 5 / 17
Computing error probability: confusion matrix
EL
ins[x,y]: count (x typed as xy)
sub[x,y]: count (x typed as y)
PT
trans[x,y]: count(xy typed as yx)
N
Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 6 / 17
Computing error probability: confusion matrix
EL
ins[x,y]: count (x typed as xy)
sub[x,y]: count (x typed as y)
PT
trans[x,y]: count(xy typed as yx)
Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 6 / 17
Channel model
EL
PT
N
Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 7 / 17
Channel model for acress
EL
PT
N
Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 8 / 17
Noisy channel probability for acress
EL
PT
N
Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 9 / 17
Using a bigram language model
EL
PT
N
Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 10 / 17
Using a bigram language model
EL
Counts from the Corpus of Contemporary American English with add-1
smoothing
PT
N
Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 10 / 17
Using a bigram language model
EL
Counts from the Corpus of Contemporary American English with add-1
smoothing
P(actress|versatile) = 0.000021, P(across|versatile) = 0.000021
PT
N
Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 10 / 17
Using a bigram language model
EL
Counts from the Corpus of Contemporary American English with add-1
smoothing
P(actress|versatile) = 0.000021, P(across|versatile) = 0.000021
PT
P(whose|actress) = 0.0010, P(whose|across) = 0.000006
N
Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 10 / 17
Using a bigram language model
EL
Counts from the Corpus of Contemporary American English with add-1
smoothing
P(actress|versatile) = 0.000021, P(across|versatile) = 0.000021
PT
P(whose|actress) = 0.0010, P(whose|across) = 0.000006
P(“versatile actress whose”) = 0.000021 * 0.0010 = 210 x 10−10
N
Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 10 / 17
Using a bigram language model
EL
Counts from the Corpus of Contemporary American English with add-1
smoothing
P(actress|versatile) = 0.000021, P(across|versatile) = 0.000021
PT
P(whose|actress) = 0.0010, P(whose|across) = 0.000006
P(“versatile actress whose”) = 0.000021 * 0.0010 = 210 x 10−10
N
P(“versatile across whose”) = 0.000021 * 0.000006 = 1 x10−10
Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 10 / 17
Real-word spelling errors
EL
The study was conducted mainly be John Black
The design an construction of the system ...
PT
N
Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 11 / 17
Real-word spelling errors
EL
The study was conducted mainly be John Black
The design an construction of the system ...
PT
25-40% of spelling errors are real words
N
Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 11 / 17
Noisy channel for real-word spell correction
Given a sentence X = w1 , w2 , w3 . . . , wn
EL
Candidate (w1 ) = {w1 , w0 1 , w00 1 , w000 1 , . . .}
Candidate (w2 ) = {w2 , w0 2 , w00 2 , w000 2 , . . .}
PT
Candidate (w3 ) = {w3 , w0 3 , w00 3 , w000 3 , . . .}
N
Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 12 / 17
Noisy channel for real-word spell correction
Given a sentence X = w1 , w2 , w3 . . . , wn
EL
Candidate (w1 ) = {w1 , w0 1 , w00 1 , w000 1 , . . .}
Candidate (w2 ) = {w2 , w0 2 , w00 2 , w000 2 , . . .}
PT
Candidate (w3 ) = {w3 , w0 3 , w00 3 , w000 3 , . . .}
Choose the sequence W that maximizes P(W|X)
N
Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 12 / 17
Noisy channel for real-world spell correction
EL
PT
N
Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 13 / 17
Simplification: One error per sentence
EL
two of thew
w1 , w00 2 , w3 two off thew
w1 , w2 , w0 3 two of the
PT
w000 1 , w2 , w3 too of thew
N
Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 14 / 17
Simplification: One error per sentence
EL
two of thew
w1 , w00 2 , w3 two off thew
w1 , w2 , w0 3 two of the
PT
w000 1 , w2 , w3 too of thew
N
Choose the sequence W that maximizes P(W|X)
Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 14 / 17
Getting the probability values
Noisy Channel
Ŵ = arg maxP(W|X)
W∈S
EL
where X is the observed sentence and S is the set of all the possible
sequences from the candidate set
PT
N
Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 15 / 17
Getting the probability values
Noisy Channel
Ŵ = arg maxP(W|X)
W∈S
EL
where X is the observed sentence and S is the set of all the possible
sequences from the candidate set
PT
= arg maxP(X|W)P(W)
W∈S
N
Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 15 / 17
Getting the probability values
Noisy Channel
Ŵ = arg maxP(W|X)
W∈S
EL
where X is the observed sentence and S is the set of all the possible
sequences from the candidate set
PT
= arg maxP(X|W)P(W)
W∈S
N
P(X|W)
Same as for non-word spelling correction
Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 15 / 17
Getting the probability values
Noisy Channel
Ŵ = arg maxP(W|X)
W∈S
EL
where X is the observed sentence and S is the set of all the possible
sequences from the candidate set
PT
= arg maxP(X|W)P(W)
W∈S
N
P(X|W)
Same as for non-word spelling correction
Also require proabability for no error P(w|w)
Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 15 / 17
Probability of no error
EL
What is the probability for a correctly typed word? P(“the”|“the”)
PT
N
Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 16 / 17
Probability of no error
EL
What is the probability for a correctly typed word? P(“the”|“the”)
Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 16 / 17
Computing P(W)
EL
Use Language Model
Unigram
Bigram
... PT
N
Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 17 / 17
N-gram Language Models
EL
Pawan Goyal
PT CSE, IITKGP
Week 2: Lecture 4
N
EL
PT
N
EL
PT
N
EL
PT
N
EL
Use a Language Model PT
P(about fifteen minutes from) > P(about fifteen minuets from)
N
Speech Recognition
P(I saw a van) >> P(eyes awe of an)
EL
PT
N
Speech Recognition
P(I saw a van) >> P(eyes awe of an)
EL
Machine Translation
Which sentence is more plausible in the target language?
PT
P(high winds) > P(large winds)
N
Speech Recognition
P(I saw a van) >> P(eyes awe of an)
EL
Machine Translation
Which sentence is more plausible in the target language?
PT
P(high winds) > P(large winds)
Other Applications
N
Context Sensitive Spelling Correction
Natural Language Generation
...
EL
Language model also supports predicting the completion of a sentence.
I Please turn off your cell ...
I Your program does not ...
PT
N
EL
Language model also supports predicting the completion of a sentence.
I Please turn off your cell ...
I Your program does not ...
PT
Predictive text input systems can guess what you are typing and give
choices on how to complete it.
N
P(W) = P(w1 , w2 , w3 , . . . , wn )
EL
PT
N
P(W) = P(w1 , w2 , w3 , . . . , wn )
EL
PT
Related Task: probability of an upcoming word:
P(w4 |w1 , w2 , w3 )
N
P(W) = P(w1 , w2 , w3 , . . . , wn )
EL
PT
Related Task: probability of an upcoming word:
P(w4 |w1 , w2 , w3 )
N
A model that computes either of these is called a language model
EL
How to compute the joint probability
P(about, fifteen, minutes, from)
PT
N
EL
How to compute the joint probability
P(about, fifteen, minutes, from)
Basic Idea
PT
Rely on the Chain Rule of Probability
N
Conditional Probabilities
P(A, B)
P(B|A) =
P(A)
EL
PT
N
Conditional Probabilities
P(A, B)
P(B|A) =
P(A)
EL
P(A, B) = P(A)P(B|A)
PT
N
Conditional Probabilities
P(A, B)
P(B|A) =
P(A)
EL
P(A, B) = P(A)P(B|A)
More Variables
PT
P(A, B, C, D) = P(A)P(B|A)P(C|A, B)P(D|A, B, C)
N
Conditional Probabilities
P(A, B)
P(B|A) =
P(A)
EL
P(A, B) = P(A)P(B|A)
More Variables
PT
P(A, B, C, D) = P(A)P(B|A)P(C|A, B)P(D|A, B, C)
N
The Chain Rule in General
P(x1 , x2 , . . . , xn ) = P(x1 )P(x2 |x1 )P(x3 |x1 , x2 ) . . . P(xn |x1 , . . . , xn−1 )
EL
P(w1 w2 . . . wn ) = ∏ P(wi |w1 w2 . . . wi−1 )
i
EL
P(w1 w2 . . . wn ) = ∏ P(wi |w1 w2 . . . wi−1 )
i
EL
Count (about fifteen minutes from office)
P(office | about fifteen minutes from) = Count (about fifteen minutes from)
PT
N
EL
Count (about fifteen minutes from office)
P(office | about fifteen minutes from) = Count (about fifteen minutes from)
EL
Simplifying Assumption: Use only the previous word
P(office | about fifteen minutes from) ≈ P(office | from)
PT
N
EL
Simplifying Assumption: Use only the previous word
P(office | about fifteen minutes from) ≈ P(office | from)
EL
i
PT
N
EL
i
PT
P(w1 w2 . . . wn ) ≈ ∏ P(wi |wi−k . . . wi−1 )
i
N
EL
i
PT
P(w1 w2 . . . wn ) ≈ ∏ P(wi |wi−k . . . wi−1 )
i
N
We approximate each component in the product
EL
PT
N
EL
Unigram: P(office)
Bigram: P(office | from)
PT
Trigram: P(office | minutes from)
N
EL
Unigram: P(office)
Bigram: P(office | from)
PT
Trigram: P(office | minutes from)
EL
Unigram: P(office)
Bigram: P(office | from)
PT
Trigram: P(office | minutes from)
EL
In general, an insufficient model of language:
PT
N
EL
In general, an insufficient model of language:
language has long-distance dependencies:
PT
“The computer which I had just put into the machine room on the fifth
floor crashed.”
N
EL
In general, an insufficient model of language:
language has long-distance dependencies:
floor crashed.”
PT
“The computer which I had just put into the machine room on the fifth
EL
PT
N
EL
Value that makes the observed data the “most probable”
count(wi−1 , wi )
P(wi |wi−1 ) =
PT count(wi−1 )
N
EL
Value that makes the observed data the “most probable”
count(wi−1 , wi )
P(wi |wi−1 ) =
PTP(wi |wi−1 ) =
count(wi−1 )
c(wi−1 , wi )
N
c(wi−1 )
EL
c(wi−1 , wi ) <s>who am I </s>
P(wi |wi−1 ) =
c(wi−1 ) <s>I would like to know </s>
PT
N
EL
<s>I would like to know </s>
Estimating bigrams
P(I|<s>) =
P(</s>|here) =
P(would | I) =
P(here | am) =
PT
N
P(know | like) =
EL
<s>I would like to know </s>
Estimating bigrams
P(I|<s>) = 2/3
P(</s>|here) =1
P(would | I) = 1/3
P(here | am) = 1/2
PT
N
P(know | like) = 0
EL
PT
N
Normlize by unigrams
EL
PT
N
Normlize by unigrams
EL
Bigram Probabilities
PT
N
EL
P(<s> I want english food </s>)
= P(I | <s>) x P(want | I) x P(english | want) x P(food | english ) x P(</s> | food)
PT
N
EL
P(<s> I want english food </s>)
= P(I | <s>) x P(want | I) x P(english | want) x P(food | english ) x P(</s> | food)
= 0.000031
PT
N
P(english|want) = .0011
EL
P(chinese|want) = .0065
P(to|want) = .66
P(eat | to) = .28
P(food | to) = 0
P(want | spend) = 0
PT
N
P (i | <s>) = .25
EL
Avoids underflow
PT
N
EL
Avoids underflow
Adding is faster than multiplying
PT
N
EL
Avoids underflow
Adding is faster than multiplying
log(p1 × p2 × p3 × p4 ) = logp1 + logp2 + logp3 + logp4
Handling zeros
PT
N
Use smoothing
EL
SRILM
http://www.speech.sri.com/projects/srilm/
PT
N
EL
Number of unigrams: 13,588,391
Number of bigrams: 314,843,401
Number of trigrams: 977,069,902
PT
Number of fourgrams: 1,313,818,354
Number of fivegrams: 1,176,470,663
http://googleresearch.blogspot.in/2006/08/
N
all-our-n-gram-are-belong-to-you.html
EL
serve as the inspiration 1390
serve as the installation 136
serve as the institute 187
serve as the institution 279
serve as the institutional 461
PT
N
EL
PT
N
EL
Pawan Goyal
PT CSE, IITKGP
Week 2: Lecture 5
N
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 1 / 16
Evaluating Language Model
EL
ungrammatical (or rarely observed) ones
PT
N
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 2 / 16
Evaluating Language Model
EL
ungrammatical (or rarely observed) ones
PT
Parameters of the model are trained on a large corpus of text, called
training set.
N
Performance is tested on a disjoint (held-out) test data using an
evaluation metric
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 2 / 16
Extrinsic evaluation of N-grams models
EL
Comparison of two models, A and B
Use each model for one or more tasks: spelling corrector, speech
recognizer, machine translation
PT
Get accuracy values for A and B
Compare accuracy for A and B
N
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 3 / 16
Intrinsic evaluation: Perplexity
EL
PT
N
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 4 / 16
Intrinsic evaluation: Perplexity
EL
I always order pizza with cheese and . . .
The president of India is . . .
I wrote a . . .
PT
N
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 4 / 16
Intrinsic evaluation: Perplexity
EL
I always order pizza with cheese and . . .
The president of India is . . .
I wrote a . . .
PT
Unigram model doesn’t work for this game.
N
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 4 / 16
Intrinsic evaluation: Perplexity
EL
I always order pizza with cheese and . . .
The president of India is . . .
I wrote a . . .
PT
Unigram model doesn’t work for this game.
N
A better model of text
is one which assigns a higher probability to the actual word
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 4 / 16
Perplexity
The best language model is one that best predics an unseen test set
Perplexity (PP(W))
Perplexity is the inverse probability of the test data, normalized by the number
of words:
EL
PT
N
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 5 / 16
Perplexity
The best language model is one that best predics an unseen test set
Perplexity (PP(W))
Perplexity is the inverse probability of the test data, normalized by the number
of words:
EL
1
PP(W) = P(w1 w2 . . . wN )− N
PT
N
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 5 / 16
Perplexity
The best language model is one that best predics an unseen test set
Perplexity (PP(W))
Perplexity is the inverse probability of the test data, normalized by the number
of words:
EL
1
PP(W) = P(w1 w2 . . . wN )− N
PT
PP(W) = ∏
1
P(wi |w1 . . . wi−1 )
N1
N
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 5 / 16
Perplexity
The best language model is one that best predics an unseen test set
Perplexity (PP(W))
Perplexity is the inverse probability of the test data, normalized by the number
of words:
EL
1
PP(W) = P(w1 w2 . . . wN )− N
PT
PP(W) = ∏
1
P(wi |w1 . . . wi−1 )
N1
N
For bigrams
N1
1
PP(W) = ∏
P(wi |wi−1 )
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 5 / 16
Example: A Simple Scenario
EL
PT
N
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 6 / 16
Example: A Simple Scenario
EL
PT
N
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 6 / 16
Example: A Simple Scenario
EL
1
PP(W) = P(w1 w2 . . . wN )− N
PT
N
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 6 / 16
Example: A Simple Scenario
EL
1
PP(W) = P(w1 w2 . . . wN )− N
PT =
N !− N1
1
10
N
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 6 / 16
Example: A Simple Scenario
EL
1
PP(W) = P(w1 w2 . . . wN )− N
PT =
N !− N1
1
10
N
−1
1
=
10
= 10
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 6 / 16
Lower perplexity = better model
WSJ Corpus
Training: 38 million words
Test: 1.5 million words
EL
PT
N
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 7 / 16
Lower perplexity = better model
WSJ Corpus
Training: 38 million words
Test: 1.5 million words
EL
PT
N
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 7 / 16
Lower perplexity = better model
WSJ Corpus
Training: 38 million words
Test: 1.5 million words
EL
PT
N
Unigram perplexity: 962?
The model is as confused on test data as if it had to choose uniformly and
independently among 962 possibilities for each word.
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 7 / 16
The Shannon Visualization Method
EL
PT
N
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 8 / 16
The Shannon Visualization Method
EL
Choose a random bigram
(<s>,w) as per its
probability
PT
N
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 8 / 16
The Shannon Visualization Method
EL
Choose a random bigram
(<s>,w) as per its
probability
PT
Choose a random bigram
(w,x) as per its probability
N
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 8 / 16
The Shannon Visualization Method
EL
Choose a random bigram
(<s>,w) as per its
probability
PT
Choose a random bigram
(w,x) as per its probability
N
And so on until we choose
</s>
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 8 / 16
The Shannon Visualization Method
EL
Choose a random bigram
(<s>,w) as per its
probability
PT
Choose a random bigram
(w,x) as per its probability
N
And so on until we choose
</s>
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 8 / 16
Shakespeare as Corpus
EL
N = 884,647 tokens, V = 29,066
Shakespeare produced 300,000 bigram types out of V 2 = 844 million
possible bigrams.
PT
N
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 9 / 16
Approximating Shakespeare
EL
PT
N
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 10 / 16
Problems with simple MLE estimate: zeros
EL
PT
N
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 11 / 16
Problems with simple MLE estimate: zeros
Training set
... denied the allegations
EL
... denied the reports
... denied the claims
... denied the request
PT
N
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 11 / 16
Problems with simple MLE estimate: zeros
Training set
... denied the allegations Test Data
EL
... denied the reports ... denied the offer
... denied the claims ... denied the loan
... denied the request
PT
N
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 11 / 16
Problems with simple MLE estimate: zeros
Training set
... denied the allegations Test Data
EL
... denied the reports ... denied the offer
... denied the claims ... denied the loan
... denied the request
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 11 / 16
Problems with simple MLE estimate: zeros
Training set
... denied the allegations Test Data
EL
... denied the reports ... denied the offer
... denied the claims ... denied the loan
... denied the request
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 11 / 16
Problems with simple MLE estimate: zeros
Training set
... denied the allegations Test Data
EL
... denied the reports ... denied the offer
... denied the claims ... denied the loan
... denied the request
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 11 / 16
Language Modeling: Smoothing
EL
PT
N
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 12 / 16
Language Modeling: Smoothing
EL
PT
N
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 12 / 16
Language Modeling: Smoothing
EL
PT
Steal probability mass to generalize better
N
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 12 / 16
Laplace Smoothing (Add-one estimation)
Pretend as if we saw each word (N-gram) one more time that we actually
EL
did
PT
N
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 13 / 16
Laplace Smoothing (Add-one estimation)
Pretend as if we saw each word (N-gram) one more time that we actually
EL
did
Just add one to all the counts!
PT
N
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 13 / 16
Laplace Smoothing (Add-one estimation)
Pretend as if we saw each word (N-gram) one more time that we actually
EL
did
Just add one to all the counts!
PT
MLE estimate for bigram: PMLE (wi |wi−1 ) = c(wi−1 )i
i−1
c(w ,w )
N
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 13 / 16
Laplace Smoothing (Add-one estimation)
Pretend as if we saw each word (N-gram) one more time that we actually
EL
did
Just add one to all the counts!
PT
MLE estimate for bigram: PMLE (wi |wi−1 ) = c(wi−1 )i
,w )+1
,w )
N
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 13 / 16
Reconstituted counts as effect of smoothing
EL
Effective bigram count (c∗ (wn−1 wn ))
c∗ (wn−1 wn ) c(wn−1 wn ) + 1
=
PTc(wn−1 ) c(wn−1 ) + V
N
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 14 / 16
Comparing with bigrams: Restaurant corpus
EL
PT
N
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 15 / 16
Comparing with bigrams: Restaurant corpus
EL
PT
N
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 15 / 16
More general formulations: Add-k
c(wi−1 , wi ) + k
PAdd−k (wi |wi−1 ) =
c(wi−1 ) + kV
EL
PT
N
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 16 / 16
More general formulations: Add-k
c(wi−1 , wi ) + k
PAdd−k (wi |wi−1 ) =
c(wi−1 ) + kV
EL
c(wi−1 , wi ) + m( V1 )
PAdd−k (wi |wi−1 ) =
c(wi−1 ) + m
PT
N
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 16 / 16
More general formulations: Add-k
c(wi−1 , wi ) + k
PAdd−k (wi |wi−1 ) =
c(wi−1 ) + kV
EL
c(wi−1 , wi ) + m( V1 )
PAdd−k (wi |wi−1 ) =
c(wi−1 ) + m
Unigram prior smoothing:
PT
PUnigramPrior (wi |wi−1 ) =
c(wi−1 , wi ) + mP(wi )
c(wi−1 ) + m
N
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 16 / 16
More general formulations: Add-k
c(wi−1 , wi ) + k
PAdd−k (wi |wi−1 ) =
c(wi−1 ) + kV
EL
c(wi−1 , wi ) + m( V1 )
PAdd−k (wi |wi−1 ) =
c(wi−1 ) + m
Unigram prior smoothing:
PT
PUnigramPrior (wi |wi−1 ) =
c(wi−1 , wi ) + mP(wi )
c(wi−1 ) + m
N
A good value of k or m?
Can be optimized on held-out set
Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 16 / 16