0% found this document useful (0 votes)
14 views222 pages

Week-2

Uploaded by

Shamilie M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views222 pages

Week-2

Uploaded by

Shamilie M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 222

Spelling Correction: Edit Distance

EL
Pawan Goyal

PT CSE, IITKGP

Week 2: Lecture 1
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 1 / 20
Spelling Correction

EL
PT
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 2 / 20
Spelling Correction

I am writing this email on behaf of ...

EL
PT
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 2 / 20
Spelling Correction

I am writing this email on behaf of ...


The user typed ‘behaf’.

Which are some close words?

EL
PT
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 2 / 20
Spelling Correction

I am writing this email on behaf of ...


The user typed ‘behaf’.

Which are some close words?

EL
behalf
behave
....

PT
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 2 / 20
Spelling Correction

I am writing this email on behaf of ...


The user typed ‘behaf’.

Which are some close words?

EL
behalf
behave
....

Isolated word error correction PT


Pick the one that is closest to ‘behaf’
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 2 / 20
Spelling Correction

I am writing this email on behaf of ...


The user typed ‘behaf’.

Which are some close words?

EL
behalf
behave
....

Isolated word error correction PT


Pick the one that is closest to ‘behaf’
N
How to define ‘closest’?

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 2 / 20
Spelling Correction

I am writing this email on behaf of ...


The user typed ‘behaf’.

Which are some close words?

EL
behalf
behave
....

Isolated word error correction PT


Pick the one that is closest to ‘behaf’
N
How to define ‘closest’?
Need a distance metric

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 2 / 20
Spelling Correction

I am writing this email on behaf of ...


The user typed ‘behaf’.

Which are some close words?

EL
behalf
behave
....

Isolated word error correction PT


Pick the one that is closest to ‘behaf’
N
How to define ‘closest’?
Need a distance metric
The simplest metric: edit distance

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 2 / 20
Edit Distance

EL
The minimum edit distance between two strings

PT
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 3 / 20
Edit Distance

EL
The minimum edit distance between two strings
Is the minimum number of editing operations

PT
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 3 / 20
Edit Distance

EL
The minimum edit distance between two strings
Is the minimum number of editing operations

PT
I Insertion
I Deletion
I Substitution
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 3 / 20
Minimum Edit Distance

Example
Edit distance from ‘intention’ to ‘execution’

EL
PT
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 4 / 20
Minimum Edit Distance

Example
Edit distance from ‘intention’ to ‘execution’

EL
PT
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 4 / 20
Minimum Edit Distance

EL
PT
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 5 / 20
Minimum Edit Distance

EL
PT
If each operation has a cost of 1 (Levenshtein)
N
I Distance between these is 5

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 5 / 20
Minimum Edit Distance

EL
PT
If each operation has a cost of 1 (Levenshtein)
N
I Distance between these is 5
If substitution costs 2 (alternate version)
I Distance between these is 8

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 5 / 20
How to find the Minimum Edit Distance?

EL
PT
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 6 / 20
How to find the Minimum Edit Distance?

Searching for a path (sequence of edits) from the start string to the final string:

EL
PT
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 6 / 20
How to find the Minimum Edit Distance?

Searching for a path (sequence of edits) from the start string to the final string:
Initial state: the word we are transforming

EL
PT
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 6 / 20
How to find the Minimum Edit Distance?

Searching for a path (sequence of edits) from the start string to the final string:
Initial state: the word we are transforming

EL
Operators: insert, delete, substitute

PT
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 6 / 20
How to find the Minimum Edit Distance?

Searching for a path (sequence of edits) from the start string to the final string:
Initial state: the word we are transforming

EL
Operators: insert, delete, substitute
Goal state: the word we are trying to get to

PT
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 6 / 20
How to find the Minimum Edit Distance?

Searching for a path (sequence of edits) from the start string to the final string:
Initial state: the word we are transforming

EL
Operators: insert, delete, substitute
Goal state: the word we are trying to get to
Path cost: what we want to minimize: the number of edits

PT
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 6 / 20
How to find the Minimum Edit Distance?

Searching for a path (sequence of edits) from the start string to the final string:
Initial state: the word we are transforming

EL
Operators: insert, delete, substitute
Goal state: the word we are trying to get to
Path cost: what we want to minimize: the number of edits

PT
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 6 / 20
Minimum Edit as Search

How to navigate?

EL
The space of all edit sequences is huge

PT
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 7 / 20
Minimum Edit as Search

How to navigate?

EL
The space of all edit sequences is huge
Lot of distinct paths end up at the same state

PT
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 7 / 20
Minimum Edit as Search

How to navigate?

EL
The space of all edit sequences is huge
Lot of distinct paths end up at the same state

PT
Don’t have to keep track of all of them
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 7 / 20
Minimum Edit as Search

How to navigate?

EL
The space of all edit sequences is huge
Lot of distinct paths end up at the same state

PT
Don’t have to keep track of all of them
Keep track of the shortest path to each state
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 7 / 20
Defining Minimum Edit Distance Matrix

For two strings


X of length n

EL
Y of length m

PT
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 8 / 20
Defining Minimum Edit Distance Matrix

For two strings


X of length n

EL
Y of length m

We define D(i, j)

PT
the edit distance between X[1..i] and Y[1..j]
i.e., the first i characters of X and the first j characters of Y
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 8 / 20
Defining Minimum Edit Distance Matrix

For two strings


X of length n

EL
Y of length m

We define D(i, j)

PT
the edit distance between X[1..i] and Y[1..j]
i.e., the first i characters of X and the first j characters of Y
N
Thus, the edit distance between X and Y is D(n, m)

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 8 / 20
Computing Minimum Edit Distance

Dynamic Programming

EL
A tabular computation of D(n, m)

PT
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 9 / 20
Computing Minimum Edit Distance

Dynamic Programming

EL
A tabular computation of D(n, m)
Solving problems by combining solutions to subproblems

PT
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 9 / 20
Computing Minimum Edit Distance

Dynamic Programming

EL
A tabular computation of D(n, m)
Solving problems by combining solutions to subproblems
Bottom-up

PT
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 9 / 20
Computing Minimum Edit Distance

Dynamic Programming

EL
A tabular computation of D(n, m)
Solving problems by combining solutions to subproblems
Bottom-up
I
PT
Compute D(i, j) for small i, j
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 9 / 20
Computing Minimum Edit Distance

Dynamic Programming

EL
A tabular computation of D(n, m)
Solving problems by combining solutions to subproblems
Bottom-up
I
I PT
Compute D(i, j) for small i, j
Compute larger D(i, j) based on previously computed smaller values
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 9 / 20
Computing Minimum Edit Distance

Dynamic Programming

EL
A tabular computation of D(n, m)
Solving problems by combining solutions to subproblems
Bottom-up
I
I PT
Compute D(i, j) for small i, j
Compute larger D(i, j) based on previously computed smaller values
Compute D(i, j) for all i and j till you get to D(n, m)
N
I

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 9 / 20
Dynamic Programming Algorithm

EL
PT
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 10 / 20
The Edit Distance Table

EL
PT
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 11 / 20
The Edit Distance Table

EL
PT
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 11 / 20
The Edit Distance Table

EL
PT
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 12 / 20
Computing Alignments

EL
PT
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 13 / 20
Computing Alignments

Computing edit distance may not be sufficient for some applications

EL
PT
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 13 / 20
Computing Alignments

Computing edit distance may not be sufficient for some applications

EL
I We often need to align characters of the two strings to each other

PT
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 13 / 20
Computing Alignments

Computing edit distance may not be sufficient for some applications

EL
I We often need to align characters of the two strings to each other
We do this by keeping a “backtrace”

PT
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 13 / 20
Computing Alignments

Computing edit distance may not be sufficient for some applications

EL
I We often need to align characters of the two strings to each other
We do this by keeping a “backtrace”

PT
Every time we enter a cell, remember where we came from
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 13 / 20
Computing Alignments

Computing edit distance may not be sufficient for some applications

EL
I We often need to align characters of the two strings to each other
We do this by keeping a “backtrace”

PT
Every time we enter a cell, remember where we came from
When we reach the end,
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 13 / 20
Computing Alignments

Computing edit distance may not be sufficient for some applications

EL
I We often need to align characters of the two strings to each other
We do this by keeping a “backtrace”

When we reach the end,


I
PT
Every time we enter a cell, remember where we came from

Trace back the path from the upper right corner to read off the alignment
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 13 / 20
The Edit Distance Table

EL
PT
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 14 / 20
The Edit Distance Table

EL
PT
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 15 / 20
Minimum Edit with Backtrace

EL
PT
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 16 / 20
Adding Backtrace to Minimum Edit

EL
PT
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 17 / 20
The distance matrix

EL
PT
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 18 / 20
The distance matrix

Every non-decreasing path

EL
from (0,0) to (M,N)
corresponds to an alignment
of two sequences.

PT
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 18 / 20
The distance matrix

Every non-decreasing path

EL
from (0,0) to (M,N)
corresponds to an alignment
of two sequences.

PT An optimal alignment is
composed of optimal
N
sub-alignments.

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 18 / 20
Result of Backtrace

EL
PT
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 19 / 20
Performance

Time

EL
PT
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 20 / 20
Performance

Time
O(nm)

EL
Space

PT
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 20 / 20
Performance

Time
O(nm)

EL
Space
O(nm)

Backtrace
PT
N

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 20 / 20
Performance

Time
O(nm)

EL
Space
O(nm)

Backtrace
PT
N
O(n + m)

Pawan Goyal (IIT Kharagpur) Spelling Correction: Edit Distance Week 2: Lecture 1 20 / 20
Weighted Edit Distance, Other variations

EL
Pawan Goyal

PT CSE, IITKGP

Week 2: Lecture 2
N

Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 1 / 12
Weighted Edit Distance

EL
Why to add weights to the computation?
Some letters are more likely to be mistyped.

PT
N

Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 2 / 12
Confusion Matrix for Spelling Errors

EL
PT
N

Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 3 / 12
Keyboard Design

EL
PT
N

Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 4 / 12
Weighted Minimum Edit Distance

EL
PT
N

Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 5 / 12
How to modify the algorithm with transpose?

Transpose
transpose(x, y) = (y, x)
Also known as metathesis

EL
PT
N

Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 6 / 12
How to modify the algorithm with transpose?

Transpose
transpose(x, y) = (y, x)
Also known as metathesis

EL
Modification to the dynamic programmic algorithm

PT
D(i − 1, j) + 1 (deletion)




D(i, j − 1) + 1 (insertion)


 (

 1 if (x[i] 6= y[j])(substitution)
N
D[i][j] = min D(i − 1, j − 1)+


 0 otherwise




 D(i − 2, j − 2) + 1 (x[i] = y[j − 1] and x[i − 1] = y[j]

 (transposition)

Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 6 / 12
How to find dictionary entries with smallest edit distance?

EL
PT
N

Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 7 / 12
How to find dictionary entries with smallest edit distance?

Naïve Method
Compute edit ditance from the query term to each dictionary term – an
exhaustive search

EL
PT
N

Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 7 / 12
How to find dictionary entries with smallest edit distance?

Naïve Method
Compute edit ditance from the query term to each dictionary term – an
exhaustive search

EL
Can be made efficient if we do it over a trie structure

PT
N

Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 7 / 12
How to find dictionary entries with smallest edit distance?

Naïve Method
Compute edit ditance from the query term to each dictionary term – an
exhaustive search

EL
Can be made efficient if we do it over a trie structure

PT
N

Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 7 / 12
How to find dictionary entries with smallest edit distance?

EL
PT
N

Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 8 / 12
How to find dictionary entries with smallest edit distance?

Generate all possible terms with an edit distance <=2 (deletion +

EL
transpose + substitution + insertion) from the query term and search
them in the dictionary.

PT
N

Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 8 / 12
How to find dictionary entries with smallest edit distance?

Generate all possible terms with an edit distance <=2 (deletion +

EL
transpose + substitution + insertion) from the query term and search
them in the dictionary.

PT
For a word of length 9, alphabet of size 36, this will lead to 114,324 terms
to search for
N

Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 8 / 12
How to find dictionary entries with smallest edit distance?

Generate all possible terms with an edit distance <=2 (deletion +

EL
transpose + substitution + insertion) from the query term and search
them in the dictionary.

to search for
PT
For a word of length 9, alphabet of size 36, this will lead to 114,324 terms

For Chinese alphabet size is 70,000 (Unicode Han Characters)


N

Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 8 / 12
How to find dictionary entries with smallest edit distance?

Symmetric Delete Spelling Correction


Generate terms with an edit distance ≤ 2 (deletes) from each dictionary

EL
term (offline)
Generate terms with an edit distance ≤ 2 (deletes) from the input terms
and search in dictionary

PT
N

Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 9 / 12
How to find dictionary entries with smallest edit distance?

Symmetric Delete Spelling Correction


Generate terms with an edit distance ≤ 2 (deletes) from each dictionary

EL
term (offline)
Generate terms with an edit distance ≤ 2 (deletes) from the input terms
and search in dictionary

PT
Number of deletes within edit distance ≤ 2 for a word of length 9 will be 45
N

Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 9 / 12
How to find dictionary entries with smallest edit distance?

Symmetric Delete Spelling Correction


Generate terms with an edit distance ≤ 2 (deletes) from each dictionary

EL
term (offline)
Generate terms with an edit distance ≤ 2 (deletes) from the input terms
and search in dictionary

PT
Number of deletes within edit distance ≤ 2 for a word of length 9 will be 45
N
A further check is required to remove the false positives

Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 9 / 12
Spelling Correction

EL
PT
N

Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 10 / 12
Spelling Correction

Types of spelling errors: Non-word Errors

EL
behaf → behalf

PT
N

Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 10 / 12
Spelling Correction

Types of spelling errors: Non-word Errors

EL
behaf → behalf

Types of spelling errors: Real-word Errors

PT
Typographical errors: three → there
Cognitive errors (homophones): piece → peace, too → two
N

Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 10 / 12
Non-word spelling errors

Non-word spelling error detection


Any word not in a dictionary is an error

EL
The larger the dictionary the better

PT
N

Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 11 / 12
Non-word spelling errors

Non-word spelling error detection


Any word not in a dictionary is an error

EL
The larger the dictionary the better

PT
Non-word spelling error correction
Generate candidates: real words that are similar to the error word
Choose the best one:
N
I Shortest weighted edit distance
I Highest noisy channel probabliity

Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 11 / 12
Real word spelling errors

For each word w, generate candidate set

EL
Find candidate words with similar pronunciations
Find candidate words with similar spelling
Include w in candidate set

PT
N

Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 12 / 12
Real word spelling errors

For each word w, generate candidate set

EL
Find candidate words with similar pronunciations
Find candidate words with similar spelling
Include w in candidate set

Choosing best candidate PT


N
Noisy Channel

Pawan Goyal (IIT Kharagpur) Weighted Edit Distance, Other variations Week 2: Lecture 2 12 / 12
Noisy Channel Model for Spelling Correction

EL
Pawan Goyal

PT CSE, IITKGP

Week 2: Lecture 3
N

Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 1 / 17
Noisy Channel

We see an observation x of the misspelled word

Find the correct word w

EL
ŵ = arg maxP(w|x)
w∈V

PT
N

Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 2 / 17
Noisy Channel

We see an observation x of the misspelled word

Find the correct word w

EL
ŵ = arg maxP(w|x)
w∈V

PT
= arg max
w∈V
P(x|w)P(w)
P(x)
N

Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 2 / 17
Noisy Channel

We see an observation x of the misspelled word

Find the correct word w

EL
ŵ = arg maxP(w|x)
w∈V

PT
= arg max
w∈V
P(x|w)P(w)
P(x)
N
= arg maxP(x|w)P(w)
w∈V

Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 2 / 17
Non-word spelling error: acress

Words with similar spelling


Small edit distance to error

EL
Words with similar pronuncitation
Small edit distance of pronunciation to error

PT
N

Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 3 / 17
Non-word spelling error: acress

Words with similar spelling


Small edit distance to error

EL
Words with similar pronuncitation
Small edit distance of pronunciation to error

PT
Damerau-Levenshtein edit distance
Minimum edit distance, where edits are:
N

Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 3 / 17
Non-word spelling error: acress

Words with similar spelling


Small edit distance to error

EL
Words with similar pronuncitation
Small edit distance of pronunciation to error

PT
Damerau-Levenshtein edit distance
Minimum edit distance, where edits are:
N
Insertion, Deletion, Substitution,

Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 3 / 17
Non-word spelling error: acress

Words with similar spelling


Small edit distance to error

EL
Words with similar pronuncitation
Small edit distance of pronunciation to error

PT
Damerau-Levenshtein edit distance
Minimum edit distance, where edits are:
N
Insertion, Deletion, Substitution,
Transposition of two adjacent letters

Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 3 / 17
Words within edit distance 1 of acress

EL
PT
N

Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 4 / 17
Candidate generation

80% of errors are within edit distance 1

EL
Almost all errors within edit distance 2

PT
N

Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 5 / 17
Candidate generation

80% of errors are within edit distance 1

EL
Almost all errors within edit distance 2

thisidea → this idea


inlaw → in-law
PT
Allow deletion of space or hyphen
N

Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 5 / 17
Computing error probability: confusion matrix

del[x,y]: count (xy typed as x)

EL
ins[x,y]: count (x typed as xy)
sub[x,y]: count (x typed as y)

PT
trans[x,y]: count(xy typed as yx)
N

Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 6 / 17
Computing error probability: confusion matrix

del[x,y]: count (xy typed as x)

EL
ins[x,y]: count (x typed as xy)
sub[x,y]: count (x typed as y)

PT
trans[x,y]: count(xy typed as yx)

Insertion and deletion are conditioned on previous character


N

Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 6 / 17
Channel model

EL
PT
N

Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 7 / 17
Channel model for acress

EL
PT
N

Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 8 / 17
Noisy channel probability for acress

EL
PT
N

Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 9 / 17
Using a bigram language model

“ ... versatile acress whose ...”

EL
PT
N

Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 10 / 17
Using a bigram language model

“ ... versatile acress whose ...”

EL
Counts from the Corpus of Contemporary American English with add-1
smoothing

PT
N

Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 10 / 17
Using a bigram language model

“ ... versatile acress whose ...”

EL
Counts from the Corpus of Contemporary American English with add-1
smoothing
P(actress|versatile) = 0.000021, P(across|versatile) = 0.000021

PT
N

Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 10 / 17
Using a bigram language model

“ ... versatile acress whose ...”

EL
Counts from the Corpus of Contemporary American English with add-1
smoothing
P(actress|versatile) = 0.000021, P(across|versatile) = 0.000021

PT
P(whose|actress) = 0.0010, P(whose|across) = 0.000006
N

Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 10 / 17
Using a bigram language model

“ ... versatile acress whose ...”

EL
Counts from the Corpus of Contemporary American English with add-1
smoothing
P(actress|versatile) = 0.000021, P(across|versatile) = 0.000021

PT
P(whose|actress) = 0.0010, P(whose|across) = 0.000006
P(“versatile actress whose”) = 0.000021 * 0.0010 = 210 x 10−10
N

Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 10 / 17
Using a bigram language model

“ ... versatile acress whose ...”

EL
Counts from the Corpus of Contemporary American English with add-1
smoothing
P(actress|versatile) = 0.000021, P(across|versatile) = 0.000021

PT
P(whose|actress) = 0.0010, P(whose|across) = 0.000006
P(“versatile actress whose”) = 0.000021 * 0.0010 = 210 x 10−10
N
P(“versatile across whose”) = 0.000021 * 0.000006 = 1 x10−10

Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 10 / 17
Real-word spelling errors

EL
The study was conducted mainly be John Black
The design an construction of the system ...

PT
N

Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 11 / 17
Real-word spelling errors

EL
The study was conducted mainly be John Black
The design an construction of the system ...

PT
25-40% of spelling errors are real words
N

Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 11 / 17
Noisy channel for real-word spell correction

Given a sentence X = w1 , w2 , w3 . . . , wn

EL
Candidate (w1 ) = {w1 , w0 1 , w00 1 , w000 1 , . . .}
Candidate (w2 ) = {w2 , w0 2 , w00 2 , w000 2 , . . .}

PT
Candidate (w3 ) = {w3 , w0 3 , w00 3 , w000 3 , . . .}
N

Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 12 / 17
Noisy channel for real-word spell correction

Given a sentence X = w1 , w2 , w3 . . . , wn

EL
Candidate (w1 ) = {w1 , w0 1 , w00 1 , w000 1 , . . .}
Candidate (w2 ) = {w2 , w0 2 , w00 2 , w000 2 , . . .}

PT
Candidate (w3 ) = {w3 , w0 3 , w00 3 , w000 3 , . . .}
Choose the sequence W that maximizes P(W|X)
N

Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 12 / 17
Noisy channel for real-world spell correction

EL
PT
N

Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 13 / 17
Simplification: One error per sentence

Choose among all possible sentences with one word replaced

EL
two of thew
w1 , w00 2 , w3 two off thew
w1 , w2 , w0 3 two of the

PT
w000 1 , w2 , w3 too of thew
N

Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 14 / 17
Simplification: One error per sentence

Choose among all possible sentences with one word replaced

EL
two of thew
w1 , w00 2 , w3 two off thew
w1 , w2 , w0 3 two of the

PT
w000 1 , w2 , w3 too of thew
N
Choose the sequence W that maximizes P(W|X)

Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 14 / 17
Getting the probability values

Noisy Channel

Ŵ = arg maxP(W|X)
W∈S

EL
where X is the observed sentence and S is the set of all the possible
sequences from the candidate set

PT
N

Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 15 / 17
Getting the probability values

Noisy Channel

Ŵ = arg maxP(W|X)
W∈S

EL
where X is the observed sentence and S is the set of all the possible
sequences from the candidate set

PT
= arg maxP(X|W)P(W)
W∈S
N

Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 15 / 17
Getting the probability values

Noisy Channel

Ŵ = arg maxP(W|X)
W∈S

EL
where X is the observed sentence and S is the set of all the possible
sequences from the candidate set

PT
= arg maxP(X|W)P(W)
W∈S
N
P(X|W)
Same as for non-word spelling correction

Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 15 / 17
Getting the probability values

Noisy Channel

Ŵ = arg maxP(W|X)
W∈S

EL
where X is the observed sentence and S is the set of all the possible
sequences from the candidate set

PT
= arg maxP(X|W)P(W)
W∈S
N
P(X|W)
Same as for non-word spelling correction
Also require proabability for no error P(w|w)

Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 15 / 17
Probability of no error

EL
What is the probability for a correctly typed word? P(“the”|“the”)

PT
N

Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 16 / 17
Probability of no error

EL
What is the probability for a correctly typed word? P(“the”|“the”)

It may depend on the source text under consideration


1 error in 10 words → 0.9
1 error in 100 words → 0.99
PT
N

Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 16 / 17
Computing P(W)

EL
Use Language Model
Unigram
Bigram
... PT
N

Pawan Goyal (IIT Kharagpur) Noisy Channel Model for Spelling Correction Week 2: Lecture 3 17 / 17
N-gram Language Models

EL
Pawan Goyal

PT CSE, IITKGP

Week 2: Lecture 4
N

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 1 / 24


Context Sensitive Spelling Correction

EL
PT
N

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 2 / 24


Context Sensitive Spelling Correction

The office is about fifteen minuets from my house

EL
PT
N

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 2 / 24


Context Sensitive Spelling Correction

The office is about fifteen minuets from my house

EL
PT
N

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 2 / 24


Context Sensitive Spelling Correction

The office is about fifteen minuets from my house

EL
Use a Language Model PT
P(about fifteen minutes from) > P(about fifteen minuets from)
N

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 2 / 24


Probablilistic Language Models: Applications

Speech Recognition
P(I saw a van) >> P(eyes awe of an)

EL
PT
N

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 3 / 24


Probablilistic Language Models: Applications

Speech Recognition
P(I saw a van) >> P(eyes awe of an)

EL
Machine Translation
Which sentence is more plausible in the target language?

PT
P(high winds) > P(large winds)
N

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 3 / 24


Probablilistic Language Models: Applications

Speech Recognition
P(I saw a van) >> P(eyes awe of an)

EL
Machine Translation
Which sentence is more plausible in the target language?

PT
P(high winds) > P(large winds)

Other Applications
N
Context Sensitive Spelling Correction
Natural Language Generation
...

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 3 / 24


Completion Prediction

EL
Language model also supports predicting the completion of a sentence.
I Please turn off your cell ...
I Your program does not ...

PT
N

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 4 / 24


Completion Prediction

EL
Language model also supports predicting the completion of a sentence.
I Please turn off your cell ...
I Your program does not ...

PT
Predictive text input systems can guess what you are typing and give
choices on how to complete it.
N

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 4 / 24


Probabilistic Language Modeling

Goal: Compute the probability of a sentence or sequence of words:

P(W) = P(w1 , w2 , w3 , . . . , wn )

EL
PT
N

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 5 / 24


Probabilistic Language Modeling

Goal: Compute the probability of a sentence or sequence of words:

P(W) = P(w1 , w2 , w3 , . . . , wn )

EL
PT
Related Task: probability of an upcoming word:

P(w4 |w1 , w2 , w3 )
N

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 5 / 24


Probabilistic Language Modeling

Goal: Compute the probability of a sentence or sequence of words:

P(W) = P(w1 , w2 , w3 , . . . , wn )

EL
PT
Related Task: probability of an upcoming word:

P(w4 |w1 , w2 , w3 )
N
A model that computes either of these is called a language model

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 5 / 24


Computing P(W)

EL
How to compute the joint probability
P(about, fifteen, minutes, from)

PT
N

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 6 / 24


Computing P(W)

EL
How to compute the joint probability
P(about, fifteen, minutes, from)

Basic Idea
PT
Rely on the Chain Rule of Probability
N

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 6 / 24


The Chain Rule

Conditional Probabilities
P(A, B)
P(B|A) =
P(A)

EL
PT
N

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 7 / 24


The Chain Rule

Conditional Probabilities
P(A, B)
P(B|A) =
P(A)

EL
P(A, B) = P(A)P(B|A)

PT
N

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 7 / 24


The Chain Rule

Conditional Probabilities
P(A, B)
P(B|A) =
P(A)

EL
P(A, B) = P(A)P(B|A)

More Variables
PT
P(A, B, C, D) = P(A)P(B|A)P(C|A, B)P(D|A, B, C)
N

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 7 / 24


The Chain Rule

Conditional Probabilities
P(A, B)
P(B|A) =
P(A)

EL
P(A, B) = P(A)P(B|A)

More Variables
PT
P(A, B, C, D) = P(A)P(B|A)P(C|A, B)P(D|A, B, C)
N
The Chain Rule in General
P(x1 , x2 , . . . , xn ) = P(x1 )P(x2 |x1 )P(x3 |x1 , x2 ) . . . P(xn |x1 , . . . , xn−1 )

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 7 / 24


Probability of words in sentences

EL
P(w1 w2 . . . wn ) = ∏ P(wi |w1 w2 . . . wi−1 )
i

P(“about fifteen minutes from”) =


PT
N

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 8 / 24


Probability of words in sentences

EL
P(w1 w2 . . . wn ) = ∏ P(wi |w1 w2 . . . wi−1 )
i

P(“about fifteen minutes from”) =


PT
P(about) x P(fifteen | about) x P(minutes | about fifteen) x P(from | about fifteen
minutes)
N

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 8 / 24


Estimating These Probability Values

Count and divide

EL
Count (about fifteen minutes from office)
P(office | about fifteen minutes from) = Count (about fifteen minutes from)

PT
N

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 9 / 24


Estimating These Probability Values

Count and divide

EL
Count (about fifteen minutes from office)
P(office | about fifteen minutes from) = Count (about fifteen minutes from)

What is the problem


PT
We may never see enough data for estimating these
N

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 9 / 24


Markov Assumption

EL
Simplifying Assumption: Use only the previous word
P(office | about fifteen minutes from) ≈ P(office | from)

PT
N

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 10 / 24


Markov Assumption

EL
Simplifying Assumption: Use only the previous word
P(office | about fifteen minutes from) ≈ P(office | from)

Or the couple previous words


PT
P(office | about fifteen minutes from) ≈ P(office | minutes from)
N

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 10 / 24


Markov Assumption

More Formally: kth order Markov Model


Chain Rule:
P(w1 w2 . . . wn ) = ∏ P(wi |w1 w2 . . . wi−1 )

EL
i

PT
N

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 11 / 24


Markov Assumption

More Formally: kth order Markov Model


Chain Rule:
P(w1 w2 . . . wn ) = ∏ P(wi |w1 w2 . . . wi−1 )

EL
i

Using Markov Assumption: only k previous words

PT
P(w1 w2 . . . wn ) ≈ ∏ P(wi |wi−k . . . wi−1 )
i
N

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 11 / 24


Markov Assumption

More Formally: kth order Markov Model


Chain Rule:
P(w1 w2 . . . wn ) = ∏ P(wi |w1 w2 . . . wi−1 )

EL
i

Using Markov Assumption: only k previous words

PT
P(w1 w2 . . . wn ) ≈ ∏ P(wi |wi−k . . . wi−1 )
i
N
We approximate each component in the product

P(wi |w1 w2 . . . wi−1 ) ≈ P(wi |wi−k . . . wi−1 )

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 11 / 24


N-Gram Models

P(office | about fifteen minutes from)


An N -gram model uses only N − 1 words of prior context.

EL
PT
N

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 12 / 24


N-Gram Models

P(office | about fifteen minutes from)


An N -gram model uses only N − 1 words of prior context.

EL
Unigram: P(office)
Bigram: P(office | from)

PT
Trigram: P(office | minutes from)
N

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 12 / 24


N-Gram Models

P(office | about fifteen minutes from)


An N -gram model uses only N − 1 words of prior context.

EL
Unigram: P(office)
Bigram: P(office | from)

PT
Trigram: P(office | minutes from)

Markov model and Language Model


N

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 12 / 24


N-Gram Models

P(office | about fifteen minutes from)


An N -gram model uses only N − 1 words of prior context.

EL
Unigram: P(office)
Bigram: P(office | from)

PT
Trigram: P(office | minutes from)

Markov model and Language Model


N
An N -gram model is an N − 1-order Markov Model

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 12 / 24


N-Gram Models

We can extend to trigrams, 4-grams, 5-grams

EL
In general, an insufficient model of language:

PT
N

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 13 / 24


N-Gram Models

We can extend to trigrams, 4-grams, 5-grams

EL
In general, an insufficient model of language:
language has long-distance dependencies:

PT
“The computer which I had just put into the machine room on the fifth
floor crashed.”
N

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 13 / 24


N-Gram Models

We can extend to trigrams, 4-grams, 5-grams

EL
In general, an insufficient model of language:
language has long-distance dependencies:

floor crashed.”
PT
“The computer which I had just put into the machine room on the fifth

In most of the applications, we can get away with N-gram models


N

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 13 / 24


Estimating N-grams probabilities

EL
PT
N

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 14 / 24


Estimating N-grams probabilities

Maximum Likelihood Estimate

EL
Value that makes the observed data the “most probable”

count(wi−1 , wi )
P(wi |wi−1 ) =

PT count(wi−1 )
N

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 14 / 24


Estimating N-grams probabilities

Maximum Likelihood Estimate

EL
Value that makes the observed data the “most probable”

count(wi−1 , wi )
P(wi |wi−1 ) =

PTP(wi |wi−1 ) =
count(wi−1 )

c(wi−1 , wi )
N
c(wi−1 )

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 14 / 24


An Example

<s>I am here </s>

EL
c(wi−1 , wi ) <s>who am I </s>
P(wi |wi−1 ) =
c(wi−1 ) <s>I would like to know </s>

PT
N

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 15 / 24


An Example

<s>I am here </s>


c(wi−1 , wi ) <s>who am I </s>
P(wi |wi−1 ) =
c(wi−1 )

EL
<s>I would like to know </s>

Estimating bigrams
P(I|<s>) =
P(</s>|here) =
P(would | I) =
P(here | am) =
PT
N
P(know | like) =

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 15 / 24


An Example

<s>I am here </s>


c(wi−1 , wi ) <s>who am I </s>
P(wi |wi−1 ) =
c(wi−1 )

EL
<s>I would like to know </s>

Estimating bigrams
P(I|<s>) = 2/3
P(</s>|here) =1
P(would | I) = 1/3
P(here | am) = 1/2
PT
N
P(know | like) = 0

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 15 / 24


Bigram counts from 9222 Restaurant Sentences

EL
PT
N

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 16 / 24


Computing bigram probabilities

Normlize by unigrams

EL
PT
N

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 17 / 24


Computing bigram probabilities

Normlize by unigrams

EL
Bigram Probabilities

PT
N

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 17 / 24


Computing Sentence Probabilities

EL
P(<s> I want english food </s>)
= P(I | <s>) x P(want | I) x P(english | want) x P(food | english ) x P(</s> | food)

PT
N

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 18 / 24


Computing Sentence Probabilities

EL
P(<s> I want english food </s>)
= P(I | <s>) x P(want | I) x P(english | want) x P(food | english ) x P(</s> | food)
= 0.000031
PT
N

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 18 / 24


What knowledge does n-gram represent?

P(english|want) = .0011

EL
P(chinese|want) = .0065
P(to|want) = .66
P(eat | to) = .28
P(food | to) = 0
P(want | spend) = 0
PT
N
P (i | <s>) = .25

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 19 / 24


Practical Issues

Everything in log space

EL
Avoids underflow

PT
N

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 20 / 24


Practical Issues

Everything in log space

EL
Avoids underflow
Adding is faster than multiplying

PT
N

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 20 / 24


Practical Issues

Everything in log space

EL
Avoids underflow
Adding is faster than multiplying
log(p1 × p2 × p3 × p4 ) = logp1 + logp2 + logp3 + logp4

Handling zeros
PT
N
Use smoothing

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 20 / 24


Language Modeling Toolkit

EL
SRILM
http://www.speech.sri.com/projects/srilm/

PT
N

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 21 / 24


Google N-grams

Number of tokens: 1,024,908,267,229


Number of sentences: 95,119,665,584

EL
Number of unigrams: 13,588,391
Number of bigrams: 314,843,401
Number of trigrams: 977,069,902

PT
Number of fourgrams: 1,313,818,354
Number of fivegrams: 1,176,470,663
http://googleresearch.blogspot.in/2006/08/
N
all-our-n-gram-are-belong-to-you.html

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 22 / 24


Example from the 4-gram data

serve as the inspector 66

EL
serve as the inspiration 1390
serve as the installation 136
serve as the institute 187
serve as the institution 279
serve as the institutional 461
PT
N

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 23 / 24


Google books Ngram Data

EL
PT
N

Pawan Goyal (IIT Kharagpur) N-gram Language Models Week 2: Lecture 4 24 / 24


Evaluation of Language Models, Basic Smoothing

EL
Pawan Goyal

PT CSE, IITKGP

Week 2: Lecture 5
N

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 1 / 16
Evaluating Language Model

Does it prefer good sentences to bad sentences?


Assign higher probability to real (or frequently observed) sentences than

EL
ungrammatical (or rarely observed) ones

PT
N

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 2 / 16
Evaluating Language Model

Does it prefer good sentences to bad sentences?


Assign higher probability to real (or frequently observed) sentences than

EL
ungrammatical (or rarely observed) ones

Training and Test Corpora

PT
Parameters of the model are trained on a large corpus of text, called
training set.
N
Performance is tested on a disjoint (held-out) test data using an
evaluation metric

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 2 / 16
Extrinsic evaluation of N-grams models

EL
Comparison of two models, A and B
Use each model for one or more tasks: spelling corrector, speech
recognizer, machine translation

PT
Get accuracy values for A and B
Compare accuracy for A and B
N

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 3 / 16
Intrinsic evaluation: Perplexity

Intuition: The Shannon Game


How well can we predict the next word?

EL
PT
N

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 4 / 16
Intrinsic evaluation: Perplexity

Intuition: The Shannon Game


How well can we predict the next word?

EL
I always order pizza with cheese and . . .
The president of India is . . .
I wrote a . . .

PT
N

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 4 / 16
Intrinsic evaluation: Perplexity

Intuition: The Shannon Game


How well can we predict the next word?

EL
I always order pizza with cheese and . . .
The president of India is . . .
I wrote a . . .

PT
Unigram model doesn’t work for this game.
N

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 4 / 16
Intrinsic evaluation: Perplexity

Intuition: The Shannon Game


How well can we predict the next word?

EL
I always order pizza with cheese and . . .
The president of India is . . .
I wrote a . . .

PT
Unigram model doesn’t work for this game.
N
A better model of text
is one which assigns a higher probability to the actual word

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 4 / 16
Perplexity
The best language model is one that best predics an unseen test set

Perplexity (PP(W))
Perplexity is the inverse probability of the test data, normalized by the number
of words:

EL
PT
N

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 5 / 16
Perplexity
The best language model is one that best predics an unseen test set

Perplexity (PP(W))
Perplexity is the inverse probability of the test data, normalized by the number
of words:

EL
1
PP(W) = P(w1 w2 . . . wN )− N

PT
N

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 5 / 16
Perplexity
The best language model is one that best predics an unseen test set

Perplexity (PP(W))
Perplexity is the inverse probability of the test data, normalized by the number
of words:

EL
1
PP(W) = P(w1 w2 . . . wN )− N

Applying chain Rule

PT
PP(W) = ∏

1
P(wi |w1 . . . wi−1 )
 N1
N

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 5 / 16
Perplexity
The best language model is one that best predics an unseen test set

Perplexity (PP(W))
Perplexity is the inverse probability of the test data, normalized by the number
of words:

EL
1
PP(W) = P(w1 w2 . . . wN )− N

Applying chain Rule

PT
PP(W) = ∏

1
P(wi |w1 . . . wi−1 )
 N1
N
For bigrams
  N1
1
PP(W) = ∏
P(wi |wi−1 )

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 5 / 16
Example: A Simple Scenario

Consider a sentence consisting of N random digits

EL
PT
N

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 6 / 16
Example: A Simple Scenario

Consider a sentence consisting of N random digits


Find the perplexity of this sentence as per a model that assigns a
probability p = 1/10 to each digit.

EL
PT
N

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 6 / 16
Example: A Simple Scenario

Consider a sentence consisting of N random digits


Find the perplexity of this sentence as per a model that assigns a
probability p = 1/10 to each digit.

EL
1
PP(W) = P(w1 w2 . . . wN )− N

PT
N

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 6 / 16
Example: A Simple Scenario

Consider a sentence consisting of N random digits


Find the perplexity of this sentence as per a model that assigns a
probability p = 1/10 to each digit.

EL
1
PP(W) = P(w1 w2 . . . wN )− N

PT =
 N !− N1
1
10
N

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 6 / 16
Example: A Simple Scenario

Consider a sentence consisting of N random digits


Find the perplexity of this sentence as per a model that assigns a
probability p = 1/10 to each digit.

EL
1
PP(W) = P(w1 w2 . . . wN )− N

PT =
 N !− N1
1
10
N
 −1
1
=
10
= 10

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 6 / 16
Lower perplexity = better model

WSJ Corpus
Training: 38 million words
Test: 1.5 million words

EL
PT
N

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 7 / 16
Lower perplexity = better model

WSJ Corpus
Training: 38 million words
Test: 1.5 million words

EL
PT
N

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 7 / 16
Lower perplexity = better model

WSJ Corpus
Training: 38 million words
Test: 1.5 million words

EL
PT
N
Unigram perplexity: 962?
The model is as confused on test data as if it had to choose uniformly and
independently among 962 possibilities for each word.

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 7 / 16
The Shannon Visualization Method

Use the language model to generate word sequences

EL
PT
N

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 8 / 16
The Shannon Visualization Method

Use the language model to generate word sequences

EL
Choose a random bigram
(<s>,w) as per its
probability

PT
N

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 8 / 16
The Shannon Visualization Method

Use the language model to generate word sequences

EL
Choose a random bigram
(<s>,w) as per its
probability

PT
Choose a random bigram
(w,x) as per its probability
N

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 8 / 16
The Shannon Visualization Method

Use the language model to generate word sequences

EL
Choose a random bigram
(<s>,w) as per its
probability

PT
Choose a random bigram
(w,x) as per its probability
N
And so on until we choose
</s>

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 8 / 16
The Shannon Visualization Method

Use the language model to generate word sequences

EL
Choose a random bigram
(<s>,w) as per its
probability

PT
Choose a random bigram
(w,x) as per its probability
N
And so on until we choose
</s>

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 8 / 16
Shakespeare as Corpus

EL
N = 884,647 tokens, V = 29,066
Shakespeare produced 300,000 bigram types out of V 2 = 844 million
possible bigrams.
PT
N

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 9 / 16
Approximating Shakespeare

EL
PT
N

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 10 / 16
Problems with simple MLE estimate: zeros

EL
PT
N

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 11 / 16
Problems with simple MLE estimate: zeros

Training set
... denied the allegations

EL
... denied the reports
... denied the claims
... denied the request

PT
N

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 11 / 16
Problems with simple MLE estimate: zeros

Training set
... denied the allegations Test Data

EL
... denied the reports ... denied the offer
... denied the claims ... denied the loan
... denied the request

PT
N

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 11 / 16
Problems with simple MLE estimate: zeros

Training set
... denied the allegations Test Data

EL
... denied the reports ... denied the offer
... denied the claims ... denied the loan
... denied the request

Zero probability n-grams


P(offer | denied the) = 0
PT
N

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 11 / 16
Problems with simple MLE estimate: zeros

Training set
... denied the allegations Test Data

EL
... denied the reports ... denied the offer
... denied the claims ... denied the loan
... denied the request

Zero probability n-grams


P(offer | denied the) = 0
PT
N
The test set will be assigned a probability 0

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 11 / 16
Problems with simple MLE estimate: zeros

Training set
... denied the allegations Test Data

EL
... denied the reports ... denied the offer
... denied the claims ... denied the loan
... denied the request

Zero probability n-grams


P(offer | denied the) = 0
PT
N
The test set will be assigned a probability 0
And the perplexity can’t be computed

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 11 / 16
Language Modeling: Smoothing

EL
PT
N

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 12 / 16
Language Modeling: Smoothing

With sparse statistics

EL
PT
N

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 12 / 16
Language Modeling: Smoothing

With sparse statistics

EL
PT
Steal probability mass to generalize better
N

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 12 / 16
Laplace Smoothing (Add-one estimation)

Pretend as if we saw each word (N-gram) one more time that we actually

EL
did

PT
N

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 13 / 16
Laplace Smoothing (Add-one estimation)

Pretend as if we saw each word (N-gram) one more time that we actually

EL
did
Just add one to all the counts!

PT
N

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 13 / 16
Laplace Smoothing (Add-one estimation)

Pretend as if we saw each word (N-gram) one more time that we actually

EL
did
Just add one to all the counts!

PT
MLE estimate for bigram: PMLE (wi |wi−1 ) = c(wi−1 )i
i−1
c(w ,w )
N

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 13 / 16
Laplace Smoothing (Add-one estimation)

Pretend as if we saw each word (N-gram) one more time that we actually

EL
did
Just add one to all the counts!

PT
MLE estimate for bigram: PMLE (wi |wi−1 ) = c(wi−1 )i

Add-1 estimate: PAdd−1 (wi |wi−1 ) = c(wi−1 )+V


i−1
i
i−1
c(w
c(w

,w )+1
,w )
N

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 13 / 16
Reconstituted counts as effect of smoothing

EL
Effective bigram count (c∗ (wn−1 wn ))

c∗ (wn−1 wn ) c(wn−1 wn ) + 1
=

PTc(wn−1 ) c(wn−1 ) + V
N

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 14 / 16
Comparing with bigrams: Restaurant corpus

EL
PT
N

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 15 / 16
Comparing with bigrams: Restaurant corpus

EL
PT
N

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 15 / 16
More general formulations: Add-k

c(wi−1 , wi ) + k
PAdd−k (wi |wi−1 ) =
c(wi−1 ) + kV

EL
PT
N

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 16 / 16
More general formulations: Add-k

c(wi−1 , wi ) + k
PAdd−k (wi |wi−1 ) =
c(wi−1 ) + kV

EL
c(wi−1 , wi ) + m( V1 )
PAdd−k (wi |wi−1 ) =
c(wi−1 ) + m

PT
N

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 16 / 16
More general formulations: Add-k

c(wi−1 , wi ) + k
PAdd−k (wi |wi−1 ) =
c(wi−1 ) + kV

EL
c(wi−1 , wi ) + m( V1 )
PAdd−k (wi |wi−1 ) =
c(wi−1 ) + m
Unigram prior smoothing:

PT
PUnigramPrior (wi |wi−1 ) =
c(wi−1 , wi ) + mP(wi )
c(wi−1 ) + m
N

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 16 / 16
More general formulations: Add-k

c(wi−1 , wi ) + k
PAdd−k (wi |wi−1 ) =
c(wi−1 ) + kV

EL
c(wi−1 , wi ) + m( V1 )
PAdd−k (wi |wi−1 ) =
c(wi−1 ) + m
Unigram prior smoothing:

PT
PUnigramPrior (wi |wi−1 ) =
c(wi−1 , wi ) + mP(wi )
c(wi−1 ) + m
N
A good value of k or m?
Can be optimized on held-out set

Pawan Goyal (IIT Kharagpur) Evaluation of Language Models, Basic Smoothing Week 2: Lecture 5 16 / 16

You might also like