Unit-2 Aim 502

Uploaded by

Swathi Gattupalli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

119 views6 pages

Unit-2 Aim 502

Uploaded by

Swathi Gattupalli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

AIM-502: UNIT-2 WORD LEVEL ANALYSIS

2.1 Explain the usage of Unsmoothed and Smoothed N-grams

• N-grams are a type of language model that is used to predict the next word in a
sequence of words.
• Unsmoothed N-grams are N-grams that are not adjusted for the frequency of
words in the training data.
• This means that unsmoothed N-grams are more likely to predict rare words than
smoothed N-grams.
• Smoothed N-grams are N-grams that are adjusted for the frequency of words in the
training data.
• This means that smoothed N-grams are less likely to predict rare words than
unsmoothed N-grams.
• The main advantage of unsmoothed N-grams is that they are more accurate for
predicting words that are not very common.
• This is because unsmoothed N-grams do not take into account the frequency of
words in the training data, so they are more likely to predict words that are not very
common.
• The main advantage of smoothed N-grams is that they are less likely to predict
rare words.
• This is because smoothed N-grams take into account the frequency of words in the
training data, so they are less likely to predict words that are not very common.
• The main disadvantage of unsmoothed N-grams is that they can be less accurate
for predicting words that are very common.
• This is because unsmoothed N-grams do not take into account the frequency of
words in the training data, so they are more likely to predict words that are not very
common.
• The main disadvantage of smoothed N-grams is that they can be less accurate for
predicting words that are not very common.
• This is because smoothed N-grams take into account the frequency of words in the
training data, so they are less likely to predict words that are not very common.
• In general, unsmoothed N-grams are more accurate for predicting words that are
not very common, while smoothed N-grams are less likely to predict rare words.
2.2 Analyze N-grams
N-gram is a sequence of the N-words in the modeling of NLP. Consider an example of the
statement for modeling. “I love reading history books and watching documentaries”. In
one-gram or unigram, there is a one-word sequence. As for the above statement, in one
gram it can be “I”, “love”, “history”, “books”, “and”, “watching”, “documentaries”. In two-
gram or the bi-gram, there is the two-word sequence i.e. “I love”, “love reading”, or “history
books”. In the three-gram or the tri-gram, there are the three words sequences i.e. “I love
reading”, “history books,” or “and watching documentaries”. The illustration of the N-gram
modeling i.e. for N=1,2,3 is given below Figure.

1|Page
AIM-502: UNIT-2 WORD LEVEL ANALYSIS

For N-1 words, the N-gram modeling predicts most occurred words that can follow the
sequences. The model is the probabilistic language model which is trained on the
collection of the text. This model is useful in applications i.e. speech recognition, and
machine translations. A simple model has some limitations that can be improved by
smoothing, interpolations, and back off. So, the N-gram language model is about finding
probability distributions over the sequences of the word. Consider the sentences i.e.
"There was heavy rain" and "There was heavy flood". By using experience, it can be said
that the first statement is good. The N-gram language model tells that the "heavy rain"
occurs more frequently than the "heavy flood". So, the first statement is more likely to
occur and it will be then selected by this model. In the one-gram model, the model usually
relies on that which word occurs often without pondering the previous words. In 2-gram,
only the previous word is considered for predicting the current word. In 3-gram, two
previous words are considered.
In the N-gram language model the following probabilities are calculated:
P (“There was heavy rain”) = P (“There”, “was”, “heavy”, “rain”) = P (“There”) P (“was”
|“There”) P (“heavy”| “There was”) P (“rain” |“There was heavy”)

As it is not practical to calculate the conditional probability but by using the “Markov
Assumptions”, this is approximated to the bi-gram model as
P (“There was heavy rain”) ~ P (“There”) P (“was” |“'There”) P (“heavy” |“was”) P (“rain”
|“heavy”)

2.3 Describe Interpolation and Backoff-Word Classes

Interpolation and backoff-word classes are two techniques used in natural
language processing (NLP) to improve the accuracy of language models.
Interpolation is another technique in which we can estimate an n-gram probability
based on a linear combination of all lower-order probabilities. For instance, a 4-gram
probability can be estimated using a combination of trigram, bigram and unigram
probabilities. The weights in which these are combined can also be estimated by reserving
some part of the corpus for this purpose. This is done by taking a weighted average of the
predictions from the different models, with the weights being determined by the accuracy
of each model.
Backoff-word classes are a technique that is used to improve the accuracy of
language models for rare words. This is done by creating a hierarchy of word classes, with
the most common words being in the highest class. When a rare word is encountered, the
language model first tries to find a word in the same class. If no word is found in the same
class, the language model then tries to find a word in the next highest class, and so on.
Interpolation and backoff-word classes are both effective techniques for improving the
accuracy of language models.

While backoff considers each lower order one at a time, interpolation considers all the
lower order probabilities together.
However, interpolation is more computationally expensive than backoff-word classes.

2.4 Explain Part-of-Speech Tagging

2|Page
AIM-502: UNIT-2 WORD LEVEL ANALYSIS

Part-of-speech (POS) tagging is a process in natural language processing (NLP) where

each word in a text is labeled with its corresponding part of speech. This can include
nouns, verbs, adjectives, and other grammatical categories.
POS tagging is useful for a variety of NLP tasks, such as information extraction, named
entity recognition, and machine translation. It can also be used to identify the grammatical
structure of a sentence and to disambiguate words that have multiple meanings.
POS tagging is typically performed using machine learning algorithms, which are trained
on a large annotated corpus of text. The algorithm learns to predict the correct POS tag for
a given word based on the context in which it appears.
There are various POS tagging schemes that have been developed, each with its own set
of tags and rules. Some common POS tagging schemes include the Penn Treebank
tagset and the Universal Dependencies tagset.
Let’s take an example,
Text: “The cat sat on the mat.”
POS tags:
 The: determiner
 cat: noun
 sat: verb
 on: preposition
 the: determiner
 mat: noun
In this example, each word in the sentence has been labeled with its corresponding part of
speech. The determiner “the” is used to identify specific nouns, while the noun “cat” refers
to a specific animal. The verb “sat” describes an action, and the preposition “on” describes
the relationship between the cat and the mat.
POS tagging is a useful tool in natural language processing (NLP) as it allows algorithms
to understand the grammatical structure of a sentence and to disambiguate words that
have multiple meanings. It is typically performed using machine learning algorithms that
are trained on a large annotated corpus of text.
Identifying part of speech of word is not just mapping words to their respective POS tags.
Same word might have different part of speech tag based on different context. Thus it is
not possible to have common mapping for parts of speech tags.
When you have a huge corpus manually finding different part-of-speech for each word is a
scalable solution. As tagging itself might take days. This is why we rely on tool-based POS
tagging.

2.5 Differentiate Rule-based stochastic and Transformation-based tagging

Rule-based POS Tagging
One of the oldest techniques of tagging is rule-based POS tagging. Rule-based taggers
use dictionary or lexicon for getting possible tags for tagging each word. If the word has
more than one possible tag, then rule-based taggers use hand-written rules to identify the
correct tag. Disambiguation can also be performed in rule-based tagging by analyzing the
linguistic features of a word along with its preceding as well as following words. For
example, suppose if the preceding word of a word is article then word must be a noun.
As the name suggests, all such kind of information in rule-based POS tagging is coded in
the form of rules.
These rules may be either −
 Context-pattern rules

3|Page
AIM-502: UNIT-2 WORD LEVEL ANALYSIS

 Or, as Regular expression compiled into finite-state automata, intersected with

lexically ambiguous sentence representation.
We can also understand Rule-based POS tagging by its two-stage architecture −
 First stage − In the first stage, it uses a dictionary to assign each word a list of
potential parts-of-speech.
 Second stage − In the second stage, it uses large lists of hand-written
disambiguation rules to sort down the list to a single part-of-speech for each word.
Stochastic POS Tagging
Another technique of tagging is Stochastic POS Tagging. Now, the question that arises
here is which model can be stochastic. The model that includes frequency or probability
(statistics) can be called stochastic. Any number of different approaches to the problem of
part-of-speech tagging can be referred to as stochastic tagger.
The simplest stochastic tagger applies the following approaches for POS tagging −
Word Frequency Approach
In this approach, the stochastic taggers disambiguate the words based on the probability
that a word occurs with a particular tag. We can also say that the tag encountered most
frequently with the word in the training set is the one assigned to an ambiguous instance
of that word. The main issue with this approach is that it may yield inadmissible sequence
of tags.
Tag Sequence Probabilities
It is another approach of stochastic tagging, where the tagger calculates the probability of
a given sequence of tags occurring. It is also called n-gram approach. It is called so
because the best tag for a given word is determined by the probability at which it occurs
with the n previous tags.
Transformation-based Tagging
Transformation based tagging is also called Brill tagging. It is an instance of the
transformation-based learning (TBL), which is a rule-based algorithm for automatic tagging
of POS to the given text. TBL, allows us to have linguistic knowledge in a readable form,
transforms one state to another state by using transformation rules.
It draws the inspiration from both the previous explained taggers − rule-based and
stochastic. If we see similarity between rule-based and transformation tagger, then like
rule-based, it is also based on the rules that specify what tags need to be assigned to what
words. On the other hand, if we see similarity between stochastic and transformation
tagger then like stochastic, it is machine learning technique in which rules are
automatically induced from data.
Working of Transformation Based Learning(TBL)
In order to understand the working and concept of transformation-based taggers, we need
to understand the working of transformation-based learning. Consider the following steps
to understand the working of TBL −
 Start with the solution − The TBL usually starts with some solution to the problem
and works in cycles.
 Most beneficial transformation chosen − In each cycle, TBL will choose the
most beneficial transformation.
 Apply to the problem − The transformation chosen in the last step will be applied
to the problem.
The algorithm will stop when the selected transformation in step 2 will not add either more
value or there are no more transformations to be selected. Such kind of learning is best
suited in classification tasks.

4|Page
AIM-502: UNIT-2 WORD LEVEL ANALYSIS

Rule-based tagging Stochastic tagging Transformation-based

tagging
 These taggers are  This POS tagging is  We learn small set of
knowledge drive taggers. based on the simple rules and these
 The rules in Rule-based probability of tag rules are enough for
tagging.
POS tagging are built occurring.
 Development as well as
manually.  It requires training debugging is very easy in
 The information is coded corpus TBL because the learned
in the form of rules.  There would be no rules are easy to
 We have some limited probability for the understand.
number of rules words that do not exist  Complexity in tagging is
approximately around in the corpus. reduced because in TBL
there is interlacing of
1000.  It uses different testing
machine learned and
 Smoothing and language corpus (other than human-generated rules.
modeling is defined training corpus).  Transformation-based
explicitly in rule-based  It is the simplest POS tagger is much faster than
taggers. tagging because it Markov-model tagger.
chooses most frequent  Transformation-based
tags associated with a learning (TBL) does not
provide tag probabilities.
word in training corpus.
 In TBL, the training time is
very long especially on
large corpora.

2.6 Identify the Issues in PoS tagging

 The main problem with POS tagging is ambiguity. In English, many common words
have multiple meanings and therefore multiple POS. The job of a POS tagger is to
resolve this ambiguity accurately based on the context of use.
For example, the word "shot" can be a noun or a verb. When used as a verb, it could
be in past tense or past participle.
 Context: The part-of-speech tag of a word can depend on the context in which it is
used.
For example, the word "run" can be a noun (a race) or a verb (to move quickly).
 Variation: The part-of-speech tags of words can vary depending on the dialect or
style of the text.
For example, the word "gonna" can be tagged as a noun (a contraction of "going
to") or a verb (a contraction of "going to").

Despite these challenges, part-of-speech tagging is an important task in NLP.

It can be used for a variety of tasks, such as machine translation, information
retrieval, and sentiment analysis.

2.7 Compare Hidden Markov and Maximum Entropy models

5|Page
AIM-502: UNIT-2 WORD LEVEL ANALYSIS

Hidden Markov Model Maximum Entropy Model

 HMM is a generative model because  MEMM is a discriminative model. This
words as modelled as observations is because it directly uses posterior
generated from hidden states. probability P(T|W); that is, probability of a
tag sequence given a word sequence.
 It can also be said  MEMM uses conditional probability,
that HMM uses joint probability for conditioned on previous tag and current
maximizing the probability of the word word.
sequence.
 In HMM, for the tag sequence  In MEMM, we build a distribution by
decoding problem, probabilities are adding features, which can be hand
obtained by training on a text corpus. crafted or picked out by training. The idea
is to select the maximum entropy
distribution given the constraints
specified by the features.
 Not flexible to add features.  MEMM is more flexible because we can
add features such as capitalization,
hyphens or word endings, which are hard
to consider in HMM.
 MEMM allows for diverse non-
independent features.

6|Page

Multilingual Information Retrieval
No ratings yet
Multilingual Information Retrieval
18 pages
Unit-3 Aim 502
No ratings yet
Unit-3 Aim 502
14 pages
NLP 3 4 5
No ratings yet
NLP 3 4 5
105 pages
CCS369 - TSS-Unit 2
No ratings yet
CCS369 - TSS-Unit 2
56 pages
Teasing and Banter Cheat Sheets
100% (6)
Teasing and Banter Cheat Sheets
15 pages
Artificial Intelligence: Inference in First-Order Logic
No ratings yet
Artificial Intelligence: Inference in First-Order Logic
21 pages
Web Programming Unit-1 Notes
No ratings yet
Web Programming Unit-1 Notes
85 pages
Independent Component Analysis: Bhagesh Bhutani (20) Chayan Sharma (21) Deepak
No ratings yet
Independent Component Analysis: Bhagesh Bhutani (20) Chayan Sharma (21) Deepak
15 pages
Unit 3-2
100% (1)
Unit 3-2
50 pages
Unit - 3 NLP - R20
No ratings yet
Unit - 3 NLP - R20
21 pages
Natural Language Processing
No ratings yet
Natural Language Processing
36 pages
C Programming and Data Structures
No ratings yet
C Programming and Data Structures
5 pages
Unit 1
No ratings yet
Unit 1
24 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
23 pages
Natural Language Processing Revision Notes
No ratings yet
Natural Language Processing Revision Notes
4 pages
Dependency Parsing: Pawan Goyal
No ratings yet
Dependency Parsing: Pawan Goyal
38 pages
Unit 3
100% (1)
Unit 3
11 pages
Unit 3 NLP
No ratings yet
Unit 3 NLP
9 pages
PPT03-First Order Logic & Inference in FOL
No ratings yet
PPT03-First Order Logic & Inference in FOL
59 pages
Model With One-Word Context: 2vec 2vec 2vec 2vec
100% (1)
Model With One-Word Context: 2vec 2vec 2vec 2vec
17 pages
NEURAL NETWORKS and Deep Learning: Going Deep About Neural Network
No ratings yet
NEURAL NETWORKS and Deep Learning: Going Deep About Neural Network
4 pages
Word Sense Disambiguation: by Under The Guidance of
No ratings yet
Word Sense Disambiguation: by Under The Guidance of
99 pages
UNIT V Application Layer
100% (1)
UNIT V Application Layer
18 pages
Ch-4 Ensemble Learning
No ratings yet
Ch-4 Ensemble Learning
18 pages
Chap 11 12 - Practical Methodology and Applications - Heechul Lim
No ratings yet
Chap 11 12 - Practical Methodology and Applications - Heechul Lim
60 pages
Natural Language Processing
No ratings yet
Natural Language Processing
6 pages
IR - Models
100% (3)
IR - Models
58 pages
Lecture 05 - Part A First Order Logic (FOL) : Dr. Shazzad Hosain
No ratings yet
Lecture 05 - Part A First Order Logic (FOL) : Dr. Shazzad Hosain
80 pages
NLP Unit-2 Notes
No ratings yet
NLP Unit-2 Notes
45 pages
m8 Fol
No ratings yet
m8 Fol
27 pages
UNIT 3 KR Predicate Logic
No ratings yet
UNIT 3 KR Predicate Logic
53 pages
IS 7118 Unit-9 Semantics
No ratings yet
IS 7118 Unit-9 Semantics
82 pages
NLP Unit-V
No ratings yet
NLP Unit-V
30 pages
Unit 1
No ratings yet
Unit 1
35 pages
Be Computer Engineering Semester 7 2023 May Dloc III Natural Language Processing Rev 2019 C Scheme
0% (1)
Be Computer Engineering Semester 7 2023 May Dloc III Natural Language Processing Rev 2019 C Scheme
2 pages
Knowledge Representation First Order Logic
No ratings yet
Knowledge Representation First Order Logic
49 pages
Topic For The Class:: Knowledge and Reasoning
No ratings yet
Topic For The Class:: Knowledge and Reasoning
41 pages
Chapter 7
No ratings yet
Chapter 7
49 pages
CS224n: Natural Language Processing With Deep Learning
No ratings yet
CS224n: Natural Language Processing With Deep Learning
14 pages
AI Unit4 LogicAgents
No ratings yet
AI Unit4 LogicAgents
17 pages
Chapters 8 & 9 First-Order Logic: Dr. Daisy Tang
No ratings yet
Chapters 8 & 9 First-Order Logic: Dr. Daisy Tang
76 pages
Unit-3 (NLP)
No ratings yet
Unit-3 (NLP)
28 pages
AutoGen - The Automated Program Generator
No ratings yet
AutoGen - The Automated Program Generator
196 pages
Langauage Model
No ratings yet
Langauage Model
148 pages
WE Intro1
No ratings yet
WE Intro1
292 pages
English 10 Q4 Week 3
No ratings yet
English 10 Q4 Week 3
7 pages
Predicate Logic
No ratings yet
Predicate Logic
64 pages
ch9 Ensemble Learning
No ratings yet
ch9 Ensemble Learning
19 pages
Final Detailed Lesson Plan
No ratings yet
Final Detailed Lesson Plan
8 pages
Natural Language Processing
No ratings yet
Natural Language Processing
12 pages
Guidelinesforresearch Paperpresentation Collected By:: Mr. Amar Seddiki Mr. Tedj Ghomri
No ratings yet
Guidelinesforresearch Paperpresentation Collected By:: Mr. Amar Seddiki Mr. Tedj Ghomri
20 pages
Navies Bayes
No ratings yet
Navies Bayes
18 pages
Lecture 3: Text Processing & Minimum Edit Distance Algorithm
No ratings yet
Lecture 3: Text Processing & Minimum Edit Distance Algorithm
57 pages
Top-Down and Bottom-Up Parsing
No ratings yet
Top-Down and Bottom-Up Parsing
23 pages
Chapter 6
100% (1)
Chapter 6
28 pages
Unit-1 Aim 502
No ratings yet
Unit-1 Aim 502
15 pages
Application of First-Order Logic in Knowledge Based Systems PDF
No ratings yet
Application of First-Order Logic in Knowledge Based Systems PDF
7 pages
NLP Lab Tasks
No ratings yet
NLP Lab Tasks
16 pages
Lecture 1: Introduction To NLP: Understand Concepts Applications
No ratings yet
Lecture 1: Introduction To NLP: Understand Concepts Applications
32 pages
Knowledge Representation Additional Reading
No ratings yet
Knowledge Representation Additional Reading
26 pages
‎⁨"أكاديمية اللغة الإنجليزية- تعلم اللغة الإنجليزية بطريقة سهلة وممتعة"⁩
No ratings yet
‎⁨"أكاديمية اللغة الإنجليزية- تعلم اللغة الإنجليزية بطريقة سهلة وممتعة"⁩
20 pages
Syntactic Analysis
No ratings yet
Syntactic Analysis
62 pages
Tourism Quiz Bee Reviewer
No ratings yet
Tourism Quiz Bee Reviewer
12 pages
Individual Vocal Assessment Task: Singing Rubric A. Tone Quality Dok 4
No ratings yet
Individual Vocal Assessment Task: Singing Rubric A. Tone Quality Dok 4
4 pages
6CS4 AI Unit-5
No ratings yet
6CS4 AI Unit-5
65 pages
Avoiding Repetition When Communicating
No ratings yet
Avoiding Repetition When Communicating
42 pages
Verbal & Non Verbal Cues
No ratings yet
Verbal & Non Verbal Cues
17 pages
List of SQL Commands
100% (1)
List of SQL Commands
5 pages
Object Oriented Analysis and Design Project Report: Complaint Management System in Colleges
No ratings yet
Object Oriented Analysis and Design Project Report: Complaint Management System in Colleges
15 pages
Week 6: Introduction To Natural Language Processing
No ratings yet
Week 6: Introduction To Natural Language Processing
18 pages
Unit-8: Natural Language: Processing
No ratings yet
Unit-8: Natural Language: Processing
16 pages
A Handout of Toefl Preparation Short Course Programme For Health Students
No ratings yet
A Handout of Toefl Preparation Short Course Programme For Health Students
39 pages
Hope and Wish: Uses in Context
No ratings yet
Hope and Wish: Uses in Context
13 pages
Answer All Questions. Circle The Correct Answer
No ratings yet
Answer All Questions. Circle The Correct Answer
10 pages
Movers Class 2
No ratings yet
Movers Class 2
19 pages
Discourse Markers
100% (1)
Discourse Markers
2 pages
Redo Request Form - Concept Recovery 1
No ratings yet
Redo Request Form - Concept Recovery 1
2 pages
Plano Analitico de Ingles, 11 e 12 Cl. I Trim 2024
No ratings yet
Plano Analitico de Ingles, 11 e 12 Cl. I Trim 2024
8 pages
Canvas Lesson Plan
No ratings yet
Canvas Lesson Plan
3 pages
Demo 1 The Lab Streaming Layer
No ratings yet
Demo 1 The Lab Streaming Layer
10 pages
PPS Assignment Questions
No ratings yet
PPS Assignment Questions
6 pages
Lab I TENSOR FLOW AND KERAS
No ratings yet
Lab I TENSOR FLOW AND KERAS
3 pages
Our Iceberg Is Melting
No ratings yet
Our Iceberg Is Melting
15 pages
Test On Past Modals
No ratings yet
Test On Past Modals
2 pages
Showbook (2) CR
No ratings yet
Showbook (2) CR
2 pages
Simplification 2
No ratings yet
Simplification 2
5 pages
FL Active and Passive Voice
No ratings yet
FL Active and Passive Voice
13 pages
Practice Unit 6. Together 1
No ratings yet
Practice Unit 6. Together 1
3 pages
WLP Q4 W1 English
No ratings yet
WLP Q4 W1 English
6 pages
Textbook of Engineering Chemistry
From Everand
Textbook of Engineering Chemistry
C. Parameswara Murthy
No ratings yet