NLP assignment notes
NLP assignment notes
UNIT 1
1. Comparison: Boolean Model vs Vector Space Model vs
Probabilistic Model (for NLP & IR)
🔷 Boolean Model
Strengths:
o Simple to implement.
Weaknesses:
o No ranking of results.
Strengths:
Weaknesses:
o Ignores relationships between terms (no semantics).
🔷 Probabilistic Model
Strengths:
Weaknesses:
🔹 Rule-Based Models
Strengths:
Examples:
🔹 Statistical Models
Types:
Strengths:
Weaknesses:
Examples:
Strengths:
Weaknesses:
Examples:
🔷 Definition:
🔷 Advantages:
🔷 Challenges:
UNIT 2
1. Key Differences between Phonetics and Phonology
🔹 Morphology:
1. Derivation:
2. Inflection:
3. Compounding:
5. Clipping:
o Shortening a longer word.
o advertisement → ad
6. Blending:
8. Reduplication:
o bye-bye, tick-tock
🔹 What is an FST?
FSTs take an input word and break it down into root + affix(es).
🔹 Example:
✅ An FST can:
🔹 Practical Scenarios:
1. Spell-checkers:
2. Text-to-speech systems:
3. Search engines:
4. Machine Translation:
UNIT 3
1. Tokenization in NLP
✅ Definition:
✅ Steps Involved:
3. Sentence Segmentation:
4. Word Tokenization:
5. Punctuation Handling:
7. Language-Specific Rules:
✅ Importance:
Essential for:
o Text analysis
o Information retrieval
o Sentiment analysis
o Machine translation
When It’s
Model Type Description Examples
Effective
If a word Low-resource
Uses hand-crafted
ends with settings,
Rule-Based linguistic rules to assign
"ly", tag as languages with
PoS tags.
adverb. rich morphology.
barks." →
models like HMMs, CRFs
High chance large, labeled
(Statistical) based on word and tag
"dog" = corpora.
frequencies.
noun
When working
WordNet or
Lexical with standard
Tags are assigned based large
(Dictionary vocabulary and
on dictionaries/lexicons. tagged
-Based) limited context
corpora
needed.
✅ Examples:
Rule-Based:
Stochastic:
Lexical:
✅ Definition:
Person names
Organizations
Locations
Dates
Quantities
Monetary values
Events, etc.
✅ Example:
text
CopyEdit
NER Output:
California → Location
✅ Applications:
Journalis
Quickly tagging people, places, events in articles.
m
UNIT 4
1. What is Syntactic Parsing?
Aids in:
o Machine Translation
o Information Extraction
o Question Answering
o Grammar Checking
🟩 3. Types of Parsing
Example:
scss
CopyEdit
(S
(VP jumps
(PP over
✅ B. Dependency Parsing
jumps → root
🟩 4. Parsing Techniques
✅ A. Rule-Based Parsing
✅ B. Statistical Parsing
Uses:
✅ D. Neural Parsing
Libraries:
spaCy
AllenNLP
Examples:
rust
CopyEdit
S -> NP VP
NP -> Det N
VP -> V NP
V -> 'chased'
Sentence:
mathematica
CopyEdit
/ \
NP VP
/\ /\
Det N V NP
| | | /\
| |
the cat
Summarization
Semantics (meaning)
Named Entities
Morphological information
Coreference links
Dependency relations
o Part-of-speech taggers
o Parsers
Annotation
Description Example
Type
"bank" →
Meaning-based roles or
Semantic financial vs
senses
river
Annotation
Description Example
Type
Universal
Multilingual, Cross-lingual
Dependencies
dependency parsing NLP
(UD)
General
One of the first
Brown Corpus linguistic
annotated corpora
research
Raw Sentence:
swift
CopyEdit
[jumps/VBZ]VP
[over/IN the/DT lazy/JJ dog/NN]PP
less
CopyEdit
/ \
NP VP
/|\ |
DT JJ NN jumps
🟩 6. Benefits in NLP
o Coreference Resolution
o Machine Translation
✅ What is a Treebank?
✅ Types of Treebanks:
✅ Examples:
o Machine translation
o Grammar correction
o Question answering
Definiti General parsing using A CFG where each production rule has
on machine-learned models an associated probability
✅ CFG Basics:
S → NP VP
VP → V NP | V
V → "chased" | "saw"
✅ Example Sentence:
/ \
NP VP
/|\ /\
Det Adj N V NP
Det N
a cat
python
CopyEdit
import nltk
grammar = CFG.fromstring("""
S -> NP VP
""")
parser = nltk.ChartParser(grammar)
# Input sentence
tree.pretty_print()
UNIT 5
1. First-Order Logic (FOL) and Description Logics (DLs)
Components:
o Quantifiers:
Example:
o Concept: Person
o Role: hasChild
Common Roles:
o Agent: The doer of the action (John in “John kicked the ball”)
✅ Selectional Restrictions
Examples:
✅ Importance:
✅ Definition:
✅ Example:
Word: “bank”
o “I deposited money at the bank.” → financial institution
✅ WSD Approaches:
1. Knowledge-Based:
2. Supervised Learning:
3. Unsupervised Learning:
4. Neural Approaches:
✅ Applications:
PoS Tagging
✅ Example:
Sentence:
"The quick brown fox jumps over the lazy dog."
Tagged Output:
The/DT
quick/JJ
brown/JJ
fox/NN
jumps/VBZ
over/IN
the/DT
lazy/JJ
dog/NN
quickly,
RB Adverb
silently
Preposition/Subordinating
IN on, over
Conjunction
DT Determiner the, a
Tag Meaning Example
✅ A. Rule-Based Tagging
Example:
If "can" is preceded by a noun and followed by a verb, it’s likely a modal
verb.
✅ C. Lexical/Dictionary-Based Tagging
o RNNs / LSTMs
NLTK (Python)
spaCy
Stanford NLP
Flair
BERT-based taggers