NLP_Lecture_9_and_10_Week_5
NLP_Lecture_9_and_10_Week_5
Introduction to Parts-of-Speech
Dionysius Thrax of Alexandria (c. 100 B.C.) wrote a grammatical sketch of Greek (a "techne¯")
that summarized linguistic knowledge of his era. This work significantly influenced modern
linguistic vocabulary, introducing terms such as:
Syntax
Diphthong
Clitic
Analogy
Earlier scholars like Aristotle and the Stoics had their own lists, but Thrax’s became the
standard.
The tradition continued even into modern culture, as seen in Schoolhouse Rock (1973),
an educational TV series that taught grammar through music.
Grammar Rock, a segment of Schoolhouse Rock, adhered to an eight-part classification,
albeit substituting adjective and interjection for participle and article, demonstrating the
continued importance of these categories.
Modern computational linguistics employs expanded tagsets for more precise classification:
These extended classifications allow for detailed linguistic analysis and computational
applications.
Parts-of-speech (POS), also known as word classes, morphological classes, or lexical tags,
provide valuable linguistic insights:
Word Prediction: POS knowledge helps anticipate subsequent words, e.g., possessive
pronouns (my, your) are followed by nouns, whereas personal pronouns (I, you, he) are
typically followed by verbs.
Speech Recognition: Knowing a word’s POS aids pronunciation; e.g., "content" is
pronounced as CONtent (noun) vs. conTENT (adjective).
Stemming in Information Retrieval (IR): POS aids in selecting key terms for document
indexing and retrieval.
Parsing and Disambiguation: POS tagging enhances parsing efficiency, aids in word-
sense disambiguation, and improves named entity recognition (e.g., detecting names,
dates, times).
Introduction
English words are classified into various categories known as word classes or parts of speech.
These classifications are based on syntactic distribution and morphological properties rather than
purely semantic meaning. Word classes are broadly divided into open classes (which allow new
words to be added) and closed classes (which have fixed membership).
1. Open Classes
Open classes are dynamic and continually expand as new words are created or borrowed from
other languages. They include nouns, verbs, adjectives, and adverbs.
1.1 Nouns
Nouns typically name people, places, things, or abstract concepts. They can function as
subjects or objects in a sentence.
Morphological Properties: Can take plural forms (goat → goats) and possessives
(IBM’s revenue).
Syntactic Properties: Occur with determiners (a goat, the ship).
Types of Nouns:
1.2 Verbs
Morphological Forms: Include base form (eat), third-person singular (eats), past tense
(ate), past participle (eaten), and progressive form (eating).
Syntactic Role: Often function as predicates in sentences.
Auxiliary Verbs
A subtype of verbs that assist the main verb by adding tense, aspect, mood, or voice.
1.3 Adjectives
Common Semantic Categories: Color (red, blue), Age (young, old), Value (good, bad).
Syntactic Role: Often occur before nouns (a red car) or after copula verbs (the car is
red).
1.4 Adverbs
Types:
o Locative Adverbs (home, here) indicate location.
o Degree Adverbs (very, extremely) indicate intensity.
o Manner Adverbs (slowly, carefully) describe how an action occurs.
o Temporal Adverbs (yesterday, soon) specify time.
2. Closed Classes
Closed classes contain a fixed number of words that rarely change over time. These include
prepositions, determiners, pronouns, conjunctions, auxiliary verbs, particles, numerals,
and interjections.
2.1 Prepositions
Prepositions occur before noun phrases and indicate spatial, temporal, or other relationships.
2.2 Determiners
2.3 Pronouns
Pronouns replace noun phrases and function as references to people, things, or ideas.
Types:
o Personal Pronouns: (I, you, he, she, it, we, they)
o Possessive Pronouns: (my, your, his, her, its, our, their)
o Wh-Pronouns: (who, whom, what, which)
2.4 Conjunctions
2.5 Particles
2.6 Numerals
2.7 Interjections
Introduction
Tagging words with their appropriate part-of-speech (POS) is a fundamental task in natural
language processing (NLP). Different tagsets are used for this purpose, evolving from the
original Brown corpus tagset. This document explores major English tagsets, their applications,
and challenges in part-of-speech tagging.
Contains 45 tags.
Used in corpora such as Brown Corpus, Wall Street Journal Corpus, and
Switchboard Corpus.
Its small size makes it one of the most widely used tagsets.
Example:
o (5.1) The/DT grand/JJ jury/NN commented/VBD on/IN a/DT number/NN of/IN
other/JJ topics/NNS ./.
Contains 61 tags.
Used in the British National Corpus (BNC).
Developed by Lancaster UCREL’s CLAWS (Constituent Likelihood Automatic
Word-tagging System).
3. Tagging Challenges
3.1 Overlap Between Prepositions (IN), Particles (RP), and Adverbs (RB)
Introduction
Rule-based part-of-speech (POS) tagging is one of the earliest methods developed for assigning
POS tags to words in a text. The fundamental architecture follows a two-stage process:
This method, although initially developed in the 1960s, has been refined over time. One of the
most comprehensive rule-based tagging approaches is the Constraint Grammar (EngCG)
approach developed by Karlsson et al. (1995a).
EngCG Tagger
The EngCG tagger (Voutilainen, 1995, 1999) is a rule-based POS tagger that operates using:
EngCG Lexicon
Each word is processed using the two-level lexicon transducer to obtain all possible
POS tags.
Example:
o Sentence: Pavlov had shown that salivation...
o Possible Tags:
Rule-Based Disambiguation
EngCG applies rules in a negative manner, meaning incorrect interpretations are removed.
Rule Logic:
Example Sentences:
Enhancements in EngCG
For more details, refer to Karlsson et al. (1995b) and Voutilainen (1999).