Morphology
Morphology
• Example:
(i) When their mother signaled, the girls barried home
unhappily.
The word ‘barried’ do not exist. However,
assuming they are valid words of English,
we ‘guess’ the meaning by context and
the position of the word in the given
sentence. We do this using our general
knowledge and linguistic knowledge.
When their mother signaled, the girls
barried home unhappily.
Meaning = go
Function words:
– Are the words used to make the sentence
grammatical
– They have little lexical meaning
– Belongs to closed class category
– Used mainly for determining structure of the
sentence
Examples:
Determiners , pronouns, prepositions, auxiliary
verbs, conjunctions, articles, etc
Content words/Lexical words
Content words:
– Are the words used to convey what are the
important concepts in the sentence
– They have strong lexical meaning
– They are nouns, verbs, adjectives, adverbs, etc
– Used mainly for determining topic of the
sentence
Examples:
nouns, verbs, adjectives, adverbs, etc
Morpheme Types
free morphemes:
of, with, she, it, and, although, however, because,
then
bound morphemes:
-s, -ed, -ing, -er, -est
Forming words from Morphemes:
Inflection and Derivation
• Inflectional morphology is the process by
which a root form of a word is modified by
adding prefixes or suffixes that specify its
grammatical function but do not changes its
part-of-speech.
• We say that a lemma (root form) is inflected
(modified/combined) with one or more
morphological features to create a surface
form.
Forming words from Morphemes:
Inflection and Derivation
• Derivational morphology is the process by
which a root form of a word is modified by
adding prefixes or suffixes that specify its
grammatical function resulting in change in its
part-of-speech.
• We say that a lemma (root form) is derived
with one or more morphological features to
create a surface form.
Inflection vs. Derivation
NOUNS
PLAY = play, plays
GIRL = girl, girls
SHEEP = sheep, sheep
-s, 0 are inflectional morphemes
Derivational Morphology
Two types:
• May change the category {N,V,A,Adv}
driveV +er = driverN
eatv + able = eatableadj
girlN + ish = girlishadj
disturb V + ance = disturbance N
Nouns
play – playful [adj], replay [verb]
girl – girlish [adj], girlhood [noun]
sheep – sheepish[adj]
-ful, re-, -ish, -hood are derivational morphemes
Four-way Contrasts
• politically
• beautiful
• between
• writing
• raspberries
• unable
• nationalization
Morphology Processing and
Morphology Generation
Morphology Processing or
Morphology Analysis
• The word banks came from the root word
bank.
• Take the word banks and split it into 2
pieces :
bank + s i.e.,
(root + affixes)
This process is known as morphology
processing or morphology analysis.
Morphology Generation or
Synthesis
• We have the root word and from the root
word we should be able to produce the
word forms.
Example:
nation – national – nationalism-nationality
Compounding
• Disgracefulness
Root = grace [free morpheme]
Affixes = dis-, -ful, -ness [bound morpheme]
Affixes
• Morphemes added to free forms to make other free
forms are called affixes.
• Mainly four kinds of affixes:
1. Prefixes (at beginning) – “un-” in “unable”
2. Suffixes (at end) – “-ed” in “walked”
3. Circumfixes (at both ends) – “en—en” in enlighten
4. Infixes (in the middle) – “-um-” in kumilad [‘to be red’],
fumikas [‘to be strong’]
[ kilad = ‘red’, fikas = ‘strong’ in Bontoc language]
For instance:
1)The word "better" has "good" as its lemma.
This link is missed by stemming, as it requires a
dictionary look-up.
2)The word "walk" is the base form for word
"walking", and hence this is matched in both
stemming and lemmatisation.
Stemming
Stemming
• Stemming is the process of collapsing words into
their morphological root.
• In linguistic analysis, the stem is defined as the
analyzed base form from which all inflected forms
can be formed. It is a process of linguistic
normalization, in which the variant forms of a word
are reduced to a common form, usually root.
• For example, the terms addicted, addicting,
addictions, addictive, and addicts might be conflated
to their stem, addict.
• The process of stemming is often called conflation.
Example: Porter Stemmer, Snowball Stemmer, Wordnet
Stemmer
Stemming Utility
The process of stemming is useful in search engines for
Information retrieval
• Query expansion.
• Indexing.
• Natural language processing.
Differences between stemming and
lemmatization
Differences between stemming and
lemmatization
Stemming algorithms work by cutting off the end or
the beginning of the word, taking into account a list
of common prefixes and suffixes that can be found in
an inflected word.
This indiscriminate cutting can be successful in some
occasions, but not always, and that is why we affirm
that this approach presents some limitations. Below
is examples in English:
Form Affixes Lemma
Studies -ies Studi
Studying -ing Study
Differences between stemming and
lemmatization
Lemmatization, takes into consideration the
morphological analysis of the words. To do so,
it is necessary to have detailed dictionaries
which the algorithm can look through to link
the form back to its lemma.
Form Morphological Information Lemma
Advantages –
- Stemming error less.
- User friendly.
Problems –
- Time consuming.
- Back end updating
- Difficult to design.
Suffix Stripping Algorithms
• Suffix stripping algorithms do not rely on a
lookup table that consists of inflected forms and
root form relations.
• Instead, list of "rules" are maintained which
provide a path for the algorithm, given an input
word form to find its root form.
• Some examples of the rules include:
– if the word ends in 'ed', remove the 'ed'
– if the word ends in 'ing', remove the 'ing'
– if the word ends in 'ly', remove the 'ly’
Lemmatization Algorithms
• Involves first determining the part of speech of a
word, and then applying different normalization
rules for each part of speech.
• The part of speech is first detected prior to
attempting to find the root since for some
languages, the rules change depending on a
word's part of speech.
Hybrid Approaches
• Hybrid approaches use two or more of the
approaches described above .
• A simple example is a algorithm which first
consults a lookup table using brute force.
However, instead of trying to store the entire set
of relations between words in a given language,
the lookup table is kept small and is only used to
store a minute amount of "frequent exceptions"
like "ran => run".
• If the word is not in the exception list, apply
suffix stripping or lemmatization and output the
result