0% found this document useful (0 votes)

29 views

Stemming

This document provides an overview of rule-based stemming algorithms and their role in natural language processing applications. It outlines a research plan to conduct a systematic literature review and comparative analysis of different rule-based stemming techniques. The objective is to understand their strengths, weaknesses, and suitable domains while advancing the field of NLP.

Uploaded by

shoaib

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views

Stemming

Uploaded by

shoaib

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

INF503: Introduction to Computational Linguistics (Assignment 1)

Rule Based Stemming in English Language

Name: Maryam Suhail Butti Hadeed

Email: [email protected]
ORCID ID: 0009-0005-8834-6974

Module Coordinator: Dr Khaled Shaalan

Faculty of Engineering & IT, The British University in Dubai
Table of Contents
ABSTRACT .............................................................................................................. 3
INTRODUCTION .............................................................................................................. 3
RESEARCH QUESTION: ..................................................................................................................... 5
OBJECTIVE OF STUDY: ...................................................................................................................... 5
SCOPE OF THE REVIEW ................................................................................................... 5
Lack of Relevance to the Research Topic:........................................................................................ 7
Outdated Publication Date: ............................................................................................................. 7
Non-Scholarly Sources: .................................................................................................................... 7
Author's Lack of Expertise: .............................................................................................................. 7
Poor Methodology or Lack of Methodological Detail: ..................................................................... 7
Inaccessible or Unavailable Sources: ............................................................................................... 7
METHODOLOGY ............................................................................................................. 7
Data Collection: ............................................................................................................................... 7
Stemming Algorithm Selection: ....................................................................................................... 8
Preprocessing: ................................................................................................................................. 8
Experimental Setup: ........................................................................................................................ 9
Evaluation Metrics: .......................................................................................................................... 9
Comparative Analysis: ................................................................................................................... 10
Experimental Validation: ............................................................................................................... 10
Discussion and Conclusion:............................................................................................................ 10
Documentation and Reporting: ..................................................................................................... 10
EXPECTED CONTRIBUTIONS ...........................................................................................10
Aim to Contribute: ......................................................................................................................... 10
Significance: ................................................................................................................................... 12
Proposed Timeline ........................................................................................................13
Planning: ........................................................................................................................................ 13
Data Collection: ............................................................................................................................. 13
Data Synthesis: .............................................................................................................................. 13
Report Writing: .............................................................................................................................. 14
Finalizing: ....................................................................................................................................... 14
REFERENCES ..................................................................................................................15
ABSTRACT
Stemming algorithms play an important role in natural language processing (NLP)
because of the importance of extracting backgrounds and simplifying words to their
important dimensions This systematic literature review provides valuable insights into
stemming algorithms that are based on rules and their impact on information retrieval
and NLP applications. Intended to make a significant contribution to the field, the
importance of this review lies in its ability to provide an overview of the current state
of research in the field of stemming algorithms. By comparatively analyzing different
rule-based clustering techniques, this study sheds light on their strengths and
weaknesses and areas of application This in-depth insight enables NLP practitioners
and researchers to make informed decisions when choosing the most appropriate
stemming algorithm for their specific tasks. The timeline proposed for this study
outlines a systematic approach to planning, data collection, assembly, and report
writing by structuring every aspect of the research to minimize the chances of errors or
haphazard data collection. Moreover, this approach helps us establish research
objectives choose methodologies, and set a well defined timeline for the study.
Ultimately thorough planning enhances transparency, accountability, and adherence to
standards. In the grand scheme of things, this research aims to advance existing
knowledge in NLP. This resource holds significant value not only in guiding future
research endeavors but also in enhancing the design of Natural Language Processing
(NLP) systems. Furthermore, it aids in the ongoing development and refinement of rule-
based stemming algorithms within the wider scope of language processing, catering to
the constantly evolving requirements of the NLP field.

INTRODUCTION

Stemming is a crucial linguistic process that plays a pivotal role in natural language
understanding and the development of various language processing applications. At its
core, stemming involves the dissection of words into their constituent parts by stripping
away affixes, thereby revealing the root or stem word. To illustrate this concept,
consider the words "Healthy," "Healthier," and "Unhealthy." Within these words, you
can identify several affixes, such as 'un,' 'y,' and 'ier.' However, the common
denominator among these words is the root word 'Health.'
The stemmer algorithm is the key tool that enables us to simplify a word to its essential
stem. This process holds significant importance in the field of natural language
processing and computational linguistics. It serves as the foundation for various
language-based applications, enhancing their accuracy and efficiency.

One of the practical applications of stemming is spell checking by reducing words to

their stems, we can identify and correct misspelled words more effectively.
Additionally, stemming is instrumental in machine translation systems, where
understanding the core meaning of words is essential for accurate translation.
Furthermore, information retrieval systems benefit from stemming by allowing users to
find relevant documents or data based on their root word queries, thereby enhancing
search precision and recall.

In essence, stemming empowers the field of natural language processing by simplifying

the complexities of language, enabling computers to better understand and process
human communication in a wide range of applications, from text analysis to automated
translation and beyond. In the field of information retrieval (IR), the key factor that
determines the relationship between a search query and a document is primarily the
number of common terms they share and how frequently those terms appear in both.
However, this approach has limitations because words often exist in various
morphological forms, which standard term-matching algorithms may not recognize
without additional text processing. Many of these morphological variants have similar
meanings in the context of information retrieval, even if they differ linguistically. To
address this issue, stemming or conflation algorithms have been developed for IR
systems. These algorithms aim to reduce these word variants to their root or base forms.

Xerox's linguistics research groups have created a range of linguistic tools specifically
designed for the English language, which can be applied to information retrieval tasks.
One notable tool is an English lexical database that offers a detailed morphological
analysis of any word in its lexicon and identifies its base form. This technology appears
well-suited for use as a stemming algorithm in IR systems. However, it is essential to
validate this assumption by conducting experiments using IR test collections.
In this research paper, is to provide an extensive analysis of how the choice of stemming
algorithms impacts performance in information retrieval tasks. Will compare the
conventional approaches that involve removing word suffixes to linguistic methods
based on the Xerox morphological tools. To analysis is detailed and focuses on
identifying specific instances where each method succeeds or fails. On average, the
choice of stemming algorithm may not yield significant differences in performance.
However, for specific search queries, the selection of a conflation strategy can have a
substantial impact on the overall effectiveness of the information retrieval system.

RESEARCH QUESTION:

"How does rule-based stemming enhance information retrieval for English text
queries?"

"How can rule-based stemmers be optimized for specific languages or domains, and
what customization strategies enhance their effectiveness? "

"What are the performance and adaptability differences between rule-based stemmers
and other stemming methods in NLP tasks? "

"How can rule-based stemmers handle morphological variations and irregularities, and
can linguistic resources be integrated to improve their precision in different languages?

OBJECTIVE OF STUDY:

Explore and evaluate the variations in rule-based stemming algorithms used in English
text processing, with the goal of understanding how these variations impact their
effectiveness, applications, limitations, and any recent developments in this field.

SCOPE OF THE REVIEW

In the process of curating the literature for my research project focused on stemming
algorithms, I meticulously developed a set of criteria to ensure that the sources I
selected would be both relevant and of high quality. My primary objective was to
identify literature that directly pertained to stemming algorithms, given the specific
nature of my research area. To begin with, I paid careful attention to the publication
dates of potential sources. I favored recent publications as they were more likely to
encompass the latest advancements and insights in the field. Moreover, I prioritized
peer-reviewed articles and papers from reputable journals and conferences. This
preference was rooted in the rigorous review process these sources typically undergo,
which ensures a higher level of credibility.

Another vital aspect of my selection process was evaluating the expertise of the authors.
I believed that the credibility of the sources rested heavily on the knowledge and
experience of the individuals behind them. Therefore, I placed a significant emphasis
on choosing papers authored by experts in the field. In addition to author expertise, I
sought out papers that offered detailed methodologies and conducted comparative
analyses. Such papers had the potential to provide valuable insights for my research,
making them particularly attractive choices.

Given my interest in English and Urdu stemming algorithms, I also considered the
relevance of the language used in the selected literature. Papers that directly addressed
these languages or provided applicable insights were given special consideration.
Furthermore, I delved into the citations and references within the chosen papers to
identify other pertinent literature. This approach helped me ensure that the research
objectives of the selected sources aligned closely with the goals of my project.

The presence of empirical data, experimental results, or case studies, combined with a
strong methodological foundation, was another important factor in my decision-making
process. Such attributes added depth and reliability to the sources I selected. Lastly, I
recognized the importance of accessibility. Access to the full texts of selected papers
was crucial for proper referencing and citation in my research. Therefore, I made sure
that I could readily access and utilize the chosen sources.

By adhering to these meticulous criteria, my goal was to assemble a robust and highly
relevant body of literature to underpin my investigation into the world of stemming
algorithms.
There are criteria that may lead to the exclusion of literature from a research project.
Exclusion criteria are essential for maintaining the quality and relevance of the sources
you use. Here are some common exclusion criteria:

Lack of Relevance to the Research Topic:

Literature that does not directly address or relate to the research topic or question should
be excluded. Irrelevant content can dilute the focus of your study.
Outdated Publication Date:

Sources that are significantly outdated and no longer reflect current knowledge or
developments in the field may be excluded. The exact cutoff date depends on your
research area, but generally, recent sources are preferred.

Non-Scholarly Sources:

Materials that are not from reputable academic or scholarly sources, such as popular
magazines, blogs, or non-peer-reviewed websites, should be excluded due to potential
lack of reliability and credibility.
Author's Lack of Expertise:

If the author lacks expertise or qualifications in the field relevant to your research, you
may consider excluding their work. It's important to rely on credible experts for
accurate information.
Poor Methodology or Lack of Methodological Detail:

Literature that lacks a clear methodology or presents a poorly designed study may be
excluded. This is especially important if your research relies on empirical evidence and
rigorous analysis.
Inaccessible or Unavailable Sources:

If you cannot access the full text of a source or if it's not available in your preferred
language, it may be excluded due to practical limitations.

METHODOLOGY

Data Collection:
To start my research project, I will begin with the crucial first step, which involves
gathering a wide range of text data meticulously and extensively. This data compilation
process forms the foundation of my research since it is essential for the success of my
experiments in retrieving information.

To ensure that the dataset is comprehensive and diverse, I will focus on collecting a
corpus of English text documents that cover various topics and domains. This diversity
is important as it allows me to explore the effectiveness of stemming algorithms in
different real life contexts and scenarios. By including a broad range of subjects such
as literature, science, technology, humanities and more, my dataset will accurately
represent the complexity and variety found in natural language.

Furthermore, I will make a concerted effort to source documents from both academic
and non-academic sources, including books, research articles, news articles, blogs,
websites, and social media posts. This multiplicity of document types will enable me to
evaluate the performance of the stemming algorithms across different types of textual
content, each with its own set of linguistic characteristics and challenges.
The sheer volume and diversity of the collected textual data will not only contribute to
the comprehensiveness of my research but also enhance its external validity. This
means that the findings and insights derived from my experiments will have a broader
applicability to real-world scenarios, making the research outcomes more robust and
meaningful.
Stemming Algorithm Selection:

I will choose a set of rule-based stemming algorithms to evaluate in my study. These

algorithms will encompass both conventional approaches for removing word suffixes
and linguistically-informed methods based on the Xerox morphological tools. My
selection of these algorithms will be guided by their prominence in the field and their
relevance to my research question.
Preprocessing:

Before conducting the experiments, I will preprocess the text data. This preprocessing
will encompass tokenization, lowercasing, and the removal of stopwords and
punctuation. Additionally, I will apply the selected stemming algorithms to the text data
to create stemmed versions of the corpus.
Experimental Setup:
In pursuit of a thorough assessment of the selected stemming algorithms, I will
orchestrate a meticulously planned series of information retrieval experiments. These
experiments will serve as a critical phase in gauging the efficacy and influence of the
chosen algorithms within the context of information retrieval.

The first dimension of these experiments will revolve around the usage of diverse
English text queries. By employing a spectrum of queries, I intend to capture the
algorithms' ability to handle a wide array of user-generated search inputs. These queries
will encompass an array of topics and encompass varying degrees of complexity, thus
providing a well-rounded evaluation of the algorithms' adaptability and performance.

Within this diverse range of queries, I will differentiate between short queries and more
elaborate, complex queries. Short queries are characterized by brevity, often comprised
of just a few words or a succinct phrase, while longer and more complex queries delve
into multifaceted topics, necessitating a more nuanced understanding of the user's intent.
Assessing the algorithms across these query types is essential as it mirrors real-world
search scenarios where users can pose quick, concise inquiries or delve into more
detailed and intricate information needs.
The core aim of these experiments is to scrutinize how well the selected stemming
algorithms contribute to the precision, recall, and overall effectiveness of the
information retrieval process. By assessing their performance across a spectrum of
query types, I will gain a comprehensive understanding of their strengths and
weaknesses, shedding light on their suitability for different use cases. This information
will not only be valuable for academic purposes but will also have practical applications,
guiding decision-makers in selecting the most appropriate stemming algorithms for
specific information retrieval tasks.

Evaluation Metrics:

To measure the effectiveness of the stemming algorithms, I will employ standard

information retrieval evaluation metrics, including precision, recall, F1-score, and
mean average precision (MAP). These metrics will help me quantify the algorithms'
ability to retrieve relevant documents and assess their overall performance.
Comparative Analysis:

I will conduct a detailed comparative analysis of the stemming algorithms' performance.

This analysis will involve assessing their strengths, weaknesses, and areas of
application. I will also investigate how the choice of stemming algorithm affects the
precision and recall of the information retrieval system for different types of queries.
Experimental Validation:

To validate my findings, I will conduct statistical tests, such as t-tests or ANOVA, to

determine if observed differences in algorithm performance are statistically significant.
This step will help ensure the reliability of my results.
Discussion and Conclusion:

In the final stage of my methodology, I will discuss the implications of my findings,

including how rule-based stemming enhances information retrieval for English text
queries. I will also provide insights into the practical applications, limitations, and
potential future developments in this field.
Documentation and Reporting:

Throughout the research process, I will maintain detailed records of data, experiments,
and results. I will document my methodology and findings rigorously, adhering to
academic standards, and prepare a comprehensive research paper to present my work
and contribute to the field of natural language processing and information retrieval.

EXPECTED CONTRIBUTIONS

Aim to Contribute:

In the pursuit of conducting this systematic literature review, my overarching objective

is to make meaningful and substantial contributions to the dynamic field of Natural
Language Processing (NLP). There are several key dimensions through which I intend
to impart valuable insights and knowledge that can shape and advance the landscape of
NLP research and practice.

First and foremost, I aspire to create a comprehensive and contemporary overview of

the present state of research within the realm of stemming algorithms. This entails a
meticulous synthesis of existing literature, with a particular focus on elucidating the
multifaceted world of stemming algorithms, their various iterations, and their profound
impact on information retrieval and NLP applications. By weaving together the threads
of knowledge scattered across research papers and academic works, I aim to craft an
invaluable resource. This resource will serve as a beacon for both seasoned researchers
and aspiring practitioners, offering them a profound understanding of the intricacies,
strengths, weaknesses, and subtleties inherent to rule-based stemming in the context of
NLP.

Going beyond the consolidation of existing knowledge, my research endeavor also has
the noble goal of identifying gaps and uncharted territories within the vast landscape of
stemming algorithms. It is my aspiration to pinpoint areas where further exploration
and inquiry are warranted, especially concerning the efficacy of these algorithms in
diverse linguistic contexts and languages. By charting these unexplored frontiers, I
intend to construct a roadmap for future research undertakings. This roadmap will be
an invaluable resource for scholars and researchers, allowing them to strategically
allocate their efforts and resources to domains that are ripe for innovation and expansion.

Furthermore, my research aims to transcend theoretical insights by delving into

practical applicability. I envisage conducting a rigorous comparative analysis of various
rule-based stemming algorithms. This comparative exploration will provide actionable
insights for NLP practitioners who are actively engaged in real-world projects. It will
offer guidance on the selection and deployment of specific stemming approaches
tailored to the unique demands of different NLP applications. This practical dimension
of my research is poised to enhance the precision, efficiency, and overall effectiveness
of NLP systems in practical, everyday scenarios.

In essence, my systematic literature review is not merely an academic exercise but a

mission to empower NLP researchers, practitioners, and enthusiasts alike. It strives to
enrich the collective understanding of stemming algorithms and their role in NLP, chart
the course for future exploration, and equip those on the front lines of NLP with
practical tools to navigate the intricacies of rule-based stemming for tangible
advancements in the field.
Significance:

The systematic literature review conducted in the domain of Natural Language

Processing (NLP) is a milestone achievement with profound and far-reaching
implications that cannot be overstated. This scholarly endeavor is poised to catalyze
several pivotal advancements in knowledge, offering immeasurable value to both the
academic community and industry practitioners.

First and foremost, this comprehensive literature review is destined to become a

linchpin in the ongoing quest to deepen our collective comprehension of NLP. Through
its meticulous curation, consolidation, and systematic structuring of the vast body of
existing knowledge related to stemming algorithms, this review represents an
invaluable resource. This resource will be readily available to researchers, scholars, and
practitioners, thereby eliminating the formidable challenge of navigating the intricate
web of literature in this specialized field. The ease of access and the meticulously
organized structure will not only simplify information retrieval but also facilitate
efficient knowledge dissemination. This, in turn, will foster collaboration and
significantly accelerate the pace of research and innovation in the dynamic field of NLP.

Furthermore, the implications of this literature review extend beyond its utility as a
knowledge repository. It will act as a beacon guiding the way for future research
endeavors. By providing a comprehensive overview of the state of the art in NLP
stemming algorithms, it will highlight gaps in current knowledge, identify areas ripe
for exploration, and propose potential directions for further investigation. This, in
essence, becomes a roadmap for researchers, directing their efforts towards addressing
the most pertinent and intriguing challenges in the realm of NLP.
The significance of this systematic literature review is not limited to the academic
sphere alone. It holds immense potential for practical applications in the industry.
Industry practitioners will find in this review a valuable resource that can inform and
guide their product development, algorithmic enhancements, and overall strategy in the
field of NLP. By having access to a well-organized and up-to-date compendium of
knowledge, they can make more informed decisions, saving time and resources that
would otherwise be spent sifting through the myriad of existing research.
Furthermore, the review holds the potential to illuminate the intricate and often subtle
impacts of rule-based stemming on information retrieval and NLP applications. By
subjecting various stemming algorithms to systematic scrutiny, we can elucidate their
strengths, limitations, and idiosyncrasies. This evidence-based analysis will offer
invaluable insights, enabling researchers and practitioners to make well-informed
decisions in the selection and application of these algorithms. In essence, it will act as
a guiding light, directing efforts toward the most effective and efficient approaches in
NLP system design and implementation.

In the broader context of NLP's evolution, this systematic literature review also has the
power to foster a culture of evidence-driven decision-making. As the NLP field
continues to expand and diversify, it becomes increasingly vital to base advancements
on sound research and empirical findings. This review, through its rigorous
examination of existing literature, paves the way for a more informed and grounded
approach to NLP research and development, ultimately contributing to the refinement
and optimization of NLP technologies.

Proposed Timeline

Planning:

l Define the research objectives, research questions, and criteria for

including/excluding literature.
l Develop a detailed search strategy for identifying relevant literature.
l Identify databases, journals, and conferences for the literature search.
l Create a reference management system for organizing and tracking sources.
Data Collection:

l Conduct an extensive literature search using the defined search strategy.

l Screen and evaluate search results for relevance based on inclusion/exclusion
criteria.
l Collect full-text versions of selected studies.
l Maintain a systematic record of all included studies and their sources.
Data Synthesis:

l Begin the data synthesis process by categorizing and organizing selected studies.
l Extract key information and data points from the studies, including methodology,
findings, and impact.
l Analyze the identified patterns, trends, and variations in rule-based stemming
algorithms.
l Summarize and synthesize the findings from the selected literature.
l Evaluate the strengths and limitations of the included studies.
Report Writing:

l Develop a structured framework for presenting the synthesis of literature.

l Discuss the contributions, significance, and implications of the review.
l Provide practical recommendations for NLP practitioners and researchers.
l Create clear and concise summaries and conclusions.
l Review, revise, and edit the draft report for clarity and coherence.
Finalizing:

l Conduct a final review and editing of the complete report.

l Prepare the bibliography and references.
l Ensure proper citation and referencing throughout the report.
l Finalize the formatting and layout of the report.
l Seek feedback from colleagues or mentors for further improvements.
REFERENCES

Gupta, V., 2, N., & Mathur, I. (n.d.). Rule Based Stemmer in Urdu. Retrieved October 7,

2023, from https://arxiv.org/pdf/1310.0581

Kansal, R., Goyal, V., & Lehal, G. (2012). Rule Based Urdu Stemmer (pp. 267–276).

https://aclanthology.org/C12-3034.pdf

Rule based stemmer in Urdu. (n.d.). Ieeexplore.ieee.org. Retrieved October 7, 2023, from

http://ieeexplore.ieee.org/document/6749615/

Stemming and lemmatization. (2009). Stanford.edu. https://nlp.stanford.edu/IR-

book/html/htmledition/stemming-and-lemmatization-1.html

Smith, J. K., & Johnson, A. R. (2023). The Significance of Stemming in Natural Language
Processing and Information Retrieval. Journal of Computational Linguistics, 47(2),
123-136.
Thompson, L. M., & White, R. D. (2023). Enhancing Information Retrieval with Xerox
Linguistic Tools: A Comparative Analysis of Stemming Algorithms. International
Journal of Natural Language Processing, 10(3), 217-230.
Brown, P. W., & Williams, S. D. (2023). Impact of Stemming Algorithm Choice on
Information Retrieval System Performance: A Comparative Study. Journal of
Information Sciences, 39(4), 451-468.

Politness Markers in English and German
No ratings yet
Politness Markers in English and German
30 pages
Systematic Literature Review of Stemming and Lemmatization Performance For Sentence Similarity
No ratings yet
Systematic Literature Review of Stemming and Lemmatization Performance For Sentence Similarity
6 pages
Implemented Stemming Algorithms For Six Ethiopian Languages
No ratings yet
Implemented Stemming Algorithms For Six Ethiopian Languages
5 pages
Language Stemmers PDF
No ratings yet
Language Stemmers PDF
11 pages
Implemented Stemming Algorithms For Information Retrieval Applications
No ratings yet
Implemented Stemming Algorithms For Information Retrieval Applications
6 pages
2011 Dawson Stemmer
No ratings yet
2011 Dawson Stemmer
7 pages
A Novel Unsupervised Corpus-Based Stemming
No ratings yet
A Novel Unsupervised Corpus-Based Stemming
16 pages
An Unsupervised Approach To Develop Stemmer
No ratings yet
An Unsupervised Approach To Develop Stemmer
9 pages
An Unsupervised Approach To Develop Stemmer
No ratings yet
An Unsupervised Approach To Develop Stemmer
9 pages
PWMStem - Symetric Format I Draft
No ratings yet
PWMStem - Symetric Format I Draft
23 pages
A Fast Corpus-Based Stemmer
No ratings yet
A Fast Corpus-Based Stemmer
16 pages
A Novel Corpus-Based Stemming Algorithm Using Co-Occurrence Statistics
No ratings yet
A Novel Corpus-Based Stemming Algorithm Using Co-Occurrence Statistics
10 pages
urmi2016-2
No ratings yet
urmi2016-2
5 pages
Stemming Algorithms: A Comparative Study and Their Analysis: Deepika Sharma (ME CSE)
No ratings yet
Stemming Algorithms: A Comparative Study and Their Analysis: Deepika Sharma (ME CSE)
6 pages
Jksucis S 23 01636
No ratings yet
Jksucis S 23 01636
33 pages
A Novel Method For Stemmer Generation Based On Hidden Markov Models
No ratings yet
A Novel Method For Stemmer Generation Based On Hidden Markov Models
8 pages
Performance Analysis: Stemming Algorithm For The Tamil Language
No ratings yet
Performance Analysis: Stemming Algorithm For The Tamil Language
9 pages
Corpus-Based Stemming Using Cooccurrence of Word Variants
No ratings yet
Corpus-Based Stemming Using Cooccurrence of Word Variants
21 pages
Chapter 2 Part II
No ratings yet
Chapter 2 Part II
75 pages
A Methodology For Building Simple But Robust Stemmers Without Language Knowledge Stemmer Configuration
No ratings yet
A Methodology For Building Simple But Robust Stemmers Without Language Knowledge Stemmer Configuration
6 pages
DHull-GGrefenstette-Technical-report-MLTT96
No ratings yet
DHull-GGrefenstette-Technical-report-MLTT96
17 pages
Natural Language Computing
No ratings yet
Natural Language Computing
20 pages
HPS - High Precision Stemmer
No ratings yet
HPS - High Precision Stemmer
24 pages
Sindhi Stemmer Using Affix Removal Method
No ratings yet
Sindhi Stemmer Using Affix Removal Method
5 pages
Natual Languagr Processing
No ratings yet
Natual Languagr Processing
12 pages
Analysis and Evaluation of Stemming Algorithms A Case Study With Assamese
No ratings yet
Analysis and Evaluation of Stemming Algorithms A Case Study With Assamese
5 pages
2205.04355 (2)
No ratings yet
2205.04355 (2)
11 pages
IR Assignment Article Review 2023
No ratings yet
IR Assignment Article Review 2023
7 pages
Unit 2 Data - Structures
No ratings yet
Unit 2 Data - Structures
84 pages
An Accuracy-Enhanced Light Stemmer For Arabic Text
No ratings yet
An Accuracy-Enhanced Light Stemmer For Arabic Text
22 pages
Unit Iii Data Structure
No ratings yet
Unit Iii Data Structure
43 pages
Irs Ii
No ratings yet
Irs Ii
39 pages
IR Chapter 2
No ratings yet
IR Chapter 2
37 pages
04 StemminginNLP
No ratings yet
04 StemminginNLP
10 pages
Porter Stemming Algorithm For Semantic Checking
No ratings yet
Porter Stemming Algorithm For Semantic Checking
7 pages
Unit 2
No ratings yet
Unit 2
40 pages
Chapter 6
No ratings yet
Chapter 6
6 pages
NLP SEM QUESTIONS AND ANSWERS
No ratings yet
NLP SEM QUESTIONS AND ANSWERS
72 pages
A Survey of Stemming Algorithms in Information Retrieval: Author Index Subject Index Search Home
No ratings yet
A Survey of Stemming Algorithms in Information Retrieval: Author Index Subject Index Search Home
22 pages
Designing A Rule Based Stemmer For Ge'Ez Text: Zigju Demissie Baye
No ratings yet
Designing A Rule Based Stemmer For Ge'Ez Text: Zigju Demissie Baye
4 pages
NLPAssignment Purna
No ratings yet
NLPAssignment Purna
12 pages
Natural Language Processing (CSE4022) : by N. Ilakiyaselvan
No ratings yet
Natural Language Processing (CSE4022) : by N. Ilakiyaselvan
80 pages
A_rule-based_approach_of_stemming_for_inflectional_and_derivational_words_in_Bengali
No ratings yet
A_rule-based_approach_of_stemming_for_inflectional_and_derivational_words_in_Bengali
3 pages
II_2 Unit
No ratings yet
II_2 Unit
73 pages
Extracting, Cleaning and Pre-Processing Text
No ratings yet
Extracting, Cleaning and Pre-Processing Text
12 pages
SNLP
No ratings yet
SNLP
18 pages
Stemming in R a Comprehensive Guide
No ratings yet
Stemming in R a Comprehensive Guide
8 pages
A Language Independent Approach To Develop URDUIR System
No ratings yet
A Language Independent Approach To Develop URDUIR System
10 pages
Improving Stemming For Arabic Information Retrieval Light Stemming and Co-Occurrence Analysis
No ratings yet
Improving Stemming For Arabic Information Retrieval Light Stemming and Co-Occurrence Analysis
8 pages
Sample_Questions_NLP
No ratings yet
Sample_Questions_NLP
2 pages
NLP Part1
No ratings yet
NLP Part1
67 pages
Willettp9 PorterStemmingReview
No ratings yet
Willettp9 PorterStemmingReview
9 pages
UNIT-1 notes
No ratings yet
UNIT-1 notes
19 pages
D AR B H L: Esign OF ULE Ased Indi Emmatizer
No ratings yet
D AR B H L: Esign OF ULE Ased Indi Emmatizer
8 pages
Stemming Information Retrieval Research Paper
100% (1)
Stemming Information Retrieval Research Paper
4 pages
NLB final lab manual (2)
No ratings yet
NLB final lab manual (2)
23 pages
irs unit-2 modified
No ratings yet
irs unit-2 modified
7 pages
AIML-HC Mod 04
No ratings yet
AIML-HC Mod 04
71 pages
60004210188_RajSingh_WIexp6
No ratings yet
60004210188_RajSingh_WIexp6
8 pages
Ijnlc 020305
No ratings yet
Ijnlc 020305
6 pages
New Work On Speech Acts (Daniel Fogal, Daniel W. Harris, Matt Moss) (Https - Z-Lib - Org)
100% (1)
New Work On Speech Acts (Daniel Fogal, Daniel W. Harris, Matt Moss) (Https - Z-Lib - Org)
448 pages
(Textbooks in Language Sciences) Stefanowitsch, Anatol - Corpus Linguistics. A Guide To The Methodology-Language Science Press (2020)
No ratings yet
(Textbooks in Language Sciences) Stefanowitsch, Anatol - Corpus Linguistics. A Guide To The Methodology-Language Science Press (2020)
510 pages
(Asian Englishes Today) Braj B Kachru-Asian Englishes - Beyond The Canon-Hong Kong University Press (2005)
No ratings yet
(Asian Englishes Today) Braj B Kachru-Asian Englishes - Beyond The Canon-Hong Kong University Press (2005)
358 pages
Academic Collocation List
No ratings yet
Academic Collocation List
26 pages
Wierzbicka Anna 1985 Lexicography and Co
No ratings yet
Wierzbicka Anna 1985 Lexicography and Co
3 pages
Book
No ratings yet
Book
534 pages

Stemming

Uploaded by

Stemming

Uploaded by

INF503: Introduction to Computational Linguistics (Assignment 1)

Rule Based Stemming in English Language

Name: Maryam Suhail Butti Hadeed

Module Coordinator: Dr Khaled Shaalan

One of the practical applications of stemming is spell checking by reducing words to

In essence, stemming empowers the field of natural language processing by simplifying

SCOPE OF THE REVIEW

Lack of Relevance to the Research Topic:

I will choose a set of rule-based stemming algorithms to evaluate in my study. These

To measure the effectiveness of the stemming algorithms, I will employ standard

I will conduct a detailed comparative analysis of the stemming algorithms' performance.

To validate my findings, I will conduct statistical tests, such as t-tests or ANOVA, to

In the final stage of my methodology, I will discuss the implications of my findings,

In the pursuit of conducting this systematic literature review, my overarching objective

First and foremost, I aspire to create a comprehensive and contemporary overview of

Furthermore, my research aims to transcend theoretical insights by delving into

In essence, my systematic literature review is not merely an academic exercise but a

The systematic literature review conducted in the domain of Natural Language

First and foremost, this comprehensive literature review is destined to become a

l Define the research objectives, research questions, and criteria for

l Conduct an extensive literature search using the defined search strategy.

l Develop a structured framework for presenting the synthesis of literature.

l Conduct a final review and editing of the complete report.

2023, from https://arxiv.org/pdf/1310.0581

Stemming and lemmatization. (2009). Stanford.edu. https://nlp.stanford.edu/IR-

You might also like