A Handbook of Computational Linguistics: Artificial Intelligence in Natural Language Processing
By Youddha Beer Singh (Editor), Aditya Dev Mishra (Editor) and Pushpa Singh (Editor)
()
About this ebook
This handbook provides a comprehensive understanding of computational linguistics, focusing on the integration of deep learning in natural language processing (NLP). 18 edited chapters cover the state-of-the-art theoretical and experimental research on NLP, offering insights into advanced models and recent applications.
Highlights:
- Foundations of NLP: Provides an in-depth study of natural language processing, including basics, challenges, and applications.
- Advanced NLP Techniques: Explores recent advancements in text summarization, machine translation, and deep learning applications in NLP.
- Practical Applications: Demonstrates use cases on text identification from hazy images, speech-to-sign language translation, and word sense disambiguation using deep learning.
- Future Directions: Includes discussions on the future of NLP, including transfer learning, beyond syntax and semantics, and emerging challenges.
Key Features:
- Comprehensive coverage of NLP and deep learning integration.
- Practical insights into real-world applications
- Detailed exploration of recent research and advancements through 16 easy to read chapters
- References and notes on experimental methods used for advanced readers
Ideal for researchers, students, and professionals, this book offers a thorough understanding of computational linguistics by equipping readers with the knowledge to understand how computational techniques are applied to understand text, language and speech.
Readership
Researchers, students, and professionals in computer science and related fields (AI, ML, NLP and computational linguistics).
Related to A Handbook of Computational Linguistics
Titles in the series (3)
Reinventing Technological Innovations with Artificial Intelligence Rating: 0 out of 5 stars0 ratingsA Handbook of Computational Linguistics: Artificial Intelligence in Natural Language Processing Rating: 0 out of 5 stars0 ratingsFederated Learning Based Intelligent Systems to Handle Issues and Challenges in IoVs (Part 1) Rating: 0 out of 5 stars0 ratings
Related ebooks
A Handbook of Computational Linguistics: Artificial Intelligence in Natural Language Processing Rating: 0 out of 5 stars0 ratingsNatural Language Processing with NLTK: Definitive Reference for Developers and Engineers Rating: 0 out of 5 stars0 ratingsAdvanced Deep Learning Techniques for Natural Language Understanding: A Comprehensive Guide Rating: 0 out of 5 stars0 ratingsAI Development for the Modern World: A Comprehensive Guide to Building and Integrating AI Solutions Rating: 0 out of 5 stars0 ratingsAI For Your Business Rating: 0 out of 5 stars0 ratingsConversing With AI: The World Of Natural Language Processing Rating: 0 out of 5 stars0 ratingsNo-Code Data Science: Mastering Advanced Analytics, Machine Learning, and Artificial Intelligence Rating: 5 out of 5 stars5/5Hugging Face Transformers Essentials: From Fine-Tuning to Deployment Rating: 0 out of 5 stars0 ratingsCoreNLP in Practice: Definitive Reference for Developers and Engineers Rating: 0 out of 5 stars0 ratingsApplied Natural Language Processing with AllenNLP: Definitive Reference for Developers and Engineers Rating: 0 out of 5 stars0 ratingsComputing the Future: Exploring the Frontier of Intelligent Technologies Rating: 0 out of 5 stars0 ratingsThe spaCy Handbook: Simplifying Natural Language Processing Rating: 0 out of 5 stars0 ratingsMastering Natural Language Processing with Python and NLTK Rating: 0 out of 5 stars0 ratingsPython Natural Language Processing Cookbook: Over 60 recipes for building powerful NLP solutions using Python and LLM libraries Rating: 0 out of 5 stars0 ratingsGensim for Natural Language Processing: Definitive Reference for Developers and Engineers Rating: 0 out of 5 stars0 ratingsMastering Large Language Models: Advanced techniques, applications, cutting-edge methods, and top LLMs (English Edition) Rating: 0 out of 5 stars0 ratingsThe Age of AI: How Artificial Intelligence Will Transform Our World Rating: 0 out of 5 stars0 ratingsAI Basics and The RGB Prompt Engineering Model: Empowering AI & ChatGPT Through Effective Prompt Engineering Rating: 0 out of 5 stars0 ratingsNatural Language Processing Rating: 0 out of 5 stars0 ratingsChatGPT A Professional Guide to Its History, Usage, and Biases Rating: 0 out of 5 stars0 ratingsNatural Language Processing with Python: Natural Language Processing Using NLTK Rating: 4 out of 5 stars4/5Artificial Inteligence: 1 Rating: 0 out of 5 stars0 ratingsApplied Natural Language Processing with PyTorch 2.0 Rating: 0 out of 5 stars0 ratingsMastering Transformers: The Journey from BERT to Large Language Models and Stable Diffusion Rating: 0 out of 5 stars0 ratingsData Analysis with LLMs Rating: 0 out of 5 stars0 ratingsThe Beginner’s Guide to Creating AI Chatbots Rating: 0 out of 5 stars0 ratingsThe Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook Rating: 0 out of 5 stars0 ratings
Computers For You
The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology Rating: 4 out of 5 stars4/5CompTIA Security+ Get Certified Get Ahead: SY0-701 Study Guide Rating: 5 out of 5 stars5/5Mastering ChatGPT: 21 Prompts Templates for Effortless Writing Rating: 4 out of 5 stars4/5Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates Rating: 4 out of 5 stars4/5Elon Musk Rating: 4 out of 5 stars4/5Storytelling with Data: Let's Practice! Rating: 4 out of 5 stars4/5Computer Science I Essentials Rating: 5 out of 5 stars5/5SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5UX/UI Design Playbook Rating: 4 out of 5 stars4/5The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution Rating: 4 out of 5 stars4/5Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad Rating: 5 out of 5 stars5/5Data Analytics for Beginners: Introduction to Data Analytics Rating: 4 out of 5 stars4/5Microsoft Azure For Dummies Rating: 0 out of 5 stars0 ratingsThe Self-Taught Computer Scientist: The Beginner's Guide to Data Structures & Algorithms Rating: 0 out of 5 stars0 ratingsFundamentals of Programming: Using Python Rating: 5 out of 5 stars5/5Deep Search: How to Explore the Internet More Effectively Rating: 5 out of 5 stars5/5Excel 101: A Beginner's & Intermediate's Guide for Mastering the Quintessence of Microsoft Excel (2010-2019 & 365) in no time! Rating: 0 out of 5 stars0 ratingsLearning the Chess Openings Rating: 5 out of 5 stars5/5The Musician's Ai Handbook: Enhance And Promote Your Music With Artificial Intelligence Rating: 5 out of 5 stars5/5Becoming a Data Head: How to Think, Speak, and Understand Data Science, Statistics, and Machine Learning Rating: 5 out of 5 stars5/5A Quickstart Guide To Becoming A ChatGPT Millionaire: The ChatGPT Book For Beginners (Lazy Money Series®) Rating: 4 out of 5 stars4/5CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61 Rating: 0 out of 5 stars0 ratingsTechnical Writing For Dummies Rating: 0 out of 5 stars0 ratingsITIL Foundation Essentials ITIL 4 Edition - The ultimate revision guide Rating: 5 out of 5 stars5/5
Reviews for A Handbook of Computational Linguistics
0 ratings0 reviews
Book preview
A Handbook of Computational Linguistics - Youddha Beer Singh
A Comprehensive Study of Natural Language Processing
Rohit Vashisht¹, *, Sonia Deshmukh¹, Ambrish Gangal¹, Garima Singh¹
¹ KIET Group of Institutions, Ghaziabad, India
Abstract
Natural Language Processing (NLP) has received a lot of interest in the current era of digitization due to its capacity to computationally show and analyze human behaviours. Machine transformation, email spam recognition, information mining, and summarization, as well as medical and inquiry response, are just a few of the many tasks it is used for today. The development of NLP from the 1950s to 2023, with various outcomes in the specified period of time, has been outlined in this article. In addition, the fundamental NLP working components are used to show the analogy between the processing done by the human brain and NLP. Major NLP applications have been explored with examples. Last but not least, significant challenges and possible future directions in the same field have been highlighted.
Keywords: Lemmatization, Language processing, Machine learning, Stemming, Tokenization.
* Corresponding author Rohit Vashisht: KIET Group of Institutions, Ghaziabad, India; E-mail: [email protected]
1. INTRODUCTION
NLP stands for Natural Language Processing, a sub-branch of Artificial Intelligence (AI), which has further emerged from the root node of computer science in general. To enable computers to grasp spoken and written language similarly to humans, it focuses on merging computational language understanding with statistical, machine learning, and deep learning models [1].
With linguistics roots, NLP has been around for more than 50 years. It has several useful applications in a variety of fields, such as corporate intelligence, search engine optimization, and research in medicine. Due to NLP, computers can now understand natural speech just like humans. No matter if the given input is speech or written words, NLP employs AI to process and interpret real-world input so that a computer can understand it [2]. NLP underpins all computer programs that translate text between languages, reply to spoken requests, and sum up enormous
quantities of text quickly, even in real-time. Voice-activated GPS systems, virtual assistants, speech-to-text translation software, chatbots for customer support, and other consumer conveniences are all examples of how NLP is employed in daily life [3]. But as a way to streamline mission-critical business processes, improve employee productivity, and optimize business operations, NLP is increasingly being used in enterprise applications. Fig. (1) shows the different potential subfields of computer science as well as the emergence of NLP as one of the uses of AI.
Fig. (1))
Various Branches of Computer Science.
As shown in Fig. (2), NLP results from the confluence of three pillars: computer science, computational linguistics, and machine learning models [4].
Fig. (2))
Three Pillars of NLP.
There are two main functional parts of NLP. The first step is Natural Language Understanding (NLU), which converts the provided input into useful representations and examines the various linguistic aspects. Second, Natural Language Generation (NLG) uses text planning, text realization, and sentence structuring to create significant phrases and sentences in the form of natural language from some internal representation. In general, NLU is much more difficult than NLG [5, 6].
The structure of the chapter is outlined as follows: Section 2 discusses the development of NLP from 1950 to 2023. Section 3 describes the working model of NLP and its various components. Sections 4 and 5 discuss the major applications and challenges in the field of NLP, respectively. Section 6 lists potential future paths and summarizes the overall conclusions.
2. Emergency of NLP
NLP is constantly changing and has a significant influence on our world. Starting with rule-based techniques, it was simple and constrained. As data grew, it advanced to statistical learning, which was applied to basic question answering, predictive text, and other things. Fig. (3) shows the development of NLP over the specified time span with various outcomes.
Bell Labs developed Audrey, the initial speech recognition device, in 1952. It could recognize all ten digits. However, it was dropped because typing phone digits with a finger was quicker [7]. IBM unveiled a shoebox-sized computer that could recognize 16 phrases in 1962. The Turing Test, which Alan Turing created to determine whether or not a computer is genuinely intelligent, has its origins in this decade. The test uses the creation of natural English and automated interpretation as an intelligence criterion [8].
At Carnegie Mellon University, DARPA created Harpy in 1971. Over a thousand words were recognized for the first time by this algorithm [9]. Real-time speech recognition was made feasible in the 1980s thanks to improvements in computing power, which boosted the evolution of NLP. During this period, conceptual ontologies, grammatical theories, symbolic frameworks, and statistical models were created.
During 2000-2020s, NLP has many practical applications due to advancements in computing capacity. Modern approaches to NLP combine statistical techniques with traditional languages. Speech recognition algorithms use deep neural networks as NLP advances. On a spectrogram, it is possible to see how various vowels or sounds have different frequencies. Computers have the capacity to output speech thanks to speech synthesis. These noises, though, are sporadic and seem artificial. The hand-operated Bell Labs machine had a very noticeable case of this, but modern digital voices like Siri and Alexa have improved. These days, chatbots are used for more than just client service. They are competent in both human resources and healthcare. NLP in healthcare can track therapies, examine reports, and review patient files. Routine chores are automated using a combination of cognitive analytics and NLP [10, 11].
Fig. (3))
Evolution of NLP.
The use of NLP in technology and how people engage with it is essential. It is utilized in a wide variety of real-world business and consumer apps, such as chatbots, cyber security, search engines, and big data analytics. Despite some difficulties, NLP is anticipated to remain a vital component of business and daily living.
3. Working Model of NLP
Computers can now comprehend natural conversation just like people due to NLP. Computing machines have reading programs and microphones to gather audio, just as humans have various sensors like ears to hear and eyes to see. The former has a program to process their various inputs, and the latter has a brain to do so. Eventually, during processing, the data is converted into computer-readable instructions. The clear analogy between computer analysis using NLP and human interpretation is shown in Fig. (4).
Fig. (4))
The analogy between Human Beings & Computational Machine.
Fig. (4) illustrates the two primary working stages of NLP. Data pre-processing comes first, followed by algorithm development. Data pre-processing is the process of prepping and cleaning
text data so that computers can analyze it. Pre-processing emphasizes text features that an algorithm can exploit while preparing data for usage. This can be accomplished in a number of methods, including:
Tokenization: Sensitive data is replaced with distinctive identification symbols through the tokenization procedure. Text is divided into manageable chunks at this point. Instantaneously, a text file or unstructured string is transformed into a numerical data structure that is suitable for ML. They can also be employed right away by a computer to start beneficial reactions and actions. In a ML pipeline, they could also be used as features to start more complex actions or judgements. Tokenization allows for the division of phrases, words, characters, and sub-words. Breaking up written material into sentences is referred to as sentence or phrase tokenization. It is known as word tokenization for terms or words. Tokenizing textual content can be done using a variety of tools, including NLTK, TextBlob, spacy, Gensim, and Keras. Examples of phrase tokenization and word tokenization are shown in Figs. (5a) and (5b), respectively.
Fig. (5a))
Phrase Tokenization.
Fig. (5b))
Term Tokenization.
Elimination of Stop Words: Common terms are eliminated from the document corpus in this case, leaving only the special words that reveal the most about the subject matter. Pronouns, prepositions and articles are typically categorized as stop words. The Python-based NLTK (Natural Language Toolkit) contains a list of stop words that are available in 16 different languages [12]. Table 1 shows examples of sentences or phrases before and after the removal of stop words. Removal of stop words is done after the tokenization step.
Table 1 Phrases with and without Stop Words.
One can check the list of stop words by typing the following statements into the Python shell.
import nltk from nltk.corpus
import stopwords
print (stopwords.words(‘english’))
The following code will eliminate all stop words from the text after word tokenization and return a collection of tokens (words) without stop words.
for sw in phrase_tokens:
if sw not in stop_words:
updated_sentence.append(sw) print(phrase_tokens)
print(updated_sentence)
Stemming & Lemmatization: Words are stripped down to their basic components at this point for preprocessing. The process of grouping various word inflections into their base forms with the same meaning is known as lemmatization. It is utilized in both compact indexing and thorough finding systems like search engines. Stemming and lemmatization are closely linked. The distinction is that a stemmer only processes a single word without taking the context into account and cannot distinguish between words with varying meanings based on the perspective of spoken words. Stemmers are frequently quicker and simpler to use, so for some applications, the decreased accuracy may not be a factor. Stemming should be used if a big dataset is available and performance is a concern. Instead, choose lemmatization if accuracy is crucial and the information isn't enormous [13, 14]. Different Lemmatization instances are shown in Table 2. A closer look reveals that the term stripes is lemmatized into the words strip and stripe depending on the context. Lemmatization's default context is a noun.
Part-of-Speech Tagging: This is the labelling of words with their respective parts of speech, such as verbs, nouns and adjectives. It produces a tuple with a list of ordered pairs (word, tag), where the tag denotes the word's category, such as noun (n), verb (v), adjective (a), and adverb (r). It is done with the help of the DefaultTagger class. An example of this pre-processing step is presented here [15]. Table 3 lists the ordered pairs with the relevant Part of Speech (POS) categories that will be printed as an output [16].
import nltk from nltk import
word_tokenize phrase = "He is
going to college"
print (nltk.pos_tag(word_tokenize(phrase)))
In the second phase, an algorithm is created to handle the data after it has undergone preprocessing step. Despite the wide variety of NLP algorithms, two major categories are frequently used that are Rule Based Algorithms (RBA) and Machine Learning Based Algorithms (MLBA).
Table 2 Glimpse of Lemmatization.
Table 3 Part of Speech Tagging.
Rule Based Algorithms: This framework's linguistic rules were carefully crafted. This tactic has been used since the early days of NLP development. Without the rule, the algorithm won't be able to understand
human language and won't be able to classify it. Sadly, this implies that accuracy depends on the rules that are given to the model. Examples of rule-based methods of NLP include regular expressions and context-free grammar. Either the rules can be organized, where the end class is the one that corresponds to the highest priority rule that was triggered [17]. In the absence of this, one can assign votes to each class based on some of their weights, meaning that the rules stay unordered [18]. Take into account the accompanying illustration of an IF-THEN rule R for a binary classification issue.
R: IF (25 ≤ age ≤ 60) AND (salary ≥ 25000)
THEN issue_creditcard = YES
IF portion of a rule is referred to as a rule antecedent or precondition, and THEN part is referred to as a rule consequent or conclusion.
Machine Learning-Based Algorithms: In ML systems, statistics are applied. Training data is provided to teach them how to accomplish tasks, and when more training data is analyzed, their strategies are modified. NLP algorithms use machine learning, deep learning, and neural networks to create their own principles through processing and learning over time. To identify entities, sentiment, POS, and other textual components, ML for NLP and text analytics employ a number of statistical techniques [19]. The techniques may be contained in Supervised Machine Learning (SML), also referred to as a model that is then applied to further text. Unsupervised Machine Learning (USML) refers to a class of algorithms that process enormous amounts of data to extract meaning. In order to recommend a more successful data analytics method, ML and NLP have been coupled, as seen in Fig. (6).
Fig. (6))
ML-based NLP Model for Data Analytics.
4. Major Application of NLP
NLP makes it possible for machines to comprehend facts in the same manner that humans perceive. It aids in the computer system's interpretation of the true meaning and assists in the recognition of the attitudes, tone, opinions, thoughts, and other elements that make up a proper dialogue. The following are some examples of the numerous NLP applications:
Sentimental Analysis:- The majority of discussions and texts consist of emotions because of daily interactions, posted material, comments and product reviews. Understanding these feelings is just as crucial as comprehending the meaning of the words themselves. Humans are capable of deciphering the emotional undertones of written and spoken words, but with the aid of NLP, computers can also interpret the sentiments of a document in addition to its true meaning.
Text Summarization:- We have a vast corpus of data in today's digital world, which has also broadened the scope of data processing. Processing data manually takes time and is prone to mistakes. NLP has a solution for it as well; in addition to summarizing information's meaning, it can additionally decode its emotional significance. Consequently, the summarizing procedure is expedited and flawless.
Search Engines:- We must continually navigate this challenging and puzzling world by gathering the necessary knowledge from the resources that are available. The internet is one of the most abundant sources of knowledge. By recognizing the exact meaning of words and the intent behind their creation, NLP aids search engines in understanding what is requested of them and provides us with the desired results.
Text Summarization:- We have a vast corpus of data in today's digital world, which has also broadened the scope of data processing. Processing data manually takes time and is prone to mistakes. NLP has a solution for it as well; in addition to summarizing information's meaning, it can additionally decode its emotional significance. Consequently, the summarizing procedure is expedited and flawless.
Search Engines:- We must continually navigate this challenging and puzzling world by gathering the necessary knowledge from the available resources. The internet is one of the most abundant sources of knowledge. By recognizing the exact meaning of words and the intent behind their creation, NLP aids search engines in understanding what is requested of them and provides us with the desired results.
Smart Assistance:- In the modern world, a new smart device is introduced every day, making the world increasingly smarter. Smart assistants like Siri, Alexa, and Cortana are now available. They even respond to us in the same way when we speak to them in the same manner as we would. NLP makes all of this feasible. By decomposing our language into its constituent elements, such as vocabulary, core stem, and other linguistic qualities, it improves the computer system in processing it.
Language Conversion:- There are as many languages due to various cultural groups in the world, yet not everyone is fluent in every one of them. One needs to be able to work with several languages in order to satisfy the demands of the global world. NLP assists us by translating language together with all of its emotions.
Chatbots:- These chatbots are equipped with conversational skills thanks to NLP, enabling more accurate consumer responses than simple one-word answers. Chatbots are also useful in situations when human labour is scarce or not available constantly. NLP-based chatbots also include psychological intelligence, which enables them to recognize and successfully address the emotional needs of their users.
In addition to the aforementioned key applications, NLP is employed in word processing tasks, including detecting plagiarism, customer service automation, customer feedback analysis, analysis and segmentation of medical information, stock forecasting and trading insights, and word processing in general.
5. NLP’s Prime Challenges
Although NLP is an effective instrument with many advantages, there are still a number of constraints and issues associated with it.
Ambiguity:- In NLP, sentences and phrases that have the potential for two or more meanings are referred to as ambiguous. If a term can be employed as a noun, adjective, or verb, it may be considered lexically ambiguous. Another possibility is semantic ambiguity, in which a statement may convey various interpretations depending on the reader's context [20].
Text or Speech Errors:- Text analysis may have problems due to misspelt or misused words. Although autocorrect and grammar checkers can correct trivial mistakes, they typically miss the writer's intended meaning [21].
Contextual Words/Phrases and Homonyms:- Homonyms are words that sound the same yet mean different things. For instance, it can be difficult for a computer to distinguish between the terms their and there.
Synonyms:- Synonyms may present issues analogous to those related to contextual comprehension since we utilize a range of words to convey the same notion. In addition, many speakers use synonyms to denote slightly varied meanings in their own vocabularies, even while some of these phrases may have the exact same meaning and others may have varying degrees of complexity.
Sarcasm and Irony:- Word embeddings are used to get around this problem since irony and sarcasm frequently use words and phrases that, strictly speaking, can only be positive or negative. This causes issues for ML models.
Precision:- Traditionally, computers have required people to speak
to them using a programming language that is precise, transparent, and extremely organized or by using a small set of verbal instructions that are clearly pronounced. The grammatical structure of human speech can change depending on a wide range of complex circumstances, such as slang terms, regional dialects, and social factors, and it is frequently unclear.
Tone of Voice & Inflection:- Depending on the word or syllable the speaker emphasizes, a sentence can have a different meaning. When doing speech recognition, NLP algorithms could fail to pick up on a person's voice's small but significant tone shifts. It can be difficult for an algorithm to analyze speech because accents can also affect tone and inflection [22].
Challenges of NLP are shown in Fig. (7). Even if NLP has its limitations, any organization can greatly profit from its vast advantages. Many of these limitations will be overcome in the upcoming years as new methods and technologies emerge on a daily basis.
Fig. (7))
Challenges of NLP.
CONCLUSION AND FUTURE DIRECTIONS
Natural Language Processing, or NLP for short, is a subfield of computer science, humanities, and artificial intelligence. Machines can comprehend, analyze, manipulate, and analyze human languages thanks to technology. In order to execute tasks like translation, auto summarization, named entity recognition (NER), voice recognition, relation extraction, and topic segmentation, it aids developers in organizing knowledge. This chapter describes the development of NLP from the 1950s to the present in detail. Additionally, the working paradigm of NLP is effectively illustrated with a clear comparison to human thought processes. Besides this, the main uses and difficulties of NLP have been outlined.
One of the most promising future research directions is the evolution of NLP as a human-computer discussion rather than a human-computer interaction. One of the upcoming projects will involve fusing biometrics and NLP to produce more sophisticated smart gadgets. Last but not least, humanoid robots are one of the future advancements in the field of NLP.
References
Recent Advancements in Text Summarization with Natural Language Processing
Asha Rani Mishra¹, *, Payal Garg¹
¹ Department of Computer Science & Technology, G.L Bajaj Institute of Technology and Management, Greater Noida, India
Abstract
Computers can now comprehend and interpret human languages thanks to Natural Language Processing (NLP), a subfield of artificial intelligence. NLP is now being used in a variety of fields, including healthcare, banking, marketing, and entertainment. NLP is employed in the healthcare industry for activities like disease surveillance, medical coding, and clinical documentation. NLP may extract relevant data from patient data and clinical notes. Sentiment classification, fraud prevention, and risk management are three areas of finance where NLP is applied. It can identify trends in financial data, spot anomalies that can point to fraud, and examine news stories and social network feeds to learn more about consumer trends and market dynamics. NLP is utilized in marketing for chatbot development, sentiment analysis, and consumer feedback analysis. It can assist in determining the needs and preferences of the consumer, create tailored marketing campaigns, and offer chatbot-based customer care. Speech recognition, language translation, and content suggestion are all uses of NLP in the entertainment industry. In order to suggest movies, TV series, and other material that viewers are likely to love, NLP analyses user behaviour and preferences. It can also translate text between languages and instantly translate audio and video content. It is anticipated that NLP technology will develop further and be used in new fields and use cases. It will soon be a necessary tool for enterprises and organizations in a variety of sectors. In this chapter, we will highlight the overview and adoption of NLP in different applications. Also, this chapter discusses text summarization, an important application of NLP. Different techniques of generating text summaries along with evaluation metrics are the highlights of the chapter.
Keywords: Cosine Similarity, Extractive Summarization, Natural Language Processing (NLP), ROUGE Scores, TF-IDF, TextRank, Text Summarization.
* Corresponding author Asha Rani Mishra: Department of Computer Science & Technology, G.L Bajaj Institute of Technology and Management, Greater Noida, India; E-mail: [email protected]
1. INTRODUCTION
Data is generated by many systems every day in large volumes. The large volume of text data is present in almost every domain and different sources like tweets,
articles, reviews, and comments. Text data is unstructured in nature since it does not fit into any predefined data model that is available to us as relational databases. For example, to store text data, organizations are using different file systems to access it as needed. There are many challenges while analyzing data to extract meaningful patterns or have insights for business decisions. Most of the algorithms in machine learning or data analytics are compatible with numeric data. Textual data, since it does not follow any structure, does not have regular syntax or patterns, so direct use of any mathematical or statistical model is not feasible. An essential component of artificial intelligence (AI) is natural language processing (NLP) tool that can provide many transformations that can be easily interpreted by the machine.
1.1. Evolution in NLP
In the 1950s and 1960s, at the dawn of artificial intelligence and computer science, NLP began to take shape. The goal of some of the earliest rule-based system development in NLP was to interpret and produce human language. Due to the limitations of these systems of linguistic comprehension, research in the 1990s and 2000s switched to statistical and machine learning-based methods. Unsupervised and semi-supervised learning algorithms for NLP were developed in the early 2000s as a result of the accessibility of enormous volumes of text data and computing capacity. Large-scale language models that could train to interpret and produce language in an unsupervised or weakly supervised manner were made possible by these techniques.
NLP experienced substantial breakthroughs in the middle of the 2010s as a result of the advent of deep learning techniques such as Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs). With the use of these techniques, models that could learn to represent language in a continuous vector space may be created, allowing for more precise language comprehension and creation.
Transformer-based models like BERT and GPT have become the preeminent paradigm in NLP in recent years. These models may be customized for a wide range of NLP tasks and, in many cases, achieves state-of-the-art performance after being pretrained on substantial amounts of text data. NLP has developed a wide range of applications over the course of its progress, including chatbots, sentiment analysis, speech recognition, machine translation, and virtual assistants. We may anticipate even more ground-breaking uses for NLP in the future as research in the field advances.
1.2. Recent Advancement in NLP
The field of NLP has made numerous advancements recently. The following are some of the most important ones:
Large Pretrained Language Models: The creation of large pretrained language models, like GPT-3, which are capable of a variety of language tasks with high accuracy, is one of the most significant developments in NLP. These models can be customized for particular tasks like text categorization, sentiment analysis, and automated translation because they are trained on vast amounts of text data.
Transfer Learning: Transfer learning is where a model is first trained on a large amount of data for a specific task and then fine-tuned on a smaller dataset for a different but related task. This technique has been shown to improve performance on a wide range of NLP tasks.
Neural Machine Translation (NMT): This method of translation translates text from one language into another by using neural networks. NMT has been shown to produce more fluent and accurate translations than traditional statistical machine translation methods.
Multilingual NLP: Recent developments in multilingual NLP have made it possible to train models in many languages simultaneously, boosting performance on multilingual tasks. Multilingual NLP refers to the ability of NLP models to process and understand text in numerous languages.
Explainable AI (XAI) is the term used to describe an AI model's capacity to offer clear, comprehensible justifications for its predictions. Recent developments in NLP have made it possible to create XAI models that can give thorough justifications for their linguistic predictions, enhancing their trustworthiness and interpretability.
Zero-Shot Learning: This method enables NLP models to carry out tasks for which they have not been specifically taught. Utilizing the general information and context obtained during pre-training, this is accomplished. It has been demonstrated that zero-shot learning works well for tasks like text classification and machine translation.
Ethical Issues: As NLP is used more frequently in a variety of applications, ethical issues like prejudice, fairness, and privacy are receiving more attention. Researchers are working hard to create strategies that will reduce bias and guarantee that NLP models are impartial and considerate of user privacy.
1.3. Applications in NLP
Various Applications of NLP are shown in Fig. (1). Understanding some Natural Language Processing applications can help finish various time-consuming jobs more quickly and effectively while reducing the workload.
Fig. (1))
Various Applications of NLP.
Email filtering: Email is used on a daily basis related to jobs, studies, or a variety of other topics having all different kinds of sources; some are work-related, while others are spam or promotional communications. Natural language processing is used in this situation. It distinguishes between incoming emails that are important
or spam
and filters them into those folders.
Language translation: Due to the advent of technology, society has become a global village, making it necessary to engage with others who may speak a language that is unfamiliar to us. By translating the language with all of its sentiments, natural language processing aids great help.
Smart assistants: In the modern world, a new smart device is introduced every day, making the world increasingly smarter. And not just machines are benefiting from this development, with the development of technology, intelligent personal assistants like Siri, Alexa, and Cortana. They even respond in the same way as humans.
Natural Language Processing: Natural Language Processing makes all of this feasible. Composing any language into its constituent elements of speech, root stem, and other linguistic qualities aids the computer system in comprehending it. It not only aids in language comprehension but also in processing meaning and emotions and responding in a human-like manner.
Document analysis: Document analysis is another application for NLP. Companies, colleges, schools and other similar institutions continually have an abundance of data that needs to be kept, organised, and searched for. All of this may be accomplished with NLP. Along with performing a keyword search, it also organises the results into relevant categories, preventing the user from having to spend time and effort browsing through numerous files to find a certain person's information. Not content with that, it also helps the user make knowledgeable decisions about how to handle claims and manage risks.
Predictive text: Predictive text is a comparable application to online searches. Every time we use our smartphones to input anything, we use it. The keyboard suggests possible words whenever we input a few letters on the screen, and once we have written a few words, it begins to offer possible words for the following word. These predictive sentences could initially be a little inaccurate. Even so, over time, it learns from our messages and begins to correctly propose the next word even when we have not yet written a single letter of it. By giving our smartphones the ability to suggest phrases and learn from our messaging behaviours, NLP is used to accomplish all of this.
Automatic summarization: Data has expanded along with numerous innovations and creations. This growth in data has also widened the range of data processing. However, manual data processing takes time and is prone to mistakes. NLP has a solution for it as well; in addition to summarizing information's meaning, it can also decipher its emotional significance. Consequently, the summarizing procedure is expedited and flawless.
Social media monitoring: Everybody has a social media account, and sharing views, likes, dislikes, experiences, etc. on these platforms reveals a lot about the people. We discover details not only about specific people but also about goods and services. The relevant businesses can process this data to learn more about their goods and services so they can enhance or modify them. NLP is used in this situation. It makes it possible for the computer system to comprehend and analyze unstructured social media data in order to generate the necessary findings in a useful format for businesses.
Sentiment analysis: The majority of discussions and texts are emotional because of daily interactions, posted material and comments, and book, restaurant, and product reviews. Understanding these feelings is just as crucial as comprehending the meaning of the words themselves. Humans are capable of deciphering the emotional undertones of written and spoken words, but with the aid of natural language processing, computers can also comprehend the sentiments of a document in addition to its literal meaning.
Chatbots: As technology has advanced, everything from studying to shopping, buying tickets, and customer service has gone digital. The Chabot responds immediately and accurately rather than making the user wait a lengthy time to receive some brief and immediate answers. These chatbots are equipped with conversational capabilities thanks to NLP, enabling more accurate consumer responses than simple one-word answers. Chabot has also been useful in areas with limited or unreliable human power. NLP-based chatbots also include emotional intelligence, which enables them to efficiently comprehend and address the customer's emotional needs.
1.4. Role of Natural Language Processing in Text Mining
Text analytics, popularly