Sujan dulal
Sujan dulal
2024-25 Autumn
I confirm that I understand my coursework needs to be submitted online via Google Classroom under the
relevant module page before the deadline for my assignment to be accepted and marked. I am fully aware
that late submissions will be treated as non-submission and a mark of zero will be awarded
Acknowledgement
I initially felt excited to work on this coursework but struggled with time management as the
deadline approached, especially while conducting research and gathering information.
However, the support from my friends and module teacher, Zishan Siddique, was invaluable.
My friends motivated me with their insights and understanding, while Mr. Siddique expertise,
guidance, and patience greatly enhanced my understanding of the subject and helped me
achieve the best possible outcome. I am deeply grateful for their contributions and support
throughout this journey.
Abstract
The purpose of this research is to create a sentiment analysis system that can accurately
identify the sentiment of text in different contexts. To accomplish this, the use of machine
learning techniques, specifically the Naive Bayes algorithm, is proposed to train a sentiment
classifier using a large and diverse dataset of text samples that are labelled with their
corresponding sentiment. Sentiment analysis is a method to understand the opinions and
emotions expressed in text data such as customer feedback and social media posts.
However, building an accurate and robust sentiment analysis system is still a challenging
task due to the complexity and variability of natural language. As the coursework progresses,
the researcher anticipates facing several difficulties and obstacles. To overcome these, it is
crucial to thoroughly examine different materials and use a test and error method to gain a
deeper understanding of any issues that may arise. The outcome of this coursework will lead
to the development of a sentiment analysis system that can be applied in various domains,
providing valuable insights into the opinions and emotions expressed in text data
Table of Contents
1 Introduction.........................................................................................................................7
2 Background.......................................................................................................................11
3 Solution.............................................................................................................................14
3.4.3 Dataset.................................................................................................................23
4 Conclusion........................................................................................................................34
4.1 Results......................................................................................................................34
4.1.1 Naïve Bayes.........................................................................................................34
5 References.......................................................................................................................36
1 Introduction
Machine learning enhances sentiment analysis by automating text analytics operations, such
as identifying sentiment in text by training models on large datasets of text data, such as
customer feedback, using supervised and unsupervised approaches. NLP techniques are
also used to assign weighted sentiment ratings to entities, topics, and categories within text,
allowing for a more nuanced understanding and better data driven decisions for businesses.
(Anon., n.d.)
Machine Learning (ML): Machine learning is a branch of AI that enables computer systems
to improve and adapt without explicit programming. It uses algorithms to analyze data and
make predictions and decisions, the goal being to allow computer systems to learn and
improve automatically. Its purpose is identifying patterns, update predictions and decision as
new data is available, thus making the model intelligent over time. (jannade, n.d.) There are
mainly two types of machine learning:
• Supervised Machine Learning
Supervised learning is a type of machine learning where the model is trained on labeled data
to make predictions. The model learns from input-output relationships in the labeled data
during training, then evaluated on separate test data to gauge its accuracy. This allows the
model to make predictions on new, unseen data. The goal is to train a model using labeled
data and apply the learned relationships to make predictions on new, unlabelled data.
(geeksforgeeks, n.d.)
10 | P a g e Sujan Dulal
Artificial Intelligence CU6051
2 Background
Fine-grained sentiment analysis is a method that can offer more detailed information on
emotions and attitudes in text data than a simple binary classification. It determines specific
emotions and intensity in customer feedback, social media posts and more. Fine grained
sentiment can provide more accurate and nuanced insights, such as specific emotions such
as excitement or disappointment and the level of intensity. It is particularly useful in
applications where detailed understanding of customer sentiment is required.
(analyticsvidhya, 2020)
11 | P a g e Sujan Dulal
Artificial Intelligence CU6051
Emotion detection is a field of study that aims to identify and understand the emotions that
people express through various channels of communication. It is a rapidly growing area of
research, and its applications are wide-ranging, from facilitating communication between
robots and humans to enhancing decision-making processes. There are various Machine
Learning Models that have been developed for the task of extracting emotions from text, one
of the most popular methods is the lexicon-based approach, which uses a collection of
words that express emotions, commonly called lexicons. This approach is widely used by
emotion detection systems. (thecleverprogrammer, 2020)
However, some advanced systems use more sophisticated Machine Learning techniques
that are better equipped to understand the nuances of human emotions and the different
ways in which they are expressed. This is because people express their emotions in many
ways, and lexicons may not always be adequate for proper emotion recognition. For
example, a sentence like "This product is going to kill me" can be used to express fear and
panic, but it could also be used in a different context with a positive connotation like "This
product is killing it for me" the word "kill" is used differently and a lexicon-based approach
could lead to improper emotion recognition. make it short make it short by counting the
positive and negative words in the text, adding these values mathematically, and then
labelling each word, we may determine the total emotion score.
The sentiment score (StSc) is often calculated using the following formula:
12 | P a g e Sujan Dulal
Artificial Intelligence CU6051
The text is categorized as negative if the sentiment score is negative. Accordingly, a score of
positive indicates a positive text, and a score of zero designates a neutral text. (Bessa,
2023)
• Google's output serves as a solid example of how sentiment analysis software helps
improve products. Consider the Chrome web browser as an illustration. The
development team for Google Chrome is always keeping an eye on both direct and
indirect user feedback (i.e. presented in the open sources, most notably, blogs).
• KFC is a superb illustration of brand tracking with sentiment analysis. For brand
monitoring and advertising, KFC used sentiment analysis. They connect individuals
with their brand and eventually get them to associate with the product by fusing
sentiment analysis with social network monitoring and campaign management.
13 | P a g e Sujan Dulal
Artificial Intelligence CU6051
• YouTube has a vast number of user comments under videos. Sentiment analysis can
help creators and businesses understand the general sentiment of their audience
whether viewers feel positively, negatively, or neutrally about their content.
3 Solution
Text sentiment analysis helps solve real-world problems by understanding how people feel
through their written words. It has many useful applications:
• Companies use it to read customer reviews and make their services better.
• Doctors can spot signs of depression by analysing patients’ writings. (oh, 2024)
• During emergencies, it helps find people who need food or medical help by checking
social media posts. (kumar, n.d.)
• Schools use student feedback to improve teaching.
Naive Bayes classification method is a popular technique for identifying patterns in data and
making predictions. It is based on the Bayes' Theorem, which states that the probability of an
event occurring is determined by the prior knowledge of conditions that might be relevant to
that event. It's a probabilistic approach which consider all the feature independently. The
algorithm calculates the likelihood of each class and chooses the one with the highest
probability. It is commonly used for a variety of tasks, but it has been particularly effective in
solving problems related to natural language processing (NLP). (Gamal, 2020)
𝑃(𝐴|𝐵)=𝑃(𝐵|𝐴) 𝑥 𝑝(𝐴)
𝑃(𝐵)
14 | P a g e Sujan Dulal
Artificial Intelligence CU6051
P(A|B) – posterior
P(B|A) – likelihood
P(A) – prior
P(B) – evidence
Recurrent Neural Networks (RNNs) are considered highly suitable for sentiment analysis
because they are specifically designed to process sequential data, such as text, where the
order of words plays a critical role. By remembering the context from earlier words while
analysing the subsequent ones, RNNs can maintain continuity, which is essential for
understanding the flow of sentiment in a sentence. This ability allows them to handle
complex sentences and capture long-range dependencies effectively, making them
particularly useful for tasks where understanding relationships between words is necessary
to derive accurate sentiment predictions. (Patel, 2019)
15 | P a g e Sujan Dulal
Artificial Intelligence CU6051
READ and classify the dataset by reading a file 'Dataset.txt' using the pandas library
INSPECT the missing values by using the isnull() function and sum() function to get the
missing value
SHOW the first five labelled dataframe using the head() function
COUNT the total number of negative and positive labels by using the value_counts() function
CONVERT all the text's uppercase to lowercase and store in a list lower_text
REMOVE punctuations and regular expressions from the text and store in a list clean_text
SPLIT the prepared data into training group and testing group
TAKE a new input from the user, process it the same way as training data, predict the
16 | P a g e Sujan Dulal
Artificial Intelligence CU6051
17 | P a g e Sujan Dulal
Artificial Intelligence CU6051
18 | P a g e Sujan Dulal
Artificial Intelligence CU6051
19 | P a g e Sujan Dulal
Artificial Intelligence CU6051
20 | P a g e Sujan Dulal
Artificial Intelligence CU6051
Third, the text was transformed to numbers by using the TF-IDF technique so that the data
could be used in machine learning algorithms. Naïve Bayes, Logistic Regression, and
RNNs were developed and trained using this processed data.
The performance of each model was evaluated using basic assessment tools such as
accuracy and confusion matrix. Finally, the results were presented in the form of graphs to
ensure that they could be easily understood. This approach was instrumental in developing
a process that can now be used to determine the sentiment of text.
Python is an interpreted flexible open-ended language used for machine learning and data
analysis because of its simplicity and extensive support base.
Jupyter Notebook
A live coding environment perfect for developing and demonstrating prototypes, as well as
visualizing results in cells with markdown.
21 | P a g e Sujan Dulal
Artificial Intelligence CU6051
NumPy
It is used for analysis high dimensions data and mathematical calculation on multi-
dimensional arrays. (naik, 2024)
Pandas
Allows for easy data operations such as selection, cleaning and structuring data by
converting data sets into data frames for further analysis. (geeksforgeeks, n.d.)
Applied to data visualization where data patterns are represented by plots, histograms and
heat maps. (geeksforgeeks, 2025)
Scikit-Learn
A machine learning library containing Naïve Bayes and Logistic Regression algorithms, and
tools for data pre-processing, feature extraction and assessment. (domino, n.d.)
TensorFlow
It is helpful in applying deep learning models such as RNNs or LSTMS that are essential in
processing analysed text sequences. (banoula, 2024)
NLTK
Combines features such as tokenization of text documents, removal of stop words and text
preprocessing (stemming and lemmatization). (sidak, 2023)
22 | P a g e Sujan Dulal
Artificial Intelligence CU6051
Spacy/TextBlob
Semantic analysis for entity recognition and ideal sentiment analysis from the advanced
NLP packages. (gichere, 2023)
TF-IDF Vectorizer
Transforms text data into numeric by highlighting important words and downplaying
frequent words. (kilmen, 2022)
3.4.3 Dataset
The dataset of Twitter sentiment data was also labelled and used to train and test the
machine learning models. This dataset is pre-labelled by sentiments like positive, negative
and neutral, so it will be useful for testing and developing a sentiment analysis system.
23 | P a g e Sujan Dulal
Artificial Intelligence CU6051
24 | P a g e Sujan Dulal
Artificial Intelligence CU6051
25 | P a g e Sujan Dulal
Artificial Intelligence CU6051
26 | P a g e Sujan Dulal
Artificial Intelligence CU6051
27 | P a g e Sujan Dulal
Artificial Intelligence CU6051
28 | P a g e Sujan Dulal
Artificial Intelligence CU6051
Figure 21 of RNN
29 | P a g e Sujan Dulal
Artificial Intelligence CU6051
30 | P a g e Sujan Dulal
Artificial Intelligence CU6051
31 | P a g e Sujan Dulal
Artificial Intelligence CU6051
32 | P a g e Sujan Dulal
Artificial Intelligence CU6051
33 | P a g e Sujan Dulal
Artificial Intelligence CU6051
4 Conclusion
Sentiment analysis helps figure out if a text is positive, negative, or neutral. Businesses use
it to see how customers feel about their products or services. If customers are unhappy,
businesses can improve and make them happier. It also helps marketers understand if their
ads or products are liked or disliked by customers.
The use of machine learning methods to implement those aspects has been described in this
report. The introduction of the subject of "Sentiment Analysis" has also been covered along
with an analysis of the methods used to approach various issue domains.
4.1 Results
4.1.1 Naïve Bayes
Accuracy: 0.67
Macro Average:
Accuracy: 0.68
Macro Average:
Accuracy: 0.52
34 | P a g e Sujan Dulal
Artificial Intelligence CU6051
Macro Average:
Class-wise Performance:
For Naïve Bayes, the Negative class had the highest recall (0.81) and F1-
score (0.72).
For Logistic Regression, the Negative class showed the best overall balance
with F1-score=0.73.
For RNN, the Positive class achieved high recall (0.89) but performed poorly
for the Neutral class, with no predictions for that category.
Macro Averages: Logistic Regression outperformed others with the best macro
averages across Precision, Recall, and F1-score.
Logistic Regression is the most balanced model in terms of both accuracy and macro
averages, making it the most suitable for this dataset. However, the performance of all
models for the Neutral class indicates potential room for improvement in handling this
category.
35 | P a g e Sujan Dulal
Artificial Intelligence CU6051
Gathering public opinion and feedback from customers is crucial for any service to function
well. Surveys are often used to gain insights on how people think and feel about a particular
service. Sentiment analysis can be used within these surveys to evaluate the effectiveness
of services, measure their impact on people and identify areas that require improvement. In
short, sentiment analysis can provide valuable information to make necessary adjustments
to better the services.
5 References
aimtechnologies, n.d. aimtechnologies. [Online]
36 | P a g e Sujan Dulal
Artificial Intelligence CU6051
[Accessed 23 12 2024].
Available at:
https://www.lexalytics.com/technology/sentiment-analysis/#machinelearningsentiment
[Accessed 22 12 2024].
Available at: Artificial Intelligence (AI) is the simulation of human cognitive functions by
computer systems, including expert systems, natural language processing, speech
recognition, and machine vision. It is used in various industries such as self-driving cars, cus
[Accessed 22 12 2024].
Gamal, B., 2020. medium. [Online]
37 | P a g e Sujan Dulal
Artificial Intelligence CU6051
38 | P a g e Sujan Dulal
Artificial Intelligence CU6051
39 | P a g e Sujan Dulal
Artificial Intelligence CU6051
40 | P a g e Sujan Dulal
Artificial Intelligence CU6051
41 | P a g e Sujan Dulal