0% found this document useful (0 votes)
9 views

NLP Final

The document discusses using machine learning and deep learning models to perform sentiment analysis on movie reviews from IMDB to classify them as positive or negative. It describes exploring the dataset, implementing traditional models like Naive Bayes and logistic regression as well as deep learning models like LSTM and feed forward neural networks, and evaluating model performance.

Uploaded by

Talha ch
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

NLP Final

The document discusses using machine learning and deep learning models to perform sentiment analysis on movie reviews from IMDB to classify them as positive or negative. It describes exploring the dataset, implementing traditional models like Naive Bayes and logistic regression as well as deep learning models like LSTM and feed forward neural networks, and evaluating model performance.

Uploaded by

Talha ch
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 22

IMDB Movie Ratings Sentiment Analysis Project

Table of Contents
Introduction......................................................................................................................................2

Importance of the Topic...................................................................................................................3

Background Review.........................................................................................................................3

SMART Objectives.........................................................................................................................4

Dataset Description..........................................................................................................................5

Exploratory Data Analysis (EDA) and Preprocessing.....................................................................6

Traditional Models...........................................................................................................................8

Traditional Machine Learning Methods......................................................................................8

Deep Learning Methods.............................................................................................................10

Implementations.........................................................................................................................13

Naive Bayes...............................................................................................................................13

Logistic Regression....................................................................................................................15

Deep Learning Models..................................................................................................................17

LSTM (Long Short-Term Memory)..........................................................................................17

Feed forward Neural Network...................................................................................................18

Model Refinement and Evaluation................................................................................................19

Conclusion.....................................................................................................................................20

References......................................................................................................................................20
Introduction
One of the major challenges in the field of Natural Language Processing (NLP) is accurately
quantifying sentiment from textual data. The problem here is that the human language is nuanced
and context-dependent; the very same words may take on very different emotional colorations
depending on the context and the tone in which it is spoken (Liu, 2016). Sentiment analysis is
one such part of NLP that solves this issue by use of algorithms and linguistic analyses that
classify text under pre-defined sentiment classes: those are either positive, negative, or neutral. In
spite of such improvements, a problem remains that it is difficult to effectively determine
subtleties in sentiment, especially in the subjective domain, where emotions and cultural context
are an inseparable combination (Topal, 2016).

The domain of movie reviews provides the scope for a rich and meaningful domain of study
within the framework of sentiment analysis. Fact is, more than any other form of textual data,
movie reviews usually capture diversity in emotions, opinions, and subjective experiences — an
ideal candidate for sentiment analysis (Liu, 2016). And of course, the filmic environment is
replete with genres, ersyns of all kinds, eras, and even cultural contexts, hence really making the
job for sentiment analysis more than difficult. Continuing with the aid of IMDB movie scores,
this paper looks to make contributions to the field of NLP by making known the collective
sentiments which lie in hiding within user-generated movie reviews (Holsapple, 2014). Through
these analyses, we are trying to somehow give both the film-watching public and the makers
themselves a sense of worth in the movies they make, nurturing a mutual relationship that is
enriching for the development of the cultural tapestries as much for the practice of telling stories.

Importance of the Topic


Sentiment analysis is the kind of tool that represents the insights into various aspects of the film
production and its distribution. It becomes very relevant in sentiment analysis due to its capacity
to decode audience reactions and preferences, helping filmmakers and producers shape their
creative vision such that it is in tune with the feelings of their viewers. It speaks at a higher level
than just box office success because of the influence it has on environmental sustainability,
thematic relevance, and engagement to the audience. However, the movie viewership of
sentiment analysis is mostly unexplored, offering opportunities for their potential use in
providing perspectives gained from the data in decision-making from the trends to underlying
reasons in scriptwriting, actor selection, budget allotment, and marketing. With filmmakers
always striving to captivate and win over the audience through something that would be touching
and engrossing, integrating sentiment analysis into the film production process definitely bears a
lot of promise and, at this point, is an underused resource in the NLP field (Shaukat, 2020).

Background Review
The sentiment analysis in the film industry has opened the gates for research on a number of
studies, bringing varied methodologies and insights to the table. Dahir and Alkindy do the data
analysis of IMDB movie reviews using logistic regression with machine learning algorithms.
The experiment shows that the model performs well in logistic regression with TF-IDF as the
feature extraction method. This is a supervised learning method and, though very precise, it may
be less easily scalable (Dahir, 2023). Similarly, Ruus and Sharma undertook the task of
predicting movie revenue with Twitter and critic data, for which they saw room for the use of
random forest models. However, it oversimplifies and captures the nuanced sentiment in
reviews, which could be a challenge in making accurate revenue outcome predictions (Ruus,
2019).

Lopez and Sumba contextualize it into their work on sentiment analysis via IMDB, suggesting
hybrid feature extraction techniques to improve classification accuracy. Even though presenting
sentiment in this multilayered form allows the model flexibility, potential arising in model
complexity is worth probing further into its trade-offs (Lopez, 2019).

Harish and Kumar proposed that the hybrid feature-extraction method, where lexicon features
are used along with machine learning features, would become the most accurate sentiment
classifier. However, concern still remains for the tradeoff in accuracy toward computational
efficiency, which calls for strategies in optimization (Harish, 2019).

Singh and Kulkarni researched the IMDB movie review classifier and assumed that logistic
regression gave the best accuracy (Singh, 2022). Explain the feature selection method more in
order to obtain a better explanation of the model and its generalization. Taken together, these
studies advance the sentiment analysis methodology within the film industry, emphasizing
further the importance of context, feature selection, and scalability for accurately capturing
audience sentiment and predicting movie success.
SMART Objectives
 Specific: we develop a sentiment analysis system that could classify the IMDB movie
reviews into classes of either positive or negative with high accuracy.
 Measurable: In this project, model performance will be measured using metrics that
include accuracy, F1, and AUC-ROC, among others, that quantify sentiment
classification best.
 Achievable: In this project, deep learning models (for example, LSTM, Feed forward
Neural Network) and traditional machine learning models (such as Naïve Bayes, Logistic
Regression) will be implemented and compared, given the success of these models in
carrying out the task of sentiment analysis and, therefore, attainable within the project.
 Relevant: This is an aim to give relevant findings for stakeholders in the movie industry
from the results of the sentiment analysis. The sentiment analysis system can guide
decisions in the production of the movie, the marketing methodologies, and how to
involve the audience through the classifying of movie reviews.
 Time-bound: Model development, training, and evaluation would be effectively time-
bound to ensure that results are delivered on time and project deadlines are met.

Dataset Description
For this project, the dataset used is an IMDB movie review labeled one with positive and the
other with a negative sentiment (Adam, 2021). In total, there are 39,723 reviews: one positive
and one negative. The data set is well-structured formatted in a table, and the distribution
between positive and negative value reviews is quite balanced, hence a good representative
sample for the task of sentiment analysis.
Figure 1: Dataset
Strengths:

 Organized and labeled dataset: The data is in an organized and clear format such that it
makes work easier for the person carrying out the analysis and the model developer.
 Balanced distribution of sentiment labels: This enables an equal proportion of both
positive and negative reviews and balances the training and evaluation of the sentiment
analysis models.
 Sufficient size for effective model training: Each of the datasets is represented by at least
40,000 reviews, and indeed this size is normally considered large enough to train robust
models of sentiment analysis that capture different linguistic patterns.

Weaknesses:

 Presence of duplicated records that need pre-processing: The duplicated records in the
dataset are just but a pertinent situation prone to many noises and biases in any
meaningful analysis. Preprocessing stages may be involved in ensuring that they are
cleared for quality.
 Limited contextual information: The dataset does not come with more of the context
details, such as the date released, genres, and demographics of the reviewers, all of which
will actually be important in getting the kind of sentiment. This can contribute to deeper
analysis and improved precision in classification per sentiment.

Exploratory Data Analysis (EDA) and Preprocessing


In this context of IMDB Movie Rating Sentiment Analysis, EDA involves exploring sentiment
label distributions, analyzing word frequencies in positive and negative reviews, and
preprocessing text data for modeling (Massarotto, 2021).

 Word Cloud Visualization


The most frequent top words for positive and negative reviews and their occurrence are
shown together in word clouds, which give a qualitative overview of the vocabulary
related to sentiment classes as shown in figure 2 and 3. Word clouds are a way of
showing the frequencies of words in a visual manner, where usually, the largest words
tend to demonstrate the highest frequencies.

Figure 2: Positive Words cloud


Figure 3: Negative Words cloud

 Text Preprocessing
Text preprocessing aims at the removal of noise, standardization of format, and quality
improvement, thus getting data ready for analysis and modeling. It removes special
characters, URLs, stop-words, and also does lemmatization to make sure the word
representation is uniform and also aids in dimensionality reduction as shown in figure 4.

Figure 4: Text preprocessing (cleaned data)

 Baseline Establishment
Baseline performance establishes a point of reference towards understanding when
sentiment analysis models work and improve over simple, naive methods. One such
commonly used approach to setting a baseline is the majority-class classifier, where for
every instance in the dataset, the majority class is predicted.
Traditional Models

Traditional Machine Learning Methods


In the realm of natural language processing (NLP), traditional machine learning methods play a
significant role in sentiment analysis tasks, offering simplicity and interpretability. Here, we
describe and compare five relevant traditional machine learning methods suitable for sentiment
analysis of IMDB movie ratings:

1. Naive Bayes:

 Description: Naive Bayes classifiers assume independence among features given


the class label, making them efficient and effective for text classification tasks
(Adam, 2021).

 Strengths: Naive Bayes classifiers are computationally efficient, handle high-


dimensional data well, and perform surprisingly well even with the assumption of
feature independence.

 Weaknesses: The independence assumption can hardly be realized in practice, and


therefore, the model might not realize its optimal performance in catching the
complexities that exist among the features.

 Comparison: We choose Naive Bayes because of its simplicity and very low
computational requirements, which makes it extremely attractive when processing
very large volumes of textual data, as is the case in IMDB movie reviews.

2. Logistic Regression:

 The logistic regression models the probability of an event or outcome being true
using a logistic function, hence it is normally used as a linear classifier.

 Strengths: Logistic Regression is a human-interpretable model that can be easily


scaled up to big data and, with proper feature engineering, worked for non-linear
relationships.

 Weaknesses: Logistic regression makes the assumption of linear relationships


between the input features and log-odds of the outcome.
 Comparison: Logistic regression has been selected because of its interpretability
and capability with respect to feature transformation, mostly suitable for tasks of
sentiment analysis where decisions of the model should be interpretable.

3. Support Vector Machines (SVM):

 SVMs try to find the hyper plane that optimally separates the data points from
classes in the high-dimensional space (Ghaddar, 2018).

 Strengths: SVMs work best in high-dimensional spaces; they are flexible


concerning the diverse kernel functions and have good robustness to over-fitting.

 Weaknesses: SVMs are computationally intensive, sensitive to the careful choice


of the kernel parameters, and suffer from large-scale application.

 Comparison: In large-scale sentiment analysis tasks, however, while SVMs have


high classification accuracy, the high computational complexity of SVMs may
make their benefits not worth it.

4. Decision Trees:

 Decision trees recursively split the feature space according to feature values to
produce a tree-like structure for classification.

 Strengths: Decision trees are one of the most interpretable model types, either
with numerical or categorical data, and they hardly ever require any data
preprocessing.

 Weaknesses: In this context, the decision tree method is prone to overfitting,


sensitive to small variations within the data, and does not generalize well when
presented with unseen data

 Comparison: Though offering the interpretability and ease of use, decision trees
may lack the predictive power at the disposal to achieve good accuracy of the
sentiment analysis of IMDB movie reviews.

5. Random Forests:
 Random forest is an ensemble learning method that selects decision trees trained
on random subsamples of the data with replacement, thus yielding greater
generalization and robustness (Genuer, 2017).

 Strengths: Random forests are less prone to overfitting compared with single
decision trees, can effectively be used with high-dimensional data, and offer
measures of feature importance.

 Weaknesses: Random Forests are computably expensive and require much


hyperparameter tuning. They are not scaled for an extremely large dataset.

 Comparison: Random forests are more accurate and robust compared to a single
tree, hence will give better results in a sensitive task like sentiment analysis,
where model performance is very important.

Deep Learning Methods


1. Long Short-Term Memory (LSTM):

 LSTM networks are a form of recurrent neural network (RNN) designed


specifically for effective long-range dependency capture in sequential data since
they are amenable to text data with complex temporal dynamics (Sherstinsky,
2020).

 Strengths: Some of the strengths of this type of model include the long-term
dependencies of long durations, avoidance of the vanishing gradient problem, and
best-suited modeling for variable-length sequences.

 Weaknesses: LSTMs may be a bit tricky to grasp the fine meaning variations
within the text, require hyperparameter tuning very diligently, and sometimes
computationally costly.

 The decision of the use of LSTM networks reflects this decision of the networks
to model sequence data effectively.

2. Gated Recurrent Unit (GRU):


 Description: GRU networks, is just another form of RNN designed for the same
task as LSTM units, for some type of limitations like computational complexity
and overfitting.

 Strengths: This allows GRUs to capture long-range dependencies with the same
capability as LSTMs but is much more computationally efficient and much easier
to train.

 Weaknesses: They are asked to do complex and difficult tasks in which, therefore,
the GRUs might fail to capture the long-term dependencies effectively.

 Comparison: The performance of GRUs is more erratic in such cases. This is the
balance between the effectiveness and efficiency of GRU networks, making them
a better alternative for limited computational resources in performing sentiment
analysis tasks.

3. Feed Forward Neural Network (FFNN):

 Description: FFNNs have deep architecture with multiple hidden layers, hence
they can capture complex patterns and relations expressed in textual data
(Emmert-Streib, 2020).

 Strengths: FFNNs, effectively capturing the ability of hierarchical features and


their abstract patterns in the text with good performance, are computationally
efficient and very easy to scale

 Weaknesses: This could mean that FFNNs would be less successful at capturing
the sequential dependencies and contextual subtleties involved in IMDB movie
review data compared to recurrent models like LSTMs and GRUs.

 Comparison: On the other side, while FFNNs afford a large degree of scalability
and efficiency, they are supposed not to afford effectiveness in identifying the
sequential dependencies and capturing the delicate patterns of sentiments that
recurrent models.

4. Transformer Models (e.g., BERT, GPT):


 Description: Transformer models represent a breakthrough in NLP, utilizing self-
attention mechanisms to capture contextual relationships within input sequences
without recurrent connections (Gruetzemacher, 2022).

 Strengths: Transformer models excel in capturing global dependencies, handling


variable-length inputs efficiently, and achieving state-of-the-art performance on
various NLP tasks.

 Weaknesses: Transformer models may require extensive pretraining on large


corpora, suffer from computational overhead during inference, and struggle with
modeling sequential dependencies over extremely long sequences.

 Comparison: Transformer models offer unparalleled performance in capturing


contextual relationships in text, making them suitable for sentiment analysis tasks
where understanding the broader context of IMDB movie reviews is crucial.

5. Convolutional Neural Networks (CNNs):

 Description: Transformer models, in detail, need no explicit recurrent connection


for the simple fact of capturing the relationship context in input sequences,
making models like LSTMs paramount while trained over long sequences.

 Strengths: Recent transformer models have been very successful in capturing


global dependencies, processing variable-length inputs with tremendous
efficiency, and achieving state-of-the-art performance on a diverse range of tasks
for natural language processing.

 Weaknesses: Transformer models can become very computationally costly, both


at training time, on large corpora sets, and also at inference time in modeling
sequential dependencies over extremely long sequences.

 Comparison: The CNNs can provide effective and scalable ways. However, while
CNNs offer effective and scalable methods, they have not been powerful enough
to capture the sensitivity of the sequential dependencies and nuanced sentiment
patterns in IMDB movie reviews with their scalable long-short term memory
networks.
Implementations
Naive Bayes
 Implementation: A Complement Naive Bayes classifier (Adam, 2021) was trained on
the dataset to perform sentiment classification. Hyperparameters of the classifier were
optimized using both Optuna and GridSearch techniques to enhance model performance.
 Results: After hyperparameter tuning, the test-set accuracy achieved with the Naive
Bayes classifier was 86.76%. This reflects the proportion of correct classifications of
movie reviews as either having a positive or negative sentiment.

Figure 5: Naive Bayes accuracy

Figure 6: Naive Bayes accuracy with Optuna and Grid Search

 Evaluation: Several metrics measured the model performance as classifiers of the Naive
Bayes and included accuracy, F1 score, confusion matrix, and the ROC-AUC curve as
shown in figure 7, 8 and 9.
Figure 7: Classification report of Naive Bayes

Figure 8: Confusion matrix


Figure 9: ROC Curve

Logistic Regression
 Implementation: I implemented the sentiment classifier with a commonly used linear
classification algorithm, i.e., logistic regression. I then went on to train the model over
this dataset and, later, predicted the sentiment label for reviews of the movies.
 Result: The logistic regression model tested scored accuracy of 89.53% with an F1 score
of 0.8953.

Figure 10: Logistic regression accuracy & F1 Score


 Evaluation: The logistic regression model was evaluated based on the F1 score,
accuracy,
Confusion matrix and ROC curve

Figure 11: Confusion matrix (LR)

Figure 12: ROC Curve (LR)


Deep Learning Models
LSTM (Long Short-Term Memory)
 Implementation: An LSTM-based neural network model was implemented in order to
capture sequential dependencies in movie reviews.
 Results: The LSTM model after 5 epochs processed on the test set and attained an
accuracy of 87.48%. This well-accords with the model's accuracy to capture the
sequential nature of language.

Figure 13: Performance of the LSTM model

 Evaluation: The model performance is computed. The metrics give an understanding of


the ability by the model to classify movie reviews between positive and negative
sentiments.

Figure 14: Confusion matrix and ROC Curve (LSTM)

Feed forward Neural Network


 Implementation: In this work, the feed forward neural network with embedding layers
was implemented for sentiment classification (Emmert-Streib, 2020).
 Results: The accuracy and F1 score of the feed forward neural network were able to
exhibit 84.80% and 0.8480, respectively, on the test set.

Figure 15: Feed forward neural network performance

 Evaluation: The performance of the feed-forward neural network is determined by some


metrics: accuracy, F1 score, a confusion matrix as in figure 13, and the ROC-AUC curve
as in 14. These metrics clearly give an insight on how the model is performing and
whether it can classify reviews of a movie into sentiment correctly.

Figure 16: Feed forward neural network confusion matrix


Figure 17: Feed forward neural network ROC curve

Model Refinement and Evaluation


Hyperparameter Tuning

The hyperparameters of the Naive Bayes classifier were optimized using both Optuna and
GridSearch techniques. The aim was optimization searching with the purpose of finding the
configuration within the hyperparameter space that will give the best hyperparameter setting for
the sentiment classification problem.

Evaluation Metrics

Performance of the models was evaluated using metrics such as accuracy, F1 score, confusion
matrix, and ROC-AUC curve. These indices give a general view of the level of effectiveness
from the models towards sentiment classification. Accuracy scores the overall correctness of
predictions provided by the model, while the F1 score balances precision and recall, being
particularly useful for imbalanced datasets. Confusion matrix provides clear insight into the kind
of errors being committed by the model, like false positives and false negatives. Moreover, this
evaluates the model's capability of separation between positive and negative sentiments at
different threshold levels through the ROC-AUC curve.

Comparison of Models

We did a comparison of traditional machine learning models with the deep learning models, such
as LSTM and feed forward neural network. Based on the performance metric, we come out with
the best approach for sentiment analysis on the IMDB movie reviews. The comparative study
conducted will give useful insights into the strengths and weaknesses of each model type,
making it possible to help choose the most appropriate technique for sentiment classification
tasks.

Conclusion
Traditional machine learning and deep learning approaches have been proposed to address the
sentiment analysis problem. The developed model, therefore, is applicable in explaining
audience sentiment towards movies, hence actionable insights to stakeholders in the film
industry with the purpose of making informed decisions.

As such, this will leverage advanced NLP techniques to extract useful data from textual
information, thereby enabling stakeholders in gaining deeper insights into audience preferences,
trends, and sentiments. Models developed within this project provide a way to scale and make
efficient analysis of a large volume of movie reviews in such a way that allows data-driven
decisions and, in general, provide for enhancement of experience gained from watching movies
(Topal, 2016).

References
1. Topal, K. and Ozsoyoglu, G., 2016, August. Movie review analysis: Emotion analysis of
IMDb movie reviews. In 2016 IEEE/ACM International Conference on Advances in
Social Networks Analysis and Mining (ASONAM) (pp. 1170-1176). IEEE.
2. Liu, B., 2022. Sentiment analysis and opinion mining. Springer Nature.
3. Holsapple, C., Hsiao, S.H. and Pakath, R., 2014. Business social media analytics:
Definition, benefits, and challenges.
4. Dahir, U.M. and Alkindy, F.K., 2023. Utilizing machine learning for sentiment analysis
of IMDB movie review data. International Journal of Engineering Trends and
Technology, 71(5), pp.18-26.
5. Ruus, R. and Sharma, R., 2019, November. Predicting Movies’ Box office result-A large
scale study across Hollywood and Bollywood. In International Conference on Complex
Networks and Their Applications (pp. 982-994). Cham: Springer International Publishing.
6. Lopez, B. and Sumba, X., 2019. IMDb sentiment analysis.
7. Harish, B.S., Kumar, K. and Darshan, H.K., 2019. Sentiment analysis on IMDb movie
reviews using hybrid feature extraction method.
8. Singh, A., Kulkarni, C. and Ayan, N.A., 2022. Sentiment Analysis of IMDB Movie
Reviews.
9. Shaukat, Z., Zulfiqar, A.A., Xiao, C., Azeem, M. and Mahmood, T., 2020. Sentiment
analysis on IMDB using lexicon and neural networks. SN Applied Sciences, 2, pp.1-10.
10. Adam, N.L., Rosli, N.H. and Soh, S.C., 2021, September. Sentiment analysis on movie
review using Naïve Bayes. In 2021 2nd international conference on artificial intelligence
and data sciences (AiDAS) (pp. 1-6). IEEE.
11. Massarotto, G. and Ittoo, A., 2021. Gleaning insight from antitrust cases using machine
learning. Stanford Computational Antitrust, 1.
12. Ghaddar, B. and Naoum-Sawaya, J., 2018. High dimensional data classification and
feature selection using support vector machines. European Journal of Operational
Research, 265(3), pp.993-1004.
13. Genuer, R., Poggi, J.M., Tuleau-Malot, C. and Villa-Vialaneix, N., 2017. Random forests
for big data. Big Data Research, 9, pp.28-46.
14. Sherstinsky, A., 2020. Fundamentals of recurrent neural network (RNN) and long short-
term memory (LSTM) network. Physica D: Nonlinear Phenomena, 404, p.132306.
15. Emmert-Streib, F., Yang, Z., Feng, H., Tripathi, S. and Dehmer, M., 2020. An
introductory review of deep learning for prediction models with big data. Frontiers in
Artificial Intelligence, 3, p.4.
16. Gruetzemacher, R. and Paradice, D., 2022. Deep transfer learning & beyond:
Transformer language models in information systems research. ACM Computing Surveys
(CSUR), 54(10s), pp.1-35.

You might also like