NLP Final
NLP Final
Table of Contents
Introduction......................................................................................................................................2
Background Review.........................................................................................................................3
SMART Objectives.........................................................................................................................4
Dataset Description..........................................................................................................................5
Traditional Models...........................................................................................................................8
Implementations.........................................................................................................................13
Naive Bayes...............................................................................................................................13
Logistic Regression....................................................................................................................15
Conclusion.....................................................................................................................................20
References......................................................................................................................................20
Introduction
One of the major challenges in the field of Natural Language Processing (NLP) is accurately
quantifying sentiment from textual data. The problem here is that the human language is nuanced
and context-dependent; the very same words may take on very different emotional colorations
depending on the context and the tone in which it is spoken (Liu, 2016). Sentiment analysis is
one such part of NLP that solves this issue by use of algorithms and linguistic analyses that
classify text under pre-defined sentiment classes: those are either positive, negative, or neutral. In
spite of such improvements, a problem remains that it is difficult to effectively determine
subtleties in sentiment, especially in the subjective domain, where emotions and cultural context
are an inseparable combination (Topal, 2016).
The domain of movie reviews provides the scope for a rich and meaningful domain of study
within the framework of sentiment analysis. Fact is, more than any other form of textual data,
movie reviews usually capture diversity in emotions, opinions, and subjective experiences — an
ideal candidate for sentiment analysis (Liu, 2016). And of course, the filmic environment is
replete with genres, ersyns of all kinds, eras, and even cultural contexts, hence really making the
job for sentiment analysis more than difficult. Continuing with the aid of IMDB movie scores,
this paper looks to make contributions to the field of NLP by making known the collective
sentiments which lie in hiding within user-generated movie reviews (Holsapple, 2014). Through
these analyses, we are trying to somehow give both the film-watching public and the makers
themselves a sense of worth in the movies they make, nurturing a mutual relationship that is
enriching for the development of the cultural tapestries as much for the practice of telling stories.
Background Review
The sentiment analysis in the film industry has opened the gates for research on a number of
studies, bringing varied methodologies and insights to the table. Dahir and Alkindy do the data
analysis of IMDB movie reviews using logistic regression with machine learning algorithms.
The experiment shows that the model performs well in logistic regression with TF-IDF as the
feature extraction method. This is a supervised learning method and, though very precise, it may
be less easily scalable (Dahir, 2023). Similarly, Ruus and Sharma undertook the task of
predicting movie revenue with Twitter and critic data, for which they saw room for the use of
random forest models. However, it oversimplifies and captures the nuanced sentiment in
reviews, which could be a challenge in making accurate revenue outcome predictions (Ruus,
2019).
Lopez and Sumba contextualize it into their work on sentiment analysis via IMDB, suggesting
hybrid feature extraction techniques to improve classification accuracy. Even though presenting
sentiment in this multilayered form allows the model flexibility, potential arising in model
complexity is worth probing further into its trade-offs (Lopez, 2019).
Harish and Kumar proposed that the hybrid feature-extraction method, where lexicon features
are used along with machine learning features, would become the most accurate sentiment
classifier. However, concern still remains for the tradeoff in accuracy toward computational
efficiency, which calls for strategies in optimization (Harish, 2019).
Singh and Kulkarni researched the IMDB movie review classifier and assumed that logistic
regression gave the best accuracy (Singh, 2022). Explain the feature selection method more in
order to obtain a better explanation of the model and its generalization. Taken together, these
studies advance the sentiment analysis methodology within the film industry, emphasizing
further the importance of context, feature selection, and scalability for accurately capturing
audience sentiment and predicting movie success.
SMART Objectives
Specific: we develop a sentiment analysis system that could classify the IMDB movie
reviews into classes of either positive or negative with high accuracy.
Measurable: In this project, model performance will be measured using metrics that
include accuracy, F1, and AUC-ROC, among others, that quantify sentiment
classification best.
Achievable: In this project, deep learning models (for example, LSTM, Feed forward
Neural Network) and traditional machine learning models (such as Naïve Bayes, Logistic
Regression) will be implemented and compared, given the success of these models in
carrying out the task of sentiment analysis and, therefore, attainable within the project.
Relevant: This is an aim to give relevant findings for stakeholders in the movie industry
from the results of the sentiment analysis. The sentiment analysis system can guide
decisions in the production of the movie, the marketing methodologies, and how to
involve the audience through the classifying of movie reviews.
Time-bound: Model development, training, and evaluation would be effectively time-
bound to ensure that results are delivered on time and project deadlines are met.
Dataset Description
For this project, the dataset used is an IMDB movie review labeled one with positive and the
other with a negative sentiment (Adam, 2021). In total, there are 39,723 reviews: one positive
and one negative. The data set is well-structured formatted in a table, and the distribution
between positive and negative value reviews is quite balanced, hence a good representative
sample for the task of sentiment analysis.
Figure 1: Dataset
Strengths:
Organized and labeled dataset: The data is in an organized and clear format such that it
makes work easier for the person carrying out the analysis and the model developer.
Balanced distribution of sentiment labels: This enables an equal proportion of both
positive and negative reviews and balances the training and evaluation of the sentiment
analysis models.
Sufficient size for effective model training: Each of the datasets is represented by at least
40,000 reviews, and indeed this size is normally considered large enough to train robust
models of sentiment analysis that capture different linguistic patterns.
Weaknesses:
Presence of duplicated records that need pre-processing: The duplicated records in the
dataset are just but a pertinent situation prone to many noises and biases in any
meaningful analysis. Preprocessing stages may be involved in ensuring that they are
cleared for quality.
Limited contextual information: The dataset does not come with more of the context
details, such as the date released, genres, and demographics of the reviewers, all of which
will actually be important in getting the kind of sentiment. This can contribute to deeper
analysis and improved precision in classification per sentiment.
Text Preprocessing
Text preprocessing aims at the removal of noise, standardization of format, and quality
improvement, thus getting data ready for analysis and modeling. It removes special
characters, URLs, stop-words, and also does lemmatization to make sure the word
representation is uniform and also aids in dimensionality reduction as shown in figure 4.
Baseline Establishment
Baseline performance establishes a point of reference towards understanding when
sentiment analysis models work and improve over simple, naive methods. One such
commonly used approach to setting a baseline is the majority-class classifier, where for
every instance in the dataset, the majority class is predicted.
Traditional Models
1. Naive Bayes:
Comparison: We choose Naive Bayes because of its simplicity and very low
computational requirements, which makes it extremely attractive when processing
very large volumes of textual data, as is the case in IMDB movie reviews.
2. Logistic Regression:
The logistic regression models the probability of an event or outcome being true
using a logistic function, hence it is normally used as a linear classifier.
SVMs try to find the hyper plane that optimally separates the data points from
classes in the high-dimensional space (Ghaddar, 2018).
4. Decision Trees:
Decision trees recursively split the feature space according to feature values to
produce a tree-like structure for classification.
Strengths: Decision trees are one of the most interpretable model types, either
with numerical or categorical data, and they hardly ever require any data
preprocessing.
Comparison: Though offering the interpretability and ease of use, decision trees
may lack the predictive power at the disposal to achieve good accuracy of the
sentiment analysis of IMDB movie reviews.
5. Random Forests:
Random forest is an ensemble learning method that selects decision trees trained
on random subsamples of the data with replacement, thus yielding greater
generalization and robustness (Genuer, 2017).
Strengths: Random forests are less prone to overfitting compared with single
decision trees, can effectively be used with high-dimensional data, and offer
measures of feature importance.
Comparison: Random forests are more accurate and robust compared to a single
tree, hence will give better results in a sensitive task like sentiment analysis,
where model performance is very important.
Strengths: Some of the strengths of this type of model include the long-term
dependencies of long durations, avoidance of the vanishing gradient problem, and
best-suited modeling for variable-length sequences.
Weaknesses: LSTMs may be a bit tricky to grasp the fine meaning variations
within the text, require hyperparameter tuning very diligently, and sometimes
computationally costly.
The decision of the use of LSTM networks reflects this decision of the networks
to model sequence data effectively.
Strengths: This allows GRUs to capture long-range dependencies with the same
capability as LSTMs but is much more computationally efficient and much easier
to train.
Weaknesses: They are asked to do complex and difficult tasks in which, therefore,
the GRUs might fail to capture the long-term dependencies effectively.
Comparison: The performance of GRUs is more erratic in such cases. This is the
balance between the effectiveness and efficiency of GRU networks, making them
a better alternative for limited computational resources in performing sentiment
analysis tasks.
Description: FFNNs have deep architecture with multiple hidden layers, hence
they can capture complex patterns and relations expressed in textual data
(Emmert-Streib, 2020).
Weaknesses: This could mean that FFNNs would be less successful at capturing
the sequential dependencies and contextual subtleties involved in IMDB movie
review data compared to recurrent models like LSTMs and GRUs.
Comparison: On the other side, while FFNNs afford a large degree of scalability
and efficiency, they are supposed not to afford effectiveness in identifying the
sequential dependencies and capturing the delicate patterns of sentiments that
recurrent models.
Comparison: The CNNs can provide effective and scalable ways. However, while
CNNs offer effective and scalable methods, they have not been powerful enough
to capture the sensitivity of the sequential dependencies and nuanced sentiment
patterns in IMDB movie reviews with their scalable long-short term memory
networks.
Implementations
Naive Bayes
Implementation: A Complement Naive Bayes classifier (Adam, 2021) was trained on
the dataset to perform sentiment classification. Hyperparameters of the classifier were
optimized using both Optuna and GridSearch techniques to enhance model performance.
Results: After hyperparameter tuning, the test-set accuracy achieved with the Naive
Bayes classifier was 86.76%. This reflects the proportion of correct classifications of
movie reviews as either having a positive or negative sentiment.
Evaluation: Several metrics measured the model performance as classifiers of the Naive
Bayes and included accuracy, F1 score, confusion matrix, and the ROC-AUC curve as
shown in figure 7, 8 and 9.
Figure 7: Classification report of Naive Bayes
Logistic Regression
Implementation: I implemented the sentiment classifier with a commonly used linear
classification algorithm, i.e., logistic regression. I then went on to train the model over
this dataset and, later, predicted the sentiment label for reviews of the movies.
Result: The logistic regression model tested scored accuracy of 89.53% with an F1 score
of 0.8953.
The hyperparameters of the Naive Bayes classifier were optimized using both Optuna and
GridSearch techniques. The aim was optimization searching with the purpose of finding the
configuration within the hyperparameter space that will give the best hyperparameter setting for
the sentiment classification problem.
Evaluation Metrics
Performance of the models was evaluated using metrics such as accuracy, F1 score, confusion
matrix, and ROC-AUC curve. These indices give a general view of the level of effectiveness
from the models towards sentiment classification. Accuracy scores the overall correctness of
predictions provided by the model, while the F1 score balances precision and recall, being
particularly useful for imbalanced datasets. Confusion matrix provides clear insight into the kind
of errors being committed by the model, like false positives and false negatives. Moreover, this
evaluates the model's capability of separation between positive and negative sentiments at
different threshold levels through the ROC-AUC curve.
Comparison of Models
We did a comparison of traditional machine learning models with the deep learning models, such
as LSTM and feed forward neural network. Based on the performance metric, we come out with
the best approach for sentiment analysis on the IMDB movie reviews. The comparative study
conducted will give useful insights into the strengths and weaknesses of each model type,
making it possible to help choose the most appropriate technique for sentiment classification
tasks.
Conclusion
Traditional machine learning and deep learning approaches have been proposed to address the
sentiment analysis problem. The developed model, therefore, is applicable in explaining
audience sentiment towards movies, hence actionable insights to stakeholders in the film
industry with the purpose of making informed decisions.
As such, this will leverage advanced NLP techniques to extract useful data from textual
information, thereby enabling stakeholders in gaining deeper insights into audience preferences,
trends, and sentiments. Models developed within this project provide a way to scale and make
efficient analysis of a large volume of movie reviews in such a way that allows data-driven
decisions and, in general, provide for enhancement of experience gained from watching movies
(Topal, 2016).
References
1. Topal, K. and Ozsoyoglu, G., 2016, August. Movie review analysis: Emotion analysis of
IMDb movie reviews. In 2016 IEEE/ACM International Conference on Advances in
Social Networks Analysis and Mining (ASONAM) (pp. 1170-1176). IEEE.
2. Liu, B., 2022. Sentiment analysis and opinion mining. Springer Nature.
3. Holsapple, C., Hsiao, S.H. and Pakath, R., 2014. Business social media analytics:
Definition, benefits, and challenges.
4. Dahir, U.M. and Alkindy, F.K., 2023. Utilizing machine learning for sentiment analysis
of IMDB movie review data. International Journal of Engineering Trends and
Technology, 71(5), pp.18-26.
5. Ruus, R. and Sharma, R., 2019, November. Predicting Movies’ Box office result-A large
scale study across Hollywood and Bollywood. In International Conference on Complex
Networks and Their Applications (pp. 982-994). Cham: Springer International Publishing.
6. Lopez, B. and Sumba, X., 2019. IMDb sentiment analysis.
7. Harish, B.S., Kumar, K. and Darshan, H.K., 2019. Sentiment analysis on IMDb movie
reviews using hybrid feature extraction method.
8. Singh, A., Kulkarni, C. and Ayan, N.A., 2022. Sentiment Analysis of IMDB Movie
Reviews.
9. Shaukat, Z., Zulfiqar, A.A., Xiao, C., Azeem, M. and Mahmood, T., 2020. Sentiment
analysis on IMDB using lexicon and neural networks. SN Applied Sciences, 2, pp.1-10.
10. Adam, N.L., Rosli, N.H. and Soh, S.C., 2021, September. Sentiment analysis on movie
review using Naïve Bayes. In 2021 2nd international conference on artificial intelligence
and data sciences (AiDAS) (pp. 1-6). IEEE.
11. Massarotto, G. and Ittoo, A., 2021. Gleaning insight from antitrust cases using machine
learning. Stanford Computational Antitrust, 1.
12. Ghaddar, B. and Naoum-Sawaya, J., 2018. High dimensional data classification and
feature selection using support vector machines. European Journal of Operational
Research, 265(3), pp.993-1004.
13. Genuer, R., Poggi, J.M., Tuleau-Malot, C. and Villa-Vialaneix, N., 2017. Random forests
for big data. Big Data Research, 9, pp.28-46.
14. Sherstinsky, A., 2020. Fundamentals of recurrent neural network (RNN) and long short-
term memory (LSTM) network. Physica D: Nonlinear Phenomena, 404, p.132306.
15. Emmert-Streib, F., Yang, Z., Feng, H., Tripathi, S. and Dehmer, M., 2020. An
introductory review of deep learning for prediction models with big data. Frontiers in
Artificial Intelligence, 3, p.4.
16. Gruetzemacher, R. and Paradice, D., 2022. Deep transfer learning & beyond:
Transformer language models in information systems research. ACM Computing Surveys
(CSUR), 54(10s), pp.1-35.