A contest of sentiment analysis: k-nearest neighbor versus neural network
A contest of sentiment analysis: k-nearest neighbor versus neural network
Corresponding Author:
Fachrul Kurniawan
Department of Informatics Engineering, Faculty of Science and Technology
Universitas Islam Negeri Maulana Malik Ibrahim
St. Gajayana 50, Malang City, East Java, Indonesia
Email: [email protected]
1. INTRODUCTION
Community discourse essentially shapes public perceptions and policy frameworks, especially in
contentious topics like Islamophobia that evoke heated debates across social media platforms [1]. In order to
devise proper policies for such issues, detailed methodology and analysis should be carried out with a view to
evaluating and understanding the pulse of the public sentiment, which essentially requires the power of deep
text-mining technologies for the precise identification and classification of discourse [2]. If applied, such
technologies hold considerable potential for providing essential insights into the nature of public opinion.
Another valuable contribution of this study is the use of machine learning and deep learning
algorithms for classifying conversations with positive and negative sentiments accordingly [3]. Deep learning
is the subpart of machine learning that goes beyond the limitations of traditional approaches by modeling their
operations just like the human brain and efficiently manages complex data types. Machine learning tries to
improve system performance by acquiring knowledge from labeled training datasets [4], [5]. These
developments offer a more complex understanding of community perceptions.
Deep learning, as introduced by Geoffrey Hinton in 2006, changed the main paradigm in machine
learning by providing an automated feature engineering process and enhancing their performances in a very
wide range of applications including speech analysis, text classification, and image recognition [6], [7]. The
neural network (NN) architectures it covers, including NN, convolutional neural network (CNN), and recurrent
neural network (RNN), prove their worth in both supervised and unsupervised learning scenarios [8]. The mere
fact that it does so underlines the potential of deep learning to change how we analyze and comprehend digital
communication.
On the other hand, traditional machine learning algorithms decision trees, random forests, support
vector machines (SVM), naïve Bayes all require features to be manually predefined and also hardwire the
programming to implement specific tasks [9]–[12]. In this review, two different algorithms are considered: NN
under deep learning and k-nearest neighbor (K-NN) under traditional machine learning. NN models mimic
human neural systems functions by interpreting stimuli to generate actionable responses [13], [14]. However,
K-NN is appropriate for jobs of pattern identification and classification since it classifies data by the proximity
of nearby data points to one another [15]–[19]. The insight about the comparisons of both the advantages and
disadvantages, and their influence on sentiment analysis (SA) approaches.
Except for the underlying approaches, the research tries to compare and critically evaluate the
classification performance of NN and K-NN algorithms using the same process phases. The study provided
useful insight into their suitability for SA in discourse about Islamophobia through the structured evaluation of
advantages and disadvantages of each and of measures of performance. The ultimate objective is to determine
which of these techniques, deep learning-NN or traditional machine learning-K-NN, gives more accuracy and
speed in the classification of sentiment in online discussions. Comparing these two aimed at improving our
understanding of the technological foundation of SA and how it informs public and policy responses.
2. METHODS
The research method has been accordingly planned in such a way that it captures the aims of the
research with step-by-step clarity. It describes a chronological sequence of steps involving the careful
collection of data, processing of the same, and ending with the derivation of key research findings. Primarily,
this is aimed at equipping the researcher with all that they may require in order to effectively undertake the
research. Figure 1 shows a graphic that explains this organized research method and provides a clear guide for
understanding the study's complex way of doing things.
A contest of sentiment analysis: k-nearest neighbour versus neural network (Fachrul Kurniawan)
1628 ISSN: 2252-8938
training [26]. This iterative adjustment enhances the algorithm's ability to provide accurate outputs aligned
with expected values, which is crucial for optimizing classification outcomes in SA tasks [27].
On the other hand, the K-NN is a proximity-based learning classification. The algorithm classified
new observations through their similarities in the previously labeled examples of observations [28]. In this
regard, K-NN is considered a supervising learning technique, where it learns from the prelabelled training
datasets through taking the type of new instances on the nearest neighbors with respect to a set of training
observations. The main challenge with 'K-NN' classification involves choosing 'k', which balances the number
of neighbors to be considered [29]. A large value of k smooths out the noise, though on the other side,
it contributes to an over-smoothing effect. On the other side, a small 'k' gives less smoothed predictions with
vulnerability to outliers.
However, their effectiveness in respect to both NN and K-NN depends on the quality of the training
data and properties of the data. Due to the hierarchical structure of NN, combined with learning processes, NN
become very good at recognizing complex patterns and relations inside the data. On the other hand, the
openness and reliance on local similarity make K-NN very useful for some classification tasks. It is particularly
good in those cases in which the points constitute discrete or discontinuous clusters. This is because one can
only tell, by experience, about the adequacy of various algorithms to several specific needs or attributes sought
within the analyzed dataset, having deep knowledge of their respective merits and demerits. In addition, NNs
and K-NN are among a broad range of algorithms for classification that could be used in SA, and each will
have its relative merits based on the different factors involved in a dataset, such as complexity, size, and
noisiness. These techniques surely keep evolving with each step-in machine learning and artificial intelligence,
probably hybrid models or new ways to obtain better performance and efficiency in the broad applications and
contexts of SA.
2.3. Evaluation
A confusion matrix serves as an essential instrument for evaluating the effectiveness of classification
algorithms or models when ground truth data is accessible [30]. To systematically illustrate the model's
predicted ability by distinguishing between correct and incorrect classifications. Table 1 illustrates the core
concepts within the confusion matrix. From Table 1, true positives (TP) and true negatives (TN) represent
correct predictions, where the model accurately identifies positive and negative instances, respectively.
Conversely, false positives (FP) and false negatives (FN) represent inaccurate predictions where the model
misclassifies positive instances as negative and vice versa.
Classification performance metrics from the confusion matrix encompass accuracy, precision, recall,
and the F1-score, all articulated as percentages to quantify the model's effectiveness in distinguishing between
classes. Precision (1) quantifies the ratio of accurately predicted positive cases to the total instances anticipated
as positive, emphasizing the model's accuracy in optimistic predictions instances predicted as positive,
highlighting the model's exactness in positive predictions. Recall (2) gauges the fraction of correctly predicted
positive instances out of all actual positive instances, emphasizing the model's ability to capture all positives.
Accuracy (3) calculates the ratio of correctly predicted instances (both positive and negative) to the total
number of instances, providing an overall assessment of the model's correctness. The F1-score (4) harmonizes
precision and recall, offering a balanced measure that considers both metrics' contributions to the model's
performance.
𝑇𝑃+𝑇𝑁
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = × 100% (1)
(𝑇𝑃+𝐹𝑃)
𝑇𝑃
𝑅𝑒𝑐𝑎𝑙𝑙 = × 100% (2)
(𝑇𝑃+𝐹𝑁)
𝑇𝑃+𝑇𝑁
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = × 100% (3)
(𝑇𝑃+𝐹𝑁+𝐹𝑃+𝑇𝑁)
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝑥 𝑅𝑒𝑐𝑎𝑙𝑙
𝐹 𝑀𝑒𝑎𝑠𝑢𝑟𝑒 = 2 × × 100% (4)
(𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙)
These metrics collectively evaluate how well a classification model performs, informing researchers and
practitioner about its strengths and areas needing improvement in various real-world applications of SA and
beyond.
Table 2 shows that the maximum accuracy attained is 0.78, corresponding to a configuration
consisting of 10 nodes and a learning rate of 0.1. Subsequently, the chosen test outcomes will be assessed using
a confusion matrix classification method. The subsequent findings delineate the confusion matrix's results,
which were obtained utilizing a 0.1 learning rate, ten nodes, 80% training data, and 20% testing data partition.
Figure 3 shows the results of NN.
Upon examining the calculations using the confusion matrix, 180 negative class predictions are
discerned. This result indicates the algorithm's proficiency in accurately predicting positive and negative
classes. The subsequent section presents a manual computation of the metrics, including accuracy, precision,
recall, and F1-score as in (5)-(8).
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (180 + 133)/(180 + 133 + 37 + 50) × 100% = 0.776 × 100% ≈ 77.6% (7)
A contest of sentiment analysis: k-nearest neighbour versus neural network (Fachrul Kurniawan)
1630 ISSN: 2252-8938
Classification employing the K-NN algorithm is achieved by identifying the predominant occurrence
of data within a specified number of proximal neighbors. In this investigation, the chosen number of neighbors
is three. Initially, data weighting is necessitated, serving as the foundation for determining the nearest neighbor
values. Subsequently, the performance of the implemented methodology is evaluated utilizing a confusion
matrix. This evaluation aims to ascertain the precise measurement values within the system. The conducted
measurements on K-NN algorithms yield the subsequent values in Figure 4.
Figure 4 presents the measures using the K-NN algorithm. Such measures enable the extraction of a
confusion matrix applied to metrics calculation of accuracy, precision, recall, and F1-score. These are obtained
in the use of formulae, as presented in the previous chapter. Thereby, the values of accuracy, precision, recall,
and F1-score will be obtained based on the confusion matrix values as in (9)-(12).
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (161 + 124)/(161 + 124 + 55 + 60) × 100% = 0.712 × 100% = 71% (11)
Our results show that NN and K-NN can classify the sentiment in Indonesian public health discussions
well. The best result for the NN model can reach 78% with ten nodes and 0.1 as the learning rate. It can be
seen that the K-NN algorithm showed quite a good performance, at an accuracy of 71% with three nearest
neighbors. This implies that though NN performs best concerning precision and recall, K-NN is doing great at
proximity-based classification tasks.
Table 3 shows the classification results of both methods NN and K-NN. Each of the methods has been
run using 10 different sample data in order to see various classifications. Table 3 also presents that the
classification results obtained from the test of each algorithm are in concert with the prelabeled training data
provided by the expert. Even as the metrics from the experiments conducted do not show very high values, the
results of classification are always in line with the training data, which means both algorithms had effective
implementations.
Our findings concur with the available literature that machine learning algorithms, especially, play a
very crucial role in SA. It is illustrated from the study that each of these approaches provides distinct advantages
in the fact that both NN and K-NN algorithms attain respectable accuracies of classification 78% for NN and
71% for K-NN. The NN model is adaptable and can change weights based on the subtleties of the training data;
hence, it is befitting for intricate sentiment patterns of health-related discussions [31]. Contrarily, the simplicity
of K-NN and reliance on nearest neighbors make a much more straightforward approach to initial sentiment
assessments so valuable in a much faster analysis of big volumes of data [32].
The comparison of the obtained results with previous studies gives complex views about the
approaches applied in the field of SA. The improved performance in real applications may not indicate an
increase in the accuracy of the classification metrics. Since the NN model can handle complex patterns, this
promises more deep SA processes, especially in those fields that put high demands on contextual
understanding. On the other hand, the efficiency of K-NN concerning proximity-based classification underlines
its application for rapid sentiment estimations, though it is less sensitive to subtle shifts in sentiment.
The present study developed a comprehensive dataset of one social media platform for a fixed period.
It is, nevertheless, a very specific dataset on Indonesian health discourse on Facebook, which has its
generalisability across platforms or geographies rather circumscribed. Future research should expand these
datasets to a diverse set of social media sources and languages in order to validate findings across broader
contexts.
These results indicate the necessity to continue the search for other methods of SA and data extension
to multilingual and multicultural contributions. Advanced natural language processing (NLP) techniques may
be integrated into future studies to further enhance sentiment classification accuracy between platforms. It
could also describe temporal trends and dynamic shifts in the ways public view public health.
Thus, our study shows that machine learning algorithms such as NN and K-NN represent a feasible
tool for performing SA in public health discourse on social media. The findings give an insight into the
application of the algorithms in understanding the public perception and sentiment, thus serving as an input for
targeted health communication strategies. More diversified datasets and further advances in analysis techniques
will provide a greater robustness of SA on health-related social media content for further research in the future.
4. CONCLUSION
The conclusion drawn from this study asserts that, in the context of classifying public discourse about
Islamophobia, the accuracy achieved by employing machine learning techniques is surpassed by that of deep
learning approaches. Experimental results indicate a 71% accuracy rate for machine learning, while deep
learning demonstrates a superior performance with a 78% accuracy rate. This discrepancy in accuracy amounts
to a 6% difference, with the deep learning method displaying a higher percentage. As expounded in preceding
chapters, deep learning constitutes an advanced development of machine learning techniques. This
A contest of sentiment analysis: k-nearest neighbour versus neural network (Fachrul Kurniawan)
1632 ISSN: 2252-8938
investigation substantiates the notion that deep learning implementation yields more precise outcomes than
machine learning, as evidenced by the accuracy rates associated with applying both models. Consequently, it
is advisable to utilize deep learning methodologies for text-based classification tasks that necessitate elevated
levels of accuracy, as opposed to alternative approaches.
ACKNOWLEDGMENTS
Universitas Islam Negeri Maulana Malik Ibrahim Malang who has supported this work.
REFERENCES
[1] Y. A. Ahmed, M. N. Ahmad, N. Ahmad, and N. H. Zakaria, “Social media for knowledge-sharing: a systematic literature review,”
Telematics and Informatics, vol. 37, pp. 72–112, Apr. 2019, doi: 10.1016/j.tele.2018.01.015.
[2] Z.-H. Zhou, Machine learning. Springer Nature, 2021, doi: 10.1007/978-981-15-1967-3.
[3] J. Huyan, W. Li, S. Tighe, Z. Xu, and J. Zhai, “CrackU‐net: a novel deep convolutional neural network for pixelwise pavement
crack detection,” Structural Control and Health Monitoring, vol. 27, no. 8, Aug. 2020, doi: 10.1002/stc.2551.
[4] E. H. Houssein, A. Hammad, and A. A. Ali, “Human emotion recognition from EEG-based brain–computer interface using machine
learning: a comprehensive review,” Neural Computing and Applications, vol. 34, no. 15, pp. 12527–12557, Aug. 2022, doi:
10.1007/s00521-022-07292-4.
[5] S. Nurmaini et al., “An automated ECG beat classification system using deep neural networks with an unsupervised feature
extraction technique,” Applied Sciences, vol. 9, no. 14, Jul. 2019, doi: 10.3390/app9142921.
[6] G. Kocher and G. Kumar, “Machine learning and deep learning methods for intrusion detection systems: recent developments and
challenges,” Soft Computing, vol. 25, no. 15, pp. 9731–9763, Aug. 2021, doi: 10.1007/s00500-021-05893-0.
[7] P. Sharma, S. Jain, S. Gupta, and V. Chamola, “Role of machine learning and deep learning in securing 5G-driven industrial IoT
applications,” Ad Hoc Networks, vol. 123, Dec. 2021, doi: 10.1016/j.adhoc.2021.102685.
[8] K. Korfmann, O. E. Gaggiotti, and M. Fumagalli, “Deep learning in population genetics,” Genome Biology and Evolution, vol. 15,
no. 2, Feb. 2023, doi: 10.1093/gbe/evad008.
[9] G. Battineni, N. Chintalapudi, and F. Amenta, “Machine learning in medicine: performance calculation of dementia prediction by
support vector machines (SVM),” Informatics in Medicine Unlocked, vol. 16, 2019, doi: 10.1016/j.imu.2019.100200.
[10] A. All Tanvir, E. M. Mahir, S. Akhter, and M. R. Huq, “Detecting fake news using machine learning and deep learning algorithms,”
in 2019 7th International Conference on Smart Computing & Communications (ICSCC), Jun. 2019, pp. 1–5, doi:
10.1109/ICSCC.2019.8843612.
[11] B. He, D. J. Armaghani, and S. H. Lai, “Assessment of tunnel blasting-induced overbreak: a novel metaheuristic-based random
forest approach,” Tunnelling and Underground Space Technology, vol. 133, Mar. 2023, doi: 10.1016/j.tust.2022.104979.
[12] A. Coatrini-Soares et al., “Microfluidic e-tongue to diagnose bovine mastitis with milk samples using machine learning with
decision tree models,” Chemical Engineering Journal, vol. 451, Jan. 2023, doi: 10.1016/j.cej.2022.138523.
[13] J. Mei, E. Muller, and S. Ramaswamy, “Informing deep neural networks by multiscale principles of neuromodulatory systems,”
Trends in Neurosciences, vol. 45, no. 3, pp. 237–250, Mar. 2022, doi: 10.1016/j.tins.2021.12.008.
[14] C. Sun, X. Liu, Q. Jiang, X. Ye, X. Zhu, and R.-W. Li, “Emerging electrolyte-gated transistors for neuromorphic perception,”
Science and Technology of Advanced Materials, vol. 24, no. 1, Dec. 2023, doi: 10.1080/14686996.2022.2162325.
[15] S. M. Ayyad, A. I. Saleh, and L. M. Labib, “Gene expression cancer classification using modified k-nearest neighbors technique,”
Biosystems, vol. 176, pp. 41–51, Feb. 2019, doi: 10.1016/j.biosystems.2018.12.009.
[16] H. Saadatfar, S. Khosravi, J. H. Joloudari, A. Mosavi, and S. Shamshirband, “A new k-nearest neighbors classifier for big data
based on efficient data pruning,” Mathematics, vol. 8, no. 2, Feb. 2020, doi: 10.3390/math8020286.
[17] D. A. Anggoro and D. Novitaningrum, “Comparison of accuracy level of support vector machine (SVM) and artificial neural
network (ANN) algorithms in predicting diabetes mellitus disease,” ICIC Express Letters, vol. 15, no. 1, pp. 9–18, 2021, doi:
10.24507/icicel.15.01.9.
[18] S. Uddin, I. Haque, H. Lu, M. A. Moni, and E. Gide, “Comparative performance analysis of k-nearest neighbour (KNN) algorithm
and its different variants for disease prediction,” Scientific Reports, vol. 12, no. 1, Apr. 2022, doi: 10.1038/s41598-022-10358-x.
[19] S. R. S. Chakravarthy, N. Bharanidharan, and H. Rajaguru, “Deep learning-based metaheuristic weighted k-nearest neighbor
algorithm for the severity classification of breast cancer,” IRBM, vol. 44, no. 3, Jun. 2023, doi: 10.1016/j.irbm.2022.100749.
[20] B. A. H. Murshed, S. Mallappa, O. A. M. Ghaleb, and H. D. E. Al-ariki, “Efficient twitter data cleansing model for data analysis of
the pandemic tweets,” Studies in Systems, Decision and Control, vol. 348, pp. 93–114, 2021, doi: 10.1007/978-3-030-67716-9_7.
[21] B. Zhong et al., “Deep learning-based extraction of construction procedural constraints from construction regulations,” Advanced
Engineering Informatics, vol. 43, Jan. 2020, doi: 10.1016/j.aei.2019.101003.
[22] R. K. Dey and A. K. Das, “Modified term frequency-inverse document frequency based deep hybrid framework for sentiment
analysis,” Multimedia Tools and Applications, vol. 82, no. 21, pp. 32967–32990, Sep. 2023, doi: 10.1007/s11042-023-14653-1.
[23] K. Wanjale, A. Chitre, P. Patil, R. Parmar, S. Raka, and S. Ghattuwar, “A comprehensive survey of stemming methods in
information retrieval,” International Journal of Multidisciplinary Engineering in Current Research, vol. 7, no. 10, pp. 15–22, 2022.
[24] A. Xiong, D. Liu, H. Tian, Z. Liu, P. Yu, and M. Kadoch, “News keyword extraction algorithm based on semantic clustering and
word graph model,” Tsinghua Science and Technology, vol. 26, no. 6, pp. 886–893, Dec. 2021, doi: 10.26599/TST.2020.9010051.
[25] J. Yang et al., “Neuromorphic engineering: from biological to spike‐based hardware nervous systems,” Advanced Materials,
vol. 32, no. 52, Dec. 2020, doi: 10.1002/adma.202003610.
[26] A. Goldstein et al., “Shared computational principles for language processing in humans and deep language models,” Nature
Neuroscience, vol. 25, no. 3, pp. 369–380, Mar. 2022, doi: 10.1038/s41593-022-01026-4.
[27] H. Dagdougui, F. Bagheri, H. Le, and L. Dessaint, “Neural network model for short-term and very-short-term load forecasting in
district buildings,” Energy and Buildings, vol. 203, Nov. 2019, doi: 10.1016/j.enbuild.2019.109408.
[28] L. Jiao, X. Geng, and Q. Pan, “BP kNN: k-nearest neighbor classifier with pairwise distance metrics and belief function theory,”
IEEE Access, vol. 7, pp. 48935–48947, 2019, doi: 10.1109/ACCESS.2019.2909752.
[29] M. R. Romadhon and F. Kurniawan, “A comparison of Naive Bayes methods, logistic regression and K-NN for predicting healing
of COVID-19 patients in Indonesia,” in 2021 3rd East Indonesia Conference on Computer and Information Technology
(EIConCIT), pp. 41–44, Apr. 2021, doi: 10.1109/EIConCIT50028.2021.9431845.
[30] D. Chicco, N. Tötsch, and G. Jurman, “The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy,
bookmaker informedness, and markedness in two-class confusion matrix evaluation,” BioData Mining, vol. 14, no. 1, Feb. 2021,
doi: 10.1186/s13040-021-00244-z.
[31] S. Thangamayan, S. N. Jagdale, T. R. K. Lakshmi, J. G. Thatipudi, P. K, and B. Khan, “Artificial intelligence oriented user sentiment
evaluation system on social networks using modified deep learning principles,” in 2024 5th International Conference on Mobile
Computing and Sustainable Informatics (ICMCSI), pp. 272–279, Jan. 2024, doi: 10.1109/ICMCSI61536.2024.00046.
[32] Rizkiansyah, A. Herliana, D. P. Alamsyah, and T. F. Tjoe, “Comparison of the k-nearest neighbor and decision tree algorithm to
the sentiment analysis of investment applications users in Indonesia,” in 2022 Seventh International Conference on Informatics and
Computing (ICIC), pp. 01–06, Dec. 2022, doi: 10.1109/ICIC56845.2022.10006970.
BIOGRAPHIES OF AUTHORS
A contest of sentiment analysis: k-nearest neighbour versus neural network (Fachrul Kurniawan)