Contextual Bias Study-Mohan Shrivastava
Contextual Bias Study-Mohan Shrivastava
articles
1. INTRODUCTION:
https://arxiv.org/pdf/2205.10773
Although both headlines refer to the same action (tax increase), the first
one presents it in a positive light, suggesting it is for a good cause
(improving healthcare), while the second frames it as a negative act that
burdens citizens. This subtle difference in framing reflects contextual bias.
Detecting such bias is challenging because it relies on understanding the
broader context, including tone, subtle implications, and connotations,
rather than overt expressions of bias. Pre-trained models in Natural
Language Processing (NLP), such as BERT, RoBERTa, and GPT, have been
increasingly utilized to capture such subtle language patterns, offering
promising solutions to detecting contextual bias.
To study the effectiveness of pre-trained NLP models like BERT and GPT in
identifying contextual bias in news articles and compare their
performances across a dataset of news articles.
To try and improve the existing models, that can analyze text and detect
bias based on context rather than explicit sentiment.
1.3 Questions addressed:
Contextual bias in news articles is a subtle yet powerful force that shapes
public opinion by presenting facts in a particular light, often without
appearing overtly biased. Unlike explicit bias—which is easy to spot
through emotionally charged words or outright distortions—contextual
bias is difficult to capture. It occurs due to careful choice of words, the
structure of sentences, or the selective emphasis on certain aspects of a
story.
In an era where fake news spreads rapidly, and the media, the “Fourth
pillar of Democracy”-being openly biased (nothing surprising) - has a great
contribution in developing political perceptions. For a democracy, where
everyone must have access to unbiased news, and then form their opinion
based on their conscience, spotting these subtle forms of bias has become
crucial.
Pre-trained models like BERT, RoBERTa, and GPT have revolutionized NLP
by understanding context at a deep level. These models have been
applied in various tasks such as sentiment analysis and misinformation
identification. However, adapting these models to identify subtle
contextual framing is still underexplored (Menzner & Leidner, 2024).
[Know more:-
3. Challenges in Explainability:
1. Data Collection
News articles are sourced from publicly available datasets such as
the News Bias Dataset (Baly et al., 2018) and supplemented with
additional manually labeled data to enhance contextual bias
annotations.
2. Preprocessing
Articles are tokenized, and metadata like publication date and
source are removed to focus on content. Tokenization is performed
using pre-trained models like Hugging Face’s Tokenizers.
3. Model Training
We fine-tune a pre-trained language model, such
as RoBERTa or GPT, on annotated datasets to classify contextual
bias. This involves supervised learning with labeled data containing
bias categories such as "neutral," "positive," and "negative."
4. Evaluation
Metrics such as F1-score, accuracy, and explainability are used to
assess the performance. Additionally, SHAP (SHapley Additive
exPlanations) values help explain model predictions.
Pre-trained Models:
3.3 Evaluation