0% found this document useful (0 votes)
8 views5 pages

Contextual Bias Study-Mohan Shrivastava

This document discusses the detection of contextual bias in news articles using pre-trained NLP models like BERT and GPT. It highlights the challenges of identifying subtle biases in language and outlines a methodology for improving bias detection through model fine-tuning and evaluation. The research aims to address existing gaps in bias detection by leveraging advanced techniques and ensuring model transparency.

Uploaded by

mohanms.0709
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views5 pages

Contextual Bias Study-Mohan Shrivastava

This document discusses the detection of contextual bias in news articles using pre-trained NLP models like BERT and GPT. It highlights the challenges of identifying subtle biases in language and outlines a methodology for improving bias detection through model fine-tuning and evaluation. The research aims to address existing gaps in bias detection by leveraging advanced techniques and ensuring model transparency.

Uploaded by

mohanms.0709
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Using pre-trained models to detect contextual bias from news

articles

1. INTRODUCTION:

From article by Krieger and Spinde, I quote- “Media bias is a


multi-faceted construct influencing individual behavior and
collective decision-making. Slanted news reporting is the result of
one-sided and polarized writing which can occur in various forms.
In this work, we focus on an important form of media bias, i.e.
bias by word choice. Detecting biased word choices is a
challenging task due to its linguistic complexity and the lack of
representative gold-standard corpora.”

https://arxiv.org/pdf/2205.10773

1.1 Background (150 words):


Contextual bias refers to the subtle manipulation or distortion of
information in a text that influences the reader's perception without being
explicitly stated. It often involves the choice of words, framing, and the
selection of context in which information is presented. For example,
consider two headlines reporting the same event:

o "Government raises taxes to improve healthcare."

o "Government imposes higher taxes to burden citizens."

Although both headlines refer to the same action (tax increase), the first
one presents it in a positive light, suggesting it is for a good cause
(improving healthcare), while the second frames it as a negative act that
burdens citizens. This subtle difference in framing reflects contextual bias.
Detecting such bias is challenging because it relies on understanding the
broader context, including tone, subtle implications, and connotations,
rather than overt expressions of bias. Pre-trained models in Natural
Language Processing (NLP), such as BERT, RoBERTa, and GPT, have been
increasingly utilized to capture such subtle language patterns, offering
promising solutions to detecting contextual bias.

1.2 Aims and Objectives:

To study the effectiveness of pre-trained NLP models like BERT and GPT in
identifying contextual bias in news articles and compare their
performances across a dataset of news articles.

To try and improve the existing models, that can analyze text and detect
bias based on context rather than explicit sentiment.
1.3 Questions addressed:

 How accurately can pre-trained models detect contextual bias in


news articles?
 What limitations exist when using these models for such tasks?
 How can these models be further fine-tuned or optimized to improve
bias detection?

2. LITERATURE REVIEW (800 words)

2.1 Introduction to Contextual Bias in News

Contextual bias in news articles is a subtle yet powerful force that shapes
public opinion by presenting facts in a particular light, often without
appearing overtly biased. Unlike explicit bias—which is easy to spot
through emotionally charged words or outright distortions—contextual
bias is difficult to capture. It occurs due to careful choice of words, the
structure of sentences, or the selective emphasis on certain aspects of a
story.

As in example above, two reports on the same event depict vastly


different pictures simply by using contrasting adjectives or omitting key
details.

In an era where fake news spreads rapidly, and the media, the “Fourth
pillar of Democracy”-being openly biased (nothing surprising) - has a great
contribution in developing political perceptions. For a democracy, where
everyone must have access to unbiased news, and then form their opinion
based on their conscience, spotting these subtle forms of bias has become
crucial.

2.2 Existing Approaches to Bias Detection

Pre-trained models like BERT, RoBERTa, and GPT have revolutionized NLP
by understanding context at a deep level. These models have been
applied in various tasks such as sentiment analysis and misinformation
identification. However, adapting these models to identify subtle
contextual framing is still underexplored (Menzner & Leidner, 2024).

[Know more:-

GPT model: https://arxiv.org/abs/2005.14165

BERT model: https://arxiv.org/abs/1810.04805

RoBERTa model: https://arxiv.org/abs/1907.11692]


2.3 Research Gap

Despite significant advancements, several gaps persist in contextual bias


detection:

1. Explicit Bias and Sentiment Analysis

 Studies such as those by Krieger et al. 2022 have focused on


detecting explicit bias using models like DA-RoBERTa. These
approaches are effective for overtly polarized content but fail to
address the subtler framing techniques of contextual bias.
 “Limitations of our approach are the exclusively pre-training focus
on sentence-level classification and the restricted evaluation
incorporating a single data set/task due to the lack of existing
representative bias corpora.”
 Read more: https://arxiv.org/pdf/2205.10773

2. Datasets for Bias Detection

 Most datasets used in bias detection research are annotated for


explicit bias or sentiment rather than contextual bias. For example,
the News Bias Dataset (Baly et al., 2018) includes annotations for
factuality and political bias but lacks granularity in framing analysis.
 “In future work, we are also interested in characterizing the
factuality of reporting for media in other languages and go beyond
left vs. right bias that is typical of the Western world and to model
other kinds of biases.”
 Read more: https://arxiv.org/pdf/1810.01765

3. Challenges in Explainability:

 The opaque nature of pre-trained models limits their adoption in


domains requiring transparency (Krieger et al., 2022).

4. Underutilization of Advanced Techniques:

 Techniques like domain-specific pre-training and fine-tuning remain


underexplored in contextual bias detection (Feng etal., 2023).

2.4 Summary of the Literature Review

The review highlights the evolution of bias detection methods,


emphasizing the shift from explicit to contextual bias. While pre-trained
NLP models offer potential solutions, challenges such as the lack of
annotated datasets, limited explainability, and underutilization of
advanced techniques persist. This research aims to fill these gaps by
adapting pre-trained models for contextual bias detection, leveraging
domain-specific datasets, and ensuring model transparency.
3. Methodology (300 words)

The methodology outlines the approach for adapting pre-trained language


models to detect contextual bias in news articles. This involves system
design, tools and technologies, and the evaluation framework.

3.1 System Design

The proposed system consists of the following steps:

1. Data Collection
News articles are sourced from publicly available datasets such as
the News Bias Dataset (Baly et al., 2018) and supplemented with
additional manually labeled data to enhance contextual bias
annotations.

2. Preprocessing
Articles are tokenized, and metadata like publication date and
source are removed to focus on content. Tokenization is performed
using pre-trained models like Hugging Face’s Tokenizers.

3. Model Training
We fine-tune a pre-trained language model, such
as RoBERTa or GPT, on annotated datasets to classify contextual
bias. This involves supervised learning with labeled data containing
bias categories such as "neutral," "positive," and "negative."

4. Evaluation
Metrics such as F1-score, accuracy, and explainability are used to
assess the performance. Additionally, SHAP (SHapley Additive
exPlanations) values help explain model predictions.

3.2 Tools and Technologies Used

 Programming Language: Python

 Libraries and Frameworks:

o transformers by Hugging Face for model fine-tuning

o pandas and numpy for data handling

o SHAP for explainability analysis

o scikit-learn for evaluation metrics

 Pre-trained Models:

o RoBERTa (Liu et al., 2019)


o GPT (Brown et al., 2020)

3.3 Evaluation

The model's effectiveness is evaluated on a held-out test set. Performance


metrics include:

 F1-score: Ensures balanced evaluation for biased and unbiased


classifications.

 SHAP Values: Visualize how individual words or phrases influence


predictions.

You might also like