0% found this document useful (0 votes)

8 views5 pages

Contextual Bias Study-Mohan Shrivastava

This document discusses the detection of contextual bias in news articles using pre-trained NLP models like BERT and GPT. It highlights the challenges of identifying subtle biases in language and outlines a methodology for improving bias detection through model fine-tuning and evaluation. The research aims to address existing gaps in bias detection by leveraging advanced techniques and ensuring model transparency.

Uploaded by

mohanms.0709

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views5 pages

Contextual Bias Study-Mohan Shrivastava

Uploaded by

mohanms.0709

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Using pre-trained models to detect contextual bias from news

articles

1. INTRODUCTION:

From article by Krieger and Spinde, I quote- “Media bias is a

multi-faceted construct influencing individual behavior and
collective decision-making. Slanted news reporting is the result of
one-sided and polarized writing which can occur in various forms.
In this work, we focus on an important form of media bias, i.e.
bias by word choice. Detecting biased word choices is a
challenging task due to its linguistic complexity and the lack of
representative gold-standard corpora.”

https://arxiv.org/pdf/2205.10773

1.1 Background (150 words):

Contextual bias refers to the subtle manipulation or distortion of
information in a text that influences the reader's perception without being
explicitly stated. It often involves the choice of words, framing, and the
selection of context in which information is presented. For example,
consider two headlines reporting the same event:

o "Government raises taxes to improve healthcare."

o "Government imposes higher taxes to burden citizens."

Although both headlines refer to the same action (tax increase), the first
one presents it in a positive light, suggesting it is for a good cause
(improving healthcare), while the second frames it as a negative act that
burdens citizens. This subtle difference in framing reflects contextual bias.
Detecting such bias is challenging because it relies on understanding the
broader context, including tone, subtle implications, and connotations,
rather than overt expressions of bias. Pre-trained models in Natural
Language Processing (NLP), such as BERT, RoBERTa, and GPT, have been
increasingly utilized to capture such subtle language patterns, offering
promising solutions to detecting contextual bias.

1.2 Aims and Objectives:

To study the effectiveness of pre-trained NLP models like BERT and GPT in
identifying contextual bias in news articles and compare their
performances across a dataset of news articles.

To try and improve the existing models, that can analyze text and detect
bias based on context rather than explicit sentiment.
1.3 Questions addressed:

 How accurately can pre-trained models detect contextual bias in

news articles?
 What limitations exist when using these models for such tasks?
 How can these models be further fine-tuned or optimized to improve
bias detection?

2. LITERATURE REVIEW (800 words)

2.1 Introduction to Contextual Bias in News

Contextual bias in news articles is a subtle yet powerful force that shapes
public opinion by presenting facts in a particular light, often without
appearing overtly biased. Unlike explicit bias—which is easy to spot
through emotionally charged words or outright distortions—contextual
bias is difficult to capture. It occurs due to careful choice of words, the
structure of sentences, or the selective emphasis on certain aspects of a
story.

As in example above, two reports on the same event depict vastly

different pictures simply by using contrasting adjectives or omitting key
details.

In an era where fake news spreads rapidly, and the media, the “Fourth
pillar of Democracy”-being openly biased (nothing surprising) - has a great
contribution in developing political perceptions. For a democracy, where
everyone must have access to unbiased news, and then form their opinion
based on their conscience, spotting these subtle forms of bias has become
crucial.

2.2 Existing Approaches to Bias Detection

Pre-trained models like BERT, RoBERTa, and GPT have revolutionized NLP
by understanding context at a deep level. These models have been
applied in various tasks such as sentiment analysis and misinformation
identification. However, adapting these models to identify subtle
contextual framing is still underexplored (Menzner & Leidner, 2024).

[Know more:-

GPT model: https://arxiv.org/abs/2005.14165

BERT model: https://arxiv.org/abs/1810.04805

RoBERTa model: https://arxiv.org/abs/1907.11692]

2.3 Research Gap

Despite significant advancements, several gaps persist in contextual bias

detection:

1. Explicit Bias and Sentiment Analysis

 Studies such as those by Krieger et al. 2022 have focused on

detecting explicit bias using models like DA-RoBERTa. These
approaches are effective for overtly polarized content but fail to
address the subtler framing techniques of contextual bias.
 “Limitations of our approach are the exclusively pre-training focus
on sentence-level classification and the restricted evaluation
incorporating a single data set/task due to the lack of existing
representative bias corpora.”
 Read more: https://arxiv.org/pdf/2205.10773

2. Datasets for Bias Detection

 Most datasets used in bias detection research are annotated for

explicit bias or sentiment rather than contextual bias. For example,
the News Bias Dataset (Baly et al., 2018) includes annotations for
factuality and political bias but lacks granularity in framing analysis.
 “In future work, we are also interested in characterizing the
factuality of reporting for media in other languages and go beyond
left vs. right bias that is typical of the Western world and to model
other kinds of biases.”
 Read more: https://arxiv.org/pdf/1810.01765

3. Challenges in Explainability:

 The opaque nature of pre-trained models limits their adoption in

domains requiring transparency (Krieger et al., 2022).

4. Underutilization of Advanced Techniques:

 Techniques like domain-specific pre-training and fine-tuning remain

underexplored in contextual bias detection (Feng etal., 2023).

2.4 Summary of the Literature Review

The review highlights the evolution of bias detection methods,

emphasizing the shift from explicit to contextual bias. While pre-trained
NLP models offer potential solutions, challenges such as the lack of
annotated datasets, limited explainability, and underutilization of
advanced techniques persist. This research aims to fill these gaps by
adapting pre-trained models for contextual bias detection, leveraging
domain-specific datasets, and ensuring model transparency.
3. Methodology (300 words)

The methodology outlines the approach for adapting pre-trained language

models to detect contextual bias in news articles. This involves system
design, tools and technologies, and the evaluation framework.

3.1 System Design

The proposed system consists of the following steps:

1. Data Collection
News articles are sourced from publicly available datasets such as
the News Bias Dataset (Baly et al., 2018) and supplemented with
additional manually labeled data to enhance contextual bias
annotations.

2. Preprocessing
Articles are tokenized, and metadata like publication date and
source are removed to focus on content. Tokenization is performed
using pre-trained models like Hugging Face’s Tokenizers.

3. Model Training
We fine-tune a pre-trained language model, such
as RoBERTa or GPT, on annotated datasets to classify contextual
bias. This involves supervised learning with labeled data containing
bias categories such as "neutral," "positive," and "negative."

4. Evaluation
Metrics such as F1-score, accuracy, and explainability are used to
assess the performance. Additionally, SHAP (SHapley Additive
exPlanations) values help explain model predictions.

3.2 Tools and Technologies Used

 Programming Language: Python

 Libraries and Frameworks:

o transformers by Hugging Face for model fine-tuning

o pandas and numpy for data handling

o SHAP for explainability analysis

o scikit-learn for evaluation metrics

 Pre-trained Models:

o RoBERTa (Liu et al., 2019)

o GPT (Brown et al., 2020)

3.3 Evaluation

The model's effectiveness is evaluated on a held-out test set. Performance

metrics include:

 F1-score: Ensures balanced evaluation for biased and unbiased

classifications.

 SHAP Values: Visualize how individual words or phrases influence

predictions.

Transforming Education with AI: Guide to Understanding and Using ChatGPT in the Classroom
From Everand
Transforming Education with AI: Guide to Understanding and Using ChatGPT in the Classroom
Shane Snipes, PhD
No ratings yet
W19-4809
No ratings yet
W19-4809
8 pages
Means Ends Analysis: Fundamentals and Applications
From Everand
Means Ends Analysis: Fundamentals and Applications
Fouad Sabry
No ratings yet
Dbias: Detecting Biases and Ensuring Fairness in News Articles
No ratings yet
Dbias: Detecting Biases and Ensuring Fairness in News Articles
21 pages
Taking A Stance On Fake News: Towards Automatic Disinformation Assessment Via Deep Bidirectional Transformer Language Models For Stance Detection
No ratings yet
Taking A Stance On Fake News: Towards Automatic Disinformation Assessment Via Deep Bidirectional Transformer Language Models For Stance Detection
8 pages
Project Progress Review #2
No ratings yet
Project Progress Review #2
28 pages
Language Models Can Improve Event Prediction
No ratings yet
Language Models Can Improve Event Prediction
26 pages
Concept Mining: Fundamentals and Applications
From Everand
Concept Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Data Science Mastery: From Beginner to Expert in Big Data Analytics
From Everand
Data Science Mastery: From Beginner to Expert in Big Data Analytics
Kameron Hussain
No ratings yet
Language Models Can Improve Event Prediction by Few-Shot Abductive Reasoning
No ratings yet
Language Models Can Improve Event Prediction by Few-Shot Abductive Reasoning
26 pages
NLP For Policy Making
No ratings yet
NLP For Policy Making
5 pages
Text Mining: Fundamentals and Applications
From Everand
Text Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Bootstrapping Language-Image Pretraining: The Complete Guide for Developers and Engineers
From Everand
Bootstrapping Language-Image Pretraining: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Fake News Detection Report
No ratings yet
Fake News Detection Report
46 pages
Large Language Models
From Everand
Large Language Models
A. Scholtens
2/5 (2)
2 Ruf
No ratings yet
2 Ruf
11 pages
BERT Foundations and Applications: Definitive Reference for Developers and Engineers
From Everand
BERT Foundations and Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
The Science of Managing Our Digital Stuff
From Everand
The Science of Managing Our Digital Stuff
Ofer Bergman
3.5/5 (3)
Data Science Fundamentals and Practical Approaches: Understand Why Data Science Is the Next (English Edition)
From Everand
Data Science Fundamentals and Practical Approaches: Understand Why Data Science Is the Next (English Edition)
Dr. Gypsy Nandi
No ratings yet
Machine Learning Fundamentals: Concepts, Models, and Applications
From Everand
Machine Learning Fundamentals: Concepts, Models, and Applications
Amar Sahay
No ratings yet
Chat GPT Sex Bias
No ratings yet
Chat GPT Sex Bias
17 pages
2310.06422v2
No ratings yet
2310.06422v2
7 pages
Upstream Mitigation Is Not All You Need: Testing The Bias Transfer Hypothesis in Pre-Trained Language Models
No ratings yet
Upstream Mitigation Is Not All You Need: Testing The Bias Transfer Hypothesis in Pre-Trained Language Models
19 pages
Towards Robust Models for Fake News Detection in Spanish- Gómez González, Coll Ardanuy y Rosso
No ratings yet
Towards Robust Models for Fake News Detection in Spanish- Gómez González, Coll Ardanuy y Rosso
13 pages
Self-Supervised Learning: Teaching AI with Unlabeled Data
From Everand
Self-Supervised Learning: Teaching AI with Unlabeled Data
Robert Johnson
No ratings yet
2024.naacl-long.292
No ratings yet
2024.naacl-long.292
14 pages
Removing Bias From Machine Learning Models
No ratings yet
Removing Bias From Machine Learning Models
16 pages
Sscibert: A Pre-Trained Language Model For Social Science Texts
No ratings yet
Sscibert: A Pre-Trained Language Model For Social Science Texts
24 pages
2020.Findings Emnlp.344
No ratings yet
2020.Findings Emnlp.344
11 pages
The Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook
From Everand
The Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook
Timothy King
No ratings yet
common bias in AI models
No ratings yet
common bias in AI models
3 pages
Pattern Recognition: Fundamentals and Applications
From Everand
Pattern Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Mixed Methods Research: Applying AI Tools for Effective Writing and Publishing
From Everand
Mixed Methods Research: Applying AI Tools for Effective Writing and Publishing
Krishna Bista
No ratings yet
WIREs Data Min Knowl - 2020 - Ntoutsi - Bias in Data Driven Artificial Intelligence Systems An Introductory Survey
No ratings yet
WIREs Data Min Knowl - 2020 - Ntoutsi - Bias in Data Driven Artificial Intelligence Systems An Introductory Survey
14 pages
Disclosure on sustainable development, CSR environmental disclosure and greater value recognized to the company by users
From Everand
Disclosure on sustainable development, CSR environmental disclosure and greater value recognized to the company by users
Olga Maria Stefania Cucaro
No ratings yet
A Methodology To Characterize Bias and Harmful Stereotypes in Natural Language Processing in Latin America
No ratings yet
A Methodology To Characterize Bias and Harmful Stereotypes in Natural Language Processing in Latin America
24 pages
Persuading with Data: A Guide to Designing, Delivering, and Defending Your Data
From Everand
Persuading with Data: A Guide to Designing, Delivering, and Defending Your Data
Miro Kazakoff
No ratings yet
A Prompt Array Keeps The Bias Away: Debiasing Vision-Language Models With Adversarial Learning
No ratings yet
A Prompt Array Keeps The Bias Away: Debiasing Vision-Language Models With Adversarial Learning
17 pages
Data Science and Analytics: Transforming Raw Data into Actionable Insights: A Comprehensive Guide
From Everand
Data Science and Analytics: Transforming Raw Data into Actionable Insights: A Comprehensive Guide
Marlowe Reyes
No ratings yet
R B: A Real-World Resource For Bias Evaluation and Debiasing of Conversational Language Models
No ratings yet
R B: A Real-World Resource For Bias Evaluation and Debiasing of Conversational Language Models
15 pages
Ref Paper-1
No ratings yet
Ref Paper-1
12 pages
Data Science Projects for thesis and Portfolio: Solving Political Problems
From Everand
Data Science Projects for thesis and Portfolio: Solving Political Problems
Dr. Zemelak Goraga
No ratings yet
3641276
No ratings yet
3641276
41 pages
Evaluating and Mitigating Social Bias for Large Language Models in Open-ended Settings
No ratings yet
Evaluating and Mitigating Social Bias for Large Language Models in Open-ended Settings
12 pages
Automatic Image Annotation: Fundamentals and Applications
From Everand
Automatic Image Annotation: Fundamentals and Applications
Fouad Sabry
No ratings yet
2024.acl-long.778
No ratings yet
2024.acl-long.778
15 pages
annotator_bias_llms
No ratings yet
annotator_bias_llms
14 pages
12 - Fairness Issues, Current Approaches, and Challenges in Machine Learning Models
No ratings yet
12 - Fairness Issues, Current Approaches, and Challenges in Machine Learning Models
31 pages
Count Data Analysis: A Comprehensive Guide
From Everand
Count Data Analysis: A Comprehensive Guide
Pasquale De Marco
No ratings yet
Knowledge Reasoning: Fundamentals and Applications
From Everand
Knowledge Reasoning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Titanic
No ratings yet
Titanic
22 pages
Big-O Notation Demystified: Definitive Reference for Developers and Engineers
From Everand
Big-O Notation Demystified: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
偏见综述2411.10915v1
No ratings yet
偏见综述2411.10915v1
47 pages
Applied Techniques for GPT-3: Definitive Reference for Developers and Engineers
From Everand
Applied Techniques for GPT-3: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Conceptual Dependency Theory: Fundamentals and Applications
From Everand
Conceptual Dependency Theory: Fundamentals and Applications
Fouad Sabry
No ratings yet
2023 Ranlp-1 127
No ratings yet
2023 Ranlp-1 127
10 pages
Deep Reinforcement Learning: An Essential Guide
From Everand
Deep Reinforcement Learning: An Essential Guide
Robert Johnson
No ratings yet
Data Analysis: An In-depth Insight
From Everand
Data Analysis: An In-depth Insight
Pasquale De Marco
No ratings yet
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Decoding Large Language Models: An exhaustive guide to understanding, implementing, and optimizing LLMs for NLP applications
From Everand
Decoding Large Language Models: An exhaustive guide to understanding, implementing, and optimizing LLMs for NLP applications
Irena Cronin
No ratings yet
Detail of Absconders of ATC & Other Courts (Investigation-III Korangi)
No ratings yet
Detail of Absconders of ATC & Other Courts (Investigation-III Korangi)
45 pages
Research
No ratings yet
Research
5 pages
What is a diaspora - Kevin Kenny
No ratings yet
What is a diaspora - Kevin Kenny
14 pages
M Co Quota 120216
No ratings yet
M Co Quota 120216
47 pages
Benzocaine Cream - USP38
No ratings yet
Benzocaine Cream - USP38
2 pages
History of Microbiology
50% (2)
History of Microbiology
23 pages
NHD Bibliograph1
No ratings yet
NHD Bibliograph1
6 pages
Batasan Hills National High School: January 18, 2019
No ratings yet
Batasan Hills National High School: January 18, 2019
2 pages
Ch-13 Principles of Bioenergetics
No ratings yet
Ch-13 Principles of Bioenergetics
53 pages
2017 - TP Use of English - Units 4 and 6
No ratings yet
2017 - TP Use of English - Units 4 and 6
3 pages
1973 01
No ratings yet
1973 01
16 pages
Titan Store Details
No ratings yet
Titan Store Details
142 pages
7 KAMBI
No ratings yet
7 KAMBI
2 pages
Power System State Estimation and Bad Data Analysis Using Weighted Least Square Method
No ratings yet
Power System State Estimation and Bad Data Analysis Using Weighted Least Square Method
5 pages
Claim Your Inheritance
No ratings yet
Claim Your Inheritance
3 pages
Electronic Code of Federal Regulations Norma Fda para Tuna
No ratings yet
Electronic Code of Federal Regulations Norma Fda para Tuna
19 pages
Joseph_Keller
No ratings yet
Joseph_Keller
3 pages
Vismaya (CV Resume)
No ratings yet
Vismaya (CV Resume)
3 pages
Swapnil Bhandari CV'25
No ratings yet
Swapnil Bhandari CV'25
5 pages
Balanay vs. Martinez
0% (1)
Balanay vs. Martinez
3 pages
Kubla Khan Heaven and Hell
No ratings yet
Kubla Khan Heaven and Hell
8 pages
Triumph - Rulebook
No ratings yet
Triumph - Rulebook
28 pages
ECONOMICS PROJECT
No ratings yet
ECONOMICS PROJECT
11 pages
Inglés Junior BBC London Volumen 3 Fascículo 25
No ratings yet
Inglés Junior BBC London Volumen 3 Fascículo 25
18 pages
Girador de Balde
No ratings yet
Girador de Balde
14 pages
ROHIT JINDAL FINAL WORK - Rohit Jindal
No ratings yet
ROHIT JINDAL FINAL WORK - Rohit Jindal
42 pages
Iliad-18 (Lombardo, Trans.)
No ratings yet
Iliad-18 (Lombardo, Trans.)
25 pages
WIDGB2 Utest Language 3A
No ratings yet
WIDGB2 Utest Language 3A
3 pages
Discussion: Exercise
No ratings yet
Discussion: Exercise
10 pages
What Is A Dangling Modifier
No ratings yet
What Is A Dangling Modifier
1 page

Contextual Bias Study-Mohan Shrivastava

Uploaded by

Contextual Bias Study-Mohan Shrivastava

Uploaded by

Using pre-trained models to detect contextual bias from news

From article by Krieger and Spinde, I quote- “Media bias is a

1.1 Background (150 words):

o "Government raises taxes to improve healthcare."

o "Government imposes higher taxes to burden citizens."

1.2 Aims and Objectives:

 How accurately can pre-trained models detect contextual bias in

2. LITERATURE REVIEW (800 words)

2.1 Introduction to Contextual Bias in News

As in example above, two reports on the same event depict vastly

2.2 Existing Approaches to Bias Detection

GPT model: https://arxiv.org/abs/2005.14165

BERT model: https://arxiv.org/abs/1810.04805

RoBERTa model: https://arxiv.org/abs/1907.11692]

Despite significant advancements, several gaps persist in contextual bias

1. Explicit Bias and Sentiment Analysis

 Studies such as those by Krieger et al. 2022 have focused on

2. Datasets for Bias Detection

 Most datasets used in bias detection research are annotated for

 The opaque nature of pre-trained models limits their adoption in

4. Underutilization of Advanced Techniques:

 Techniques like domain-specific pre-training and fine-tuning remain

2.4 Summary of the Literature Review

The review highlights the evolution of bias detection methods,

The methodology outlines the approach for adapting pre-trained language

3.1 System Design

The proposed system consists of the following steps:

3.2 Tools and Technologies Used

 Programming Language: Python

 Libraries and Frameworks:

o transformers by Hugging Face for model fine-tuning

o pandas and numpy for data handling

o SHAP for explainability analysis

o scikit-learn for evaluation metrics

o RoBERTa (Liu et al., 2019)

The model's effectiveness is evaluated on a held-out test set. Performance

 F1-score: Ensures balanced evaluation for biased and unbiased

 SHAP Values: Visualize how individual words or phrases influence

You might also like