Final report for plag check
Final report for plag check
analysis in financial news headlines to understand investor sen- volumes of textual data with high accuracy. Traditional sen-
timent. Using large language models (LLM), I analyze sentiment timent analysis models often struggle with the complexity
from the perspective of retail investors. The dataset that was cho-
sen contains categorized sentiments of financial news headlines, and nuances of financial language, which can include jargon,
which serves as the basis for my analysis. I fine-tuned Llama2-7b idiomatic expressions, and context-specific meanings. LLMs,
and Llama3-8b and Qwen2.5-14b to evaluate their effectiveness such as Llama are pretrained on vast corpora and can be fine-
in sentiment classification. My experiments demonstrate that the tuned for specific tasks, offering a significant improvement
fine-tuned Qwen2.5 achieves the highest accuracy. It showed over traditional methods.
significant improvements in accuracy after fine-tuning, indicating
its robustness in capturing the nuances of financial sentiment. The primary goal is to identify the most effective model
This model can be instrumental in providing market insights, for this task. We hypothesize that fine-tuning these pre-trained
risk management, and aiding investment decisions by accurately models on the Indian Financial News dataset will enhance their
predicting the sentiment of financial news. The results highlight ability to capture the subtle nuances of financial sentiment.
the potential of advanced LLMs in transforming how we analyze The fine-tuned Qwen2.5-14B achieved the highest precision,
and interpret financial information, offering a powerful tool
for stakeholders in the financial industry.It also presents a recall, and F1-score, demonstrating its robustness and accuracy
comprehensive NLP pipeline that integrates transformer-based in sentiment classification.
Named Entity Recognition with fine-tuned large language models The implications of this research are significant for the
to enhance sentiment analysis and entity resolution in financial financial industry. Accurate sentiment analysis can provide
news.
deeper market insights, help in identifying potential reputa-
tional risks, and support more informed investment decisions.
I. INTRODUCTION
By understanding the sentiment of financial news, businesses
The financial industry is a dynamic and rapidly changing can better anticipate market reactions and develop strategies
environment where news and information play a critical role to mitigate risks and capitalize on opportunities.
in shaping market behavior and investor sentiment. With the
constant influx of financial news, it becomes imperative for II. RELATED WORK
businesses, investors, and analysts to accurately gauge the The analysis of sentiment in financial news is a well
sentiment conveyed in these news items. Sentiment analysis, researched area, drawing interest from various domains includ
a branch of Natural Language Processing (NLP), offers a ing finance, computer science, and linguistics. The application
sophisticated method to automatically determine the emotional of NLP and LLMs to this field has evolved significantly
tone behind words, providing valuable insights into market over the past decade, driven by the increasing availability
trends, investor confidence, and consumer behavior. of data and advancements in machine learning algorithms.
The sentiment of financial news can significantly impact This section reviews the key literature in sentiment analysis,
market movements, influencing decisions made by retail in- particularly focusing on financial news, and highlights the
vestors, institutional investors, and other stakeholders. For contributions of recent studies that have utilized advanced NLP
instance, positive news about a company’s performance can techniques and LLMs
boost investor confidence, leading to a rise in stock prices, With the rise of deep learning, researchers have explored
while negative news can trigger fear and sell-offs. Therefore, various neural architectures, including recurrent neural net-
understanding the sentiment embedded in financial news head- works (RNNs), long short-term memory networks (LSTMs),
lines can aid in various strategic decision-making processes, and transformers, to improve sentiment classification accuracy.
including market insights, risk management, and investment The introduction of pre-trained language models such as BERT
strategies. and FinBERT has significantly enhanced the ability to capture
Advancements in NLP and Large Language Models (LLMs) financial sentiment by leveraging domain-specific corpora.
have opened new avenues for sentiment analysis. These tech- FinBERT, trained on financial texts, has been widely adopted
for sentiment analysis tasks, outperforming general-purpose
models like BERT in financial applications.
Moreover, integrating financial news sentiment analysis with
real-time trading strategies has gained attention in quantitative
finance. Several studies have shown that market sentiment
extracted from news sources can serve as a leading indicator of
stock price movements. The combination of sentiment-aware
models with automated trading algorithms has demonstrated
potential in enhancing decision-making processes for investors
and traders.
P Malo et al. [1] introduce the FinancialPhraseBank dataset
and propose methods for detecting sentiment in financial texts
using machine learning. B Pang [2] provides an extensive
review of sentiment analysis techniques, including early ap-
plications to financial text. J Si et al. [3] explores the use of
topic-based sentiment analysis of Twitter data to predict stock
prices. J Devlin et al. [4] Introduces BERT, a breakthrough Fig. 1. Class Distribution
in NLP that has significantly influenced sentiment analysis
research, including applications in finance.
M Hu and B Liu [5] Discusses techniques for extract-
ing sentiment from text, laying groundwork for applications ”Sentiment Analysis for Financial News” by Ankur Z., and the
in financial news analysis. FZ Xing et al. [6] survey of Indian Financial News dataset from Hugging Face, which con-
natural language processing techniques applied to financial tains 27,000 financial news headlines. The FinancialPhrase-
forecasting, with a focus on sentiment analysis. S Kogan Bank dataset is specifically designed for sentiment analysis in
et al. [7] Utilizes regression analysis on sentiment extracted the financial sector, featuring news headlines annotated with
from financial reports to predict firm risk. BM Barber [8] sentiment labels. It consists of two columns: Sentiment and
Examines the influence of news attention on investor behavior, Headline, where the sentiment is categorized into three dis-
highlighting the role of sentiment. tinct classes—positive, neutral, and negative. This structured
X Zhang et al. [9] Demonstrates the potential of Twitter classification provides a strong foundation for understanding
sentiment to predict stock market indicators, emphasizing real- sentiment trends in financial news from a retail investor’s
time analysis. F Li [10] Applies Naive Bayesian classification perspective.
to forward-looking statements in corporate filings to gauge To ensure data consistency,A stratified train-test split was
sentiment and predict future performance. BG Choi et al. [11] performed, with 70% of the data allocated for training and
Investigates how the sentiment of earnings announcements 30% for testing. Additionally, 5% of the total dataset was
affects investor perceptions and market outcomes. BS Kumar separated as an evaluation set. The training dataset was then
and V Ravi [12] Reviews various text mining applications in shuffled to mitigate ordering biases. Fig.1 shows the distribu-
finance, including sentiment analysis of financial news. tion of Sentiment class.
The reviewed literature highlights the importance of sen- To facilitate numerical processing by machine learning mod-
timent analysis in finance and the advancements made with els, these textual labels are mapped to corresponding integer
NLP and LLM technologies. Studies show that models like values using a predefined encoding scheme. Specifically, the
BERT, LSTM networks, and deep learning techniques have label Positive is mapped to the integer 2, Neutral (including
improved financial sentiment analysis. My study builds on entries labeled as ”none”, which are semantically aligned with
this by finetuning advanced models and demonstrating the neutrality) is mapped to 1, and Negative is assigned the value
Llama3-8B and Qwen2.5-14B model’s superior performance 0. This encoding enables the transformation of categorical
in classifying financial news sentiment, contributing to more sentiment annotations into a numerical format suitable for
accurate and efficient tools for the industry. model training and evaluation.
This work builds upon previous research by fine-tuning
the LLaMA3 model for financial sentiment analysis using a
dataset of 27,000 labeled headlines. By leveraging LoRA and A. Data Augmentation and Prompt Engineering
half-precision (fp16) training on an NVIDIA H100 GPU, this
study aims to push the boundaries of accuracy and efficiency To optimize sentiment classification, the training samples
in financial text analysis. were reformatted into a structured prompt format. The prompt
explicitly instructed the model to classify sentiment into
III. DATASET AND ANALYSIS Positive, Neutral, or Negative. The test data was structured
The datasets utilized in this study include the Finan- similarly, except without sentiment labels, ensuring a realistic
cialPhraseBank, sourced from the Kaggle repository titled inference scenario.
IV. MODEL SELECTION C. Output Layer
The LLaMA-2-7B, LLaMA-3-8B and Qwen2.5-14B models The final transformer outputs are projected to the vocabulary
were chosen as the base model due to its state-of-the-art space to generate logits for each token. LLaMA 3 uses tied
performance in text generation and contextual understanding. embeddings, meaning the output weights share parameters
Fine-tuning was performed using low-rank aversion (LoRA) to with the input embeddings:
enhance efficiency while minimizing computational overhead.
The LoRA configuration targeted key transformer layers to logits = WeT XL + bo (6)
optimize sentiment classification performance.
where:
V. LLAMA MODEL ARCHITECTURE
• WeT is the transposed embedding matrix,
A. Embedding Layer • XL is the final hidden state,
The embedding layer converts input tokens into dense • bo is a bias term.
vector representations. Instead of absolute position embed-
dings, LLaMA 3 uses Rotary Position Embeddings (RoPE), VI. QWEN2.5 MODEL ARCHITECTURE
which encode positional information directly into the attention
A. Embedding Layer
mechanism. Each token ti in the input sequence is mapped to
a high-dimensional space: The embedding layer in Qwen2.5 converts input tokens into
dense vector representations. Qwen2.5 utilizes Rotary Posi-
E(ti) = Weti (1) tional Embeddings (RoPE) to encode positional information
where We is the embedding matrix. directly into the attention mechanism. Each input token ti is
B. Transformer Layers transformed using the embedding matrix We:
The core of the model consists of multiple Transformer
layers, each comprising the following components: E(ti) = We · ti (7)
1) Multi-Head Grouped-Query Attention (GQA): LLaMA
Qwen2.5 models use byte-level byte-pair encoding
3-8B uses Grouped-Query Attention (GQA) instead of
(BBPE) with a vocabulary of 151,643 tokens, shared across
standard Multi-Head Attention, reducing memory overhead
all variants for consistency.
while maintaining performance. The attention mechanism is
defined as: B. Transformer Layers
QKT
Attention(Q, K, V ) = softmax √ V (2) Qwen2.5 maintains a decoder-only Transformer-based ar-
dk chitecture, composed of several stacked layers that include:
where: 1) Multi-Head Grouped Query Attention (GQA): Grouped
• Q, K, V are the query, key, and value matrices,
Query Attention (GQA) is employed to enhance KV cache
• dk is the dimensionality of the keys,
efficiency. In this setup, multiple query heads share fewer
• Rotary Position Embeddings (RoPE) are applied to Q and
key/value heads, reducing memory usage:
K before computing attention scores.
2) SwiGLU Feedforward Network (FFN): Instead of a QKT + BQKV
standard feedforward network, LLaMA 3 uses SwiGLU
(Swish-Gated Linear Units) for improved efficiency and Attention(Q, K, V ) = softmax √ k V (8)
d
expressiveness. The FFN is defined as: where:
FFN(X) = (Swish(XW1) ⊙ XW2)W3 (3) • Q, K, V are the query, key, and value matrices,
where: • dk is the dimensionality of the keys,
• Swish(x) = x ·sigmoid(x) is an activation function, • BQKV is a bias term specific to Qwen2.5,
• W1, W2, W3 are learned weight matrices, • RoPE is applied to Q and K before attention computa-
• ⊙ represents element-wise multiplication. tion.
3) Layer Normalization and Residual Connections: Each 2) SwiGLU Feedforward Network (FFN): Qwen2.5 adopts
Transformer layer includes pre-normalization and residual the SwiGLU activation function for the feedforward network,
connections to stabilize training and improve convergence: which improves efficiency and expressiveness:
Zl = LayerNorm(Xl + Attention(Ql, Kl, Vl)) (4)
Xl+1 = LayerNorm(Zl + FFN(Zl)) FFN(X) = (Swish(XW1) ⊙ XW2) W3 (9)
(5)
where: with:
• Xl is the input to the l-th layer, • Swish(x) = x · sigmoid(x),
• Zl is the output after attention, • W1, W2, W3 as learnable parameters,
• LayerNorm ensures stable activations. • ⊙ indicating element-wise multiplication.
3) Layer Normalization and Residual Connections: Instead and Qlora techniques. This approach focuses on fine-tuning a
of traditional LayerNorm, Qwen2.5 uses RMSNorm with pre- limited set of additional parameters while keeping most pre-
normalization to stabilize training and maintain gradient flow: trained model parameters fixed, thus reducing computational
and storage costs and mitigating the risk of catastrophic
Zl = RMSNorm(Xl + Attention(Ql, Kl, Vl)) (10) forgetting.
The training process was conducted using the AdamW
Xl+1 = RMSNorm(Zl + FFN(Zl)) (11) optimizer with gradient accumulation and a learning rate
where: scheduler to optimize performance. A warmup phase was
applied to stabilize learning, and the training progress was
• Xl is the input to the l-th layer,
logged and monitored using Weights and Biases for real-time
• Zl is the output after attention computation.
tracking.
C. Output Layer To ensure robust evaluation, accuracy was computed at
In most Qwen2.5 models, output embeddings are tied with regular intervals, and early stopping was implemented to
input embeddings to reduce parameter count: prevent overfitting. The best model was selected based on
validation accuracy to ensure optimal generalization to unseen
logits = WeT XL + bo (12) data.
TABLE II
CLASSIFICATION REPORT-LLAMA3-8B
B. Entity Extraction
TABLE V The function extract entities filters relevant entities from
PERFORMANCE
each headline. These entities may include companies, brands,
or regional indexes that are relevant for equity tracking.
REFERENCES
[1] P. Malo, A. Sinha, P. Korhonen, J. Wallenius, and P. Takala, “Good debt
or bad debt: Detecting semantic orientations in economic texts,” Journal
of the Association for Information Science and Technology, vol. 65, no.
4, pp. 782–796, 2014.
[2] B. Pang, L. Lee et al., “Opinion mining and sentiment analysis,”
Foundations and Trends® in Information Retrieval, vol. 2, no. 1–2, pp.
1–135, 2008.
[3] J. Si, A. Mukherjee, B. Liu, Q. Li, H. Li, and X. Deng, “Exploiting topic
based twitter sentiment for stock prediction,” in Proc. of the 51st Annual
Meeting of the Association for Computational Linguistics (Volume 2:
Short Papers), 2013, pp. 24–29.
[4] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training
of deep bidirectional transformers for language understanding,” arXiv
preprint arXiv:1810.04805, 2018.
[5] M. Hu and B. Liu, “Mining and summarizing customer reviews,” in
Proc. of the 10th ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, 2004, pp. 168–177.
[6] F. Z. Xing, E. Cambria, and R. E. Welsch, “Natural language based
financial forecasting: a survey,” Artificial Intelligence Review, vol. 50,
no. 1, pp. 49–73, 2018.
[7] S. Kogan, D. Levin, B. R. Routledge, J. S. Sagi, and N. A. Smith,
“Predicting risk from financial reports with regression,” in Proc. of Hu-
man Language Technologies: The 2009 Annual Conference of the North
American Chapter of the Association for Computational Linguistics,
2009, pp. 272–280.