0% found this document useful (0 votes)

35 views

Introduction To LLMs 1730172304

Uploaded by

Nicolás Leonardo Adriazola Román

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views

Introduction To LLMs 1730172304

Uploaded by

Nicolás Leonardo Adriazola Román

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

IntroLLM Attention Interpretability Text Classification

Introduction to Large Language Models

Dr. Aijun Zhang

October 2024

StatSoft.org 1
IntroLLM Attention Interpretability Text Classification

Recommended Texts

• Tunstall, L., von Werra, L. and Wolf, T. (2022).

• Bishop, C. and Bishop, H. (2023)
• Alammar, J. and Grootendorst, M. (2024)
• Raschka, S. (2024)

StatSoft.org 2
IntroLLM Attention Interpretability Text Classification

Outline

1 LLMs - A Quick Overview

2 Attention is All You Need (2017)

Transformer Architecture
Attention Mechanism
Specialized Transformer LLMs

3 Interpreting Contextual Embeddings

4 Text Classification and Fine-Tuning

StatSoft.org 3
IntroLLM Attention Interpretability Text Classification

Landscape of Artificial Intelligence

Source: Raschka (2024)

StatSoft.org 4
IntroLLM Attention Interpretability Text Classification

Evolution of Language Models

Source: Alammar and Grootendorst (2024)

StatSoft.org 5
IntroLLM Attention Interpretability Text Classification

Growing Scales of Language Models

Scaling laws: increasing param-

eters, data quality, and compute
resources generally improves LLM
performance.

Click into VizSweet

StatSoft.org 6
IntroLLM Attention Interpretability Text Classification

Key Tasks of Large Language Models

• Text Classification: Sentiment analysis, Spam detection, Named

Entity Recognition (NER), Natural Language Inference (NLI)

• Text Generation: Creative writing, Chatbot, Translation, Coding

• Summarization: News article summaries, Research paper abstracts,

Document condensation

• Question Answering: Customer support queries, Educational FAQs

• Knowledge Integration: Retrieval-Augmented Generation (RAG) for

up-to-date responses, evidence-based information generation

• Reasoning: Logical deduction, Mathematics problem solving,

Chain-of-Thought analysis

StatSoft.org 7
IntroLLM Attention Interpretability Text Classification

Outline

1 LLMs - A Quick Overview

2 Attention is All You Need (2017)

Transformer Architecture
Attention Mechanism
Specialized Transformer LLMs

3 Interpreting Contextual Embeddings

4 Text Classification and Fine-Tuning

StatSoft.org 8
IntroLLM Attention Interpretability Text Classification

Attention is All You Need (2017)

• By Google Brain: Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J.,
Jones, L. and Gomez, A. N., Kaiser, L. and Polosukhin, I. (NIPS 2017)
• It revolutionized neural networks by introducing the Transformer, a
highly flexible architecture enabling LLMs like BERT, GPT, and T5.
• The core innovation of self-attention allows each token to capture
global context efficiently, eliminating the need for recurrence.
• It sparked groundbreaking models across NLP and other domains, with
applications extending to vision, audio, and multimodal tasks.

StatSoft.org 9
IntroLLM Attention Interpretability Text Classification

Transformer Architecture

• Encoder processes input

• Decoder generates output, predicting

the next token auto-regressively

• Feed forward network (deep learning)

• Multi-head self-attention

• Masked attention for decoder

• Positional encoding

• Check PyTorch function:

Docs>torch.nn>Transformer

StatSoft.org 10
IntroLLM Attention Interpretability Text Classification

Attention Mechanism

Single-head Attention Multi-head Attention

StatSoft.org 11
IntroLLM Attention Interpretability Text Classification

Scaled Dot-Product Self-Attention

N
X
y(k) ← αki x(i)
i=1

exp(x(k) xT(i) )
αki = PN
T
j=1 exp(x(k) x(j) )

Expressed in matrix form:

Y = Softmax[XXT ]X

(Q,K,V)-parameterization: W(q) , W(k) , W(v) ∈ RD×D

h i
Y = Softmax XW(q) (XW(k) )T XW(v) ≡ Softmax[QKT ]V
h T
i
QK
Scaled self-attention: Y = Softmax √
D
V ≡ Attention(Q, K, V)

StatSoft.org 12
IntroLLM Attention Interpretability Text Classification

Specialized Transformer LLMs

• Encoder-Only Models: BERT (Bidirenctional Encoder Representations

from Transformers), DistilBERT, RoBERTa, DeBERTa, etc. Ideal for
understanding tasks such as text classification, sentiment analysis, and
named entity recognition, with bidirectional attention capturing full context.

• Decoder-Only Models: GPT (Generative Pretrained Transformer), GPT-2,

GPT-3, and beyond. Optimized for generation tasks, with unidirectional
attention. Live example: ChatGPT (GPT-3.5+)

• Encoder-Decoder Models: T5 (Text-to-Text Transfer Transformer),

BART. Balance input understanding with output generation, suitable for
tasks like translation, summarization, and paraphrasing.

Highlight: Encoder-only models (BERT) are for understanding tasks, while

decoder-only models (GPT) are for generative tasks.

StatSoft.org 13
IntroLLM Attention Interpretability Text Classification

Transformer Explainer: Interactive Learning of GPT-2 by Cho et al.(2024)

StatSoft.org 14
IntroLLM Attention Interpretability Text Classification

Live Demo: BertViz

BertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)

https://github.com/jessevig/bertviz

Try it on Google Colab: BertViz Interactive Tutorial

StatSoft.org 15
IntroLLM Attention Interpretability Text Classification

Outline

1 LLMs - A Quick Overview

2 Attention is All You Need (2017)

Transformer Architecture
Attention Mechanism
Specialized Transformer LLMs

3 Interpreting Contextual Embeddings

4 Text Classification and Fine-Tuning

StatSoft.org 16
IntroLLM Attention Interpretability Text Classification

Static and Contextual Embeddings

• Embeddings provide a way to represent textual data as dense,

continuous vectors in a high-dimensional space, to capture the
semantic meaning of words, phrases, and documents.

• Traditional Static Embeddings: Word2Vec, GloVe, etc. Easy to

compute, able to capture basic semantic relationships. However, each
word has a single, fixed embedding regardless of context.

• Contextual Embeddings: BERT, GTP, etc. Generate dynamic,

context-dependent embeddings for each token. BERT is most popular
as it captures both the left and right context of a word in a sentense.

StatSoft.org 17
IntroLLM Attention Interpretability Text Classification

Interpreting Contextual Embeddings

• Interpretability matters as it is crucial to understand how LLMs

work and make decisions, fostering transparency, trust and reliability.

• Challenges in Interpreting LLMs:

– Complexity of contextual embeddings, high-dimensionality
– Black-box nature, millions/billions of parameters, highly nonlinear
– Semantic ambiguity, polysemy (words with multiple meanings) in
diverse contexts

• Here’s a structured process to interpret contextual embeddings:

1. Dimensionality reduction for effective clustering
2. Extract topics for each cluster
3. Further dimensionality reduction for 2D or 3D visualization

StatSoft.org 18
IntroLLM Attention Interpretability Text Classification

Interpreting Contextual Embeddings

Source: Alammar and Grootendorst (2024)

StatSoft.org 19
IntroLLM Attention Interpretability Text Classification

Dimensionality Reduction Techniques

Technique Type Strength Weakness Best Use
PCA Linear Captures maximum Assumes linear- Medium data with
variance ity, less flexible linear relationships
t-SNE Nonlinear Preserves local Computationally Small data, ideal for
structure expensive 2D/3D visualization
UMAP Nonlinear Balances local & Requires tuning Clustering in large,
global structure, complex datasets
scalable
Random Linear Fast, scalable, pre- Lower inter- High-dim data with
Projection (Random) serves distances pretability simple structure
AE (Auto- Nonlinear Learns nonlinear Requires tuning Complex datasets
Encoders) relationships, cus- & significant with non-linear pat-
tomizable data terns
VAE (Varia- Nonlinear Captures data vari- Complex train- When data variabil-
tional AE) (Proba- ability, generates ing, requires ity and generation
bilistic) new data tuning are important
MDS Nonlinear Flexible metrics, Computationally Semantic similarity
preserves distances intensive in embeddings

StatSoft.org 20
IntroLLM Attention Interpretability Text Classification

Clustering Techniques

Technique How It Works Advantages Limitations Best Use

k-Means Minimizing Simple, effi- Requires k in When clusters
Clustering distances to cen- cient, scalable advance; as- are approximately
troids sumes spherical spherical and
clusters equally sized
Agglomerative Merges closest Captures hierar- Computationally Data with inher-
/Hierarchical clusters itera- chical structure, intensive, sensi- ent hierarchical
Clustering tively, forming a no need for k tive to noise structure
hierarchy
DBSCAN Groups densely Handles non- Sensitive to Arbitrary shapes,
packed points; convex shapes, parameters, no outlier detection,
labels sparse detects outliers good for varying unknown clusters
points as outliers densities
Spectral Uses similar- Good for non- Computationally Small data with
Clustering ity matrix and convex clusters, expensive, re- complex relation-
eigenvalues for adaptable simi- quires k ships
clustering larity measures

StatSoft.org 21
IntroLLM Attention Interpretability Text Classification

Topic Extraction Techniques

Technique Description Use in Clusters

TF-IDF Identifies important terms by Applied to each cluster to find
(Term Frequency – comparing term frequency in a keywords that best represent its
Inverse Document document to its frequency in the content.
Frequency) corpus. High scores indicate terms
important to the document but
uncommon in the corpus.
KeyBERT A transformer-based method us- Identifies representative key-
(Keyword Extraction ing BERT embeddings to extract words or phrases, providing
with BERT) semantically relevant keywords. rich, contextually accurate topic
descriptions for each cluster.
LDA A topic modeling technique that Extracts coherent topics from
(Latent Dirichlet Al- assumes documents are mixtures clusters, revealing specific
location) of topics, each with a unique word themes within broader topics.
distribution.

StatSoft.org 22
IntroLLM Attention Interpretability Text Classification

BERTopic Pipeline for Interpretable Topic Modeling

https://github.com/MaartenGr/BERTopic

Source: Alammar and Grootendorst (2024)

StatSoft.org 23
IntroLLM Attention Interpretability Text Classification

Live Demo: BERTopic

Try it on Google Colab: BERTopic Demo with Rotten Tomatoes Dataset

StatSoft.org 24
IntroLLM Attention Interpretability Text Classification

Outline

1 LLMs - A Quick Overview

2 Attention is All You Need (2017)

Transformer Architecture
Attention Mechanism
Specialized Transformer LLMs

3 Interpreting Contextual Embeddings

4 Text Classification and Fine-Tuning

StatSoft.org 25
IntroLLM Attention Interpretability Text Classification

Text Classification with Pretrained Transformers

Source: Tunstall et al. (2022)

1. Select a pretrained transformer model, e.g. SBERT or DistilBERT;

2. Prepare the labeled text data, split into training and validation sets;
3. Add a classifier layer with sigmoid/softmax activation;
4. Freeze the transformer model parameters, train only the classifier;
5. Evaluate performance and perform outcome analysis.

StatSoft.org 26
IntroLLM Attention Interpretability Text Classification

Fine-Tuning Transformers for Classification

Source: Tunstall et al. (2022)

• Pros: Fine-tuning the entire model adapts fully to the task, yielding
higher accuracy and flexibility.
• Cons: Increased computational demands, potential risk of overfitting.

StatSoft.org 27
IntroLLM Attention Interpretability Text Classification

Live Demo: DistilBERT

• DistilBERT is a small, fast, cheap and light Transformer model

trained by distilling BERT base.

• Check the hugging face transformers page: DistilBERT

Table: DistilBERT Classification on Rotten Tomatoes Data

DistilBERT-raw DistilBERT-finetuned
train-AUC 0.918097 0.980089
test-AUC 0.882378 0.911010

Try it on Google Colab: Text Classification using DistiBERT

StatSoft.org 28
IntroLLM Attention Interpretability Text Classification

Thank you!
https://www.linkedin.com/in/ajzhang/

StatSoft.org 29

Whitepaper - Foundational Large Language Models & Text Generation
100% (1)
Whitepaper - Foundational Large Language Models & Text Generation
75 pages
Module 3 Test
100% (1)
Module 3 Test
4 pages
Sinan Ozdemir - Quick Start Guide to Large Language Models, Second Edition-Addison-Wesley (2024)
No ratings yet
Sinan Ozdemir - Quick Start Guide to Large Language Models, Second Edition-Addison-Wesley (2024)
279 pages
Pets Lesson Plan Clasa 0
100% (1)
Pets Lesson Plan Clasa 0
3 pages
1719720399971
No ratings yet
1719720399971
51 pages
RADL LHPhuong
No ratings yet
RADL LHPhuong
66 pages
challenges-interpretability
No ratings yet
challenges-interpretability
12 pages
Presentation 11 (1)
No ratings yet
Presentation 11 (1)
20 pages
Large Language Models For Information Management - 01 - Modulo Base (MB) - 4pdf
No ratings yet
Large Language Models For Information Management - 01 - Modulo Base (MB) - 4pdf
68 pages
DZ-getting-started-large Language Models LLMs-2024
No ratings yet
DZ-getting-started-large Language Models LLMs-2024
7 pages
SESSION_1_LLMs
No ratings yet
SESSION_1_LLMs
40 pages
14-LookingForward
No ratings yet
14-LookingForward
48 pages
Deciphering The Enigma A Deep Dive Into Understand
No ratings yet
Deciphering The Enigma A Deep Dive Into Understand
11 pages
2023 LLMBC LLM Foundations
No ratings yet
2023 LLMBC LLM Foundations
92 pages
seminar
No ratings yet
seminar
14 pages
Summer Course Material
No ratings yet
Summer Course Material
52 pages
AN2DL_06_2324_AttentionAndTrasformers
No ratings yet
AN2DL_06_2324_AttentionAndTrasformers
60 pages
Understanding LLMS: A Comprehensive Overview From Training To Inference
No ratings yet
Understanding LLMS: A Comprehensive Overview From Training To Inference
30 pages
Understanding LLMs: A Comprehensive Overview from Training to Inference
No ratings yet
Understanding LLMs: A Comprehensive Overview from Training to Inference
30 pages
aa
No ratings yet
aa
11 pages
Generative AI With LArge Language Models
No ratings yet
Generative AI With LArge Language Models
36 pages
240
No ratings yet
240
61 pages
Bert
No ratings yet
Bert
60 pages
Large Language Models Johns Hopkins University
No ratings yet
Large Language Models Johns Hopkins University
54 pages
LLM_Review
No ratings yet
LLM_Review
16 pages
Explainability For Large Language Models: A Survey
No ratings yet
Explainability For Large Language Models: A Survey
31 pages
Explainability for Large Language Models-A Survey
No ratings yet
Explainability for Large Language Models-A Survey
38 pages
L22_Attention in Deep Learning
No ratings yet
L22_Attention in Deep Learning
65 pages
LLM Interpretability 101
No ratings yet
LLM Interpretability 101
8 pages
aM3RdIpjnYdPsGKF
No ratings yet
aM3RdIpjnYdPsGKF
20 pages
Attention Book Sample
No ratings yet
Attention Book Sample
32 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
62 pages
Leveraging Language Models With RAG
No ratings yet
Leveraging Language Models With RAG
57 pages
LLM Learning
No ratings yet
LLM Learning
56 pages
Pieces DZ RC 393 Getting Started Llms 2024
No ratings yet
Pieces DZ RC 393 Getting Started Llms 2024
8 pages
LLM Intro
No ratings yet
LLM Intro
8 pages
LLM
No ratings yet
LLM
41 pages
Tranformrerz
No ratings yet
Tranformrerz
62 pages
The Best LLMs Cheatsheet - Part 1
No ratings yet
The Best LLMs Cheatsheet - Part 1
16 pages
4-HC24.PrimisAI.Hans_Bouwmeester.v4
No ratings yet
4-HC24.PrimisAI.Hans_Bouwmeester.v4
29 pages
lec20.LLM
No ratings yet
lec20.LLM
58 pages
Comprehensive Guide Attention Mechanism Deep Learning
No ratings yet
Comprehensive Guide Attention Mechanism Deep Learning
17 pages
LLM .Foundation - Models.from - The.ground - Up
No ratings yet
LLM .Foundation - Models.from - The.ground - Up
195 pages
Transformers
No ratings yet
Transformers
27 pages
Unlocking_the_potential_A_comprehensive_exploratio
No ratings yet
Unlocking_the_potential_A_comprehensive_exploratio
6 pages
02-Transformer Based NLP Applications
No ratings yet
02-Transformer Based NLP Applications
57 pages
Transformers in NLP 1
No ratings yet
Transformers in NLP 1
9 pages
Week_13_LLM_ChatGPT_HAAI_IITKgp_v2
No ratings yet
Week_13_LLM_ChatGPT_HAAI_IITKgp_v2
119 pages
2022-foundations-tutorial3-sunwang-deeplearning4nlp
No ratings yet
2022-foundations-tutorial3-sunwang-deeplearning4nlp
103 pages
10 Attention N Bert
No ratings yet
10 Attention N Bert
55 pages
14.Chapter10_AdvancedDeepLearningForText
No ratings yet
14.Chapter10_AdvancedDeepLearningForText
22 pages
Week4 LLMs EN
No ratings yet
Week4 LLMs EN
48 pages
Deep Neural Network Module 7 Attention Transformer
No ratings yet
Deep Neural Network Module 7 Attention Transformer
40 pages
AP I W T - L M: Rimer On The Nner Orkings of Ransformer Based Anguage Odels
No ratings yet
AP I W T - L M: Rimer On The Nner Orkings of Ransformer Based Anguage Odels
55 pages
LLM 1
No ratings yet
LLM 1
6 pages
Pranay Report
No ratings yet
Pranay Report
26 pages
Transformer Tutorial
No ratings yet
Transformer Tutorial
14 pages
GenAI_Syllabus
No ratings yet
GenAI_Syllabus
17 pages
Transformer
No ratings yet
Transformer
5 pages
Llm
No ratings yet
Llm
5 pages
Large Language Model (LLM) 1
100% (1)
Large Language Model (LLM) 1
17 pages
Python The Complete Reference: Comprehensive Guide to Mastering Python Programming from Fundamentals to Advanced Techniques
From Everand
Python The Complete Reference: Comprehensive Guide to Mastering Python Programming from Fundamentals to Advanced Techniques
Aarav Joshi
No ratings yet
HT22 Theme How Far Would You Go
No ratings yet
HT22 Theme How Far Would You Go
3 pages
Lectures 4 Translation. Equivalence
No ratings yet
Lectures 4 Translation. Equivalence
3 pages
Communication Skills Handout
100% (1)
Communication Skills Handout
54 pages
WEEK 7 Contemporary Philippine Arts From The Region
No ratings yet
WEEK 7 Contemporary Philippine Arts From The Region
8 pages
COT English 3rd Preposition
100% (1)
COT English 3rd Preposition
14 pages
HMEF5053 Topic 8 Reliability Validity
No ratings yet
HMEF5053 Topic 8 Reliability Validity
20 pages
Tiếng Anh 6 Smart World - Unit 2 - SCHOOL - SB
No ratings yet
Tiếng Anh 6 Smart World - Unit 2 - SCHOOL - SB
26 pages
Introduction To Linguitics
No ratings yet
Introduction To Linguitics
9 pages
Managing of Students Behaviour Variety in Elt
No ratings yet
Managing of Students Behaviour Variety in Elt
13 pages
Mathematics Department Institut Perguruan Sultan Mizan Besut, Terengganu
No ratings yet
Mathematics Department Institut Perguruan Sultan Mizan Besut, Terengganu
8 pages
Relevance of Jurisprudence
No ratings yet
Relevance of Jurisprudence
3 pages
2nd Year TG Lesson Plan 3 - Auxiliary Projections
No ratings yet
2nd Year TG Lesson Plan 3 - Auxiliary Projections
7 pages
Basic of Public Speaking
100% (3)
Basic of Public Speaking
23 pages
Accomplishment Report Book Week Celebration
No ratings yet
Accomplishment Report Book Week Celebration
4 pages
The Good Life
No ratings yet
The Good Life
10 pages
Đề writing
No ratings yet
Đề writing
2 pages
Susaar Van Der Walt 201007764 Professional Studies 3A Reflection
No ratings yet
Susaar Van Der Walt 201007764 Professional Studies 3A Reflection
7 pages
Total Ket Student S Book Demo PDF
100% (1)
Total Ket Student S Book Demo PDF
32 pages
Gold Exp Grammar PPT A1 U2
No ratings yet
Gold Exp Grammar PPT A1 U2
10 pages
CHAPTER 1 - Nature of Inquiry and Research - Lesson 1 3
No ratings yet
CHAPTER 1 - Nature of Inquiry and Research - Lesson 1 3
56 pages
L 11 Reading 11 - Shettleworth (2010)
No ratings yet
L 11 Reading 11 - Shettleworth (2010)
5 pages
50 Ways To Love Your Lover - Bonus - 1121
No ratings yet
50 Ways To Love Your Lover - Bonus - 1121
12 pages
Icebreaker Evaluation of TM Krishnaa
No ratings yet
Icebreaker Evaluation of TM Krishnaa
3 pages
Teoría de Los Constructos Personales Teoría Cognoscitiva
No ratings yet
Teoría de Los Constructos Personales Teoría Cognoscitiva
1 page
Portfolio - EDUC 5711-Unit 2 - Organizing Instruction of Diverse Students
No ratings yet
Portfolio - EDUC 5711-Unit 2 - Organizing Instruction of Diverse Students
5 pages
Caie Igcse French 0520 Foreign Language v1
100% (1)
Caie Igcse French 0520 Foreign Language v1
21 pages
Abcd Lesson Plan
No ratings yet
Abcd Lesson Plan
2 pages
Chapter 11 ISP640
No ratings yet
Chapter 11 ISP640
2 pages