0% found this document useful (0 votes)
2 views

intro class

The document outlines an introductory graduate course on Large Language Models (LLMs) taught by Tanmoy Chakraborty at IIT Delhi. It covers fundamental concepts in natural language processing (NLP), deep learning, and the architecture of transformers, along with recent advancements in LLM research. Prerequisites include a background in data structures, algorithms, and machine learning, while the course does not delve into generative models for modalities other than text.

Uploaded by

Nithin Randhi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

intro class

The document outlines an introductory graduate course on Large Language Models (LLMs) taught by Tanmoy Chakraborty at IIT Delhi. It covers fundamental concepts in natural language processing (NLP), deep learning, and the architecture of transformers, along with recent advancements in LLM research. Prerequisites include a background in data structures, algorithms, and machine learning, while the course does not delve into generative models for modalities other than text.

Uploaded by

Nithin Randhi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 81

Course Introduction

Tanmoy Chakraborty
Associate Professor, IIT Delhi
https://tanmoychak.com/

Introduction to Large Language Models


Instructors Teaching Assistants

Tanmoy Chakraborty Soumen Chakrabarti Anwoy Chatterjee Poulami Ghosh


IIT Delhi IIT Bombay PhD student, IIT Delhi PhD student, IIT Bombay

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


Course Content
• This is an introductory graduate course and we will be teaching the fundamental
concepts underlying large language models.

• This course will start with a short introduction to NLP and Deep Learning, and then move
on to the architectural intricacies of Transformers, followed by the recent advances in
LLM research.

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


Course Content
Basics
• Introduction
• Intro to NLP
• Intro to Deep
Learning
• Intro to Language
Models (LMs)
• Word Embeddings
(Word2Vec,
GloVE)
• Neural LMs (CNN,
RNN, Seq2Seq,
Attention)

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


Course Content
Basics Architecture
• Introduction
• Intro to
• Intro to NLP Transformer
• Intro to Deep • Positional
Learning encoding
• Intro to Language • Tokenization
Models (LMs) strategies
• Word Embeddings • Decoder-only LM,
(Word2Vec, Prefix LM,
GloVE) Decoding
• Neural LMs (CNN, strategies
RNN, Seq2Seq, • Encoder-only LM,
Attention) Encoder-decoder
LM

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


Course Content
Basics Architecture
• Introduction
Learnability
• Intro to
• Intro to NLP Transformer • Instruction fine-
• Intro to Deep tuning
• Positional
Learning encoding • In-context learning
• Intro to Language • Tokenization • Advanced
Models (LMs) strategies prompting (Chain of
• Word Embeddings Thoughts, Graph of
• Decoder-only LM, Thoughts, Prompt
(Word2Vec, Prefix LM,
GloVE) Chaining, etc.)
Decoding
• Neural LMs (CNN, strategies • Alignment
RNN, Seq2Seq, • Encoder-only LM, • PEFT
Attention) Encoder-decoder
LM

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


Course Content
Basics Architecture
• Introduction
Learnability Knowledge &
• Intro to Retrieval
• Intro to NLP Transformer • Instruction fine-
tuning • Knowledge graphs
• Intro to Deep • Positional
Learning encoding • In-context learning • Open-book
question
• Intro to Language • Tokenization • Advanced answering
Models (LMs) strategies prompting (Chain of
Thoughts, Graph of • Retrieval
• Word Embeddings • Decoder-only LM, augmentation
(Word2Vec, Thoughts, Prompt
Prefix LM, Chaining, etc.) techniques
GloVE) Decoding
• Neural LMs (CNN, strategies • Alignment
RNN, Seq2Seq, • Encoder-only LM, • PEFT
Attention) Encoder-decoder
LM

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


Course Content
Basics Architecture
• Introduction
Learnability Knowledge &
• Intro to Retrieval Ethics and Misc.
• Intro to NLP Transformer • Instruction fine-
tuning • Knowledge graphs
• Intro to Deep • Positional • Overview of recently
Learning encoding • In-context learning • Open-book popular models
question
• Intro to Language • Tokenization • Advanced answering • Bias, toxicity and
Models (LMs) strategies prompting (Chain of hallucination
Thoughts, Graph of • Retrieval
• Word Embeddings • Decoder-only LM, augmentation
(Word2Vec, Thoughts, Prompt
Prefix LM, Chaining, etc.) techniques
GloVE) Decoding
• Neural LMs (CNN, strategies • Alignment
RNN, Seq2Seq, • Encoder-only LM, • PEFT
Attention) Encoder-decoder
LM

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


Pre-Requisites
• Excitement about language!
• Willingness to learn

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


Pre-Requisites
• Excitement about language!
• Willingness to learn

Mandatory Desirable
• Data Structures & Algorithms • NLP
• Machine Learning • Deep learning
• Python programming

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


Pre-Requisites
• Excitement about language!
• Willingness to learn

Mandatory Desirable
• Data Structures & Algorithms • NLP
• Machine Learning • Deep learning
• Python programming

This course will NOT cover:


• Details of NLP, Machine Learning and Deep Learning
• Generative models for modalities other than text

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


Reading and Reference Materials
• Books (optional reading)
• Speech and Language Processing, Dan Jurafsky and James H. Martin
https://web.stanford.edu/~jurafsky/slp3/
• Foundations of Statistical Natural Language Processing, Chris Manning and Hinrich Schütze
• Natural Language Processing, Jacob Eisenstein
https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf
• A Primer on Neural Network Models for Natural Language Processing, Yoav Goldberg
http://u.cs.biu.ac.il/~yogo/nnlp.pdf
• Journals
• Computational Linguistics, Natural Language Engineering, TACL, JMLR, TMLR, etc.
• Conferences
• ACL, EMNLP, NAACL, COLING, ICML, NeurIPS, ICLR, AAAI, WWW, KDD, SIGIR, etc.

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


Research Papers Repository

https://aclanthology.org/

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


Research Papers Repository

https://arxiv.org/list/cs.CL/recent

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


Acknowledgements (Non-exhaustive List)
• Advanced NLP, Graham Neubig http://www.phontron.com/class/anlp2022/
• Advanced NLP, Mohit Iyyer https://people.cs.umass.edu/~miyyer/cs685/
• NLP with Deep Learning, Chris Manning, http://web.stanford.edu/class/cs224n/
• Understanding Large Language Models, Danqi Chen https://www.cs.princeton.edu/courses/archive/fall22/cos597G/
• Natural Language Processing, Greg Durrett https://www.cs.utexas.edu/~gdurrett/courses/online-course/materials.html
• Large Language Models: https://stanford-cs324.github.io/winter2022/
• Natural Language Processing at UMBC, https://laramartin.net/NLP-class/
• Computational Ethics in NLP, https://demo.clab.cs.cmu.edu/ethical_nlp/
• Self-supervised models, CS 601.471/671: Self-supervised Models (jhu.edu)
• WING.NUS Large Language Models, https://wing-nus.github.io/cs6101/
• And many more…

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


What is a Language Model (LM)?
Language Model gives the probability distribution over a sequence of tokens.

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


What is a Language Model (LM)?
Language Model gives the probability distribution over a sequence of tokens.

Language Model

Vocabulary
V = {arrived, delhi, have,
is, monsoon, rains, the}

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


What is a Language Model (LM)?
Language Model gives the probability distribution over a sequence of tokens.

P(the monsoon rains


have arrived) 0.2

Language Model

Vocabulary
V = {arrived, delhi, have,
is, monsoon, rains, the}

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


What is a Language Model (LM)?
Language Model gives the probability distribution over a sequence of tokens.

P(the monsoon rains


have arrived) 0.2

Language Model
P(monsoon the have
rains arrived) 0.001

Vocabulary
V = {arrived, delhi, have,
is, monsoon, rains, the}

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


LMs can ‘Generate’ Text !

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


LMs can ‘Generate’ Text !

Vocabulary
Given input ‘the monsoon rains have’ ,
V = {arrived, delhi, have, LM can calculate
is, monsoon, rains, the}
P(xi | the monsoon rains have) , ∀ xi ϵ V

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


LMs can ‘Generate’ Text !

Vocabulary
Given input ‘the monsoon rains have’ ,
V = {arrived, delhi, have, LM can calculate
is, monsoon, rains, the}
P(xi | the monsoon rains have) , ∀ xi ϵ V

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


LMs can ‘Generate’ Text !

Vocabulary
Given input ‘the monsoon rains have’ ,
V = {arrived, delhi, have, LM can calculate
is, monsoon, rains, the}
P(xi | the monsoon rains have) , ∀ xi ϵ V

For generation, next token is sampled


from this probability distribution

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


LMs can ‘Generate’ Text !

Vocabulary
Given input ‘the monsoon rains have’ ,
V = {arrived, delhi, have, LM can calculate
is, monsoon, rains, the}
P(xi | the monsoon rains have) , ∀ xi ϵ V

For generation, next token is sampled


from this probability distribution

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


LMs can ‘Generate’ Text !

Vocabulary
Given input ‘the monsoon rains have’ ,
V = {arrived, delhi, have, LM can calculate
is, monsoon, rains, the}
P(xi | the monsoon rains have) , ∀ xi ϵ V

For generation, next token is sampled


Auto-regressive LMs calculate from this probability distribution
this distribution efficiently, e.g.
using ‘Deep’ Neural Networks

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


‘Large’ Language Models
The ‘Large’ in terms of model's size (# parameters) and massive size of training dataset.

Model sizes have


increased by an order of
5000x over just the last
4 years !!!

Image source: https://hellofuture.orange.com/en/the-gpt-3-language-model-revolution-or-evolution/

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


‘Large’ Language Models
The ‘Large’ in terms of model's size (# parameters) and massive size of training dataset.

Model sizes have


increased by an order of
5000x over just the last
4 years !!!

Other recent models: PaLM (540B), OPT (175B), BLOOM


(176B), Gemini-Ultra (1.56T), GPT-4 (1.76T)

Disclaimer: For API-based models like GPT-4/Gemini-Ultra,


the number of parameters are not announced officially –
these are rumored numbers as on the web

Image source: https://hellofuture.orange.com/en/the-gpt-3-language-model-revolution-or-evolution/

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


LLMs in AI Landscape

Image source: https://www.manning.com/books/build-a-large-language-model-from-scratch

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


Evolution of
(L)LMs

Image source:
https://synthedia.substack.com/p/a-
timeline-of-large-language-model

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


Post-Transformers Era
The LLM Race
Google Designed Transformers: But Could it Take
Advantage?

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


Google Designed Transformers: But Could it Take
Advantage?

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


Google Designed Transformers: But Could it Take
Advantage?

The beginning of use of


Transformer as Language
Representation Models.

BERT achieved SOTA on 11 NLP


tasks.

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


Google Designed Transformers: But Could it Take
Advantage?
DistilBERT, TinyBERT, MobileBERT

The beginning of use of


Transformer as Language
Representation Models.

BERT achieved SOTA on 11 NLP


tasks.

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


However, someone was waiting for the right
opportunity!!

Guess Who?

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


However, someone was waiting for the right
opportunity!!

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


OpenAI Started Pushing the Frontier

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


OpenAI Started Pushing the Frontier

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


OpenAI Started Pushing the Frontier

• Use of decoder-only architecture


• The idea of generative pre-training over large corpus

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


The Beginning of Scale

• GPT-1 (117 M) → GPT-2 (1.5 B) 13x increase in # parameters


• Minimal changes (some LayerNorms added, modified weight
initialization)
• Increase in context length: GPT-1 (512 tokens) → GPT-2 (1024 tokens)

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


The Beginning of Scale

Performance boosts across tasks

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


What Was Google Developing Parallelly?

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


What Was Google Developing Parallelly?

• Similar broader goal of converting all text-based language problems


into a text-to-text format.
• Used Encoder-Decoder Architecture.
• Pre-training strategy differs from GPT
• Strategy more similar to BERT

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


Was It Only Google vs OpenAI?
Where did Meta Stand?

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


Was It Only Google vs OpenAI?
Where did Meta Stand?

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


Was It Only Google vs OpenAI?
Where did Meta Stand?

• Replication study of BERT pretraining


• Measured the impact of many key
hyperparameters and training data
size.
• Found that BERT was significantly
undertrained, and can match or
exceed the performance of every
model published after it.

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


Was It Only Google vs OpenAI?
Where did Meta Stand?

• Replication study of BERT pretraining


• Measured the impact of many key
hyperparameters and training data
size.
• Found that BERT was significantly
undertrained, and can match or
exceed the performance of every
model published after it.

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


Was It Only Google vs OpenAI?
Where did Meta Stand?

• Proposed methods to learn


• Replication study of BERT pretraining cross-lingual language models
• Measured the impact of many key
(XLMs)
hyperparameters and training data • Obtained SOTA on:
size.
• cross-lingual classification
• Found that BERT was significantly
• unsupervised and
undertrained, and can match or supervised machine
exceed the performance of every
translation
model published after it.

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


OpenAI Continues to Scale

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


OpenAI Continues to Scale

175 B parameters !

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


OpenAI Continues to Scale

175 B parameters !

OpenAI stops open-sourcing!!

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


Google Starts Scaling too (But is it Late) !

540 B parameters !

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


Google Starts Scaling too (But is it Late) !

540 B parameters !

Google follows OpenAI in


stopping open-sourcing !

It’s now the “LLM Race”

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


2021-2022: A Flurry of LLMs

Megatron-Turing
NLG

Codex

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


Meta Promotes Open-sourcing !

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


Meta Promotes Open-sourcing !

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


Meta Promotes Open-sourcing !

• A suite of decoder-only pre-trained


transformers ranging from 125M to
175B parameters
• Open-sourced !!!

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


The ChatGPT Moment

November 30, 2022

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


2023: The Year of Rapid Pace

March, 2023: Anthropic, a


Feb, 2023: Meta releases start-up founded in 2021 by March, 2023: OpenAI
its LLaMA family of open- ex-OpenAI researchers, releases GPT-4, which is
Feb, 2023: Google releases
source models releases Claude multimodal
Bard

June, 2023: Microsoft Sept, 2023: Mistral AI


releases Phi-1, a 1.3B releases Mistral-7B Nov, 2023: xAI releases Dec, 2023: Google
LLM for code model Grok releases Gemini

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


And now in 2024 seeing even more
rapid advancements !
Why Does This Course Exist?
Why Does This Course Exist?
Why do we need a separate course on LLMs? What changes with the scale of LMs?

Content credits: https://stanford-cs324.github.io/winter2022/

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


Why Does This Course Exist?
Why do we need a separate course on LLMs? What changes with the scale of LMs?

Emergence

Content credits: https://stanford-cs324.github.io/winter2022/

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


Why Does This Course Exist?
Why do we need a separate course on LLMs? What changes with the scale of LMs?

Emergence

Although the technical machineries are almost similar, ‘just scaling up’ these models
results in new emergent behaviors, which lead to significantly different capabilities and
societal impacts.

Content credits: https://stanford-cs324.github.io/winter2022/

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


Why Does This Course Exist?
LLMs show emergent capabilities, not observed previously in ‘small’ LMs.

Content credits: https://stanford-cs324.github.io/winter2022/

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


Why Does This Course Exist?
LLMs show emergent capabilities, not observed previously in ‘small’ LMs.
• In-context learning: A pre-trained language model can be guided with only prompts to perform different tasks
(without separate task-specific fine-tuning).
• In-context learning is an example of emergent behavior.

Content credits: https://stanford-cs324.github.io/winter2022/

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


Why Does This Course Exist?
LLMs show emergent capabilities, not observed previously in ‘small’ LMs.
• In-context learning: A pre-trained language model can be guided with only prompts to perform different tasks
(without separate task-specific fine-tuning).
• In-context learning is an example of emergent behavior.

LLMs are widely adopted in real-world.

Content credits: https://stanford-cs324.github.io/winter2022/

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


Why Does This Course Exist?
LLMs show emergent capabilities, not observed previously in ‘small’ LMs.
• In-context learning: A pre-trained language model can be guided with only prompts to perform different tasks
(without separate task-specific fine-tuning).
• In-context learning is an example of emergent behavior.

LLMs are widely adopted in real-world.


• Research: LLMs have transformed NLP research world, achieving state-of-the-art performance across a wide
range of tasks such as sentiment classification, question answering, summarization, and machine translation.

Content credits: https://stanford-cs324.github.io/winter2022/

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


Why Does This Course Exist?
LLMs show emergent capabilities, not observed previously in ‘small’ LMs.
• In-context learning: A pre-trained language model can be guided with only prompts to perform different tasks
(without separate task-specific fine-tuning).
• In-context learning is an example of emergent behavior.

LLMs are widely adopted in real-world.


• Research: LLMs have transformed NLP research world, achieving state-of-the-art performance across a wide
range of tasks such as sentiment classification, question answering, summarization, and machine translation.
• Industry: Here is a very incomplete list of some high profile large language models that are being used in
production systems:
• Google Search (BERT)
• Facebook content moderation (XLM)
• Microsoft’s Azure OpenAI Service (GPT-3/3.5/4)

Content credits: https://stanford-cs324.github.io/winter2022/

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


Why Does This Course Exist?
With tremendous capabilities, LLMs’ usage also carries various risks.

Content credits: https://stanford-cs324.github.io/winter2022/

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


Why Does This Course Exist?
With tremendous capabilities, LLMs’ usage also carries various risks.
• Reliability & Disinformation: LLMs often hallucinate – generate responses that seem correct, but
are not factually correct.
• Significant challenge for high-stakes applications like healthcare

Content credits: https://stanford-cs324.github.io/winter2022/

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


Why Does This Course Exist?
With tremendous capabilities, LLMs’ usage also carries various risks.
• Reliability & Disinformation: LLMs often hallucinate – generate responses that seem correct, but
are not factually correct.
• Significant challenge for high-stakes applications like healthcare
• Social bias: Most LLMs show performance disparities across demographic groups, and their
predictions can enforce stereotypes.
• P(He is a doctor) > P(She is a doctor.)
• Training data contains inherent bias

Content credits: https://stanford-cs324.github.io/winter2022/

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


Why Does This Course Exist?
With tremendous capabilities, LLMs’ usage also carries various risks.
• Reliability & Disinformation: LLMs often hallucinate – generate responses that seem correct, but
are not factually correct.
• Significant challenge for high-stakes applications like healthcare
• Social bias: Most LLMs show performance disparities across demographic groups, and their
predictions can enforce stereotypes.
• P(He is a doctor) > P(She is a doctor.)
• Training data contains inherent bias
• Toxicity: LLMs can generate toxic/hateful content.
• Trained on a huge amount of Internet data (e.g., Reddit), which inevitably contains offensive content
• Challenge for applications such as writing assistants or chatbots

Content credits: https://stanford-cs324.github.io/winter2022/

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


Why Does This Course Exist?
With tremendous capabilities, LLMs’ usage also carries various risks.
• Reliability & Disinformation: LLMs often hallucinate – generate responses that seem correct, but
are not factually correct.
• Significant challenge for high-stakes applications like healthcare
• Social bias: Most LLMs show performance disparities across demographic groups, and their
predictions can enforce stereotypes.
• P(He is a doctor) > P(She is a doctor.)
• Training data contains inherent bias
• Toxicity: LLMs can generate toxic/hateful content.
• Trained on a huge amount of Internet data (e.g., Reddit), which inevitably contains offensive content
• Challenge for applications such as writing assistants or chatbots
• Security: LLMs are trained on a scrape of the public Internet - anyone can put up a website that
can enter the training data.
• An attacker can perform a data poisoning attack. Content credits: https://stanford-cs324.github.io/winter2022/

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


We Will Cover Almost All of These in 5 Modules
Module-1: Basics
• A refresher on the basics of NLP required to understand and appreciate LLMs.

• A brief introduction to the basics of Deep Learning.


Intro to Deep
Intro to NLP
Learning
• The basics of Statistical Language Modelling.

• How did we end up in Neural NLP? Intro to Language


Word Embeddings
(Word2Vec,
Models (LMs)
GloVE)
• We will discuss the transition and the foundations of Neural NLP.
Neural LMs (CNN,
• Initial Neural LMs RNN, Seq2Seq,
Attention)

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


We Will Cover Almost All of These in 5 Modules
• Module-2: Architecture
Intro to Transformer Positional encoding
• Workings of Vanilla Transformers

• Positional encoding and Tokenization strategies Decoder-only LM,


Tokenization Prefix LM,
strategies Decoding
• Different Transformer Variants strategies

• How do their training strategies differ? How are Masked LMs (like, BERT) Encoder-only LM,
Encoder-decoder
different from Auto-regressive LMs (like, GPT)? LM

• Response generation (Decoding) strategies

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


We Will Cover Almost All of These in 5 Modules
• Module-3: Learnability Instruction fine-
In-context learning
tuning
• What makes modern LLMs so good in following user instructions?

• What is In-context Learning? What are its various facets? Advanced


Alignment
Prompting

• What kind of prompting techniques are required to elicit reasoning in LLMs?

• How are LLMs made to generate responses preferred by humans? PEFT

• Does it remove toxicity in responses?

• Efficiency is crucial in production systems.


• How are LLMs efficiently fine-tuned?

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


We Will Cover Almost All of These in 5 Modules
• Module-4: Knowledge and Retrieval
Open-book
Knowledge graphs
• Knowledge graphs (KGs) question answering

• Representation, completion
Retrieval
• Tasks: Alignment and isomorphism
augmentation
techniques
• Distinction between graph neural networks and neural KG inference

• Open-book question answering: retrieving from structured and unstructured sources


• Retrieval augmentation techniques
• Key-value memory networks in QA for simple paths in KGs
• Early HotPotQA solvers, pointer networks, reading comprehension
• REALM, RAG, FiD, Unlimiformer
• KGQA (e.g., EmbedKGQA, GrailQA)

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


We Will Cover Almost All of These in 5 Modules
• Module-5: Ethics and Miscellaneous
• A discussion on ethical issues and risks of LLM usage Bias, toxicity and
hallucination

• An overview of the recent popular LLMs, like GPT4, Llama 3,


Claude 3, Mistral, and Gemini.
Overview of the recent
popular LLMs

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


Suggestions (For Effective Learning)
• To understand the concepts clearly, experiment with the models (Hugging Face makes life easier).

• Smaller models (like, GPT2) can be run on Google Colab / Kaggle.


• Even 7B models can be run with proper quantization.

Always get your hands dirty !

LLM Research is all about implementing and experimenting with your ideas.

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty


Suggestions (For Effective Learning)
• To understand the concepts clearly, experiment with the models (Hugging Face makes life easier).

• Smaller models (like, GPT2) can be run on Google Colab / Kaggle.


• Even 7B models can be run with proper quantization.

Rule of thumb:
Never believeAlways in any hypothesis until your
get your hands dirty !
experiments verify it !
LLM Research is all about implementing and experimenting with your ideas.

Introduction to LLMs Tanmoy Chakraborty Tanmoy Chakraborty

You might also like