intro class
intro class
Tanmoy Chakraborty
Associate Professor, IIT Delhi
https://tanmoychak.com/
• This course will start with a short introduction to NLP and Deep Learning, and then move
on to the architectural intricacies of Transformers, followed by the recent advances in
LLM research.
Mandatory Desirable
• Data Structures & Algorithms • NLP
• Machine Learning • Deep learning
• Python programming
Mandatory Desirable
• Data Structures & Algorithms • NLP
• Machine Learning • Deep learning
• Python programming
https://aclanthology.org/
https://arxiv.org/list/cs.CL/recent
Language Model
Vocabulary
V = {arrived, delhi, have,
is, monsoon, rains, the}
Language Model
Vocabulary
V = {arrived, delhi, have,
is, monsoon, rains, the}
Language Model
P(monsoon the have
rains arrived) 0.001
Vocabulary
V = {arrived, delhi, have,
is, monsoon, rains, the}
Vocabulary
Given input ‘the monsoon rains have’ ,
V = {arrived, delhi, have, LM can calculate
is, monsoon, rains, the}
P(xi | the monsoon rains have) , ∀ xi ϵ V
Vocabulary
Given input ‘the monsoon rains have’ ,
V = {arrived, delhi, have, LM can calculate
is, monsoon, rains, the}
P(xi | the monsoon rains have) , ∀ xi ϵ V
Vocabulary
Given input ‘the monsoon rains have’ ,
V = {arrived, delhi, have, LM can calculate
is, monsoon, rains, the}
P(xi | the monsoon rains have) , ∀ xi ϵ V
Vocabulary
Given input ‘the monsoon rains have’ ,
V = {arrived, delhi, have, LM can calculate
is, monsoon, rains, the}
P(xi | the monsoon rains have) , ∀ xi ϵ V
Vocabulary
Given input ‘the monsoon rains have’ ,
V = {arrived, delhi, have, LM can calculate
is, monsoon, rains, the}
P(xi | the monsoon rains have) , ∀ xi ϵ V
Image source:
https://synthedia.substack.com/p/a-
timeline-of-large-language-model
Guess Who?
175 B parameters !
175 B parameters !
540 B parameters !
540 B parameters !
Megatron-Turing
NLG
Codex
Emergence
Emergence
Although the technical machineries are almost similar, ‘just scaling up’ these models
results in new emergent behaviors, which lead to significantly different capabilities and
societal impacts.
• How do their training strategies differ? How are Masked LMs (like, BERT) Encoder-only LM,
Encoder-decoder
different from Auto-regressive LMs (like, GPT)? LM
• Representation, completion
Retrieval
• Tasks: Alignment and isomorphism
augmentation
techniques
• Distinction between graph neural networks and neural KG inference
LLM Research is all about implementing and experimenting with your ideas.
Rule of thumb:
Never believeAlways in any hypothesis until your
get your hands dirty !
experiments verify it !
LLM Research is all about implementing and experimenting with your ideas.