0% found this document useful (0 votes)
3 views

Cs224u Intro 2023 Handout

The document outlines the course CS224u: Natural Language Understanding, led by Christopher Potts, detailing the team members and their areas of expertise. It covers the course structure, including topics such as contextual representations, sentiment analysis, and model introspection, along with assignments and projects. The course aims to provide hands-on experience in NLU and develop insightful researchers in the field.

Uploaded by

Maksmilian Mro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Cs224u Intro 2023 Handout

The document outlines the course CS224u: Natural Language Understanding, led by Christopher Potts, detailing the team members and their areas of expertise. It covers the course structure, including topics such as contextual representations, sentiment analysis, and model introspection, along with assignments and projects. The course aims to provide hands-on experience in NLU and develop insightful researchers in the field.

Uploaded by

Maksmilian Mro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 98

Introduction and course overview

Christopher Potts
CS224u: Natural Language Understanding
Our team
• Kawin Ethayarajh: Evaluation in NLP
• Sidd Karamcheti: Robot learning and NLP, scaling
• Mina Lee: Human–AI Interaction, HCI, LLMs
• Siyan Li: Human-centered NLP; distinguished course alum
• Lisa Li: Diffusion models, prefix tuning, in-context learning
• Tolúlope Ògúnremí: Multilingual and low-resource NLP
• Tianyi Zhang: LLMs, emergence
2
3
2012

4
2012

5
2012

2022

6
A golden age for NLU

7
Which U.S. states border no U.S. states?

8
1980

Which U.S. states border no U.S. states?

9
1980

Which country bordering the Mediterranean


borders a country that is bordered by a country
whose population exceeds the population of
India? turkey.

Which U.S. states border no U.S. states?


Which U.S. states border no U.S. states? I donʼt
understand.

10
1980
2009

Which country bordering the Mediterranean


borders a country that is bordered by a country
whose population exceeds the population of
India? turkey.

Which U.S. states border no U.S. states?


Which U.S. states border no U.S. states? I donʼt
understand.

11
1980
2009

2020
Which country bordering the Mediterranean
borders a country that is bordered by a country
whose population exceeds the population of
India? turkey.

Which U.S. states border no U.S. states?


Which U.S. states border no U.S. states? I donʼt
understand.

12
1980
2009

2020
Which country bordering the Mediterranean
borders a country that is bordered by a country 2020
whose population exceeds the population of
India? turkey.

Which U.S. states border no U.S. states?


Which U.S. states border no U.S. states? I donʼt
understand.

13
1980
2009

2020
Which country bordering the Mediterranean
borders a country that is bordered by a country 2020
whose population exceeds the population of
2021
India? turkey.

Which U.S. states border no U.S. states?


Which U.S. states border no U.S. states? I donʼt
understand.

14
1980
2009

2020
Which country bordering the Mediterranean
borders a country that is bordered by a country 2020
whose population exceeds the population of
2021
India? turkey.
2022

Which U.S. states border no U.S. states?


Which U.S. states border no U.S. states? I donʼt
understand.

15
1980
2009

2020
Which country bordering the Mediterranean
borders a country that is bordered by a country 2020
whose population exceeds the population of
2021
India? turkey.
2022

Which U.S. states border no U.S. states?2022


Which U.S. states border no U.S. states? I donʼt
understand.

16
OpenAI GPT-3

17
Spotting modelsʼ “cheap tricks”

18
OpenAI GPT-3

Levesque 2013
19
20
Benchmarks saturate faster than ever

Kiela et al. 2021


21
Emergent abilities of large language models

Jason Weiʼs blog post


22
What is going on?

23
AI model development past and present

1960 1970 1980 1990 2000 2010 2020

24
AI model development past and present

1960 1970 1980 1990 2000 2010 2020

25
AI model development past and present

1960 1970 1980 1990 2000 2010 2020

26
AI model development past and present

Task parameters

1960 1970 1980 1990 2000 2010 2020

27
AI model development past and present

1960 1970 1980 1990 2000 2010 2020

28
The Transformer

How on earth does this work?

Oh, this is actually pretty simple!

Wait, why does this work so well?

29
Self-supervision
1. The modelʼs only objective is to learn co-occurrence patterns in
the sequences it is trained on.
2. Alternatively: to assign high probability to attested sequences.
3. Generation then involves sampling from the model.
4. The sequences can contain anything.

30
Large-scale pretraining

31
Model size PaLM
1T
Megatron-Turing NLG

100B GPT-3 (175B)

10B
Megatron (11B)

Megatron (8.3B)
1B GPT-2
BERT
100M GPT

2018 2019 2020 2021 2022 2023


32
A growing number of powerful LLMs

Loose collectives

33
Academic
Start-ups
Model size PaLM
1T
Megatron-Turing NLG

100B GPT-3 (175B)

LLaMA (Meta; 13B)

10B
Megatron (11B) FLAN T5 XXL (Google; 11B)

Alpaca (Stanford; 7B)


Megatron (8.3B)
FLAN T5 XL (Google; 3B)
1B GPT-2
BERT
100M GPT

2018 2019 2020 2021 2022 2023


34
Prompting
When you prompt a language model, you put it in a temporary
state, and then you generate a sample from the model.
• Better late than __________
• Every day, I eat breakfast, lunch, and __________
• The President of the U.S. is __________
• The key to happiness is __________

35
The GPT-3 paper and the rise of in-context learning

only on moonlit nights

36
Pure self-supervision vs. regular supervision
Standard supervision for “Few-shot in-context learning”
nervous anticipation
My palms started to sweat nervous Hey model, here is an example of nervous
Supervision
as the lotto numbers were for nervous anticipationanticipation: “My palms started to sweat as
anticipation
read off. = 1
the lotto numbers were read off.”
I took a deep breath as the nervous
anticipation Hey model, here’s an example without
curtain started to rise on my
debut night. = 1 nervous anticipation: “...”

I couldn’t shake a deep nervous


feeling of unease about the anticipation
whole affair. = 0

37
Learning from human feedback

38 ChatGPT blog post


Step-by-step and chain-of-thought reasoning

Can models reason about negation? Does the model know


that if the customer doesnʼt have any loans, then the
customer doesnʼt have any auto loans?

39
Old-school prompting style (so 2021)

It reversed the question!


40
Step-by-step prompting style (cutting edge!)
Logical and commonsense reasoning exam.
Explain your reasoning in detail, then answer with Yes or No. Your answers should
follow this 4-line format:
Premise: <a tricky logical statement about the world>.
Question: <question requiring logical deduction>.
Reasoning: <an explanation of what you understand about the possible scenarios>.
Answer: <Yes or No>.
Premise: the customer doesn’t have any loans
Question: Can we logically conclude for sure that the customer doesn’t have any auto
loans?
Reasoning: Let's think logically step by step. The premise basically tells us that
41
42
Course overview

43
High-level overview
Topics Work
1. Contextual representations 1. 3 assignment/bakeoff combos
2. Multi-domain sentiment analysis 2. 3 offline quizzes
3. Retrieval-augmented in-context
learning 3. Final project:
4. Compositional generalization a. Lit review
5. Benchmarking and adversarial b. Experiment protocol
training and testing
c. Final paper
6. Model introspection
7. Methods and metrics
44
Background materials
• CS224n is a prerequisite for this course, so we are going to skip
a lot of the fundamentals we have covered in past years.
• If you need a refresher, check out the background page of the
course site:
• Fundamentals of scientific computing in AI
• Static vector representations
• Supervised learning

45
Core goals
• Hands-on experience with a wide range of challenging NLU
problems.
• A mentor from the teaching team will guide you through the
project assignments – there are many examples of these projects
becoming important publications.
• Central goal: to make you the best – most insightful and
responsible – NLU researcher and practitioner wherever you go
next.
46
Course theme

Transformer-based pretraining

47
Progression and exploration for Transformers
1. Core concepts and goals
2. Architectures
3. Positional encoding
4. Distillation
5. Diffusion objectives [Lisa!]
6. Practical pretraining and fine-tuning [Sidd!]

48
49
Course theme

Retrieval-augmented in-context learning

50
LLMs for everything Retrieval-augmented

51
What do we need?

• Synthesis/Fluency 🤠
• Efficiency
• Updateability
• Provenance/Factualness
• Safety/Security

52
Efficiency
LLMs for everything 😬 Retrieval-augmented 🤠
Smaller LMs
• Cheaper to develop
• Cheaper to maintain
• Cheaper to deploy
Managing large search
indices is a familiar problem.

53
Updateability
LLMs for everything
�� Retrieval-augmented
🤠

Document update: One


forward pass of the LLM
54
Provenance / Factualness
LLMs for everything 😓 Retrieval augmented ��

55 These links are not real!


Safety and security

LLMs for everything 😓


User privacy challenge: LLMs are known to memorize long strings
from their training data.
Client security challenge: No known way to compartmentalize LLM
capabilities.
Organizational security challenge: No known way to restrict
access to specific parts of an LLMʼs capabilities.

56
Safety and security
Retrieval augmented ��
Access restrictions
imposed at the
document level in a
familiar way.

57
What do we need?
LLMs for everything Retrieval-augmented

• Synthesis/Fluency �� ��
• Efficiency �� ��
• Updateability �� ��
• Provenance/Factualness �� ��
• Safety/Security �� ��
58
recent past?
The present: Wrangling pretrained components

Task parameters

59
Models can communicate in natural language

Text Retriever Texts with scores

Text LLM Texts with scores

60
Few-shot OpenQA
What is the course to take?
Sampled train
p1: Pragmatics is the study of language use. D
D = random
q1: What is pragmatics?
D = IR.kNN(Q)
a1: The study of language use
pi = IR.retrieve(qi, k=1)
p2: Bert is a Muppet who is lives with Ernie.
Hindsight pi = IR.retrieve(qi+a1, k=1)
q2: Who is Bert?
P = IR.retrieve(qi, k=5)
a2: Bert is a Muppet pi = LM.argmaxp∊P (ai | qi, p)

B: The course to take is NLU! b Retrieval (OpenQA)


Query rewriting Q = LM(Q, D)
Q: What is the course to take?
b = IR.retrieve(Q, k=1)
A: LM(y | Q, D, b) restrict to y substring of b

∑b ∈ B IR(b | Q) • LM(y | Q, D, b) All we are given


62
63
Course theme

Compositional generalization

64
The COGS challenge

65 Jiang and Linzen 2020


Stubborn COGS splits

66 Wu et al. 2023
ReCOGS

67 Wu et al. 2023
ReCOGS remains challenging

68 Wu et al. 2023
69
Course theme

Better and more diverse benchmark tasks

70
Water and air of our field
Jacques Cousteau: “Water and air,
the two essential fluids on which
all life depends, have become
global garbage cans.”

71
We ask a lot of our datasets
1. Optimize models
2. Evaluate models
3. Compare models
4. Enable new capabilities in models
5. Measure fieldwide progress
6. Scientific inquiry

72
What does benchmark saturation really mean?

Kiela et al. 2021


73
Dynabench

74
Course theme

More meaningful evaluations

75
Strathernʼs Law:
When a measure becomes a target,
it ceases to be a good measure

76
What we seem to value Performance
Efficiency
Interpretability (for researchers)
Applicability in the real world
Selected ʻValues
Robustness
encoded in ML Scalability
researchʼ from Interpretability (for users)
Benificence
Birhane et al. 2021
Privacy
Fairness
77
Justice
Towards multidimensional leaderboards

78
Dynascoring

Ma, Ethayarajh, Thrush, et al. 2021


79
Dynascoring

Ma, Ethayarajh, Thrush, et al. 2021


80
A new era of more meaningful evaluations?
Assessment today Assessments tomorrow
• One-dimensional • High-dimensional and fluid
• Largely insensitive to context • Highly sensitive to context
(use-case) (use-case)
• Terms set by the research • Terms set by the stakeholders
community • Judgments ultimately made by
• Opaque users
• Tailored to machine tasks • Tailored to human tasks

81
Course theme

Faithful, human-interpretable explanations of models

82
83
Trust

Safety Approved use

Reliability Bias

Analytic guarantees
about model behaviors
84
Standards for explanation
Human interpretable Faithful
We can give mechanistic, mathematical We can give human interpretable
explanations of how models work that are explanations that are not true to how our
perfectly faithful and accurate. models actually work.
However, these explanations fail to These can can seem satisfying, but if we
illuminate the concepts we care about. canʼt guarantee that they are faithful to
how the models actually work, we are
simply confusing ourselves.

Goal: Concept-level of the causal effects


85
Explanation methods for NLP models
Train/test evaluations cannot Active manipulations of model internal states
provide guarantees about behavior provide causal insights and rich characterizations
on new examples. of those states.

Probing methods illuminate internal Interchange intervention training: train models


representations but do not support to conform to the structure of high-level symbolic
causal inferences. models.

Attribution methods illuminate the


causal dynamics of models but
donʼt characterize their internal
representations
86
Looking ahead

87
1980 2023

Vector similarity index


88
Course mechanics

89
Core course components

Quizzes 15%
Homeworks and bakeoffs 35%
Literature review 10%
Experiment protocol 10%
Final project paper 30%

90
Fully asynchronous
• All lectures are recorded, and attendance is not required.
• Attending lectures is a great way to participate in shaping the
course and build connections with the teaching team.
• Office hours are offered in person and on Zoom; details to
come.
• Continuous evaluation: three assignments, four online quizzes,
and three components to the project work.

91
A note on grading original systems
All the homeworks culminate in an “original system” question that becomes your bakeoff
entry. Here are the basic guidelines we will adopt for grading this work:
1. We want to emphasize that this needs to be an original system. It doesnʼt suffice to
download code from the Web, retrain, and submit, even if this leads to an outstanding
bakeoff score. You can build on othersʼ code, but you have to do something new and
meaningful with it.
2. Systems that are very creative and well-motivated will be given full credit even if they do
not perform well on the bakeoff data. We want to encourage creative exploration!
3. Other systems will receive less than full credit, based on the judgment of the teaching
team. The specific criteria will vary based on the nature of the assignment. Point
deductions will be justified in feedback.
92
Project work
1. The second half of the course is devoted to projects.
2. The associated lectures, notebooks, and readings are focused
on methods, metrics, and best practices.
3. The assignments are all project-related; details are available at
the course website
4. Exceptional final projects from past years (access restricted)
5. Lots of guidance on projects
93
Crucial course links
• Website
• Code repository
• Discussion forum
• Gradescope
• Teaching team: cs224u-spr2223-staff@lists.stanford.edu

94
Quizzes
1. Quiz 0 is on course requirements and related details. The sole
purpose of the quiz is to create a clear incentive for you to study
the website and understand your rights and obligations.
2. Quizzes 1–4 create a course-related incentive for individual
students to study the material beyond what is required for the
more free-form and collaborative assignments.
3. All quizzes are open notes, open book, open ChatGPT, etc., but
95
no collaboration is permitted.
For next time
1. Get set up using setup.ipynb in the course repo.
2. Make sure youʼre in the discussion forum. If not, follow the link
3. given at the homepage for our course Canvas.
4. Consider doing Quiz 0 as a way of getting to know your rights
and obligations for this course.
5. Check out hw_sentiment.ipynb. If this material is new to you or
you need a refresher, check out the background materials.
96
Computing resources
1. We expect to get you AWS credits.
2. Consider getting a Colab Pro account; at $9.99/month, a
three-month subscription is cheaper than even the cheapest
textbooks.
3. Sign up for SageMaker Studio Lab for additional free GPU support.
4. Sign up for Cohere for (for now) free access to outstanding language
models (and OpenAI still offers $5 in credits for new accounts).

97
Core goals (repeated from above)
• Hands-on experience with a wide range of challenging NLU
problems.
• A mentor from the teaching team will guide you through the
project assignments – there are many examples of these projects
becoming important publications.
• Central goal: to make you the best – most insightful and
responsible – NLU researcher and practitioner wherever you go
next.
98

You might also like