0% found this document useful (0 votes)
12 views

ChatGPT, and why are these new ML models so good

Uploaded by

lomicas394
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

ChatGPT, and why are these new ML models so good

Uploaded by

lomicas394
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

[Technically dispatch] ChatGPT, and why are

these new ML models so good


Explaining recent advances in creepy good AI
JUSTIN
DEC 13, 2022

79 4 Share

This post is sponsored by Hightouch!

Hightouch helps you sync your customer data from your warehouse into the tools your
business teams rely on every day. No custom code, no APIs, just SQL. In a few clicks and
you’ve got the data you need in Salesforce, Hubspot, etc. Check out their guide to
Reverse ETL, or book a demo here.

When I graduated with a Data Science degree in 2017, AI was kind of like a funny toy, and
mostly something researchers (read: not me) spent their time on. Getting a half decent
result from an ML model involved a bunch of code, several failed attempts at training, and
then the inevitable abdication and surrender.

Today, it’s coming for my job:

Not bad!

This is from a model called ChatGPT, a recent release from OpenAI 1 that acts as a sort of
conversation companion. ChatGPT has been making rounds on the web for prompt
responses that are very good, like the above (but don’t worry, all of this post is hand written
by yours truly).

Nothing like this existed when I was in school — and even over just the past year, the
quality of available ML models has accelerated dramatically. Sentiment among people I
know in AI has never been higher and more excited, and hundreds of startups have been
popping up, building on top of these so-called Large Language Models (LLMs).

How did things improve so quickly? And what are these models actually doing?

Join 40K+ people learning more about how


software works
Basics of ML models and text generation
Admittedly, this is not the first time I’ve written about this. You might remember GPT-3,
another OpenAI model that dropped in 2020 with some seriously impressive results for
generating text:

GPT-3 is a language generation model. Machine Learning is just about figuring out
relationships – what’s the impact of something on another thing? This is pretty
straightforward when you’re tackling structured problems – like predicting housing
pricing based on the number of bedrooms – but gets kind of confusing when you move
into the realm of language and text. What are ML models doing when they generate
text? How does that work?

The easiest way to understand text generation is to think about a really good friend of
yours (assuming you have one). At some point if you hang out enough, you get a good
feel for their mannerisms, phrasing, and preferred topics of conversation - to the point
where you might be able to reliably predict what they’re going to say next (“finishing
each other’s sentences”). That’s exactly how GPT-3 and other models like it work - they
learn a lot (I mean, like really a lot) about text, and then based on what they’ve seen so
far, predict what’s coming next.

The actual internals of language models are obviously Very Scary and Very
Complicated - there’s a reason that most big advancements come from big research
teams full of PhDs.

ChatGPT is trained on text and code from across the web (articles, books, comments, etc.),
but also actual human conversations:

We trained this model using Reinforcement Learning from Human Feedback (RLHF),
using the same methods as InstructGPT, but with slight differences in the data
collection setup. We trained an initial model using supervised fine-tuning: human AI
trainers provided conversations in which they played both sides—the user and an AI
assistant. We gave the trainers access to model-written suggestions to help them
compose their responses.

There’s a whole taxonomy of OpenAI models and what other models they’re built off of.

Why this is happening now


Every time an exciting new ML model or interface comes out, the question is always “why
now?” and the answers are usually sort of the same.

It’s hard to point to a particular single breakthrough – instead, many of the trends that
have been driving better models for years have just continued to develop.
1. More training data – OpenAI and others have been training models on just colossally
large sets of data taken from increasingly diverse sources. Better data means better
models

2. More complex model architectures – foundation models and transformer


architectures (neither of which you need to know in depth) are driving more complex,
deeper neural networks. In english: models are getting more complex

3. More horsepower! – the machines we’re using to train and run these models are
getting stronger and more efficient. It’s a rough estimate but training GPT-3 probably
cost millions of dollars in compute cost

These bullet points are nothing new; they’re just steady progress on the same dimensions
as the past few years. Instead, my “big brain” take on why AI has been making it into the
discourse so much more lately is less about the models themselves, and more about who
uses the models.

Model interfaces are now for the public


Over the past couple of years, models have gone from private, code first, and inaccessible to
widely available to the public.

If you look at models that have made it into the public discourse recently, like DALL-E or
Stable Diffusion, they share a unique quality: whoever built the model also built an
interface to the model. Using ChatGPT is as simple as typing a prompt into OpenAI’s
website; generating a photo with DALL-E is too. It’s for everyone! And that is very weird.

It’s hard to overstate how novel this is. For as long as I can remember, AI was a research
driven discipline, which meant that interfaces to cutting edge models were in what
researchers were familiar with: code. In fact you can trace a pretty clear progression in
these model interfaces – a fancy word for how you use them – from niche and closed off,
to widely applicable and open.

1. Academic: new model architecture is discussed in some paper on arXiv, practitioners


read and discuss

2. Code and local first: a new cutting edge model is available for download, and you work
with it on your computer (or a server) with code

3. Code and remote first: a new model is released via API, where you can make requests
and get responses. You still use code to do that (this is how GPT-3 worked)

4. [today] UI and remote first: many new models are released to the public via a slick UI
that’s easy to interact with

In other words, anyone can use these new ML models easily, which is a very significant
departure from how things used to work. AI has gone from a by-researchers-for-
researchers (and practitioners) discipline to a by-researchers-for-the-public-discipline.

The implications of this are huge! Part of OpenAI’s research philosophy is transparency:
they want major developments to be known and available to the public. This is obviously a
double edged sword; the public is erratic, doesn’t understand what’s going on under the
hood, and is very prone to misuse and misunderstand. But it’s also pretty cool to be able to
use this stuff, and maybe more than cool, it’s useful.

What ChatGPT means for you and the workforce


With every advance in AI comes the perennial question: is my job safe?
I’m not a sociologist, and I’m definitely not an economist. Papers and bodies of work have
been dedicated to how AI impacts the economy, and most is probably conjecture. What we
can talk about are a couple of practical changes that you might see over the next few years.

The concept of the prompt engineer


Models like ChatGPT and DALL-E respond to prompts given by the user. In the same sense
that searching on Google is now a skill, getting good at creating ML model prompts may
become the same. Some have called this eventual job (or more likely, skill you’ll use in
your job) prompt engineering. E.g. here’s a post with tips and tricks for prompt
engineering.

Businesses built on language models


In my day job at a VC fund, I’m seeing an absolute explosion in startups using language
models like GPT-3 to make something easier. This isn’t a new thing, but it’s definitely
accelerating. A few categories and examples:

Generating text instead of writing it: Jasper raised $125M (!) for a tool that generates
marketing copy and content with AI.

Using text to analyze data: building charts, writing queries, etc. generated by ML
models

Using text to build interfaces: describing what you want (e.g. a customer support tool)
and generating the code with ML models

I have my gripes with the no code universe, but it is undoubtedly true that we’re going to
see non-technical people empowered by this kind of stuff. Which is why you subscribe to
Technically!

The era of the idea guy


Taken to an extreme, AI like ChatGPT is going to make it as simple as coming up with the
idea (now: the prompt) and the model will do the work for you. This is not where we’re at
now, but early applications like generating decent marketing copy has definitely got me
scared, as a writer.

Is the dawn of the era of the idea guy beginning? What do you think about what’s been
going on in ML? How do you think it might help or hurt you at work? Chime in with a
comment.

This post is sponsored by Hightouch!

Hightouch helps you sync your customer data from your warehouse into the tools your
business teams rely on every day. No custom code, no APIs, just SQL. In a few clicks and
you’ve got the data you need in Salesforce, Hubspot, etc. Check out their guide to
Reverse ETL, or book a demo here.

1 OpenAI is a for-profit research company, with very very deep pockets, focused on building
powerful AI models that benefit humanity (or something like that).

79 Likes

Previous Next

Discussion about this post


Comments Restacks

Write a comment...

DIDIER BOREL 20 December 2022


good article
LIKE (1) REPLY SHARE

Reena Kapoor Arrivals and Departures 15 December 2022


First thing I thought of was the college essay(s)- it’s already a corrupted process but now since so
much of it can be machine generated, college admissions teams will need to meet candidates and
have live interviews to truly vet / assess real capabilities, which may not be a bad thing but has its
own limitations.
And... The fact that AI and attendant apps will also comprise another nail in the coffin for the
concept of 4-year college as we know it, is perhaps not a bad thing...
LIKE (1) REPLY SHARE

2 more comments...

© 2024 Justin ∙ Privacy ∙ Terms ∙ Collection notice


Substack is the home for great culture

You might also like