ChatGPT, and why are these new ML models so good
ChatGPT, and why are these new ML models so good
79 4 Share
Hightouch helps you sync your customer data from your warehouse into the tools your
business teams rely on every day. No custom code, no APIs, just SQL. In a few clicks and
you’ve got the data you need in Salesforce, Hubspot, etc. Check out their guide to
Reverse ETL, or book a demo here.
When I graduated with a Data Science degree in 2017, AI was kind of like a funny toy, and
mostly something researchers (read: not me) spent their time on. Getting a half decent
result from an ML model involved a bunch of code, several failed attempts at training, and
then the inevitable abdication and surrender.
Not bad!
This is from a model called ChatGPT, a recent release from OpenAI 1 that acts as a sort of
conversation companion. ChatGPT has been making rounds on the web for prompt
responses that are very good, like the above (but don’t worry, all of this post is hand written
by yours truly).
Nothing like this existed when I was in school — and even over just the past year, the
quality of available ML models has accelerated dramatically. Sentiment among people I
know in AI has never been higher and more excited, and hundreds of startups have been
popping up, building on top of these so-called Large Language Models (LLMs).
How did things improve so quickly? And what are these models actually doing?
GPT-3 is a language generation model. Machine Learning is just about figuring out
relationships – what’s the impact of something on another thing? This is pretty
straightforward when you’re tackling structured problems – like predicting housing
pricing based on the number of bedrooms – but gets kind of confusing when you move
into the realm of language and text. What are ML models doing when they generate
text? How does that work?
The easiest way to understand text generation is to think about a really good friend of
yours (assuming you have one). At some point if you hang out enough, you get a good
feel for their mannerisms, phrasing, and preferred topics of conversation - to the point
where you might be able to reliably predict what they’re going to say next (“finishing
each other’s sentences”). That’s exactly how GPT-3 and other models like it work - they
learn a lot (I mean, like really a lot) about text, and then based on what they’ve seen so
far, predict what’s coming next.
The actual internals of language models are obviously Very Scary and Very
Complicated - there’s a reason that most big advancements come from big research
teams full of PhDs.
ChatGPT is trained on text and code from across the web (articles, books, comments, etc.),
but also actual human conversations:
We trained this model using Reinforcement Learning from Human Feedback (RLHF),
using the same methods as InstructGPT, but with slight differences in the data
collection setup. We trained an initial model using supervised fine-tuning: human AI
trainers provided conversations in which they played both sides—the user and an AI
assistant. We gave the trainers access to model-written suggestions to help them
compose their responses.
There’s a whole taxonomy of OpenAI models and what other models they’re built off of.
It’s hard to point to a particular single breakthrough – instead, many of the trends that
have been driving better models for years have just continued to develop.
1. More training data – OpenAI and others have been training models on just colossally
large sets of data taken from increasingly diverse sources. Better data means better
models
3. More horsepower! – the machines we’re using to train and run these models are
getting stronger and more efficient. It’s a rough estimate but training GPT-3 probably
cost millions of dollars in compute cost
These bullet points are nothing new; they’re just steady progress on the same dimensions
as the past few years. Instead, my “big brain” take on why AI has been making it into the
discourse so much more lately is less about the models themselves, and more about who
uses the models.
If you look at models that have made it into the public discourse recently, like DALL-E or
Stable Diffusion, they share a unique quality: whoever built the model also built an
interface to the model. Using ChatGPT is as simple as typing a prompt into OpenAI’s
website; generating a photo with DALL-E is too. It’s for everyone! And that is very weird.
It’s hard to overstate how novel this is. For as long as I can remember, AI was a research
driven discipline, which meant that interfaces to cutting edge models were in what
researchers were familiar with: code. In fact you can trace a pretty clear progression in
these model interfaces – a fancy word for how you use them – from niche and closed off,
to widely applicable and open.
2. Code and local first: a new cutting edge model is available for download, and you work
with it on your computer (or a server) with code
3. Code and remote first: a new model is released via API, where you can make requests
and get responses. You still use code to do that (this is how GPT-3 worked)
4. [today] UI and remote first: many new models are released to the public via a slick UI
that’s easy to interact with
In other words, anyone can use these new ML models easily, which is a very significant
departure from how things used to work. AI has gone from a by-researchers-for-
researchers (and practitioners) discipline to a by-researchers-for-the-public-discipline.
The implications of this are huge! Part of OpenAI’s research philosophy is transparency:
they want major developments to be known and available to the public. This is obviously a
double edged sword; the public is erratic, doesn’t understand what’s going on under the
hood, and is very prone to misuse and misunderstand. But it’s also pretty cool to be able to
use this stuff, and maybe more than cool, it’s useful.
Generating text instead of writing it: Jasper raised $125M (!) for a tool that generates
marketing copy and content with AI.
Using text to analyze data: building charts, writing queries, etc. generated by ML
models
Using text to build interfaces: describing what you want (e.g. a customer support tool)
and generating the code with ML models
I have my gripes with the no code universe, but it is undoubtedly true that we’re going to
see non-technical people empowered by this kind of stuff. Which is why you subscribe to
Technically!
Is the dawn of the era of the idea guy beginning? What do you think about what’s been
going on in ML? How do you think it might help or hurt you at work? Chime in with a
comment.
Hightouch helps you sync your customer data from your warehouse into the tools your
business teams rely on every day. No custom code, no APIs, just SQL. In a few clicks and
you’ve got the data you need in Salesforce, Hubspot, etc. Check out their guide to
Reverse ETL, or book a demo here.
1 OpenAI is a for-profit research company, with very very deep pockets, focused on building
powerful AI models that benefit humanity (or something like that).
79 Likes
Previous Next
Write a comment...
2 more comments...