Language Models are Few-Shot Learners: 开箱即用的GPT-3(二)

新兴AI民工

于 2025-07-10 14:52:13 发布

阅读量222

点赞数 5

CC 4.0 BY-SA版权

分类专栏：深度网络/大模型经典论文详解文章标签：语言模型 gpt-3 人工智能论文解读

本文链接：https://blog.csdn.net/pcgamer/article/details/149194587

深度网络/大模型经典论文详解专栏收录该内容

49 篇文章 ¥49.90 ¥99.00

订阅专栏

超级会员免费看

接上一篇

Approach

前面的摘要和Introduction做了一些概要性的介绍，论文在第二章，也就是approach中，介绍了模型的设计，zero，one，few-shot的设计等等。

这一章一开头就说，GPT-3的结构和GPT-2的结构一样，只是在相应的把模型尺寸，数据规模，训练时间等增加了。Our basic pre-training approach, including model, data, and training, is similar to the process described in [RWC+19],
with relatively straightforward scaling up of the model size, dataset size and diversity, and length of training。

而且在上下文学习这一块也和GPT-2一样，Our use of in-context learning is also similar to [RWC+19], but in this work we systematically explore different settings for
learning within the context.

所以论文的意思是，从不同的角度来评估GPT-3，也就是在第一章中提到的，GPT-3有多不依赖某个具体的NLP任务，其实就是上面提到的zero，one，few-shot这些评估方式：
Therefore, we start this section by explicitly defining and contrasti