Attention Is All You Need: Transformer的提出(二)

新兴AI民工

于 2025-07-04 16:31:04 发布

阅读量49

点赞数

CC 4.0 BY-SA版权

分类专栏：深度网络/大模型经典论文详解文章标签： transformer 深度学习人工智能

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/pcgamer/article/details/149113687

深度网络/大模型经典论文详解专栏收录该内容

53 篇文章 ¥49.90 ¥99.00

订阅专栏

超级会员免费看

接上一篇。

Model Architecture

Applications of Attention in our Model，注意力机制在模型中的应用

上一篇讲了文章提出的注意里机制：提出新的一种注意力计算方法Scaled dot-product attention，以及其扩展的multi-head attention。

论文接下来的内容就是说的那张结构图中的三处attention的不同，把那张图再贴过来，便于对比。
在这里插入图片描述

首先是encoder这边的Multi-Head Attention。论文的原文为：The encoder contains self-attention layers. In a self-attention layer all of the keys, values and queries come from the same place, in this case, the output of the previous layer in the encoder. Each position in the encoder can attend to all positions in the previous layer of the encoder.

了解本专栏

超级会员免费看

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

打赏作者

新兴AI民工 码字不易，各位看客随意

¥1 ¥2 ¥4 ¥6 ¥10 ¥20

扫码支付：¥1

获取中

扫码支付

您的余额不足，请更换扫码支付或充值

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。