This document summarizes recent research on applying self-attention mechanisms from Transformers to domains other than language, such as computer vision. It discusses models that use self-attention for images, including ViT, DeiT, and T2T, which apply Transformers to divided image patches. It also covers more general attention modules like the Perceiver that aims to be domain-agnostic. Finally, it discusses work on transferring pretrained language Transformers to other modalities through frozen weights, showing they can function as universal computation engines.
27. Copyright (C) Present Square Co., Ltd. All Rights Reserved.
Appendix
参考文献
• [32] Jonathan Frankle, Gintare Karolina Dziugaite, Daniel Roy, and Michael Carbin. Linear mode connectivity and the
lottery ticket hypothesis. In International Conference on Machine Learning (ICML), 2020.
https://arxiv.org/abs/1912.05671.
• [46] Pavel Izmailov, Dmitrii Podoprikhin, Timur Garipov, Dmitry Vetrov, and Andrew Gordon Wilson. Averaging weights
leads to wider optima and better generalization. In Conference on Uncertainty in Articial Intelligence(UAI), 2018.
https://arxiv.org/abs/1803.05407.
• [47] Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V Le, Yunhsuan Sung, Zhen Li, and
Tom Duerig. Scaling up visual and vision-language representation learning with noisy text supervision. In International
Conference on Machine Learning (ICML), 2021. https://arxiv.org/abs/2102.05918.
• [72] Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda
Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from
natural language supervision. In International Conference on Machine Learning (ICML), 2021.
https://arxiv.org/abs/2103.00020.
• [102] Xiaohua Zhai, Alexander Kolesnikov, Neil Houlsby, and Lucas Beyer. Scaling vision transformers, 2021.
https://arxiv.org/abs/2106.04560.
27