The document discusses hyperparameter optimization in machine learning models. It introduces various hyperparameters that can affect model performance, and notes that as models become more complex, the number of hyperparameters increases, making manual tuning difficult. It formulates hyperparameter optimization as a black-box optimization problem to minimize validation loss and discusses challenges like high function evaluation costs and lack of gradient information.
The document summarizes a research paper that compares the performance of MLP-based models to Transformer-based models on various natural language processing and computer vision tasks. The key points are:
1. Gated MLP (gMLP) architectures can achieve performance comparable to Transformers on most tasks, demonstrating that attention mechanisms may not be strictly necessary.
2. However, attention still provides benefits for some NLP tasks, as models combining gMLP and attention outperformed pure gMLP models on certain benchmarks.
3. For computer vision, gMLP achieved results close to Vision Transformers and CNNs on image classification, indicating gMLP can match their data efficiency.
The document summarizes a research paper that compares the performance of MLP-based models to Transformer-based models on various natural language processing and computer vision tasks. The key points are:
1. Gated MLP (gMLP) architectures can achieve performance comparable to Transformers on most tasks, demonstrating that attention mechanisms may not be strictly necessary.
2. However, attention still provides benefits for some NLP tasks, as models combining gMLP and attention outperformed pure gMLP models on certain benchmarks.
3. For computer vision, gMLP achieved results close to Vision Transformers and CNNs on image classification, indicating gMLP can match their data efficiency.
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...Deep Learning JP
The document proposes modifications to self-attention in Transformers to improve faithful signal propagation without shortcuts like skip connections or layer normalization. Specifically, it introduces a normalization-free network that uses dynamic isometry to ensure unitary transformations, a ReZero technique to implement skip connections without adding shortcuts, and modifications to attention and normalization techniques to address issues like rank collapse in Transformers. The methods are evaluated on tasks like CIFAR-10 classification and language modeling, demonstrating improved performance over standard Transformer architectures.
2. 書誌情報
RLCD: Reinforcement Learning from Contrast Distillation for Language Model
Alignment
https://arxiv.org/abs/2307.12950
タイトル:
著者:
人間のフィードバックデータを使用せずに人間の好みに合わせて言語モデルを調整する方法である,コ
ントラスト蒸留による強化学習 (RLCD) という手法を提案
概要:
2
Kevin Yang, Dan Klein, Asli Celikyilmaz, Nanyun Peng, Yuandong Tian
UC Berkeley, Meta AI, UCLA
3. • Reinforcement Learning with Human Feedback (RLHF) は,人間の好みに合わせて
調整(Alignment)するために用いられる(無害性,有用性,真実性など)
背景
3