Transformer
Transformer
Source: https://github.com/jessevig/bertviz
Model Architecture
High Level
Model Architecture
Positional Encoding
Model Architecture
Layer Normalization & Residual Connection
Model Architecture
Position-wise Feed Forward Networks
● Achevies SOTA on 2
machine translation dataset
● Less training cost than
existing SOTA models
Model Variation Study