Tall Transformer with Style-Augmented Training | 37.8 | Better Translation for Vietnamese | |
Transformer+BPE+FixNorm+ScaleNorm | 32.8 | Transformers without Tears: Improving the Normalization of Self-Attention | |
Transformer+LayerNorm-simple | 31.4 | Understanding and Improving Layer Normalization | - |