Transformer | 34.44 | Attention Is All You Need | |
TaLK Convolutions | 35.5 | Time-aware Large Kernel Convolutions | |
Mask Attention Network (small) | 36.3 | Mask Attention Networks: Rethinking and Strengthen Transformer | |
Minimum Risk Training [Edunov2017] | 32.84 | Classical Structured Prediction Losses for Sequence to Sequence Learning | |
Back-Translation Finetuning | 28.83 | Tag-less Back-Translation | - |
Transformer + R-Drop + Cutoff | 37.90 | R-Drop: Regularized Dropout for Neural Networks | |
Actor-Critic [Bahdanau2017] | 28.53 | An Actor-Critic Algorithm for Sequence Prediction | |