FLAN 137B (few-shot, k=9) | 38.1 | Finetuned Language Models Are Zero-Shot Learners | - |
Attentional encoder-decoder + BPE | 33.3 | Edinburgh Neural Machine Translation Systems for WMT 16 | - |
Denoising autoencoders (non-autoregressive) | 30.30 | Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement | - |
Adaptively Sparse Transformer (1.5-entmax) | 33.1 | Adaptively Sparse Transformers | - |
Levenshtein Transformer (distillation) | 33.26 | Levenshtein Transformer | - |