FLAN 137B (few-shot, k=9) | 38.1 | Finetuned Language Models Are Zero-Shot Learners | |
Attentional encoder-decoder + BPE | 33.3 | Edinburgh Neural Machine Translation Systems for WMT 16 | |
Denoising autoencoders (non-autoregressive) | 30.30 | Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement | |
Adaptively Sparse Transformer (1.5-entmax) | 33.1 | Adaptively Sparse Transformers | |
Levenshtein Transformer (distillation) | 33.26 | Levenshtein Transformer | |