FLAN 137B (few-shot, k=11) | 26.1 | Finetuned Language Models Are Zero-Shot Learners | |
Exploiting Mono at Scale (single) | - | Exploiting Monolingual Data at Scale for Neural Machine Translation | - |
SMT + iterative backtranslation (unsupervised) | 18.23 | Unsupervised Statistical Machine Translation | |
Unsupervised S2S with attention | 9.64 | Unsupervised Machine Translation Using Monolingual Corpora Only | |
Attentional encoder-decoder + BPE | 34.2 | Edinburgh Neural Machine Translation Systems for WMT 16 | |
Unsupervised NMT + weight-sharing | 10.86 | Unsupervised Neural Machine Translation with Weight Sharing | |