Machine Translation On Wmt2014 English German
Metrics
BLEU score
Hardware Burden
Operations per network pass
Results
Performance results of various models on this benchmark
Comparison Table
Model Name | BLEU score | Hardware Burden | Operations per network pass |
---|---|---|---|
improving-neural-language-modeling-via | 29.52 | ||
multi-branch-attentive-transformer | - | - | - |
advaug-robust-adversarial-augmentation-for-1 | 29.57 | - | - |
incorporating-a-local-translation-mechanism | 27.35 | ||
flowseq-non-autoregressive-conditional | 22.94 | ||
very-deep-transformers-for-neural-machine | 30.1 | - | - |
muse-parallel-multi-scale-attention-for | 29.9 | ||
deep-residual-output-layers-for-neural | 28.1 | ||
frage-frequency-agnostic-word-representation | 29.11 | ||
glancing-transformer-for-non-autoregressive | 25.21 | - | - |
partialformer-modeling-part-instead-of-whole | 29.56 | - | - |
bi-simcut-a-simple-strategy-for-boosting-1 | 30.78 | - | - |
simple-recurrent-units-for-highly | 28.4 | 34G | |
Model 14 | 20.7 | ||
190506596 | 29.7 | ||
lite-transformer-with-long-short-range | 26.5 | - | - |
accelerating-neural-transformer-via-an | 26.05 | - | - |
phrase-based-neural-unsupervised-machine | 17.16 | ||
kermit-generative-insertion-based-modeling | 28.7 | ||
finetuning-pretrained-transformers-into-rnns | 28.7 | ||
accelerating-neural-transformer-via-an | 26.31 | - | - |
mask-attention-networks-rethinking-and | 29.1 | ||
effective-approaches-to-attention-based | 20.9 | ||
attention-is-all-you-need | 27.3 | - | 330000000.0G |
modeling-localness-for-self-attention | 29.2 | ||
edinburghs-syntax-based-systems-at-wmt-2015 | 20.7 | ||
effective-approaches-to-attention-based | 11.3 | ||
rethinking-perturbations-in-encoder-decoders | 33.89 | ||
convolutional-sequence-to-sequence-learning | 25.16 | 72G | |
attention-is-all-you-need | 28.4 | 871G | 2300000000.0G |
sequence-level-knowledge-distillation | 18.5 | ||
random-feature-attention-1 | 28.2 | ||
neural-machine-translation-in-linear-time | 23.75 | ||
phrase-based-neural-unsupervised-machine | 20.23 | ||
levenshtein-transformer | 27.27 | - | - |
universal-transformers | 28.9 | ||
scaling-neural-machine-translation | 29.3 | 9G | |
deterministic-non-autoregressive-neural | 21.54 | ||
non-autoregressive-translation-by-learning | 26.6 | - | - |
adaptively-sparse-transformers | 26.93 | - | - |
the-evolved-transformer | 29.8 | - | - |
flowseq-non-autoregressive-conditional | 18.55 | ||
rethinking-batch-normalization-in | 30.1 | - | - |
data-diversification-an-elegant-strategy-for | 30.7 | ||
phrase-based-neural-unsupervised-machine | 17.94 | ||
resmlp-feedforward-networks-for-image | 26.4 | - | - |
subformer-a-parameter-reduced-transformer | 29.3 | ||
neural-machine-translation-with-adequacy | 28.99 | ||
googles-neural-machine-translation-system | 26.3 | ||
flowseq-non-autoregressive-conditional | 23.14 | ||
depthwise-separable-convolutions-for-neural | 26.1 | ||
adaptively-sparse-transformers | 25.89 | - | - |
lessons-on-parameter-sharing-across-layers-in | 35.14 | - | - |
effective-approaches-to-attention-based | 14.0 | ||
non-autoregressive-neural-machine-translation-1 | 19.17 | ||
unsupervised-statistical-machine-translation | 14.08 | ||
weighted-transformer-network-for-machine | 28.9 | ||
advaug-robust-adversarial-augmentation-for-1 | 28.08 | - | - |
convolutional-sequence-to-sequence-learning | 26.4 | 54G | |
non-autoregressive-translation-with | 27.06 | - | - |
synthesizer-rethinking-self-attention-in | 28.47 | ||
omninet-omnidirectional-representations-from | 29.8 | ||
the-evolved-transformer | 28.4 | 2488G | - |
accelerating-neural-transformer-via-an | 25.91 | - | - |
pay-less-attention-with-lightweight-and | 28.9 | - | - |
learning-to-encode-position-for-transformer | 29.2 | ||
r-drop-regularized-dropout-for-neural | 30.91 | 49G | |
advaug-robust-adversarial-augmentation-for-1 | 28.58 | - | - |
flowseq-non-autoregressive-conditional | 23.64 | ||
self-attention-with-relative-position | 29.2 | - | - |
pay-less-attention-with-lightweight-and | 29.7 | - | - |
time-aware-large-kernel-convolutions | 29.6 | - | - |
exploring-the-limits-of-transfer-learning | 32.1 | - | - |
resmlp-feedforward-networks-for-image | 26.8 | - | - |
bert-mbert-or-bibert-a-study-on | 31.26 | - | - |
mega-moving-average-equipped-gated-attention | 29.01 | - | - |
dense-information-flow-for-neural-machine | 25.52 | ||
mask-attention-networks-rethinking-and | 30.4 | ||
the-best-of-both-worlds-combining-recent | 28.5 | 44G | 2.81G |
incorporating-bert-into-neural-machine-1 | 30.75 | - | - |
fast-and-simple-mixture-of-softmaxes-with-bpe | 29.6 | ||
hat-hardware-aware-transformers-for-efficient | 28.4 | - | - |
flowseq-non-autoregressive-conditional | 20.85 | ||
deep-recurrent-models-with-fast-forward | 20.7 | 119G | |
understanding-back-translation-at-scale | 35.0 | 146G | |
bi-simcut-a-simple-strategy-for-boosting-1 | 30.56 | - | - |
outrageously-large-neural-networks-the | 26.03 | 24G | |
synchronous-bidirectional-neural-machine | 29.21 | ||
incorporating-a-local-translation-mechanism | 25.20 | ||
depth-growing-for-neural-machine-translation | 30.07 | 24G | |
neural-semantic-encoders | 17.9 |