HyperAI

Machine Translation On Wmt2014 English German

Métriques

BLEU score
Hardware Burden
Operations per network pass

Résultats

Résultats de performance de divers modèles sur ce benchmark

Tableau comparatif
Nom du modèleBLEU scoreHardware BurdenOperations per network pass
improving-neural-language-modeling-via29.52
multi-branch-attentive-transformer---
advaug-robust-adversarial-augmentation-for-129.57--
incorporating-a-local-translation-mechanism27.35
flowseq-non-autoregressive-conditional22.94
very-deep-transformers-for-neural-machine30.1--
muse-parallel-multi-scale-attention-for29.9
deep-residual-output-layers-for-neural28.1
frage-frequency-agnostic-word-representation29.11
glancing-transformer-for-non-autoregressive25.21--
partialformer-modeling-part-instead-of-whole29.56--
bi-simcut-a-simple-strategy-for-boosting-130.78--
simple-recurrent-units-for-highly28.434G
Modèle 1420.7
19050659629.7
lite-transformer-with-long-short-range26.5--
accelerating-neural-transformer-via-an26.05--
phrase-based-neural-unsupervised-machine17.16
kermit-generative-insertion-based-modeling28.7
finetuning-pretrained-transformers-into-rnns28.7
accelerating-neural-transformer-via-an26.31--
mask-attention-networks-rethinking-and29.1
effective-approaches-to-attention-based20.9
attention-is-all-you-need27.3-330000000.0G
modeling-localness-for-self-attention29.2
edinburghs-syntax-based-systems-at-wmt-201520.7
effective-approaches-to-attention-based11.3
rethinking-perturbations-in-encoder-decoders33.89
convolutional-sequence-to-sequence-learning25.1672G
attention-is-all-you-need28.4871G2300000000.0G
sequence-level-knowledge-distillation18.5
random-feature-attention-128.2
neural-machine-translation-in-linear-time23.75
phrase-based-neural-unsupervised-machine20.23
levenshtein-transformer27.27--
universal-transformers28.9
scaling-neural-machine-translation29.39G
deterministic-non-autoregressive-neural21.54
non-autoregressive-translation-by-learning26.6--
adaptively-sparse-transformers26.93--
the-evolved-transformer29.8--
flowseq-non-autoregressive-conditional18.55
rethinking-batch-normalization-in30.1--
data-diversification-an-elegant-strategy-for30.7
phrase-based-neural-unsupervised-machine17.94
resmlp-feedforward-networks-for-image26.4--
subformer-a-parameter-reduced-transformer29.3
neural-machine-translation-with-adequacy28.99
googles-neural-machine-translation-system26.3
flowseq-non-autoregressive-conditional23.14
depthwise-separable-convolutions-for-neural26.1
adaptively-sparse-transformers25.89--
lessons-on-parameter-sharing-across-layers-in35.14--
effective-approaches-to-attention-based14.0
non-autoregressive-neural-machine-translation-119.17
unsupervised-statistical-machine-translation14.08
weighted-transformer-network-for-machine28.9
advaug-robust-adversarial-augmentation-for-128.08--
convolutional-sequence-to-sequence-learning26.454G
non-autoregressive-translation-with27.06--
synthesizer-rethinking-self-attention-in28.47
omninet-omnidirectional-representations-from29.8
the-evolved-transformer28.42488G-
accelerating-neural-transformer-via-an25.91--
pay-less-attention-with-lightweight-and28.9--
learning-to-encode-position-for-transformer29.2
r-drop-regularized-dropout-for-neural30.9149G
advaug-robust-adversarial-augmentation-for-128.58--
flowseq-non-autoregressive-conditional23.64
self-attention-with-relative-position29.2--
pay-less-attention-with-lightweight-and29.7--
time-aware-large-kernel-convolutions29.6--
exploring-the-limits-of-transfer-learning32.1--
resmlp-feedforward-networks-for-image26.8--
bert-mbert-or-bibert-a-study-on31.26--
mega-moving-average-equipped-gated-attention29.01--
dense-information-flow-for-neural-machine25.52
mask-attention-networks-rethinking-and30.4
the-best-of-both-worlds-combining-recent28.544G2.81G
incorporating-bert-into-neural-machine-130.75--
fast-and-simple-mixture-of-softmaxes-with-bpe29.6
hat-hardware-aware-transformers-for-efficient28.4--
flowseq-non-autoregressive-conditional20.85
deep-recurrent-models-with-fast-forward20.7119G
understanding-back-translation-at-scale35.0146G
bi-simcut-a-simple-strategy-for-boosting-130.56--
outrageously-large-neural-networks-the26.0324G
synchronous-bidirectional-neural-machine29.21
incorporating-a-local-translation-mechanism25.20
depth-growing-for-neural-machine-translation30.0724G
neural-semantic-encoders17.9