HyperAI

Machine Translation On Wmt2014 English German

Metriken

BLEU score
Hardware Burden
Operations per network pass

Ergebnisse

Leistungsergebnisse verschiedener Modelle zu diesem Benchmark

Vergleichstabelle
ModellnameBLEU scoreHardware BurdenOperations per network pass
improving-neural-language-modeling-via29.52
multi-branch-attentive-transformer---
advaug-robust-adversarial-augmentation-for-129.57--
incorporating-a-local-translation-mechanism27.35
flowseq-non-autoregressive-conditional22.94
very-deep-transformers-for-neural-machine30.1--
muse-parallel-multi-scale-attention-for29.9
deep-residual-output-layers-for-neural28.1
frage-frequency-agnostic-word-representation29.11
glancing-transformer-for-non-autoregressive25.21--
partialformer-modeling-part-instead-of-whole29.56--
bi-simcut-a-simple-strategy-for-boosting-130.78--
simple-recurrent-units-for-highly28.434G
Modell 1420.7
19050659629.7
lite-transformer-with-long-short-range26.5--
accelerating-neural-transformer-via-an26.05--
phrase-based-neural-unsupervised-machine17.16
kermit-generative-insertion-based-modeling28.7
finetuning-pretrained-transformers-into-rnns28.7
accelerating-neural-transformer-via-an26.31--
mask-attention-networks-rethinking-and29.1
effective-approaches-to-attention-based20.9
attention-is-all-you-need27.3-330000000.0G
modeling-localness-for-self-attention29.2
edinburghs-syntax-based-systems-at-wmt-201520.7
effective-approaches-to-attention-based11.3
rethinking-perturbations-in-encoder-decoders33.89
convolutional-sequence-to-sequence-learning25.1672G
attention-is-all-you-need28.4871G2300000000.0G
sequence-level-knowledge-distillation18.5
random-feature-attention-128.2
neural-machine-translation-in-linear-time23.75
phrase-based-neural-unsupervised-machine20.23
levenshtein-transformer27.27--
universal-transformers28.9
scaling-neural-machine-translation29.39G
deterministic-non-autoregressive-neural21.54
non-autoregressive-translation-by-learning26.6--
adaptively-sparse-transformers26.93--
the-evolved-transformer29.8--
flowseq-non-autoregressive-conditional18.55
rethinking-batch-normalization-in30.1--
data-diversification-an-elegant-strategy-for30.7
phrase-based-neural-unsupervised-machine17.94
resmlp-feedforward-networks-for-image26.4--
subformer-a-parameter-reduced-transformer29.3
neural-machine-translation-with-adequacy28.99
googles-neural-machine-translation-system26.3
flowseq-non-autoregressive-conditional23.14
depthwise-separable-convolutions-for-neural26.1
adaptively-sparse-transformers25.89--
lessons-on-parameter-sharing-across-layers-in35.14--
effective-approaches-to-attention-based14.0
non-autoregressive-neural-machine-translation-119.17
unsupervised-statistical-machine-translation14.08
weighted-transformer-network-for-machine28.9
advaug-robust-adversarial-augmentation-for-128.08--
convolutional-sequence-to-sequence-learning26.454G
non-autoregressive-translation-with27.06--
synthesizer-rethinking-self-attention-in28.47
omninet-omnidirectional-representations-from29.8
the-evolved-transformer28.42488G-
accelerating-neural-transformer-via-an25.91--
pay-less-attention-with-lightweight-and28.9--
learning-to-encode-position-for-transformer29.2
r-drop-regularized-dropout-for-neural30.9149G
advaug-robust-adversarial-augmentation-for-128.58--
flowseq-non-autoregressive-conditional23.64
self-attention-with-relative-position29.2--
pay-less-attention-with-lightweight-and29.7--
time-aware-large-kernel-convolutions29.6--
exploring-the-limits-of-transfer-learning32.1--
resmlp-feedforward-networks-for-image26.8--
bert-mbert-or-bibert-a-study-on31.26--
mega-moving-average-equipped-gated-attention29.01--
dense-information-flow-for-neural-machine25.52
mask-attention-networks-rethinking-and30.4
the-best-of-both-worlds-combining-recent28.544G2.81G
incorporating-bert-into-neural-machine-130.75--
fast-and-simple-mixture-of-softmaxes-with-bpe29.6
hat-hardware-aware-transformers-for-efficient28.4--
flowseq-non-autoregressive-conditional20.85
deep-recurrent-models-with-fast-forward20.7119G
understanding-back-translation-at-scale35.0146G
bi-simcut-a-simple-strategy-for-boosting-130.56--
outrageously-large-neural-networks-the26.0324G
synchronous-bidirectional-neural-machine29.21
incorporating-a-local-translation-mechanism25.20
depth-growing-for-neural-machine-translation30.0724G
neural-semantic-encoders17.9