HyperAI초신경

Machine Translation On Wmt2014 English German

평가 지표

BLEU score
Hardware Burden
Operations per network pass

평가 결과

이 벤치마크에서 각 모델의 성능 결과

비교 표
모델 이름BLEU scoreHardware BurdenOperations per network pass
improving-neural-language-modeling-via29.52
multi-branch-attentive-transformer---
advaug-robust-adversarial-augmentation-for-129.57--
incorporating-a-local-translation-mechanism27.35
flowseq-non-autoregressive-conditional22.94
very-deep-transformers-for-neural-machine30.1--
muse-parallel-multi-scale-attention-for29.9
deep-residual-output-layers-for-neural28.1
frage-frequency-agnostic-word-representation29.11
glancing-transformer-for-non-autoregressive25.21--
partialformer-modeling-part-instead-of-whole29.56--
bi-simcut-a-simple-strategy-for-boosting-130.78--
simple-recurrent-units-for-highly28.434G
모델 1420.7
19050659629.7
lite-transformer-with-long-short-range26.5--
accelerating-neural-transformer-via-an26.05--
phrase-based-neural-unsupervised-machine17.16
kermit-generative-insertion-based-modeling28.7
finetuning-pretrained-transformers-into-rnns28.7
accelerating-neural-transformer-via-an26.31--
mask-attention-networks-rethinking-and29.1
effective-approaches-to-attention-based20.9
attention-is-all-you-need27.3-330000000.0G
modeling-localness-for-self-attention29.2
edinburghs-syntax-based-systems-at-wmt-201520.7
effective-approaches-to-attention-based11.3
rethinking-perturbations-in-encoder-decoders33.89
convolutional-sequence-to-sequence-learning25.1672G
attention-is-all-you-need28.4871G2300000000.0G
sequence-level-knowledge-distillation18.5
random-feature-attention-128.2
neural-machine-translation-in-linear-time23.75
phrase-based-neural-unsupervised-machine20.23
levenshtein-transformer27.27--
universal-transformers28.9
scaling-neural-machine-translation29.39G
deterministic-non-autoregressive-neural21.54
non-autoregressive-translation-by-learning26.6--
adaptively-sparse-transformers26.93--
the-evolved-transformer29.8--
flowseq-non-autoregressive-conditional18.55
rethinking-batch-normalization-in30.1--
data-diversification-an-elegant-strategy-for30.7
phrase-based-neural-unsupervised-machine17.94
resmlp-feedforward-networks-for-image26.4--
subformer-a-parameter-reduced-transformer29.3
neural-machine-translation-with-adequacy28.99
googles-neural-machine-translation-system26.3
flowseq-non-autoregressive-conditional23.14
depthwise-separable-convolutions-for-neural26.1
adaptively-sparse-transformers25.89--
lessons-on-parameter-sharing-across-layers-in35.14--
effective-approaches-to-attention-based14.0
non-autoregressive-neural-machine-translation-119.17
unsupervised-statistical-machine-translation14.08
weighted-transformer-network-for-machine28.9
advaug-robust-adversarial-augmentation-for-128.08--
convolutional-sequence-to-sequence-learning26.454G
non-autoregressive-translation-with27.06--
synthesizer-rethinking-self-attention-in28.47
omninet-omnidirectional-representations-from29.8
the-evolved-transformer28.42488G-
accelerating-neural-transformer-via-an25.91--
pay-less-attention-with-lightweight-and28.9--
learning-to-encode-position-for-transformer29.2
r-drop-regularized-dropout-for-neural30.9149G
advaug-robust-adversarial-augmentation-for-128.58--
flowseq-non-autoregressive-conditional23.64
self-attention-with-relative-position29.2--
pay-less-attention-with-lightweight-and29.7--
time-aware-large-kernel-convolutions29.6--
exploring-the-limits-of-transfer-learning32.1--
resmlp-feedforward-networks-for-image26.8--
bert-mbert-or-bibert-a-study-on31.26--
mega-moving-average-equipped-gated-attention29.01--
dense-information-flow-for-neural-machine25.52
mask-attention-networks-rethinking-and30.4
the-best-of-both-worlds-combining-recent28.544G2.81G
incorporating-bert-into-neural-machine-130.75--
fast-and-simple-mixture-of-softmaxes-with-bpe29.6
hat-hardware-aware-transformers-for-efficient28.4--
flowseq-non-autoregressive-conditional20.85
deep-recurrent-models-with-fast-forward20.7119G
understanding-back-translation-at-scale35.0146G
bi-simcut-a-simple-strategy-for-boosting-130.56--
outrageously-large-neural-networks-the26.0324G
synchronous-bidirectional-neural-machine29.21
incorporating-a-local-translation-mechanism25.20
depth-growing-for-neural-machine-translation30.0724G
neural-semantic-encoders17.9