Machine Translation On Wmt2014 English German

评估指标

BLEU score

Hardware Burden

Operations per network pass

评测结果

各个模型在此基准测试上的表现结果

				Paper Title
Transformer Cycle (Rev)	35.14	-	-	Lessons on Parameter Sharing across Layers in Transformers
Noisy back-translation	35.0	146G		Understanding Back-Translation at Scale
Transformer+Rep(Uni)	33.89			Rethinking Perturbations in Encoder-Decoders for Fast Training
T5-11B	32.1	-	-	Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
BiBERT	31.26	-	-	BERT, mBERT, or BiBERT? A Study on Contextualized Embeddings for Neural Machine Translation
Transformer + R-Drop	30.91	49G		R-Drop: Regularized Dropout for Neural Networks
Bi-SimCut	30.78	-	-	Bi-SimCut: A Simple Strategy for Boosting Neural Machine Translation
BERT-fused NMT	30.75	-	-	Incorporating BERT into Neural Machine Translation
Data Diversification - Transformer	30.7			Data Diversification: A Simple Strategy For Neural Machine Translation
SimCut	30.56	-	-	Bi-SimCut: A Simple Strategy for Boosting Neural Machine Translation
Mask Attention Network (big)	30.4			Mask Attention Networks: Rethinking and Strengthen Transformer
Transformer (ADMIN init)	30.1	-	-	Very Deep Transformers for Neural Machine Translation
PowerNorm (Transformer)	30.1	-	-	PowerNorm: Rethinking Batch Normalization in Transformers
Depth Growing	30.07	24G		Depth Growing for Neural Machine Translation
MUSE(Parallel Multi-scale Attention)	29.9			MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning
Evolved Transformer Big	29.8	-	-	The Evolved Transformer
OmniNetP	29.8			OmniNet: Omnidirectional Representations from Transformers
Local Joint Self-attention	29.7			Joint Source-Target Self Attention with Locality Constraints
DynamicConv	29.7	-	-	Pay Less Attention with Lightweight and Dynamic Convolutions
TaLK Convolutions	29.6	-	-	Time-aware Large Kernel Convolutions

0 of 91 row(s) selected.

Command Palette

Machine Translation On Wmt2014 English German

评估指标

评测结果