Language Modelling On One Billion Word

评估指标

Number of params

PPL

评测结果

各个模型在此基准测试上的表现结果

			Paper Title
Sparse Non-Negative	33B	52.9	Skip-gram Language Modeling Using Sparse Non-negative Matrix Probability Estimation
RNN-1024 + 9 Gram	20B	51.3	One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling
GPT-2	1.54B	42.16	Language Models are Unsupervised Multitask Learners
BIG G-LSTM-2	-	36.0	Factorization tricks for LSTM networks
Low-Budget MoE	5B	34.1	Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
GCNN-14 bottleneck	-	31.9	Language Modeling with Gated Convolutional Networks
LSTM-8192-1024	1.8B	30.6	Exploring the Limits of Language Modeling
LSTM-8192-1024 + CNN Input	1.04B	30.0	Exploring the Limits of Language Modeling
Evolved Transformer Big	-	28.6	The Evolved Transformer
High-Budget MoE	5B	28.0	Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
DynamicConv	0.34B	26.67	Pay Less Attention with Lightweight and Dynamic Convolutions
SRU++	328M	25.1	When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute
Cohere Large	-	25.06	-
Mesh Tensorflow	4.9B	24.0	Mesh-TensorFlow: Deep Learning for Supercomputers
Adaptive Input Large	0.46B	23.91	Adaptive Input Representations for Neural Language Modeling
10 LSTM+CNN inputs + SNM10-SKIP (ensemble)	43B	23.7	Exploring the Limits of Language Modeling
Transformer-XL Base	0.46B	23.5	Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
SRU++ Large	465M	23.5	When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute
Adaptive Input Very Large	1.0B	23.02	Adaptive Input Representations for Neural Language Modeling
MDLM	110M	23.00	Simple and Effective Masked Diffusion Language Models

0 of 27 row(s) selected.

Command Palette

Language Modelling On One Billion Word

评估指标

评测结果