Language Modelling On One Billion Word

Metriken

Number of params

PPL

Ergebnisse

Leistungsergebnisse verschiedener Modelle zu diesem Benchmark

Modellname	Number of params	PPL	Paper Title	Repository
OmniNetT (Large)	100M	21.5	OmniNet: Omnidirectional Representations from Transformers
LSTM-8192-1024 + CNN Input	1.04B	30.0	Exploring the Limits of Language Modeling
Cohere Large	-	25.06	-	-
Transformer-XL Large	0.8B	21.8	Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Transformer-XL Base	0.46B	23.5	Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Adaptive Input Large	0.46B	23.91	Adaptive Input Representations for Neural Language Modeling
DynamicConv	0.34B	26.67	Pay Less Attention with Lightweight and Dynamic Convolutions
Adaptive Input Very Large	1.0B	23.02	Adaptive Input Representations for Neural Language Modeling
RNN-1024 + 9 Gram	20B	51.3	One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling
OmniNetB (Large)	-	22	OmniNet: Omnidirectional Representations from Transformers
GPT-2	1.54B	42.16	Language Models are Unsupervised Multitask Learners	-
Evolved Transformer Big	-	28.6	The Evolved Transformer
OmniNetP (Large)	100M	21.6	OmniNet: Omnidirectional Representations from Transformers
Mesh Tensorflow	4.9B	24.0	Mesh-TensorFlow: Deep Learning for Supercomputers
SRU++ Large	465M	23.5	When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute
Low-Budget MoE	5B	34.1	Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
BIG G-LSTM-2	-	36.0	Factorization tricks for LSTM networks
LSTM-8192-1024	1.8B	30.6	Exploring the Limits of Language Modeling
GCNN-14 bottleneck	-	31.9	Language Modeling with Gated Convolutional Networks
Sparse Non-Negative	33B	52.9	Skip-gram Language Modeling Using Sparse Non-negative Matrix Probability Estimation	-

0 of 27 row(s) selected.