HyperAI超神经

Language Modelling On One Billion Word

评估指标

Number of params
PPL

评测结果

各个模型在此基准测试上的表现结果

模型名称
Number of params
PPL
Paper TitleRepository
OmniNetT (Large)100M21.5OmniNet: Omnidirectional Representations from Transformers
LSTM-8192-1024 + CNN Input1.04B30.0Exploring the Limits of Language Modeling
Cohere Large-25.06--
Transformer-XL Large0.8B21.8Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Transformer-XL Base0.46B23.5Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Adaptive Input Large0.46B23.91Adaptive Input Representations for Neural Language Modeling
DynamicConv0.34B26.67Pay Less Attention with Lightweight and Dynamic Convolutions
Adaptive Input Very Large1.0B23.02Adaptive Input Representations for Neural Language Modeling
RNN-1024 + 9 Gram20B51.3One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling
OmniNetB (Large)-22OmniNet: Omnidirectional Representations from Transformers
GPT-21.54B42.16Language Models are Unsupervised Multitask Learners-
Evolved Transformer Big-28.6The Evolved Transformer
OmniNetP (Large)100M21.6OmniNet: Omnidirectional Representations from Transformers
Mesh Tensorflow4.9B 24.0Mesh-TensorFlow: Deep Learning for Supercomputers
SRU++ Large465M23.5When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute
Low-Budget MoE5B34.1Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
BIG G-LSTM-2-36.0Factorization tricks for LSTM networks
LSTM-8192-10241.8B30.6Exploring the Limits of Language Modeling
GCNN-14 bottleneck-31.9Language Modeling with Gated Convolutional Networks
Sparse Non-Negative33B52.9Skip-gram Language Modeling Using Sparse Non-negative Matrix Probability Estimation-
0 of 27 row(s) selected.