HyperAI

Language Modelling On One Billion Word

المقاييس

Number of params
PPL

النتائج

نتائج أداء النماذج المختلفة على هذا المعيار القياسي

اسم النموذج
Number of params
PPL
Paper TitleRepository
OmniNetT (Large)100M21.5OmniNet: Omnidirectional Representations from Transformers
LSTM-8192-1024 + CNN Input1.04B30.0Exploring the Limits of Language Modeling
Cohere Large-25.06--
Transformer-XL Large0.8B21.8Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Transformer-XL Base0.46B23.5Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Adaptive Input Large0.46B23.91Adaptive Input Representations for Neural Language Modeling
DynamicConv0.34B26.67Pay Less Attention with Lightweight and Dynamic Convolutions
Adaptive Input Very Large1.0B23.02Adaptive Input Representations for Neural Language Modeling
RNN-1024 + 9 Gram20B51.3One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling
OmniNetB (Large)-22OmniNet: Omnidirectional Representations from Transformers
GPT-21.54B42.16Language Models are Unsupervised Multitask Learners-
Evolved Transformer Big-28.6The Evolved Transformer
OmniNetP (Large)100M21.6OmniNet: Omnidirectional Representations from Transformers
Mesh Tensorflow4.9B 24.0Mesh-TensorFlow: Deep Learning for Supercomputers
SRU++ Large465M23.5When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute
Low-Budget MoE5B34.1Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
BIG G-LSTM-2-36.0Factorization tricks for LSTM networks
LSTM-8192-10241.8B30.6Exploring the Limits of Language Modeling
GCNN-14 bottleneck-31.9Language Modeling with Gated Convolutional Networks
Sparse Non-Negative33B52.9Skip-gram Language Modeling Using Sparse Non-negative Matrix Probability Estimation-
0 of 27 row(s) selected.