HyperAIHyperAI

Language Modelling On One Billion Word

المقاييس

Number of params
PPL

النتائج

نتائج أداء النماذج المختلفة على هذا المعيار القياسي

اسم النموذج
Number of params
PPL
Paper TitleRepository
OmniNetT (Large)100M21.5OmniNet: Omnidirectional Representations from Transformers-
LSTM-8192-1024 + CNN Input1.04B30.0Exploring the Limits of Language Modeling-
Cohere Large-25.06--
Transformer-XL Large0.8B21.8Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context-
Transformer-XL Base0.46B23.5Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context-
Adaptive Input Large0.46B23.91Adaptive Input Representations for Neural Language Modeling-
DynamicConv0.34B26.67Pay Less Attention with Lightweight and Dynamic Convolutions-
Adaptive Input Very Large1.0B23.02Adaptive Input Representations for Neural Language Modeling-
RNN-1024 + 9 Gram20B51.3One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling-
OmniNetB (Large)-22OmniNet: Omnidirectional Representations from Transformers-
GPT-21.54B42.16Language Models are Unsupervised Multitask Learners-
Evolved Transformer Big-28.6The Evolved Transformer-
OmniNetP (Large)100M21.6OmniNet: Omnidirectional Representations from Transformers-
Mesh Tensorflow4.9B 24.0Mesh-TensorFlow: Deep Learning for Supercomputers-
SRU++ Large465M23.5When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute-
Low-Budget MoE5B34.1Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer-
BIG G-LSTM-2-36.0Factorization tricks for LSTM networks-
LSTM-8192-10241.8B30.6Exploring the Limits of Language Modeling-
GCNN-14 bottleneck-31.9Language Modeling with Gated Convolutional Networks-
Sparse Non-Negative33B52.9Skip-gram Language Modeling Using Sparse Non-negative Matrix Probability Estimation-
0 of 27 row(s) selected.
Language Modelling On One Billion Word | SOTA | HyperAI