HyperAI
الرئيسية
الأخبار
أحدث الأوراق البحثية
الدروس
مجموعات البيانات
الموسوعة
SOTA
نماذج LLM
لوحة الأداء GPU
الفعاليات
البحث
حول
العربية
HyperAI
Toggle sidebar
البحث في الموقع...
⌘
K
الرئيسية
SOTA
Language Modelling
Language Modelling On One Billion Word
Language Modelling On One Billion Word
المقاييس
Number of params
PPL
النتائج
نتائج أداء النماذج المختلفة على هذا المعيار القياسي
Columns
اسم النموذج
Number of params
PPL
Paper Title
Repository
OmniNetT (Large)
100M
21.5
OmniNet: Omnidirectional Representations from Transformers
LSTM-8192-1024 + CNN Input
1.04B
30.0
Exploring the Limits of Language Modeling
Cohere Large
-
25.06
-
-
Transformer-XL Large
0.8B
21.8
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Transformer-XL Base
0.46B
23.5
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Adaptive Input Large
0.46B
23.91
Adaptive Input Representations for Neural Language Modeling
DynamicConv
0.34B
26.67
Pay Less Attention with Lightweight and Dynamic Convolutions
Adaptive Input Very Large
1.0B
23.02
Adaptive Input Representations for Neural Language Modeling
RNN-1024 + 9 Gram
20B
51.3
One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling
OmniNetB (Large)
-
22
OmniNet: Omnidirectional Representations from Transformers
GPT-2
1.54B
42.16
Language Models are Unsupervised Multitask Learners
-
Evolved Transformer Big
-
28.6
The Evolved Transformer
OmniNetP (Large)
100M
21.6
OmniNet: Omnidirectional Representations from Transformers
Mesh Tensorflow
4.9B
24.0
Mesh-TensorFlow: Deep Learning for Supercomputers
SRU++ Large
465M
23.5
When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute
Low-Budget MoE
5B
34.1
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
BIG G-LSTM-2
-
36.0
Factorization tricks for LSTM networks
LSTM-8192-1024
1.8B
30.6
Exploring the Limits of Language Modeling
GCNN-14 bottleneck
-
31.9
Language Modeling with Gated Convolutional Networks
Sparse Non-Negative
33B
52.9
Skip-gram Language Modeling Using Sparse Non-negative Matrix Probability Estimation
-
0 of 27 row(s) selected.
Previous
Next