Language Modelling On One Billion Word
المقاييس
Number of params
PPL
النتائج
نتائج أداء النماذج المختلفة على هذا المعيار القياسي
جدول المقارنة
اسم النموذج | Number of params | PPL |
---|---|---|
omninet-omnidirectional-representations-from | 100M | 21.5 |
exploring-the-limits-of-language-modeling | 1.04B | 30.0 |
النموذج 3 | - | 25.06 |
transformer-xl-attentive-language-models | 0.8B | 21.8 |
transformer-xl-attentive-language-models | 0.46B | 23.5 |
adaptive-input-representations-for-neural | 0.46B | 23.91 |
pay-less-attention-with-lightweight-and | 0.34B | 26.67 |
adaptive-input-representations-for-neural | 1.0B | 23.02 |
one-billion-word-benchmark-for-measuring | 20B | 51.3 |
omninet-omnidirectional-representations-from | - | 22 |
language-models-are-unsupervised-multitask | 1.54B | 42.16 |
the-evolved-transformer | - | 28.6 |
omninet-omnidirectional-representations-from | 100M | 21.6 |
mesh-tensorflow-deep-learning-for | 4.9B | 24.0 |
when-attention-meets-fast-recurrence-training | 465M | 23.5 |
outrageously-large-neural-networks-the | 5B | 34.1 |
factorization-tricks-for-lstm-networks | - | 36.0 |
exploring-the-limits-of-language-modeling | 1.8B | 30.6 |
language-modeling-with-gated-convolutional | - | 31.9 |
skip-gram-language-modeling-using-sparse-non | 33B | 52.9 |
outrageously-large-neural-networks-the | 5B | 28.0 |
h-transformer-1d-fast-one-dimensional | 53M | - |
when-attention-meets-fast-recurrence-training | 328M | 25.1 |
simple-and-effective-masked-diffusion | 110M | 20.09 |
simple-and-effective-masked-diffusion | 110M | 23.00 |
h-transformer-1d-fast-one-dimensional | 144M | - |
exploring-the-limits-of-language-modeling | 43B | 23.7 |