Language Modelling On Enwiki8
Métriques
Bit per Character (BPC)
Number of params
Résultats
Résultats de performance de divers modèles sur ce benchmark
Tableau comparatif
Nom du modèle | Bit per Character (BPC) | Number of params |
---|---|---|
character-level-language-modeling-with-deeper | 1.11 | 44M |
transformer-xl-attentive-language-models | 1.06 | 41M |
longformer-the-long-document-transformer | 1.00 | 41M |
single-headed-attention-rnn-stop-thinking | 1.33 | 51M |
long-short-transformer-efficient-transformers | 0.97 | 110M |
multiplicative-lstm-for-sequence-modelling | 1.24 | 46M |
hypernetworks | 1.34 | 27M |
accessing-higher-level-representations-in | 0.96 | 77M |
adaptive-attention-span-in-transformers | 1.02 | 39M |
language-models-are-unsupervised-multitask | 0.93 | 1542M |
transformer-xl-attentive-language-models | 0.99 | 277M |
cluster-former-clustering-based-sparse | 1.22 | - |
transformer-xl-attentive-language-models | 1.03 | 88M |
compressive-transformers-for-long-range-1 | 0.97 | 277M |
generating-sequences-with-recurrent-neural | 1.67 | - |
adaptive-attention-span-in-transformers | 0.98 | 209M |
2305-14952 | 0.940 | 22M |
dynamic-evaluation-of-transformer-language | 0.940 | 277M |
hierarchical-multiscale-recurrent-neural | 1.32 | 35M |
not-all-memories-are-created-equal-learning-1 | 0.95 | 208M |
memory-efficient-stochastic-methods-for | 1.033 | 41M |
when-attention-meets-fast-recurrence-training | 0.97 | 108M |
efficient-content-based-sparse-attention-with-1 | 0.99 | - |
mogrifier-lstm | 1.195 | 48M |
improving-transformer-models-by-reordering | 0.968 | 209M |
the-information-pathways-hypothesis | 1.024 | - |
190410509 | 0.99 | 95M |
augmenting-self-attention-with-persistent | - | 114M |
single-headed-attention-rnn-stop-thinking | 1.076 | 52M |
recurrent-highway-networks | 1.27 | 46M |
long-short-transformer-efficient-transformers | 0.99 | - |
augmenting-self-attention-with-persistent | 1.01 | 39M |
single-headed-attention-rnn-stop-thinking | 1.068 | 54M |
mogrifier-lstm | 1.146 | 48M |
bp-transformer-modelling-long-range-context | 1.02 | 38M |
hierarchical-transformers-are-more-efficient | 0.997 | - |
longformer-the-long-document-transformer | 0.99 | 102M |
character-level-language-modeling-with-deeper | 1.06 | 235M |
fast-slow-recurrent-neural-networks | 1.25 | 47M |
neural-machine-translation-in-linear-time | 1.31 | - |
when-attention-meets-fast-recurrence-training | 0.95 | 195M |
an-analysis-of-neural-language-modeling-at | 1.232 | 47M |