Language Modelling On Penn Treebank Word
Metrics
Params
Test perplexity
Validation perplexity
Results
Performance results of various models on this benchmark
Comparison Table
Model Name | Params | Test perplexity | Validation perplexity |
---|---|---|---|
gradual-learning-of-recurrent-neural-networks | 26M | 46.34 | 46.64 |
tying-word-vectors-and-word-classifiers-a | - | 66.0 | 68.1 |
partially-shuffling-the-training-data-to-1 | 22M | 53.92 | 55.89 |
partially-shuffling-the-training-data-to-1 | 23M | 52.0 | 53.79 |
a-theoretically-grounded-application-of | - | 79.7 | 81.9 |
improving-neural-language-modeling-via | 22M | 46.01 | 46.63 |
direct-output-connection-for-a-high-rank | 185M | 47.17 | 48.63 |
pushing-the-bounds-of-dropout | 24M | 55.3 | 57.1 |
neural-architecture-search-with-reinforcement | 25M | 64.0 | - |
trellis-networks-for-sequence-modeling | - | 54.19 | - |
mogrifier-lstm | 24M | 44.9 | 44.8 |
deep-equilibrium-models | 24M | 57.1 | - |
an-empirical-evaluation-of-generic | - | 78.93 | - |
seq-u-net-a-one-dimensional-causal-u-net-for | 14.7M | 108.47 | - |
dynamic-evaluation-of-neural-sequence-models | 24M | 51.1 | 51.6 |
recurrent-highway-networks | 23M | 65.4 | 67.9 |
frage-frequency-agnostic-word-representation | 22M | 46.54 | 47.38 |
regularizing-and-optimizing-lstm-language | 24M | 52.8 | 53.9 |
improved-language-modeling-by-decoding-the | 22M | 47.3 | 48.0 |
seq-u-net-a-one-dimensional-causal-u-net-for | 14.9M | 107.95 | - |
direct-output-connection-for-a-high-rank | 23M | 52.38 | 54.12 |
deep-independently-recurrent-neural-network | - | 50.97 | - |
deep-independently-recurrent-neural-network | - | 56.37 | - |
regularizing-and-optimizing-lstm-language | 24M | 57.3 | 60.0 |
recurrent-neural-network-regularization | - | 78.4 | 82.2 |
transformer-xl-attentive-language-models | 24M | 54.55 | 56.72 |
fraternal-dropout | 24M | 56.8 | 58.9 |
learning-associative-inference-using-fast-1 | 24M | 54.48 | 56.76 |
r-transformer-recurrent-neural-network | - | 84.38 | - |
efficient-neural-architecture-search-via-1 | 24M | 58.6 | 60.8 |
an-empirical-evaluation-of-generic | - | 92.48 | - |
breaking-the-softmax-bottleneck-a-high-rank | 22M | 47.69 | 48.33 |
190409408 | 395M | 31.3 | 36.1 |
autodropout-learning-dropout-patterns-to | - | 54.9 | 58.1 |
breaking-the-softmax-bottleneck-a-high-rank | 22M | 54.44 | 56.54 |
deep-residual-output-layers-for-neural | 24M | 49.4 | 49.5 |
darts-differentiable-architecture-search | 23M | 56.1 | 58.3 |
a-theoretically-grounded-application-of | - | 75.2 | 77.9 |
advancing-state-of-the-art-in-language | - | 47.31 | 48.92 |
language-models-are-few-shot-learners | 175000M | 20.5 | - |
language-models-are-unsupervised-multitask | 1542M | 35.76 | - |
deep-residual-output-layers-for-neural | 24M | 55.7 | 58.2 |
recurrent-neural-network-regularization | - | 82.7 | 86.2 |