HyperAI

Language Modelling On Penn Treebank Word

Métriques

Params
Test perplexity
Validation perplexity

Résultats

Résultats de performance de divers modèles sur ce benchmark

Tableau comparatif
Nom du modèleParamsTest perplexityValidation perplexity
gradual-learning-of-recurrent-neural-networks26M46.3446.64
tying-word-vectors-and-word-classifiers-a-66.068.1
partially-shuffling-the-training-data-to-122M53.9255.89
partially-shuffling-the-training-data-to-123M52.053.79
a-theoretically-grounded-application-of-79.781.9
improving-neural-language-modeling-via22M46.0146.63
direct-output-connection-for-a-high-rank185M47.1748.63
pushing-the-bounds-of-dropout24M55.357.1
neural-architecture-search-with-reinforcement25M64.0-
trellis-networks-for-sequence-modeling-54.19-
mogrifier-lstm24M44.944.8
deep-equilibrium-models24M57.1-
an-empirical-evaluation-of-generic-78.93-
seq-u-net-a-one-dimensional-causal-u-net-for14.7M108.47-
dynamic-evaluation-of-neural-sequence-models24M51.151.6
recurrent-highway-networks23M65.467.9
frage-frequency-agnostic-word-representation22M46.5447.38
regularizing-and-optimizing-lstm-language24M52.853.9
improved-language-modeling-by-decoding-the22M47.348.0
seq-u-net-a-one-dimensional-causal-u-net-for14.9M107.95-
direct-output-connection-for-a-high-rank23M52.3854.12
deep-independently-recurrent-neural-network-50.97-
deep-independently-recurrent-neural-network-56.37-
regularizing-and-optimizing-lstm-language24M57.360.0
recurrent-neural-network-regularization-78.482.2
transformer-xl-attentive-language-models24M54.5556.72
fraternal-dropout24M56.858.9
learning-associative-inference-using-fast-124M54.4856.76
r-transformer-recurrent-neural-network-84.38-
efficient-neural-architecture-search-via-124M 58.660.8
an-empirical-evaluation-of-generic-92.48-
breaking-the-softmax-bottleneck-a-high-rank22M47.6948.33
190409408395M31.336.1
autodropout-learning-dropout-patterns-to-54.958.1
breaking-the-softmax-bottleneck-a-high-rank22M54.4456.54
deep-residual-output-layers-for-neural24M49.449.5
darts-differentiable-architecture-search23M56.158.3
a-theoretically-grounded-application-of-75.277.9
advancing-state-of-the-art-in-language-47.3148.92
language-models-are-few-shot-learners175000M20.5-
language-models-are-unsupervised-multitask1542M35.76-
deep-residual-output-layers-for-neural24M55.758.2
recurrent-neural-network-regularization-82.786.2