HyperAI
HyperAI
Home
Console
Docs
News
Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
Terms of Service
Privacy Policy
English
HyperAI
HyperAI
Toggle Sidebar
Search the site…
⌘
K
Command Palette
Search for a command to run...
Console
Home
SOTA
Language Modelling
Language Modelling On Penn Treebank Word
Language Modelling On Penn Treebank Word
Metrics
Params
Test perplexity
Validation perplexity
Results
Performance results of various models on this benchmark
Columns
Model Name
Params
Test perplexity
Validation perplexity
Paper Title
TCN
14.7M
108.47
-
Seq-U-Net: A One-Dimensional Causal U-Net for Efficient Sequence Modelling
Seq-U-Net
14.9M
107.95
-
Seq-U-Net: A One-Dimensional Causal U-Net for Efficient Sequence Modelling
GRU (Bai et al., 2018)
-
92.48
-
An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
R-Transformer
-
84.38
-
R-Transformer: Recurrent Neural Network Enhanced Transformer
Zaremba et al. (2014) - LSTM (medium)
-
82.7
86.2
Recurrent Neural Network Regularization
Gal & Ghahramani (2016) - Variational LSTM (medium)
-
79.7
81.9
A Theoretically Grounded Application of Dropout in Recurrent Neural Networks
LSTM (Bai et al., 2018)
-
78.93
-
An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
Zaremba et al. (2014) - LSTM (large)
-
78.4
82.2
Recurrent Neural Network Regularization
Gal & Ghahramani (2016) - Variational LSTM (large)
-
75.2
77.9
A Theoretically Grounded Application of Dropout in Recurrent Neural Networks
Inan et al. (2016) - Variational RHN
-
66.0
68.1
Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling
Recurrent highway networks
23M
65.4
67.9
Recurrent Highway Networks
NAS-RL
25M
64.0
-
Neural Architecture Search with Reinforcement Learning
Efficient NAS
24M
58.6
60.8
Efficient Neural Architecture Search via Parameter Sharing
AWD-LSTM
24M
57.3
60.0
Regularizing and Optimizing LSTM Language Models
DEQ-TrellisNet
24M
57.1
-
Deep Equilibrium Models
AWD-LSTM 3-layer with Fraternal dropout
24M
56.8
58.9
Fraternal Dropout
Dense IndRNN
-
56.37
-
Deep Independently Recurrent Neural Network (IndRNN)
Differentiable NAS
23M
56.1
58.3
DARTS: Differentiable Architecture Search
AWD-LSTM-DRILL
24M
55.7
58.2
Deep Residual Output Layers for Neural Language Generation
2-layer skip-LSTM + dropout tuning
24M
55.3
57.1
Pushing the bounds of dropout
0 of 43 row(s) selected.
Previous
Next
Language Modelling On Penn Treebank Word | SOTA | HyperAI