HyperAI
HyperAI
Home
Console
Docs
News
Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
Terms of Service
Privacy Policy
English
HyperAI
HyperAI
Toggle Sidebar
Search the site…
⌘
K
Command Palette
Search for a command to run...
Console
Home
SOTA
Language Modelling
Language Modelling On Wikitext 103
Language Modelling On Wikitext 103
Metrics
Number of params
Test perplexity
Results
Performance results of various models on this benchmark
Columns
Model Name
Number of params
Test perplexity
Paper Title
LSTM
-
48.7
Improving Neural Language Models with a Continuous Cache
Temporal CNN
-
45.2
Convolutional Sequence Modeling Revisited
TCN
-
45.19
An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
GCNN-8
-
44.9
Language Modeling with Gated Convolutional Networks
Neural cache model (size = 100)
-
44.8
Improving Neural Language Models with a Continuous Cache
Neural cache model (size = 2,000)
-
40.8
Improving Neural Language Models with a Continuous Cache
GPT-2 Small
124M
37.50
Language Models are Unsupervised Multitask Learners
GCNN-8
-
37.2
Language Modeling with Gated Convolutional Networks
LSTM
-
36.4
Fast Parametric Learning with Activation Memorization
LSTM (Hebbian)
-
34.3
Fast Parametric Learning with Activation Memorization
4 layer QRNN
151M
33.0
An Analysis of Neural Language Modeling at Multiple Scales
AWD-LSTM-MoS + ATOI
-
32.85
Alleviating Sequence Information Loss with Data Overlapping and Prime Batch Sizes
DEQ-Transformer (small)
138M
32.4
Deep Equilibrium Models
LSTM (RMC)
-
31.6
Relational recurrent neural networks
Primal.+Trans.
-
31.0
Primal-Attention: Self-attention through Asymmetric Kernel SVD in Primal Representation
Rfa-Gate-Gaussian-Stateful (Small)
-
30.5
Random Feature Attention
LSTM (Hebbian, Cache)
-
29.7
Fast Parametric Learning with Activation Memorization
LSTM (Hebbian, Cache, MbPA)
-
29.2
Fast Parametric Learning with Activation Memorization
Trellis Network
-
29.19
Trellis Networks for Sequence Modeling
DEQ-TrellisNet
180M
29.0
Deep Equilibrium Models
0 of 89 row(s) selected.
Previous
Next
Language Modelling On Wikitext 103 | SOTA | HyperAI