HyperAI
HyperAI
Home
Console
Docs
News
Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
Terms of Service
Privacy Policy
English
HyperAI
HyperAI
Toggle Sidebar
Search the site…
⌘
K
Command Palette
Search for a command to run...
Console
Home
SOTA
Language Modelling
Language Modelling On Wikitext 2
Language Modelling On Wikitext 2
Metrics
Number of params
Test perplexity
Validation perplexity
Results
Performance results of various models on this benchmark
Columns
Model Name
Number of params
Test perplexity
Validation perplexity
Paper Title
OPT-175B (50% Sparsity)
-
234.77
-
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
Grave et al. (2016) - LSTM
-
99.3
-
Improving Neural Language Models with a Continuous Cache
Inan et al. (2016) - Variational LSTM (tied) (h=650)
-
87.7
92.3
Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling
Inan et al. (2016) - Variational LSTM (tied) (h=650) + augmented loss
-
87.0
91.5
Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling
EGRU
-
68.9
-
Efficient recurrent architectures through activity sparsity and sparse back-propagation through time
Grave et al. (2016) - LSTM + continuous cache pointer
-
68.9
-
Improving Neural Language Models with a Continuous Cache
Melis et al. (2017) - 1-layer LSTM (tied)
24M
65.9
69.3
On the State of the Art of Evaluation in Neural Language Models
AWD-LSTM
33M
65.8
68.6
Regularizing and Optimizing LSTM Language Models
AWD-LSTM + ATOI
33M
64.73
67.47
Alleviating Sequence Information Loss with Data Overlapping and Prime Batch Sizes
AWD-LSTM 3-layer with Fraternal dropout
34M
64.1
66.8
Fraternal Dropout
AWD-LSTM-DRILL
34M
61.9
64.9
Deep Residual Output Layers for Neural Language Generation
AWD-FWM Schlag et al. (2020)
37M
61.65
54.48
Learning Associative Inference Using Fast Weight Memory
AWD-LSTM-MoS
35M
61.45
63.88
Breaking the Softmax Bottleneck: A High-Rank RNN Language Model
AWD-LSTM-MoS + Partial Shuffle
35M
59.98
62.38
Partially Shuffling the Training Data to Improve Language Models
AWD-LSTM-DOC
37M
58.03
60.29
Direct Output Connection for a High-Rank Language Model
AWD-LSTM-DOC + Partial Shuffle
37M
57.85
60.16
Partially Shuffling the Training Data to Improve Language Models
Mogrifier LSTM
35M
55.1
57.3
Mogrifier LSTM
Ensemble of All
-
53.73
55.4
Advancing State of the Art in Language Modeling
AWD-LSTM-DOC x5
185M
53.09
54.19
Direct Output Connection for a High-Rank Language Model
AWD-LSTM + continuous cache pointer
33M
52.0
53.8
Regularizing and Optimizing LSTM Language Models
0 of 38 row(s) selected.
Previous
Next
Language Modelling On Wikitext 2 | SOTA | HyperAI