Language Modelling On Wikitext 2

Metrics

Number of params

Test perplexity

Validation perplexity

Results

Performance results of various models on this benchmark

				Paper Title
OPT-175B (50% Sparsity)	-	234.77	-	SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
Grave et al. (2016) - LSTM	-	99.3	-	Improving Neural Language Models with a Continuous Cache
Inan et al. (2016) - Variational LSTM (tied) (h=650)	-	87.7	92.3	Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling
Inan et al. (2016) - Variational LSTM (tied) (h=650) + augmented loss	-	87.0	91.5	Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling
EGRU	-	68.9	-	Efficient recurrent architectures through activity sparsity and sparse back-propagation through time
Grave et al. (2016) - LSTM + continuous cache pointer	-	68.9	-	Improving Neural Language Models with a Continuous Cache
Melis et al. (2017) - 1-layer LSTM (tied)	24M	65.9	69.3	On the State of the Art of Evaluation in Neural Language Models
AWD-LSTM	33M	65.8	68.6	Regularizing and Optimizing LSTM Language Models
AWD-LSTM + ATOI	33M	64.73	67.47	Alleviating Sequence Information Loss with Data Overlapping and Prime Batch Sizes
AWD-LSTM 3-layer with Fraternal dropout	34M	64.1	66.8	Fraternal Dropout
AWD-LSTM-DRILL	34M	61.9	64.9	Deep Residual Output Layers for Neural Language Generation
AWD-FWM Schlag et al. (2020)	37M	61.65	54.48	Learning Associative Inference Using Fast Weight Memory
AWD-LSTM-MoS	35M	61.45	63.88	Breaking the Softmax Bottleneck: A High-Rank RNN Language Model
AWD-LSTM-MoS + Partial Shuffle	35M	59.98	62.38	Partially Shuffling the Training Data to Improve Language Models
AWD-LSTM-DOC	37M	58.03	60.29	Direct Output Connection for a High-Rank Language Model
AWD-LSTM-DOC + Partial Shuffle	37M	57.85	60.16	Partially Shuffling the Training Data to Improve Language Models
Mogrifier LSTM	35M	55.1	57.3	Mogrifier LSTM
Ensemble of All	-	53.73	55.4	Advancing State of the Art in Language Modeling
AWD-LSTM-DOC x5	185M	53.09	54.19	Direct Output Connection for a High-Rank Language Model
AWD-LSTM + continuous cache pointer	33M	52.0	53.8	Regularizing and Optimizing LSTM Language Models

0 of 38 row(s) selected.

Command Palette

Language Modelling On Wikitext 2

Metrics

Results