Language Modelling On Wikitext 103

Number of params

Test perplexity

Ergebnisse

Leistungsergebnisse verschiedener Modelle zu diesem Benchmark

			Paper Title
LSTM	-	48.7	Improving Neural Language Models with a Continuous Cache
Temporal CNN	-	45.2	Convolutional Sequence Modeling Revisited
TCN	-	45.19	An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
GCNN-8	-	44.9	Language Modeling with Gated Convolutional Networks
Neural cache model (size = 100)	-	44.8	Improving Neural Language Models with a Continuous Cache
Neural cache model (size = 2,000)	-	40.8	Improving Neural Language Models with a Continuous Cache
GPT-2 Small	124M	37.50	Language Models are Unsupervised Multitask Learners
GCNN-8	-	37.2	Language Modeling with Gated Convolutional Networks
LSTM	-	36.4	Fast Parametric Learning with Activation Memorization
LSTM (Hebbian)	-	34.3	Fast Parametric Learning with Activation Memorization
4 layer QRNN	151M	33.0	An Analysis of Neural Language Modeling at Multiple Scales
AWD-LSTM-MoS + ATOI	-	32.85	Alleviating Sequence Information Loss with Data Overlapping and Prime Batch Sizes
DEQ-Transformer (small)	138M	32.4	Deep Equilibrium Models
LSTM (RMC)	-	31.6	Relational recurrent neural networks
Primal.+Trans.	-	31.0	Primal-Attention: Self-attention through Asymmetric Kernel SVD in Primal Representation
Rfa-Gate-Gaussian-Stateful (Small)	-	30.5	Random Feature Attention
LSTM (Hebbian, Cache)	-	29.7	Fast Parametric Learning with Activation Memorization
LSTM (Hebbian, Cache, MbPA)	-	29.2	Fast Parametric Learning with Activation Memorization
Trellis Network	-	29.19	Trellis Networks for Sequence Modeling
DEQ-TrellisNet	180M	29.0	Deep Equilibrium Models

0 of 89 row(s) selected.