HyperAIHyperAI

Language Modelling On Wikitext 103

Metrics

Number of params
Test perplexity

Results

Performance results of various models on this benchmark

Model Name
Number of params
Test perplexity
Paper TitleRepository
Transformer-XL Large + Phrase Induction257M17.4Improving Neural Language Models by Segmenting, Attending, and Predicting the Future-
AWD-LSTM-MoS + ATOI-32.85Alleviating Sequence Information Loss with Data Overlapping and Prime Batch Sizes-
LSTM (Hebbian)-34.3Fast Parametric Learning with Activation Memorization-
Reformer 125M-26.0Reformer: The Efficient Transformer-
LSTM--How much complexity does an RNN architecture need to learn syntax-sensitive dependencies?-
GCNN-8-44.9Language Modeling with Gated Convolutional Networks-
Transformer-XL Large257M18.3Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context-
GRU--How much complexity does an RNN architecture need to learn syntax-sensitive dependencies?-
Subformer96M20.39Subformer: A Parameter Reduced Transformer-
Routing Transformer-15.8Efficient Content-Based Sparse Attention with Routing Transformers-
SRU++ Base148M18.3When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute-
SRU++ Large234M17.1When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute-
DIFFQ (λ=1, g=16)-18.0Differentiable Model Compression via Pseudo Quantization Noise-
Transformer+SSA+Self-ensemble-17.18The Information Pathways Hypothesis: Transformers are Dynamic Self-Ensembles-
GPT-2 Large774M22.05Language Models are Unsupervised Multitask Learners-
Primal.+Trans.-31.0Primal-Attention: Self-attention through Asymmetric Kernel SVD in Primal Representation-
Staged Training247M17.56Shortformer: Better Language Modeling using Shorter Inputs-
Hybrid H3 (355M)355M16.9Hungry Hungry Hippos: Towards Language Modeling with State Space Models-
Transformer-XL (RMS dynamic eval)257M16.4Dynamic Evaluation of Transformer Language Models-
Decay RNN--How much complexity does an RNN architecture need to learn syntax-sensitive dependencies?-
0 of 89 row(s) selected.