Language Modelling On Wikitext 103

評価指標

Number of params
Test perplexity

評価結果

このベンチマークにおける各モデルのパフォーマンス結果

モデル名
Number of params
Test perplexity
Paper TitleRepository
Transformer-XL Large + Phrase Induction257M17.4Improving Neural Language Models by Segmenting, Attending, and Predicting the Future-
AWD-LSTM-MoS + ATOI-32.85Alleviating Sequence Information Loss with Data Overlapping and Prime Batch Sizes-
LSTM (Hebbian)-34.3Fast Parametric Learning with Activation Memorization-
Reformer 125M-26.0Reformer: The Efficient Transformer-
LSTM--How much complexity does an RNN architecture need to learn syntax-sensitive dependencies?-
GCNN-8-44.9Language Modeling with Gated Convolutional Networks-
Transformer-XL Large257M18.3Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context-
GRU--How much complexity does an RNN architecture need to learn syntax-sensitive dependencies?-
Subformer96M20.39Subformer: A Parameter Reduced Transformer-
Routing Transformer-15.8Efficient Content-Based Sparse Attention with Routing Transformers-
SRU++ Base148M18.3When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute-
SRU++ Large234M17.1When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute-
DIFFQ (λ=1, g=16)-18.0Differentiable Model Compression via Pseudo Quantization Noise-
Transformer+SSA+Self-ensemble-17.18The Information Pathways Hypothesis: Transformers are Dynamic Self-Ensembles-
GPT-2 Large774M22.05Language Models are Unsupervised Multitask Learners-
Primal.+Trans.-31.0Primal-Attention: Self-attention through Asymmetric Kernel SVD in Primal Representation-
Staged Training247M17.56Shortformer: Better Language Modeling using Shorter Inputs-
Hybrid H3 (355M)355M16.9Hungry Hungry Hippos: Towards Language Modeling with State Space Models-
Transformer-XL (RMS dynamic eval)257M16.4Dynamic Evaluation of Transformer Language Models-
Decay RNN--How much complexity does an RNN architecture need to learn syntax-sensitive dependencies?-
0 of 89 row(s) selected.