HyperAI초신경

Language Modelling On Wikitext 103

평가 지표

Number of params
Test perplexity

평가 결과

이 벤치마크에서 각 모델의 성능 결과

모델 이름
Number of params
Test perplexity
Paper TitleRepository
Transformer-XL Large + Phrase Induction257M17.4Improving Neural Language Models by Segmenting, Attending, and Predicting the Future
AWD-LSTM-MoS + ATOI-32.85Alleviating Sequence Information Loss with Data Overlapping and Prime Batch Sizes
LSTM (Hebbian)-34.3Fast Parametric Learning with Activation Memorization-
Reformer 125M-26.0Reformer: The Efficient Transformer
LSTM--How much complexity does an RNN architecture need to learn syntax-sensitive dependencies?
GCNN-8-44.9Language Modeling with Gated Convolutional Networks
Transformer-XL Large257M18.3Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
GRU--How much complexity does an RNN architecture need to learn syntax-sensitive dependencies?
Subformer96M20.39Subformer: A Parameter Reduced Transformer-
Routing Transformer-15.8Efficient Content-Based Sparse Attention with Routing Transformers
SRU++ Base148M18.3When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute
SRU++ Large234M17.1When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute
DIFFQ (λ=1, g=16)-18.0Differentiable Model Compression via Pseudo Quantization Noise
Transformer+SSA+Self-ensemble-17.18The Information Pathways Hypothesis: Transformers are Dynamic Self-Ensembles
GPT-2 Large774M22.05Language Models are Unsupervised Multitask Learners-
Primal.+Trans.-31.0Primal-Attention: Self-attention through Asymmetric Kernel SVD in Primal Representation
Staged Training247M17.56Shortformer: Better Language Modeling using Shorter Inputs
Hybrid H3 (355M)355M16.9Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Transformer-XL (RMS dynamic eval)257M16.4Dynamic Evaluation of Transformer Language Models
Decay RNN--How much complexity does an RNN architecture need to learn syntax-sensitive dependencies?
0 of 89 row(s) selected.