HyperAI초신경

Language Modelling On Wikitext 103

평가 지표

Number of params
Test perplexity

평가 결과

이 벤치마크에서 각 모델의 성능 결과

비교 표
모델 이름Number of paramsTest perplexity
improving-neural-language-models-by257M17.4
alleviating-sequence-information-loss-with-32.85
fast-parametric-learning-with-activation-34.3
reformer-the-efficient-transformer-1-26.0
how-much-complexity-does-an-rnn-architecture--
language-modeling-with-gated-convolutional-44.9
transformer-xl-attentive-language-models257M18.3
how-much-complexity-does-an-rnn-architecture--
subformer-a-parameter-reduced-transformer96M20.39
efficient-content-based-sparse-attention-with-1-15.8
when-attention-meets-fast-recurrence-training148M18.3
when-attention-meets-fast-recurrence-training234M17.1
differentiable-model-compression-via-pseudo-18.0
the-information-pathways-hypothesis-17.18
language-models-are-unsupervised-multitask774M22.05
primal-attention-self-attention-through-31.0
shortformer-better-language-modeling-using247M17.56
hungry-hungry-hippos-towards-language355M16.9
dynamic-evaluation-of-transformer-language257M16.4
how-much-complexity-does-an-rnn-architecture--
hungry-hungry-hippos-towards-language-18.5
generalization-through-memorization-nearest247M15.79
fast-parametric-learning-with-activation-29.7
190409408395M20.4
finetuning-pretrained-transformers-into-rnns-19.6
improving-neural-language-modeling-via-28.0
an-analysis-of-neural-language-modeling-at151M33.0
on-the-adequacy-of-untuned-warmup-for--
deep-equilibrium-models180M29.0
transformer-xl-attentive-language-models151M24.0
rethinking-attention-with-performers-26.8
adaptive-input-representations-for-neural247M18.70
accessing-higher-level-representations-in139M18.2
augmenting-self-attention-with-persistent133M20.6
all-nlp-tasks-are-generation-tasks-a-general10000M12.22
improving-neural-language-models-with-a-44.8
infty-former-infinite-memory-transformer-16.64
gateloop-fully-data-controlled-linear125M13.4
improving-neural-language-models-with-a-40.8
transformers-are-rnns-fast-autoregressive-25.6
general-purpose-long-context-autoregressive-18.4
hungry-hungry-hippos-towards-language1300M12.5
efficiently-modeling-long-sequences-with-1249M21.28
infty-former-infinite-memory-transformer-24.22
language-models-are-unsupervised-multitask124M37.50
language-models-are-unsupervised-multitask1542M17.48
dynamic-evaluation-of-transformer-language257M17.0
you-can-t-pick-your-neighbors-or-can-you-when247M15.5
random-feature-attention-1-30.5
infty-former-infinite-memory-transformer-16.61
the-information-pathways-hypothesis-17.60
language-models-are-unsupervised-multitask355M26.37
hyena-hierarchy-towards-larger-convolutional-18.6
improving-transformer-models-by-reordering247M17.96
an-empirical-evaluation-of-generic-45.19
language-modeling-with-gated-convolutional-37.2
advancing-state-of-the-art-in-language-13.29
time-aware-large-kernel-convolutions240M23.3
segabert-pre-training-of-segment-aware-bert257M17.1
convolutional-sequence-modeling-revisited-45.2
megatron-lm-training-multi-billion-parameter8300M10.81
improving-neural-language-models-with-a-48.7
pay-attention-when-required-22.7
compressive-transformers-for-long-range-1-17.1
deep-equilibrium-models110M23.2
mega-moving-average-equipped-gated-attention252M18.07
fnetar-mixing-tokens-with-autoregressive144.4M25.81
pay-attention-when-required-18.4
generalization-through-memorization-nearest247M16.12
random-feature-attention-1-23.5
fast-parametric-learning-with-activation-36.4
accessing-higher-level-representations-in44M22.4
delight-very-deep-and-light-weight99M24.14
hungry-hungry-hippos-towards-language2700M10.6
memory-efficient-stochastic-methods-for122M22.91
relational-recurrent-neural-networks-31.6
all-nlp-tasks-are-generation-tasks-a-general10000M11.33
improving-language-models-by-retrieving-from7532M2.4
deep-equilibrium-models138M32.4
hyena-hierarchy-towards-larger-convolutional-18.5
fast-parametric-learning-with-activation-29.2
infty-former-infinite-memory-transformer-24.22
infty-former-infinite-memory-transformer-24.22
shortformer-better-language-modeling-using247M18.15
trellis-networks-for-sequence-modeling-29.19
revisiting-simple-neural-probabilistic148M25.2
infty-former-infinite-memory-transformer-16.61
infty-former-infinite-memory-transformer-16.61
hungry-hungry-hippos-towards-language125M23.7