HyperAI
HyperAI超神経
ホーム
ニュース
最新論文
チュートリアル
データセット
百科事典
SOTA
LLMモデル
GPU ランキング
学会
検索
サイトについて
日本語
HyperAI
HyperAI超神経
Toggle sidebar
サイトを検索…
⌘
K
ホーム
SOTA
言語モデル
Language Modelling On Wikitext 103
Language Modelling On Wikitext 103
評価指標
Number of params
Test perplexity
評価結果
このベンチマークにおける各モデルのパフォーマンス結果
Columns
モデル名
Number of params
Test perplexity
Paper Title
Repository
Transformer-XL Large + Phrase Induction
257M
17.4
Improving Neural Language Models by Segmenting, Attending, and Predicting the Future
-
AWD-LSTM-MoS + ATOI
-
32.85
Alleviating Sequence Information Loss with Data Overlapping and Prime Batch Sizes
-
LSTM (Hebbian)
-
34.3
Fast Parametric Learning with Activation Memorization
-
Reformer 125M
-
26.0
Reformer: The Efficient Transformer
-
LSTM
-
-
How much complexity does an RNN architecture need to learn syntax-sensitive dependencies?
-
GCNN-8
-
44.9
Language Modeling with Gated Convolutional Networks
-
Transformer-XL Large
257M
18.3
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
-
GRU
-
-
How much complexity does an RNN architecture need to learn syntax-sensitive dependencies?
-
Subformer
96M
20.39
Subformer: A Parameter Reduced Transformer
-
Routing Transformer
-
15.8
Efficient Content-Based Sparse Attention with Routing Transformers
-
SRU++ Base
148M
18.3
When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute
-
SRU++ Large
234M
17.1
When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute
-
DIFFQ (λ=1, g=16)
-
18.0
Differentiable Model Compression via Pseudo Quantization Noise
-
Transformer+SSA+Self-ensemble
-
17.18
The Information Pathways Hypothesis: Transformers are Dynamic Self-Ensembles
-
GPT-2 Large
774M
22.05
Language Models are Unsupervised Multitask Learners
-
Primal.+Trans.
-
31.0
Primal-Attention: Self-attention through Asymmetric Kernel SVD in Primal Representation
-
Staged Training
247M
17.56
Shortformer: Better Language Modeling using Shorter Inputs
-
Hybrid H3 (355M)
355M
16.9
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
-
Transformer-XL (RMS dynamic eval)
257M
16.4
Dynamic Evaluation of Transformer Language Models
-
Decay RNN
-
-
How much complexity does an RNN architecture need to learn syntax-sensitive dependencies?
-
0 of 89 row(s) selected.
Previous
Next