HyperAI
HyperAI超神経
ホーム
プラットフォーム
ドキュメント
ニュース
論文
チュートリアル
データセット
百科事典
SOTA
LLMモデル
GPU ランキング
学会
検索
サイトについて
利用規約
プライバシーポリシー
日本語
HyperAI
HyperAI超神経
Toggle Sidebar
サイトを検索…
⌘
K
Command Palette
Search for a command to run...
プラットフォーム
ホーム
SOTA
言語モデル
Language Modelling On Enwiki8
Language Modelling On Enwiki8
評価指標
Bit per Character (BPC)
Number of params
評価結果
このベンチマークにおける各モデルのパフォーマンス結果
Columns
モデル名
Bit per Character (BPC)
Number of params
Paper Title
LSTM (7 layers)
1.67
-
Generating Sequences With Recurrent Neural Networks
Hypernetworks
1.34
27M
HyperNetworks
SHA-LSTM (4 layers, h=1024, no attention head)
1.33
51M
Single Headed Attention RNN: Stop Thinking With Your Head
LN HM-LSTM
1.32
35M
Hierarchical Multiscale Recurrent Neural Networks
ByteNet
1.31
-
Neural Machine Translation in Linear Time
Recurrent Highway Networks
1.27
46M
Recurrent Highway Networks
Large FS-LSTM-4
1.25
47M
Fast-Slow Recurrent Neural Networks
Large mLSTM
1.24
46M
Multiplicative LSTM for sequence modelling
AWD-LSTM (3 layers)
1.232
47M
An Analysis of Neural Language Modeling at Multiple Scales
Cluster-Former (#C=512)
1.22
-
Cluster-Former: Clustering-based Sparse Transformer for Long-Range Dependency Encoding
LSTM
1.195
48M
Mogrifier LSTM
Mogrifier LSTM
1.146
48M
Mogrifier LSTM
64-layer Character Transformer Model
1.11
44M
Character-Level Language Modeling with Deeper Self-Attention
SHA-RNN (4 layers, h=1024, single attention head)
1.076
52M
Single Headed Attention RNN: Stop Thinking With Your Head
SHA-RNN (4 layers, h=1024, attention head per layer)
1.068
54M
Single Headed Attention RNN: Stop Thinking With Your Head
Transformer-XL (12 layers)
1.06
41M
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Transformer (64 layers)
1.06
235M
Character-Level Language Modeling with Deeper Self-Attention
Skip Cross-Head Transformer-XL
1.033
41M
Memory-efficient Stochastic methods for Memory-based Transformers
Transformer-XL (18 layers)
1.03
88M
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Transformer+SSA
1.024
-
The Information Pathways Hypothesis: Transformers are Dynamic Self-Ensembles
0 of 42 row(s) selected.
Previous
Next
Language Modelling On Enwiki8 | SOTA | HyperAI超神経