HyperAI超神经

Language Modelling On Text8

评估指标

Bit per Character (BPC)

评测结果

各个模型在此基准测试上的表现结果

模型名称
Bit per Character (BPC)
Paper TitleRepository
Transformer-LS (small)1.09Long-Short Transformer: Efficient Transformers for Language and Vision
Focus0.98Focus Your Attention (with Adaptive IIR Filters)-
GPT-20.98Language Models are Unsupervised Multitask Learners-
All-attention network - 36 layers1.08Augmenting Self-attention with Persistent Memory
12L Transformer + 8K adaptive span1.11Adaptive Attention Span in Transformers
BP-Transformer - 12 Layers1.11BP-Transformer: Modelling Long-Range Context via Binary Partitioning-
12-layer Character Transformer Model1.18Character-Level Language Modeling with Deeper Self-Attention
All-attention network - 18 layers1.11Augmenting Self-attention with Persistent Memory
td-LSTM (Zhang et al., 2016)1.63Architectural Complexity Measures of Recurrent Neural Networks-
mLSTM + dynamic eval1.19Dynamic Evaluation of Neural Sequence Models
Large mLSTM +emb +WN +VD1.27Multiplicative LSTM for sequence modelling-
LayerNorm HM-LSTM1.29Hierarchical Multiscale Recurrent Neural Networks
td-LSTM-large1.49Architectural Complexity Measures of Recurrent Neural Networks-
GAM-RHN-101.157Recurrent Highway Networks with Grouped Auxiliary Memory
Unregularised mLSTM1.40Multiplicative LSTM for sequence modelling-
Transformer-XL + RMS dynamic eval + decay1.038Dynamic Evaluation of Transformer Language Models
Bipartite flows (8 flows)1.23Discrete Flows: Invertible Generative Models of Discrete Data
BFN1.41Bayesian Flow Networks
24L Transformer + 8K adaptive span1.07Adaptive Attention Span in Transformers
64-layer Character Transformer Model1.13Character-Level Language Modeling with Deeper Self-Attention
0 of 24 row(s) selected.