Language Modelling On Enwiki8

评估指标

Bit per Character (BPC)
Number of params

评测结果

各个模型在此基准测试上的表现结果

模型名称
Bit per Character (BPC)
Number of params
Paper TitleRepository
64-layer Character Transformer Model1.1144MCharacter-Level Language Modeling with Deeper Self-Attention-
Transformer-XL (12 layers)1.0641MTransformer-XL: Attentive Language Models Beyond a Fixed-Length Context-
Longformer (12 layers, h=512)1.0041MLongformer: The Long-Document Transformer-
SHA-LSTM (4 layers, h=1024, no attention head)1.3351MSingle Headed Attention RNN: Stop Thinking With Your Head-
Transformer-LS (large)0.97110MLong-Short Transformer: Efficient Transformers for Language and Vision-
Large mLSTM1.2446MMultiplicative LSTM for sequence modelling-
Hypernetworks1.3427MHyperNetworks-
Feedback Transformer0.9677MAddressing Some Limitations of Transformers with Feedback Memory-
Transformer (12 layers, 8k adaptive span)1.0239MAdaptive Attention Span in Transformers-
GPT-2 (48 layers, h=1600)0.931542MLanguage Models are Unsupervised Multitask Learners-
Transformer-XL (24 layers)0.99277MTransformer-XL: Attentive Language Models Beyond a Fixed-Length Context-
Cluster-Former (#C=512)1.22-Cluster-Former: Clustering-based Sparse Transformer for Long-Range Dependency Encoding-
Transformer-XL (18 layers)1.0388MTransformer-XL: Attentive Language Models Beyond a Fixed-Length Context-
Compressive Transformer (24 layers)0.97277MCompressive Transformers for Long-Range Sequence Modelling-
LSTM (7 layers)1.67-Generating Sequences With Recurrent Neural Networks-
Transformer (24 layers, 8k adaptive span)0.98209MAdaptive Attention Span in Transformers-
Focus0.94022MFocus Your Attention (with Adaptive IIR Filters)-
Transformer-XL (24 layers, RMS dynamic eval, decay)0.940277MDynamic Evaluation of Transformer Language Models-
LN HM-LSTM1.3235MHierarchical Multiscale Recurrent Neural Networks-
Expire-Span (24 layers)0.95208MNot All Memories are Created Equal: Learning to Forget by Expiring-
0 of 42 row(s) selected.
Language Modelling On Enwiki8 | SOTA | HyperAI超神经