HyperAI超神经

Language Modelling On Enwiki8

评估指标

Bit per Character (BPC)
Number of params

评测结果

各个模型在此基准测试上的表现结果

模型名称
Bit per Character (BPC)
Number of params
Paper TitleRepository
64-layer Character Transformer Model1.1144MCharacter-Level Language Modeling with Deeper Self-Attention
Transformer-XL (12 layers)1.0641MTransformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Longformer (12 layers, h=512)1.0041MLongformer: The Long-Document Transformer
SHA-LSTM (4 layers, h=1024, no attention head)1.3351MSingle Headed Attention RNN: Stop Thinking With Your Head
Transformer-LS (large)0.97110MLong-Short Transformer: Efficient Transformers for Language and Vision
Large mLSTM1.2446MMultiplicative LSTM for sequence modelling-
Hypernetworks1.3427MHyperNetworks
Feedback Transformer0.9677MAddressing Some Limitations of Transformers with Feedback Memory
Transformer (12 layers, 8k adaptive span)1.0239MAdaptive Attention Span in Transformers
GPT-2 (48 layers, h=1600)0.931542MLanguage Models are Unsupervised Multitask Learners-
Transformer-XL (24 layers)0.99277MTransformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Cluster-Former (#C=512)1.22-Cluster-Former: Clustering-based Sparse Transformer for Long-Range Dependency Encoding-
Transformer-XL (18 layers)1.0388MTransformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Compressive Transformer (24 layers)0.97277MCompressive Transformers for Long-Range Sequence Modelling
LSTM (7 layers)1.67-Generating Sequences With Recurrent Neural Networks
Transformer (24 layers, 8k adaptive span)0.98209MAdaptive Attention Span in Transformers
Focus0.94022MFocus Your Attention (with Adaptive IIR Filters)-
Transformer-XL (24 layers, RMS dynamic eval, decay)0.940277MDynamic Evaluation of Transformer Language Models
LN HM-LSTM1.3235MHierarchical Multiscale Recurrent Neural Networks
Expire-Span (24 layers)0.95208MNot All Memories are Created Equal: Learning to Forget by Expiring-
0 of 42 row(s) selected.