HyperAIHyperAI

Language Modelling On Enwiki8

Metrics

Bit per Character (BPC)
Number of params

Results

Performance results of various models on this benchmark

Model Name
Bit per Character (BPC)
Number of params
Paper TitleRepository
64-layer Character Transformer Model1.1144MCharacter-Level Language Modeling with Deeper Self-Attention-
Transformer-XL (12 layers)1.0641MTransformer-XL: Attentive Language Models Beyond a Fixed-Length Context-
Longformer (12 layers, h=512)1.0041MLongformer: The Long-Document Transformer-
SHA-LSTM (4 layers, h=1024, no attention head)1.3351MSingle Headed Attention RNN: Stop Thinking With Your Head-
Transformer-LS (large)0.97110MLong-Short Transformer: Efficient Transformers for Language and Vision-
Large mLSTM1.2446MMultiplicative LSTM for sequence modelling-
Hypernetworks1.3427MHyperNetworks-
Feedback Transformer0.9677MAddressing Some Limitations of Transformers with Feedback Memory-
Transformer (12 layers, 8k adaptive span)1.0239MAdaptive Attention Span in Transformers-
GPT-2 (48 layers, h=1600)0.931542MLanguage Models are Unsupervised Multitask Learners-
Transformer-XL (24 layers)0.99277MTransformer-XL: Attentive Language Models Beyond a Fixed-Length Context-
Cluster-Former (#C=512)1.22-Cluster-Former: Clustering-based Sparse Transformer for Long-Range Dependency Encoding-
Transformer-XL (18 layers)1.0388MTransformer-XL: Attentive Language Models Beyond a Fixed-Length Context-
Compressive Transformer (24 layers)0.97277MCompressive Transformers for Long-Range Sequence Modelling-
LSTM (7 layers)1.67-Generating Sequences With Recurrent Neural Networks-
Transformer (24 layers, 8k adaptive span)0.98209MAdaptive Attention Span in Transformers-
Focus0.94022MFocus Your Attention (with Adaptive IIR Filters)-
Transformer-XL (24 layers, RMS dynamic eval, decay)0.940277MDynamic Evaluation of Transformer Language Models-
LN HM-LSTM1.3235MHierarchical Multiscale Recurrent Neural Networks-
Expire-Span (24 layers)0.95208MNot All Memories are Created Equal: Learning to Forget by Expiring-
0 of 42 row(s) selected.