HyperAI

Language Modelling On Enwiki8

المقاييس

Bit per Character (BPC)
Number of params

النتائج

نتائج أداء النماذج المختلفة على هذا المعيار القياسي

اسم النموذج
Bit per Character (BPC)
Number of params
Paper TitleRepository
64-layer Character Transformer Model1.1144MCharacter-Level Language Modeling with Deeper Self-Attention
Transformer-XL (12 layers)1.0641MTransformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Longformer (12 layers, h=512)1.0041MLongformer: The Long-Document Transformer
SHA-LSTM (4 layers, h=1024, no attention head)1.3351MSingle Headed Attention RNN: Stop Thinking With Your Head
Transformer-LS (large)0.97110MLong-Short Transformer: Efficient Transformers for Language and Vision
Large mLSTM1.2446MMultiplicative LSTM for sequence modelling-
Hypernetworks1.3427MHyperNetworks
Feedback Transformer0.9677MAddressing Some Limitations of Transformers with Feedback Memory
Transformer (12 layers, 8k adaptive span)1.0239MAdaptive Attention Span in Transformers
GPT-2 (48 layers, h=1600)0.931542MLanguage Models are Unsupervised Multitask Learners-
Transformer-XL (24 layers)0.99277MTransformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Cluster-Former (#C=512)1.22-Cluster-Former: Clustering-based Sparse Transformer for Long-Range Dependency Encoding-
Transformer-XL (18 layers)1.0388MTransformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Compressive Transformer (24 layers)0.97277MCompressive Transformers for Long-Range Sequence Modelling
LSTM (7 layers)1.67-Generating Sequences With Recurrent Neural Networks
Transformer (24 layers, 8k adaptive span)0.98209MAdaptive Attention Span in Transformers
Focus0.94022MFocus Your Attention (with Adaptive IIR Filters)-
Transformer-XL (24 layers, RMS dynamic eval, decay)0.940277MDynamic Evaluation of Transformer Language Models
LN HM-LSTM1.3235MHierarchical Multiscale Recurrent Neural Networks
Expire-Span (24 layers)0.95208MNot All Memories are Created Equal: Learning to Forget by Expiring-
0 of 42 row(s) selected.