Large RHN | 1.27 | 46M | Recurrent Highway Networks | |
Large FS-LSTM-4 | 1.245 | 47M | Fast-Slow Recurrent Neural Networks | |
Transformer-XL + RMS dynamic eval | 0.94 | 277M | Dynamic Evaluation of Transformer Language Models | |
Large mLSTM +emb +WN +VD | 1.24 | 46M | Multiplicative LSTM for sequence modelling | - |
64-layer Character Transformer Model | 1.06 | 235M | Character-Level Language Modeling with Deeper Self-Attention | |
12-layer Character Transformer Model | 1.11 | 44M | Character-Level Language Modeling with Deeper Self-Attention | |
mLSTM + dynamic eval | 1.08 | 46M | Dynamic Evaluation of Neural Sequence Models | |
Longformer Large | 0.99 | 102M | Longformer: The Long-Document Transformer | |
RHN - depth 5 [zilly2016recurrent] | 1.31 | - | Recurrent Highway Networks | |
Mogrifier LSTM + dynamic eval | 0.988 | 96M | Mogrifier LSTM | |