Large RHN | 1.27 | 46M | Recurrent Highway Networks | - |
Large FS-LSTM-4 | 1.245 | 47M | Fast-Slow Recurrent Neural Networks | - |
Transformer-XL + RMS dynamic eval | 0.94 | 277M | Dynamic Evaluation of Transformer Language Models | - |
Large mLSTM +emb +WN +VD | 1.24 | 46M | Multiplicative LSTM for sequence modelling | - |
64-layer Character Transformer Model | 1.06 | 235M | Character-Level Language Modeling with Deeper Self-Attention | - |
Longformer Small | 1.00 | 41M | Longformer: The Long-Document Transformer | - |
12-layer Character Transformer Model | 1.11 | 44M | Character-Level Language Modeling with Deeper Self-Attention | - |
mLSTM + dynamic eval | 1.08 | 46M | Dynamic Evaluation of Neural Sequence Models | - |
Longformer Large | 0.99 | 102M | Longformer: The Long-Document Transformer | - |
RHN - depth 5 [zilly2016recurrent] | 1.31 | - | Recurrent Highway Networks | - |
Mogrifier LSTM + dynamic eval | 0.988 | 96M | Mogrifier LSTM | - |