Hybrid H3 125M (0-shot, rank classification) | 51.4 | Hungry Hungry Hippos: Towards Language Modeling with State Space Models | |
ST-MoE-32B 269B (fine-tuned) | 77.7 | ST-MoE: Designing Stable and Transferable Sparse Expert Models | - |
PaLM 540B (finetuned) | 78.8 | PaLM: Scaling Language Modeling with Pathways | |
Hybrid H3 125M (3-shot, logit scoring) | 49.1 | Hungry Hungry Hippos: Towards Language Modeling with State Space Models | |
GPT-3 175B (few-shot, k=32) | 49.4 | Language Models are Few-Shot Learners | |
Hybrid H3 125M (0-shot, logit scoring) | 51.4 | Hungry Hungry Hippos: Towards Language Modeling with State Space Models | |
UL2 20B (fine-tuned) | 77.3 | UL2: Unifying Language Learning Paradigms | |