FLAN 137B (few-shot, k=16) | 72.8 | Finetuned Language Models Are Zero-Shot Learners | |
Unified QA 406M (fine-tuned) | 73.3 | UnifiedQA: Crossing Format Boundaries With a Single QA System | |
Switch Transformer 9B (0-shot) | 53.4 | Efficient Language Modeling with Sparse all-MLP | - |
GPT-3 Large 760M (0-shot) | 57.4 | Language Models are Few-Shot Learners | |
phi-1.5-web 1.3B (zero-shot) | 74.0 | Textbooks Are All You Need II: phi-1.5 technical report | |
Base Layers 10B (0-shot) | 51 | Efficient Language Modeling with Sparse all-MLP | - |