XLNet + MTL + Verifier (single model) | 81.460 | 82.664 | - | - |
FLAN 137B (prompt-tuned) | 85.1 | - | Finetuned Language Models Are Zero-Shot Learners | |
ST-MoE-L 4.1B (fine-tuned) | 88.9 | - | ST-MoE: Designing Stable and Transferable Sparse Expert Models | - |
Switch Transformer 9B | 79.9 | - | Efficient Language Modeling with Sparse all-MLP | - |
PaLM 540B (finetuned) | 94.0 | 94.6 | PaLM: Scaling Language Modeling with Pathways | |
Base Layers 10B (0-shot) | 60.7 | - | Efficient Language Modeling with Sparse all-MLP | - |
XLNet + MTL + Verifier (ensemble) | 83.090 | 83.737 | - | - |
GPT-3 175B (one-shot) | - | 90.2 | Large Language Models are Zero-Shot Reasoners | |
DCReader+BERT (single model) | 69.490 | 71.138 | - | - |
GPT-3 Large 760M (0-shot) | 82.1 | - | Language Models are Few-Shot Learners | |