UnifiedQA 11B (fine-tuned) | 79.1 | UnifiedQA: Crossing Format Boundaries With a Single QA System | - |
UL2 20B (chain-of-thought) | 51.4 | UL2: Unifying Language Learning Paradigms | - |
STaR without Rationalization (on GPT-J) | 68.8 | STaR: Bootstrapping Reasoning With Reasoning | - |
PaLM 2 (few‑shot, CoT, SC) | 90.4 | PaLM 2 Technical Report | - |
UL2 20B (zero-shot) | 34.2 | UL2: Unifying Language Learning Paradigms | - |