GPT-4 (PoT) | 52.4 | TheoremQA: A Theorem-driven Question Answering dataset | |
GPT-4 (CoT) | 43.8 | TheoremQA: A Theorem-driven Question Answering dataset | |
DART-Math-Llama3-8B-Uniform (0-shot CoT, w/o code) | 15.4 | DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving | |
GPT-3.5-turbo (PoT) | 35.6 | TheoremQA: A Theorem-driven Question Answering dataset | |
DART-Math-Llama3-70B-Uniform (0-shot CoT, w/o code) | 27.4 | DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving | |
Claude-v1 (CoT) | 24.9 | TheoremQA: A Theorem-driven Question Answering dataset | |
Claude-v1 (PoT) | 25.9 | TheoremQA: A Theorem-driven Question Answering dataset | |
DART-Math-Mistral-7B-Uniform (0-shot CoT, w/o code) | 16.4 | DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving | |
DART-Math-Llama3-70B-Prop2Diff (0-shot CoT, w/o code) | 28.2 | DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving | |
DART-Math-DSMath-7B-Prop2Diff (0-shot CoT, w/o code) | 32.2 | DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving | |
GPT-3.5-turbo (CoT) | 30.2 | TheoremQA: A Theorem-driven Question Answering dataset | |
Claude-instant (CoT) | 23.6 | TheoremQA: A Theorem-driven Question Answering dataset | |
text-davinci-003 | 22.8 | TheoremQA: A Theorem-driven Question Answering dataset | |
PaLM-2-bison (CoT) | 21.0 | TheoremQA: A Theorem-driven Question Answering dataset | |
DART-Math-Llama3-8B-Prop2Diff (0-shot CoT, w/o code) | 19.4 | DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving | |
DART-Math-Mistral-7B-Prop2Diff (0-shot CoT, w/o code) | 17.0 | DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving | |
code-davinci-002 | 23.9 | TheoremQA: A Theorem-driven Question Answering dataset | |
DART-Math-DSMath-7B-Uniform (0-shot CoT, w/o code) | 32.5 | DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving | |
PaLM-2-unicorn (CoT) | 31.8 | TheoremQA: A Theorem-driven Question Answering dataset | |