HyperAI

Natural Questions On Theoremqa

Métriques

Accuracy

Résultats

Résultats de performance de divers modèles sur ce benchmark

Nom du modèle
Accuracy
Paper TitleRepository
GPT-4 (PoT)52.4TheoremQA: A Theorem-driven Question Answering dataset
GPT-4 (CoT)43.8TheoremQA: A Theorem-driven Question Answering dataset
DART-Math-Llama3-8B-Uniform (0-shot CoT, w/o code)15.4DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
GPT-3.5-turbo (PoT)35.6TheoremQA: A Theorem-driven Question Answering dataset
DART-Math-Llama3-70B-Uniform (0-shot CoT, w/o code)27.4DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
Claude-v1 (CoT)24.9TheoremQA: A Theorem-driven Question Answering dataset
Claude-v1 (PoT)25.9TheoremQA: A Theorem-driven Question Answering dataset
DART-Math-Mistral-7B-Uniform (0-shot CoT, w/o code)16.4DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
DART-Math-Llama3-70B-Prop2Diff (0-shot CoT, w/o code)28.2DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
DART-Math-DSMath-7B-Prop2Diff (0-shot CoT, w/o code)32.2DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
GPT-3.5-turbo (CoT)30.2TheoremQA: A Theorem-driven Question Answering dataset
Claude-instant (CoT)23.6TheoremQA: A Theorem-driven Question Answering dataset
text-davinci-00322.8TheoremQA: A Theorem-driven Question Answering dataset
PaLM-2-bison (CoT)21.0TheoremQA: A Theorem-driven Question Answering dataset
DART-Math-Llama3-8B-Prop2Diff (0-shot CoT, w/o code)19.4DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
DART-Math-Mistral-7B-Prop2Diff (0-shot CoT, w/o code)17.0DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
code-davinci-00223.9TheoremQA: A Theorem-driven Question Answering dataset
DART-Math-DSMath-7B-Uniform (0-shot CoT, w/o code)32.5DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
PaLM-2-unicorn (CoT)31.8TheoremQA: A Theorem-driven Question Answering dataset
0 of 19 row(s) selected.