Natural Questions On Theoremqa

평가 지표

Accuracy

평가 결과

이 벤치마크에서 각 모델의 성능 결과

모델 이름
Accuracy
Paper TitleRepository
GPT-4 (PoT)52.4TheoremQA: A Theorem-driven Question Answering dataset-
GPT-4 (CoT)43.8TheoremQA: A Theorem-driven Question Answering dataset-
DART-Math-Llama3-8B-Uniform (0-shot CoT, w/o code)15.4DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving-
GPT-3.5-turbo (PoT)35.6TheoremQA: A Theorem-driven Question Answering dataset-
DART-Math-Llama3-70B-Uniform (0-shot CoT, w/o code)27.4DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving-
Claude-v1 (CoT)24.9TheoremQA: A Theorem-driven Question Answering dataset-
Claude-v1 (PoT)25.9TheoremQA: A Theorem-driven Question Answering dataset-
DART-Math-Mistral-7B-Uniform (0-shot CoT, w/o code)16.4DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving-
DART-Math-Llama3-70B-Prop2Diff (0-shot CoT, w/o code)28.2DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving-
DART-Math-DSMath-7B-Prop2Diff (0-shot CoT, w/o code)32.2DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving-
GPT-3.5-turbo (CoT)30.2TheoremQA: A Theorem-driven Question Answering dataset-
Claude-instant (CoT)23.6TheoremQA: A Theorem-driven Question Answering dataset-
text-davinci-00322.8TheoremQA: A Theorem-driven Question Answering dataset-
PaLM-2-bison (CoT)21.0TheoremQA: A Theorem-driven Question Answering dataset-
DART-Math-Llama3-8B-Prop2Diff (0-shot CoT, w/o code)19.4DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving-
DART-Math-Mistral-7B-Prop2Diff (0-shot CoT, w/o code)17.0DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving-
code-davinci-00223.9TheoremQA: A Theorem-driven Question Answering dataset-
DART-Math-DSMath-7B-Uniform (0-shot CoT, w/o code)32.5DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving-
PaLM-2-unicorn (CoT)31.8TheoremQA: A Theorem-driven Question Answering dataset-
0 of 19 row(s) selected.
Natural Questions On Theoremqa | SOTA | HyperAI초신경