HyperAI

Science Question Answering On Scienceqa

Metrics

Avg. Accuracy
Grades 1-6
Grades 7-12
Image Context
Language Science
Natural Science
No Context
Social Science
Text Context

Results

Performance results of various models on this benchmark

Model Name
Avg. Accuracy
Grades 1-6
Grades 7-12
Image Context
Language Science
Natural Science
No Context
Social Science
Text Context
Paper TitleRepository
GPT-3 (QCM→A, 2-shot)73.9776.80 68.89 67.2876.0074.6477.4269.7474.44Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
GPT-3 - CoT(QCM→AE, 2-shot)74.6178.4967.6366.0977.5576.6079.5865.9275.51Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Chat-UniVi-13B90.9991.1990.6488.0588.9190.4190.9495.0589.64Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
MC-CoT F-Large94.8895.394.1393.7593.1897.4794.4990.4496.97Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training
Honeybee94.3995.0493.2193.7591.1895.2093.1796.2994.48Honeybee: Locality-enhanced Projector for Multimodal LLM
Video-LaVIT70.0--------Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization
GPT-3 - CoT (QCM→ALE , 2-shot)75.1778.23 69.6867.4378.0975.4479.9370.8774.68Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
UnifiedQA-BASE - CoT (QCM→ALE)74.1177.0668.8266.5378.9171.0081.8176.0466.42Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
LLaVA (+ GPT-4)92.53----------
Multimodal CoT91.6892.4490.3188.8090.8295.9192.8982.0095.26Multimodal Chain-of-Thought Reasoning in Language Models
0 of 10 row(s) selected.