HyperAIHyperAI

Common Sense Reasoning On Commonsenseqa

Metrics

Accuracy

Results

Performance results of various models on this benchmark

Model Name
Accuracy
Paper TitleRepository
DEKCOR83.3Fusing Context Into Knowledge Graph for Commonsense Question Answering-
UnifiedQA 11B (fine-tuned)79.1UnifiedQA: Crossing Format Boundaries With a Single QA System-
RoBERTa+HyKAS Ma et al. (2019)73.2Towards Generalizable Neuro-Symbolic Systems for Commonsense Question Answering-
Chain of thought ASDiv28.6Chain-of-Thought Prompting Elicits Reasoning in Large Language Models-
KagNet58.9KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning-
OPT 66B (1-shot)66.4BloombergGPT: A Large Language Model for Finance-
GPT-4o (HPT)92.54Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models-
UnifiedQA 440M (fine-tuned)64UnifiedQA: Crossing Format Boundaries With a Single QA System-
UL2 20B (chain-of-thought)51.4UL2: Unifying Language Learning Paradigms-
STaR without Rationalization (on GPT-J)68.8STaR: Bootstrapping Reasoning With Reasoning-
Few-shot CoT GPT-J36.6STaR: Bootstrapping Reasoning With Reasoning-
PaLM 2 (few‑shot, CoT, SC)90.4PaLM 2 Technical Report-
T5-XXL 11B (fine-tuned)78.1UnifiedQA: Crossing Format Boundaries With a Single QA System-
RoBERTa-Large 355M72.1RoBERTa: A Robustly Optimized BERT Pretraining Approach-
GPT-3 Direct Finetuned73.0Human Parity on CommonsenseQA: Augmenting Self-Attention with External Attention-
BLOOM 176B (1-shot)64.2BloombergGPT: A Large Language Model for Finance-
UL2 20B (zero-shot)34.2UL2: Unifying Language Learning Paradigms-
DeBERTaV3-large+KEAR91.2Human Parity on CommonsenseQA: Augmenting Self-Attention with External Attention-
DRAGON78.2Deep Bidirectional Language-Knowledge Graph Pretraining-
BERT_CSlarge62.2Align, Mask and Select: A Simple Method for Incorporating Commonsense Knowledge into Language Representation Models-
0 of 38 row(s) selected.
Common Sense Reasoning On Commonsenseqa | SOTA | HyperAI