HyperAI超神经

Common Sense Reasoning On Commonsenseqa

评估指标

Accuracy

评测结果

各个模型在此基准测试上的表现结果

模型名称
Accuracy
Paper TitleRepository
DEKCOR83.3Fusing Context Into Knowledge Graph for Commonsense Question Answering
UnifiedQA 11B (fine-tuned)79.1UnifiedQA: Crossing Format Boundaries With a Single QA System
RoBERTa+HyKAS Ma et al. (2019)73.2Towards Generalizable Neuro-Symbolic Systems for Commonsense Question Answering-
Chain of thought ASDiv28.6Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
KagNet58.9KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning
OPT 66B (1-shot)66.4BloombergGPT: A Large Language Model for Finance-
GPT-4o (HPT)92.54Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models Aligned with Human Cognitive Principles-
UnifiedQA 440M (fine-tuned)64UnifiedQA: Crossing Format Boundaries With a Single QA System
UL2 20B (chain-of-thought)51.4UL2: Unifying Language Learning Paradigms
STaR without Rationalization (on GPT-J)68.8STaR: Bootstrapping Reasoning With Reasoning
Few-shot CoT GPT-J36.6STaR: Bootstrapping Reasoning With Reasoning
PaLM 2 (few‑shot, CoT, SC)90.4PaLM 2 Technical Report
T5-XXL 11B (fine-tuned)78.1UnifiedQA: Crossing Format Boundaries With a Single QA System
RoBERTa-Large 355M72.1RoBERTa: A Robustly Optimized BERT Pretraining Approach
GPT-3 Direct Finetuned73.0Human Parity on CommonsenseQA: Augmenting Self-Attention with External Attention
BLOOM 176B (1-shot)64.2BloombergGPT: A Large Language Model for Finance-
UL2 20B (zero-shot)34.2UL2: Unifying Language Learning Paradigms
DeBERTaV3-large+KEAR91.2Human Parity on CommonsenseQA: Augmenting Self-Attention with External Attention
DRAGON78.2Deep Bidirectional Language-Knowledge Graph Pretraining
BERT_CSlarge62.2Align, Mask and Select: A Simple Method for Incorporating Commonsense Knowledge into Language Representation Models-
0 of 38 row(s) selected.