Coreference Resolution On Winograd Schema

평가 지표

Accuracy

평가 결과

이 벤치마크에서 각 모델의 성능 결과

모델 이름
Accuracy
Paper TitleRepository
WKH57.1WinoGrande: An Adversarial Winograd Schema Challenge at Scale-
Random chance baseline50Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema-
UDSSM-I (ensemble)57.1Unsupervised Deep Structured Semantic Models for Commonsense Reasoning-
Char-level CNN+LSTM (partial scoring)57.9A Simple Method for Commonsense Reasoning-
DeBERTa-1.5B95.9DeBERTa: Decoding-enhanced BERT with Disentangled Attention-
KEE+NKAM winner of the WSC201658.3Commonsense Knowledge Enhanced Embeddings for Solving Pronoun Disambiguation Problems in Winograd Schema Challenge-
RoBERTa-WinoGrande 355M90.1WinoGrande: An Adversarial Winograd Schema Challenge at Scale-
T5-Large 738M66.7LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions-
BERT-base 110M (fine-tuned on WSCR)62.3A Surprisingly Robust Trick for Winograd Schema Challenge-
BERTwiki 340M (fine-tuned on WSCR)72.5A Surprisingly Robust Trick for Winograd Schema Challenge-
UDSSM-II (ensemble)62.4Unsupervised Deep Structured Semantic Models for Commonsense Reasoning-
RoBERTa-large 354M73.9Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema-
Turing NLR v5 XXL 5.4B (fine-tuned)97.3Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE-
USSM + Supervised DeepNet + KB52.8Attention Is (not) All You Need for Commonsense Reasoning-
PaLM 540B (1-shot)86.3PaLM: Scaling Language Modeling with Pathways-
Hybrid H3 125M (3-shot, logit scoring)43.3Hungry Hungry Hippos: Towards Language Modeling with State Space Models-
GPT-3 175B (few-shot)80.1Language Models are Few-Shot Learners-
GPT-2-XL 1.5B73.3LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions-
LaMini-F-T5 783M64.1LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions-
GPT-2 Medium 774M (full scoring)64.5How Reasonable are Common-Sense Reasoning Tasks: A Case-Study on the Winograd Schema Challenge and SWAG-
0 of 82 row(s) selected.
Coreference Resolution On Winograd Schema | SOTA | HyperAI초신경