HyperAI

Coreference Resolution On Winograd Schema

Metrics

Accuracy

Results

Performance results of various models on this benchmark

Comparison Table
Model NameAccuracy
winogrande-an-adversarial-winograd-schema57.1
back-to-square-one-bias-detection-training50
unsupervised-deep-structured-semantic-models57.1
a-simple-method-for-commonsense-reasoning57.9
deberta-decoding-enhanced-bert-with95.9
commonsense-knowledge-enhanced-embeddings-for58.3
winogrande-an-adversarial-winograd-schema90.1
lamini-lm-a-diverse-herd-of-distilled-models66.7
a-surprisingly-robust-trick-for-winograd62.3
a-surprisingly-robust-trick-for-winograd72.5
unsupervised-deep-structured-semantic-models62.4
back-to-square-one-bias-detection-training73.9
toward-efficient-language-model-pretraining97.3
attention-is-not-all-you-need-for-commonsense52.8
palm-scaling-language-modeling-with-pathways-186.3
hungry-hungry-hippos-towards-language43.3
language-models-are-few-shot-learners80.1
lamini-lm-a-diverse-herd-of-distilled-models73.3
lamini-lm-a-diverse-herd-of-distilled-models64.1
on-the-evaluation-of-common-sense-reasoning64.5
g-daug-generative-data-augmentation-for80
pythia-a-suite-for-analyzing-large-language36.5
ask-me-anything-a-simple-strategy-for36.5
on-the-evaluation-of-common-sense-reasoning55.7
alexatm-20b-few-shot-learning-using-a-large68.3
ask-me-anything-a-simple-strategy-for74.7
winogrande-an-adversarial-winograd-schema52.8
palm-scaling-language-modeling-with-pathways-189.1
on-the-evaluation-of-common-sense-reasoning61.5
palm-2-technical-report-188.1
a-surprisingly-robust-trick-for-winograd71.4
attention-is-not-all-you-need-for-commonsense52
on-generalization-in-coreference-resolution60.1
lamini-lm-a-diverse-herd-of-distilled-models69.6
unsupervised-deep-structured-semantic-models54.5
a-simple-method-for-commonsense-reasoning63.7
socialiqa-commonsense-reasoning-about-social67
hungry-hungry-hippos-towards-language61.5
on-the-evaluation-of-common-sense-reasoning69.2
palm-2-technical-report-184.6
back-to-square-one-bias-detection-training63
language-models-are-unsupervised-multitask70.7
designing-effective-sparse-expert-models96.6
bert-pre-training-of-deep-bidirectional62.0
exploring-the-benefits-of-training-expert62.21
pythia-a-suite-for-analyzing-large-language36.5
designing-effective-sparse-expert-models93.3
back-to-square-one-bias-detection-training55.4
hungry-hungry-hippos-towards-language63.5
knowledge-in-context-towards-knowledgeable65.40
unifying-language-learning-paradigms98.1
back-to-square-one-bias-detection-training78.8
finetuned-language-models-are-zero-shot86.5
attention-is-all-you-need54.1
toward-efficient-language-model-pretraining98.6
finetuned-language-models-are-zero-shot80.8
ask-me-anything-a-simple-strategy-for77.9
palm-2-technical-report-186.9
palm-scaling-language-modeling-with-pathways-1100
a-simple-method-for-commonsense-reasoning62.6
on-generalization-in-coreference-resolution59.4
unsupervised-deep-structured-semantic-models63.0
a-surprisingly-robust-trick-for-winograd70.3
lamini-lm-a-diverse-herd-of-distilled-models59
pythia-a-suite-for-analyzing-large-language54.8
a-hybrid-neural-network-model-for-commonsense75.1
palm-scaling-language-modeling-with-pathways-189.5
winogrande-an-adversarial-winograd-schema83.1
the-cot-collection-improving-zero-shot-and66
n-grammer-augmenting-transformers-with-latent-168.3
unifying-language-learning-paradigms79.9
pythia-a-suite-for-analyzing-large-language38.5
exploring-the-limits-of-transfer-learning93.8
a-knowledge-hunting-framework-for-common57.1
back-to-square-one-bias-detection-training61.4
back-to-square-one-bias-detection-training56.5
guess-the-instruction-making-language-models58.37
socialiqa-commonsense-reasoning-about-social72.5
unsupervised-deep-structured-semantic-models59.2
scaling-instruction-finetuned-language-models89.82
tttttackling-winogrande-schemas84.6
attention-is-not-all-you-need-for-commonsense60.3