Common Sense Reasoning On Winogrande

Métriques

Accuracy

Résultats

Résultats de performance de divers modèles sur ce benchmark

Nom du modèle	Accuracy	Paper Title	Repository
CompassMTL 567M with Tailor	90.5	Task Compass: Scaling Multi-task Pre-training with Task Prefix
FLAN 137B (few-shot, k=16)	72.8	Finetuned Language Models Are Zero-Shot Learners
Unified QA 406M (fine-tuned)	73.3	UnifiedQA: Crossing Format Boundaries With a Single QA System
Switch Transformer 9B (0-shot)	53.4	Efficient Language Modeling with Sparse all-MLP	-
GPT-3 Large 760M (0-shot)	57.4	Language Models are Few-Shot Learners
Pythia 12B (5-shot)	66.6	Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
RoBERTa-large 355M	54.9	Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema	-
ST-MoE-L 4.1B (fine-tuned)	81.7	ST-MoE: Designing Stable and Transferable Sparse Expert Models
phi-1.5-web 1.3B (zero-shot)	74.0	Textbooks Are All You Need II: phi-1.5 technical report
LLaMA 13B (0-shot)	73.0	LLaMA: Open and Efficient Foundation Language Models
OPT 66B (1-shot)	66.1	BloombergGPT: A Large Language Model for Finance
Mistral 7B (0-shot)	74.2	Mixtral of Experts
Flipped-3B	58.56	Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners
GPT-4 (5-shot)	87.5	GPT-4 Technical Report
Base Layers 10B (0-shot)	51	Efficient Language Modeling with Sparse all-MLP	-
RoE-3B	61.60	Exploring the Benefits of Training Expert Language Models over Instruction Tuning
Pythia 12B (0-shot)	63.9	Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
BERT-large 345M (0-shot)	51.9	WinoGrande: An Adversarial Winograd Schema Challenge at Scale
RoBERTa-large 355M (0-shot)	50	WinoGrande: An Adversarial Winograd Schema Challenge at Scale
PaLM 2-S (1-shot)	77.9	PaLM 2 Technical Report

0 of 77 row(s) selected.