Common Sense Reasoning On Winogrande

المقاييس

Accuracy

النتائج

نتائج أداء النماذج المختلفة على هذا المعيار القياسي

اسم النموذج	Accuracy	Paper Title	Repository
CompassMTL 567M with Tailor	90.5	Task Compass: Scaling Multi-task Pre-training with Task Prefix	-
FLAN 137B (few-shot, k=16)	72.8	Finetuned Language Models Are Zero-Shot Learners	-
Unified QA 406M (fine-tuned)	73.3	UnifiedQA: Crossing Format Boundaries With a Single QA System	-
Switch Transformer 9B (0-shot)	53.4	Efficient Language Modeling with Sparse all-MLP	-
GPT-3 Large 760M (0-shot)	57.4	Language Models are Few-Shot Learners	-
Pythia 12B (5-shot)	66.6	Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling	-
RoBERTa-large 355M	54.9	Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema	-
ST-MoE-L 4.1B (fine-tuned)	81.7	ST-MoE: Designing Stable and Transferable Sparse Expert Models	-
phi-1.5-web 1.3B (zero-shot)	74.0	Textbooks Are All You Need II: phi-1.5 technical report	-
LLaMA 13B (0-shot)	73.0	LLaMA: Open and Efficient Foundation Language Models	-
OPT 66B (1-shot)	66.1	BloombergGPT: A Large Language Model for Finance	-
Mistral 7B (0-shot)	74.2	Mixtral of Experts	-
Flipped-3B	58.56	Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners	-
GPT-4 (5-shot)	87.5	GPT-4 Technical Report	-
Base Layers 10B (0-shot)	51	Efficient Language Modeling with Sparse all-MLP	-
RoE-3B	61.60	Exploring the Benefits of Training Expert Language Models over Instruction Tuning	-
Pythia 12B (0-shot)	63.9	Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling	-
BERT-large 345M (0-shot)	51.9	WinoGrande: An Adversarial Winograd Schema Challenge at Scale	-
RoBERTa-large 355M (0-shot)	50	WinoGrande: An Adversarial Winograd Schema Challenge at Scale	-
PaLM 2-S (1-shot)	77.9	PaLM 2 Technical Report	-

0 of 77 row(s) selected.