HyperAI

Common Sense Reasoning On Winogrande

المقاييس

Accuracy

النتائج

نتائج أداء النماذج المختلفة على هذا المعيار القياسي

اسم النموذج
Accuracy
Paper TitleRepository
CompassMTL 567M with Tailor90.5Task Compass: Scaling Multi-task Pre-training with Task Prefix
FLAN 137B (few-shot, k=16)72.8Finetuned Language Models Are Zero-Shot Learners
Unified QA 406M (fine-tuned)73.3UnifiedQA: Crossing Format Boundaries With a Single QA System
Switch Transformer 9B (0-shot)53.4Efficient Language Modeling with Sparse all-MLP-
GPT-3 Large 760M (0-shot)57.4Language Models are Few-Shot Learners
Pythia 12B (5-shot)66.6Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
RoBERTa-large 355M54.9Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema-
ST-MoE-L 4.1B (fine-tuned)81.7ST-MoE: Designing Stable and Transferable Sparse Expert Models-
phi-1.5-web 1.3B (zero-shot)74.0Textbooks Are All You Need II: phi-1.5 technical report
LLaMA 13B (0-shot)73.0LLaMA: Open and Efficient Foundation Language Models
OPT 66B (1-shot)66.1BloombergGPT: A Large Language Model for Finance-
Mistral 7B (0-shot)74.2Mixtral of Experts
Flipped-3B58.56Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners
GPT-4 (5-shot)87.5GPT-4 Technical Report
Base Layers 10B (0-shot)51Efficient Language Modeling with Sparse all-MLP-
RoE-3B61.60Exploring the Benefits of Training Expert Language Models over Instruction Tuning
Pythia 12B (0-shot)63.9Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
BERT-large 345M (0-shot)51.9WinoGrande: An Adversarial Winograd Schema Challenge at Scale
RoBERTa-large 355M (0-shot)50WinoGrande: An Adversarial Winograd Schema Challenge at Scale
PaLM 2-S (1-shot)77.9PaLM 2 Technical Report
0 of 77 row(s) selected.