Question Answering On Social Iqa

평가 지표

Accuracy

평가 결과

이 벤치마크에서 각 모델의 성능 결과

		Paper Title
Unicorn 11B (fine-tuned)	83.2	UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark
LLaMA-2 13B + MixLoRA	82.5	MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts
CompassMTL 567M with Tailor	82.2	Task Compass: Scaling Multi-task Pre-training with Task Prefix
CompassMTL 567M	81.7	Task Compass: Scaling Multi-task Pre-training with Task Prefix
LLaMA-3 8B+MoSLoRA (fine-tuned)	81.0	Mixture-of-Subspaces in Low-Rank Adaptation
DeBERTa-Large 304M	80.2	Two is Better than Many? Binary Classification as an Effective Approach to Multi-Choice Question Answering
DeBERTa-Large 304M (classification-based)	79.9	Two is Better than Many? Binary Classification as an Effective Approach to Multi-Choice Question Answering
UnifiedQA 3B	79.8	UnifiedQA: Crossing Format Boundaries With a Single QA System
ExDeBERTa 567M	79.6	Task Compass: Scaling Multi-task Pre-training with Task Prefix
LLaMA-3 8B + MixLoRA	78.8	MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts
LLaMA-2 7B + MixLoRA	78	MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts
RoBERTa-Large 355M (fine-tuned)	76.7	RoBERTa: A Robustly Optimized BERT Pretraining Approach
BERT-large 340M (fine-tuned)	64.5	SocialIQA: Commonsense Reasoning about Social Interactions
BERT-base 110M (fine-tuned)	63.1	SocialIQA: Commonsense Reasoning about Social Interactions
GPT-1 117M (fine-tuned)	63	SocialIQA: Commonsense Reasoning about Social Interactions
phi-1.5-web 1.3B (zero-shot)	53.0	Textbooks Are All You Need II: phi-1.5 technical report
phi-1.5 1.3B (zero-shot)	52.6	Textbooks Are All You Need II: phi-1.5 technical report
LLaMA 65B (zero-shot)	52.3	LLaMA: Open and Efficient Foundation Language Models
Chinchilla (zero-shot)	51.3	Training Compute-Optimal Large Language Models
Gopher (zero-shot)	50.6	Scaling Language Models: Methods, Analysis & Insights from Training Gopher

0 of 24 row(s) selected.

Command Palette

Question Answering On Social Iqa

평가 지표

평가 결과