HyperAI

Common Sense Reasoning On Arc Challenge

Metrics

Accuracy

Results

Performance results of various models on this benchmark

Model Name
Accuracy
Paper TitleRepository
Claude 2 (few-shot, k=5)91Model Card and Evaluations for Claude Models-
PaLM 540B (Self Improvement, CoT Prompting)88.3Large Language Models Can Self-Improve-
GPT-4 (few-shot, k=25)96.4GPT-4 Technical Report
Camelidae-8×34B65.2Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks
StupidLLM91.03--
LLaMA 3 8B + MoSLoRA (fine-tuned)81.5Mixture-of-Subspaces in Low-Rank Adaptation
PaLM 540B (CoT Prompting)85.2Large Language Models Can Self-Improve-
GLaM 64B/64E (0 shot)50.3GLaM: Efficient Scaling of Language Models with Mixture-of-Experts-
PaLM 540B (Standard-Prompting)87.1Large Language Models Can Self-Improve-
ST-MoE-32B 269B (fine-tuned)86.5ST-MoE: Designing Stable and Transferable Sparse Expert Models-
PaLM 2-S (1-shot)59.6PaLM 2 Technical Report
BLOOM (few-shot, k=5)32.9Galactica: A Large Language Model for Science
GAL 120B (zero-shot)67.9Galactica: A Large Language Model for Science
OPT (few-shot, k=5)31.1Galactica: A Large Language Model for Science
LLaMA-2 7B + MixLoRA58.1MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts
OPT-175B (50% Sparsity)25.6SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
GLaM 64B/64E (1 shot)48.2GLaM: Efficient Scaling of Language Models with Mixture-of-Experts-
LLaMA-2 13B + MixLoRA69.9MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts
LLaMA-3 8B + MixLoRA79.9MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts
PaLM 2 (few-shot, CoT, SC)95.1PaLM 2 Technical Report
0 of 54 row(s) selected.