HyperAI

Question Answering On Piqa

Metrics

Accuracy

Results

Performance results of various models on this benchmark

Comparison Table
Model NameAccuracy
sheared-llama-accelerating-language-model-pre76.2
llama-open-and-efficient-foundation-language-182.3
two-is-better-than-many-binary-classification85.9
bloomberggpt-a-large-language-model-for77.6
llama-2-open-foundation-and-fine-tuned-chat80.5
llama-2-open-foundation-and-fine-tuned-chat81.9
unifiedqa-crossing-format-boundaries-with-a85.3
task-compass-scaling-multi-task-pre-training85.5
lamini-lm-a-diverse-herd-of-distilled-models70.5
palm-2-technical-report-183.2
sheared-llama-accelerating-language-model-pre75.8
mixlora-enhancing-large-language-models-fine87.6
piqa-reasoning-about-physical-commonsense-in69.2
mixlora-enhancing-large-language-models-fine83.2
massive-language-models-can-be-accurately80.63
language-models-are-few-shot-learners81.0
mixture-of-subspaces-in-low-rank-adaptation89.7
sheared-llama-accelerating-language-model-pre73.4
llama-open-and-efficient-foundation-language-179.8
lamini-lm-a-diverse-herd-of-distilled-models70.6
shakti-a-2-5-billion-parameter-small-language86.2
piqa-reasoning-about-physical-commonsense-in66.8
efficient-language-modeling-with-sparse-all63.8
training-compute-optimal-large-language81.8
mistral-7b83.0
two-is-better-than-many-binary-classification87.4
mixtral-of-experts82.2
pythia-a-suite-for-analyzing-large-language75.2
piqa-reasoning-about-physical-commonsense-in77.1
pythia-a-suite-for-analyzing-large-language76
efficient-language-modeling-with-sparse-all63.8
scaling-language-models-methods-analysis-181.8
bloomberggpt-a-large-language-model-for75.8
pythia-a-suite-for-analyzing-large-language70.4
mixlora-enhancing-large-language-models-fine86.8
pythia-a-suite-for-analyzing-large-language76.7
efficient-language-modeling-with-sparse-all73
finetuned-language-models-are-zero-shot81.7
lamini-lm-a-diverse-herd-of-distilled-models71.3
efficient-language-modeling-with-sparse-all68.1
palm-2-technical-report-182.2
bloomberggpt-a-large-language-model-for77.9
parameter-efficient-sparsity-crafting-from82.7
bloomberggpt-a-large-language-model-for77
llama-open-and-efficient-foundation-language-180.1
lamini-lm-a-diverse-herd-of-distilled-models67.2
unicorn-on-rainbow-a-universal-commonsense90.1
bert-pre-training-of-deep-bidirectional66.7
task-compass-scaling-multi-task-pre-training87.3
massive-language-models-can-be-accurately54.73
task-compass-scaling-multi-task-pre-training88.3
massive-language-models-can-be-accurately81.07
piqa-reasoning-about-physical-commonsense-in50
llama-open-and-efficient-foundation-language-182.8
lamini-lm-a-diverse-herd-of-distilled-models55.9
textbooks-are-all-you-need-ii-phi-1-577
finetuned-language-models-are-zero-shot80.5
llama-2-open-foundation-and-fine-tuned-chat78.8
mixtral-of-experts83.6
roberta-a-robustly-optimized-bert-pretraining79.4
palm-2-technical-report-185.0
megatron-lm-training-multi-billion-parameter82.0
language-models-are-few-shot-learners72.9
lamini-lm-a-diverse-herd-of-distilled-models72.2
llama-2-open-foundation-and-fine-tuned-chat82.8
massive-language-models-can-be-accurately79.54
massive-language-models-can-be-accurately79.54