Question Answering On Piqa
Metriken
Accuracy
Ergebnisse
Leistungsergebnisse verschiedener Modelle zu diesem Benchmark
Vergleichstabelle
Modellname | Accuracy |
---|---|
sheared-llama-accelerating-language-model-pre | 76.2 |
llama-open-and-efficient-foundation-language-1 | 82.3 |
two-is-better-than-many-binary-classification | 85.9 |
bloomberggpt-a-large-language-model-for | 77.6 |
llama-2-open-foundation-and-fine-tuned-chat | 80.5 |
llama-2-open-foundation-and-fine-tuned-chat | 81.9 |
unifiedqa-crossing-format-boundaries-with-a | 85.3 |
task-compass-scaling-multi-task-pre-training | 85.5 |
lamini-lm-a-diverse-herd-of-distilled-models | 70.5 |
palm-2-technical-report-1 | 83.2 |
sheared-llama-accelerating-language-model-pre | 75.8 |
mixlora-enhancing-large-language-models-fine | 87.6 |
piqa-reasoning-about-physical-commonsense-in | 69.2 |
mixlora-enhancing-large-language-models-fine | 83.2 |
massive-language-models-can-be-accurately | 80.63 |
language-models-are-few-shot-learners | 81.0 |
mixture-of-subspaces-in-low-rank-adaptation | 89.7 |
sheared-llama-accelerating-language-model-pre | 73.4 |
llama-open-and-efficient-foundation-language-1 | 79.8 |
lamini-lm-a-diverse-herd-of-distilled-models | 70.6 |
shakti-a-2-5-billion-parameter-small-language | 86.2 |
piqa-reasoning-about-physical-commonsense-in | 66.8 |
efficient-language-modeling-with-sparse-all | 63.8 |
training-compute-optimal-large-language | 81.8 |
mistral-7b | 83.0 |
two-is-better-than-many-binary-classification | 87.4 |
mixtral-of-experts | 82.2 |
pythia-a-suite-for-analyzing-large-language | 75.2 |
piqa-reasoning-about-physical-commonsense-in | 77.1 |
pythia-a-suite-for-analyzing-large-language | 76 |
efficient-language-modeling-with-sparse-all | 63.8 |
scaling-language-models-methods-analysis-1 | 81.8 |
bloomberggpt-a-large-language-model-for | 75.8 |
pythia-a-suite-for-analyzing-large-language | 70.4 |
mixlora-enhancing-large-language-models-fine | 86.8 |
pythia-a-suite-for-analyzing-large-language | 76.7 |
efficient-language-modeling-with-sparse-all | 73 |
finetuned-language-models-are-zero-shot | 81.7 |
lamini-lm-a-diverse-herd-of-distilled-models | 71.3 |
efficient-language-modeling-with-sparse-all | 68.1 |
palm-2-technical-report-1 | 82.2 |
bloomberggpt-a-large-language-model-for | 77.9 |
parameter-efficient-sparsity-crafting-from | 82.7 |
bloomberggpt-a-large-language-model-for | 77 |
llama-open-and-efficient-foundation-language-1 | 80.1 |
lamini-lm-a-diverse-herd-of-distilled-models | 67.2 |
unicorn-on-rainbow-a-universal-commonsense | 90.1 |
bert-pre-training-of-deep-bidirectional | 66.7 |
task-compass-scaling-multi-task-pre-training | 87.3 |
massive-language-models-can-be-accurately | 54.73 |
task-compass-scaling-multi-task-pre-training | 88.3 |
massive-language-models-can-be-accurately | 81.07 |
piqa-reasoning-about-physical-commonsense-in | 50 |
llama-open-and-efficient-foundation-language-1 | 82.8 |
lamini-lm-a-diverse-herd-of-distilled-models | 55.9 |
textbooks-are-all-you-need-ii-phi-1-5 | 77 |
finetuned-language-models-are-zero-shot | 80.5 |
llama-2-open-foundation-and-fine-tuned-chat | 78.8 |
mixtral-of-experts | 83.6 |
roberta-a-robustly-optimized-bert-pretraining | 79.4 |
palm-2-technical-report-1 | 85.0 |
megatron-lm-training-multi-billion-parameter | 82.0 |
language-models-are-few-shot-learners | 72.9 |
lamini-lm-a-diverse-herd-of-distilled-models | 72.2 |
llama-2-open-foundation-and-fine-tuned-chat | 82.8 |
massive-language-models-can-be-accurately | 79.54 |
massive-language-models-can-be-accurately | 79.54 |