HyperAI

Multi Task Language Understanding On Mmlu

المقاييس

Average (%)

النتائج

نتائج أداء النماذج المختلفة على هذا المعيار القياسي

جدول المقارنة
اسم النموذجAverage (%)
albert-a-lite-bert-for-self-supervised27.1
roberta-a-robustly-optimized-bert-pretraining27.9
measuring-massive-multitask-language43.9
mixtral-of-experts70.6
unifiedqa-crossing-format-boundaries-with-a48.9
scaling-language-models-methods-analysis-129.5
few-shot-learning-with-retrieval-augmented47.9
language-models-are-few-shot-learners43.9
glm-130b-an-open-bilingual-pre-trained-model44.8
llama-2-open-foundation-and-fine-tuned-chat54.8
gpt-neox-20b-an-open-source-autoregressive-133.6
deepseek-r1-incentivizing-reasoning87.5
scaling-instruction-finetuned-language-models33.7
model-card-and-evaluations-for-claude-models73.4
scaling-instruction-finetuned-language-models73.5
the-falcon-series-of-open-language-models57.0
llama-open-and-efficient-foundation-language-168.9
mistral-7b60.1
النموذج 1977.5
النموذج 2056.7
textbooks-are-all-you-need-ii-phi-1-537.9
scaling-instruction-finetuned-language-models28.7
llama-2-open-foundation-and-fine-tuned-chat62.6
scaling-instruction-finetuned-language-models59.5
scaling-instruction-finetuned-language-models45.1
branch-train-mix-mixing-expert-llms-into-a53.2
النموذج 2731
llama-2-open-foundation-and-fine-tuned-chat45.3
bloomberggpt-a-large-language-model-for39.2
training-compute-optimal-large-language67.5
the-claude-3-model-family-opus-sonnet-haiku75.2
mixtral-of-experts62.5
bloomberggpt-a-large-language-model-for39.1
parameter-efficient-sparsity-crafting-from75.6
infoentropy-loss-to-mitigate-bias-of-learning29.68
bloomberggpt-a-large-language-model-for36
llama-3-meets-moe-efficient-upcycling86.6
llama-3-meets-moe-efficient-upcycling86.0
the-falcon-series-of-open-language-models28.0
breaking-the-ceiling-of-the-llm-community-by83.54
the-llama-3-herd-of-models73.0
llama-open-and-efficient-foundation-language-163.4
scaling-instruction-finetuned-language-models45.5
measuring-massive-multitask-language32.4
النموذج 4583.7
the-falcon-series-of-open-language-models70.6
scaling-instruction-finetuned-language-models35.9
galactica-a-large-language-model-for-science-152.6
scaling-instruction-finetuned-language-models72.2
claude-3-5-sonnet-model-card-addendum88.7
scaling-instruction-finetuned-language-models40.5
unifying-language-learning-paradigms39.2
measuring-massive-multitask-language43.9
scaling-instruction-finetuned-language-models39.7
gpt-4-technical-report-170.0
llama-open-and-efficient-foundation-language-157.8
the-llama-3-herd-of-models73.7
the-claude-3-model-family-opus-sonnet-haiku79
leeroo-orchestrator-elevating-llms75.9
النموذج 6071.8
sieve-general-purpose-data-filtering-system87