HyperAI

Code Generation On Mbpp

Metrics

Accuracy

Results

Performance results of various models on this benchmark

Comparison Table
Model NameAccuracy
llama-open-and-efficient-foundation-language-130.2
code-llama-open-foundation-models-for-code49.4
code-llama-open-foundation-models-for-code41.4
language-agent-tree-search-unifies-reasoning81.1
teaching-large-language-models-to-self-debug53.2
codet-code-generation-with-generated-tests55.4
planning-driven-programming-a-large-language84.8
coder-reviewer-reranking-for-code-generation46.2
llama-2-open-foundation-and-fine-tuned-chat33
intervenor-prompt-the-coding-ability-of-large39.8
mapcoder-multi-agent-code-generation-for83.1
chatgpt-for-software-security-exploring-the87.5
codet-code-generation-with-generated-tests67.7
agentcoder-multi-agent-based-code-generation89.9
chatgpt-for-software-security-exploring-the71.4
code-llama-open-foundation-models-for-code44.4
language-agent-tree-search-unifies-reasoning82.3
palm-scaling-language-modeling-with-pathways-147
llama-2-open-foundation-and-fine-tuned-chat20.8
code-llama-open-foundation-models-for-code65.5
codesim-multi-agent-code-generation-and-190.7
llama-open-and-efficient-foundation-language-122
lever-learning-to-verify-language-to-code68.9
starcoder-may-the-source-be-with-you35
intervenor-prompt-the-coding-ability-of-large45.4
qualityflow-an-agentic-workflow-for-program94.2
when-llm-based-code-generation-meets-the83.8±0.6
the-claude-3-model-family-opus-sonnet-haiku80.4
branch-train-mix-mixing-expert-llms-into-a39.4
coder-reviewer-reranking-for-code-generation48.3
coder-reviewer-reranking-for-code-generation66.4
natural-language-to-code-translation-with58.2
teaching-large-language-models-to-self-debug70.8
coder-reviewer-reranking-for-code-generation26.1
code-llama-open-foundation-models-for-code52.2
palm-scaling-language-modeling-with-pathways-136.8
code-llama-open-foundation-models-for-code56.2
coder-reviewer-reranking-for-code-generation66.9
codegeex-a-pre-trained-model-for-code24.4
Model 4090.0
deepseek-coder-when-the-large-language-model66
mistral-7b47.5
code-llama-open-foundation-models-for-code62.4
coder-reviewer-reranking-for-code-generation44.1
chatgpt-for-software-security-exploring-the76.2
palm-2-technical-report-150
agentcoder-multi-agent-based-code-generation91.8
code-llama-open-foundation-models-for-code57
aflow-automating-agentic-workflow-generation83.4
llama-open-and-efficient-foundation-language-137.7
chatgpt-for-software-security-exploring-the82
deepseek-coder-when-the-large-language-model60.6
deepseek-coder-when-the-large-language-model70.8
code-llama-open-foundation-models-for-code49
deepseek-coder-when-the-large-language-model70
the-claude-3-model-family-opus-sonnet-haiku79.4
code-llama-open-foundation-models-for-code62.2
incoder-a-generative-model-for-code-infilling19.4
codet-code-generation-with-generated-tests49.5
from-code-to-correctness-closing-the-last80.8
starcoder-may-the-source-be-with-you52.7
deepseek-coder-when-the-large-language-model65.4
parameter-efficient-sparsity-crafting-from41.4
coder-reviewer-reranking-for-code-generation47.3
mixtral-of-experts60.7
llama-2-open-foundation-and-fine-tuned-chat45
the-claude-3-model-family-opus-sonnet-haiku86.4
codet-code-generation-with-generated-tests61.9
teaching-large-language-models-to-self-debug61.4
code-llama-open-foundation-models-for-code47.6
code-llama-open-foundation-models-for-code61.2
llama-2-open-foundation-and-fine-tuned-chat30.6
parameter-efficient-sparsity-crafting-from48.6
chatgpt-for-software-security-exploring-the83.2
branch-train-mix-mixing-expert-llms-into-a42.6
textbooks-are-all-you-need-ii-phi-1-543.5
wizardcoder-empowering-code-large-language51.8
coder-reviewer-reranking-for-code-generation63
mapcoder-multi-agent-code-generation-for89.7
code-llama-open-foundation-models-for-code47
teaching-large-language-models-to-self-debug72.8
deepseek-coder-when-the-large-language-model46.2
mapcoder-multi-agent-code-generation-for93.2
starcoder-2-and-the-stack-v2-the-next66.2
teaching-large-language-models-to-self-debug67.6
starcoder-may-the-source-be-with-you49
code-llama-open-foundation-models-for-code55
llama-open-and-efficient-foundation-language-117.7
coder-reviewer-reranking-for-code-generation26.7
codet-code-generation-with-generated-tests34.4
intervenor-prompt-the-coding-ability-of-large69.8
deepseek-coder-when-the-large-language-model49.4
teaching-large-language-models-to-self-debug80.2
teaching-large-language-models-to-self-debug47.2
deepseek-coder-when-the-large-language-model80
coder-reviewer-reranking-for-code-generation24.4