HyperAI

Code Generation On Humaneval

Metrics

Pass@1

Results

Performance results of various models on this benchmark

Comparison Table
Model NamePass@1
from-code-to-correctness-closing-the-last96.3
ldb-a-large-language-model-debugger-via98.2
hierarchical-prompting-taxonomy-a-universal100
hierarchical-prompting-taxonomy-a-universal100
aflow-automating-agentic-workflow-generation94.7
codesim-multi-agent-code-generation-and-197.6
Model 792.0
codesim-multi-agent-code-generation-and-198.8
l2mac-large-language-model-automatic-computer90.2
agentcoder-multi-agent-based-code-generation96.3
codesim-multi-agent-code-generation-and-195.1
mapcoder-multi-agent-code-generation-for93.9
octopack-instruction-tuning-code-large86.6
Model 1491.65
ldb-a-large-language-model-debugger-via99.4
qualityflow-an-agentic-workflow-for-program98.8
nexus-a-lightweight-and-scalable-multi-agent98.8
planning-driven-programming-a-large-language98.2
Model 1985.97
claude-3-5-sonnet-model-card-addendum90.2
metagpt-meta-programming-for-multi-agent85.9