HyperAI
HyperAI
الرئيسية
المنصة
الوثائق
الأخبار
الأوراق البحثية
الدروس
مجموعات البيانات
الموسوعة
SOTA
نماذج LLM
لوحة الأداء GPU
الفعاليات
البحث
حول
شروط الخدمة
سياسة الخصوصية
العربية
HyperAI
HyperAI
Toggle Sidebar
البحث في الموقع...
⌘
K
Command Palette
Search for a command to run...
المنصة
الرئيسية
SOTA
توليد_الكود
Code Generation On Humaneval
Code Generation On Humaneval
المقاييس
Pass@1
النتائج
نتائج أداء النماذج المختلفة على هذا المعيار القياسي
Columns
اسم النموذج
Pass@1
Paper Title
Llama-3 8B (HPT)
100
Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models
Claude 3.5 Sonnet (HPT)
100
Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models
LLMDebugger (OpenAI o1)
99.4
Debug like a Human: A Large Language Model Debugger via Verifying Runtime Execution Step-by-step
CodeSim (o3-mini)
98.8
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging
QualityFlow (Sonnet-3.5)
98.8
QualityFlow: An Agentic Workflow for Program Synthesis Controlled by LLM Quality Checks
Nexus (Claude 3.5 Sonnet)
98.8
Nexus: A Lightweight and Scalable Multi-Agent Framework for Complex Tasks Automation
LLMDebugger (GPT 4o)
98.2
Debug like a Human: A Large Language Model Debugger via Verifying Runtime Execution Step-by-step
LPW (GPT-4o)
98.2
Planning-Driven Programming: A Large Language Model Programming Workflow
CodeSim (GPT-4o and LDB Debugger )
97.6
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging
MGDebugger (DeepSeek-Coder-V2-Lite)
96.3
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging
AgentCoder (GPT-4)
96.3
AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation
CodeSim (GPT-4o)
95.1
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging
AFlow(GPT-4o-mini)
94.7
AFlow: Automating Agentic Workflow Generation
MapCoder (GPT-4)
93.9
MapCoder: Multi-Agent Code Generation for Competitive Problem Solving
Claude 3.5 Sonnet (0-shot)
92.0
-
FractalResearch : Pioneer-SWO (GPT-4-turbo)
91.65
-
L2MAC (GPT-4)
90.2
L2MAC: Large Language Model Automatic Computer for Extensive Code Generation
GPT-4o (0-shot)
90.2
Claude 3.5 Sonnet Model Card Addendum
OctorCoder (GPT-4)
86.6
OctoPack: Instruction Tuning Code Large Language Models
Spark_FP16_medium_v4.1.1
85.97
-
0 of 21 row(s) selected.
Previous
Next
Code Generation On Humaneval | SOTA | HyperAI