HyperAIHyperAI

Code Generation On Mbpp

Metrics

Accuracy

Results

Performance results of various models on this benchmark

Model Name
Accuracy
Paper TitleRepository
LLaMA 33B (0-shot)30.2LLaMA: Open and Efficient Foundation Language Models-
Code Llama - Instruct 13B (3-shot)49.4Code Llama: Open Foundation Models for Code-
Code Llama 7B (3-shot)41.4Code Llama: Open Foundation Models for Code-
GPT-3.5 Turbo + Language Agent Tree Search81.1Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models-
StarCoder 15.5B (Self-Debugging with unit tests + trace)53.2Teaching Large Language Models to Self-Debug-
code-cushman-001 12B (CodeT)55.4CodeT: Code Generation with Generated Tests-
LPW (GPT-4o)84.8Planning-Driven Programming: A Large Language Model Programming Workflow-
CodeGen 16B + Coder-Reviewer46.2Coder Reviewer Reranking for Code Generation-
Llama 2 34B (0-shot)33Llama 2: Open Foundation and Fine-Tuned Chat Models-
GPT-3.5 Turbo (0-shot)39.8INTERVENOR: Prompting the Coding Ability of Large Language Models with the Interactive Chain of Repair-
MapCoder (GPT-4)83.1MapCoder: Multi-Agent Code Generation for Competitive Problem Solving-
GPT-4 (ChatGPT Plus)87.5How Does Naming Affect LLMs on Code Analysis Tasks?-
code-davinci-002 175B + CodeT67.7CodeT: Code Generation with Generated Tests-
GPT-3.5 Turbo (ChatGPT) + AgentCoder89.9AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation-
Claude71.4How Does Naming Affect LLMs on Code Analysis Tasks?-
Code Llama - Instruct 7B (3-shot)44.4Code Llama: Open Foundation Models for Code-
o1-mini + Language Agent Tree Search (Hamming.ai)82.3Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models-
PaLM Coder 540B47PaLM: Scaling Language Modeling with Pathways-
Llama 2 7B (0-shot)20.8Llama 2: Open Foundation and Fine-Tuned Chat Models-
Code Llama - Python 70B (3-shot)65.5Code Llama: Open Foundation Models for Code-
0 of 96 row(s) selected.
Code Generation On Mbpp | SOTA | HyperAI