Code Generation On Webapp1K React
Métriques
pass@1
Résultats
Résultats de performance de divers modèles sur ce benchmark
Nom du modèle | pass@1 | Paper Title | Repository |
---|---|---|---|
claude-3.5-sonnet | 0.8808 | Insights from Benchmarking Frontier Language Models on Web App Code Generation | |
deepseek-v2.5 | 0.834 | A Case Study of Web App Coding with OpenAI Reasoning Models | |
deepseek-coder-v2-instruct | 0.7002 | Insights from Benchmarking Frontier Language Models on Web App Code Generation | |
gpt-4o-2024-08-06 | 0.885 | Insights from Benchmarking Frontier Language Models on Web App Code Generation | |
o1-mini | 0.939 | A Case Study of Web App Coding with OpenAI Reasoning Models | |
mistral-large-2 | 0.7804 | Insights from Benchmarking Frontier Language Models on Web App Code Generation | |
llama-v3p1-405b-instruct | 0.302 | Insights from Benchmarking Frontier Language Models on Web App Code Generation | |
o1-preview | 0.952 | A Case Study of Web App Coding with OpenAI Reasoning Models |
0 of 8 row(s) selected.