Code Generation On Webapp1K Duo React
평가 지표
pass@1
평가 결과
이 벤치마크에서 각 모델의 성능 결과
모델 이름 | pass@1 | Paper Title | Repository |
---|---|---|---|
claude-3-5-sonnet | 0.679 | A Case Study of Web App Coding with OpenAI Reasoning Models | - |
mistral-large-2 | 0.449 | A Case Study of Web App Coding with OpenAI Reasoning Models | - |
deepseek-v2.5 | 0.49 | A Case Study of Web App Coding with OpenAI Reasoning Models | - |
o1-preview | 0.652 | A Case Study of Web App Coding with OpenAI Reasoning Models | - |
o1-mini | 0.667 | A Case Study of Web App Coding with OpenAI Reasoning Models | - |
gpt-4o-2024-08-06 | 0.531 | A Case Study of Web App Coding with OpenAI Reasoning Models | - |
0 of 6 row(s) selected.