HyperAI

Code Generation On Res Q

Metrics

pass@1

Results

Performance results of various models on this benchmark

Comparison Table
Model Namepass@1
res-q-evaluating-code-editing-large-language30.0
res-q-evaluating-code-editing-large-language58.0
res-q-evaluating-code-editing-large-language20.0
res-q-evaluating-code-editing-large-language18.0
res-q-evaluating-code-editing-large-language30.0
res-q-evaluating-code-editing-large-language36.0
res-q-evaluating-code-editing-large-language46.0
res-q-evaluating-code-editing-large-language29.0
res-q-evaluating-code-editing-large-language37.0