Instruction Following On Ifeval
Metrics
Inst-level loose-accuracy
Inst-level strict-accuracy
Prompt-level loose-accuracy
Prompt-level strict-accuracy
Results
Performance results of various models on this benchmark
Comparison Table
Model Name | Inst-level loose-accuracy | Inst-level strict-accuracy | Prompt-level loose-accuracy | Prompt-level strict-accuracy |
---|---|---|---|---|
instruction-following-evaluation-for-large | 59.11 | 55.76 | 46.95 | 43.07 |
self-play-with-execution-feedback-improving | 90.4 | 86.7 | 85.6 | 80.2 |
self-play-with-execution-feedback-improving | 88 | 86.1 | 82.3 | 80.2 |
instruction-following-evaluation-for-large | 85.37 | 83.57 | 79.3 | 76.89 |