Instruction Following On Ifeval
평가 지표
Inst-level loose-accuracy
Inst-level strict-accuracy
Prompt-level loose-accuracy
Prompt-level strict-accuracy
평가 결과
이 벤치마크에서 각 모델의 성능 결과
비교 표
모델 이름 | Inst-level loose-accuracy | Inst-level strict-accuracy | Prompt-level loose-accuracy | Prompt-level strict-accuracy |
---|---|---|---|---|
instruction-following-evaluation-for-large | 59.11 | 55.76 | 46.95 | 43.07 |
self-play-with-execution-feedback-improving | 90.4 | 86.7 | 85.6 | 80.2 |
self-play-with-execution-feedback-improving | 88 | 86.1 | 82.3 | 80.2 |
instruction-following-evaluation-for-large | 85.37 | 83.57 | 79.3 | 76.89 |