HyperAI超神経

Instruction Following On Ifeval

評価指標

Inst-level loose-accuracy
Inst-level strict-accuracy
Prompt-level loose-accuracy
Prompt-level strict-accuracy

評価結果

このベンチマークにおける各モデルのパフォーマンス結果

比較表
モデル名Inst-level loose-accuracyInst-level strict-accuracyPrompt-level loose-accuracyPrompt-level strict-accuracy
instruction-following-evaluation-for-large59.1155.7646.9543.07
self-play-with-execution-feedback-improving90.486.785.680.2
self-play-with-execution-feedback-improving8886.182.380.2
instruction-following-evaluation-for-large85.3783.5779.376.89