HyperAI

Instruction Following On Ifeval

المقاييس

Inst-level loose-accuracy
Inst-level strict-accuracy
Prompt-level loose-accuracy
Prompt-level strict-accuracy

النتائج

نتائج أداء النماذج المختلفة على هذا المعيار القياسي

اسم النموذج
Inst-level loose-accuracy
Inst-level strict-accuracy
Prompt-level loose-accuracy
Prompt-level strict-accuracy
Paper TitleRepository
PaLM 2 S59.1155.7646.9543.07Instruction-Following Evaluation for Large Language Models
AutoIF (Llama3 70B)90.486.785.680.2Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models
AutoIF (Qwen2 72B)8886.182.380.2Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models
GPT-485.3783.5779.376.89Instruction-Following Evaluation for Large Language Models
0 of 4 row(s) selected.