Dialogue Safety Prediction On Rt Inod
评估指标
Best-of
评测结果
各个模型在此基准测试上的表现结果
比较表格
模型名称 | Best-of |
---|---|
benchmarking-llama2-mistral-gemma-and-gpt-for | 0.91 |
benchmarking-llama2-mistral-gemma-and-gpt-for | 0.87 |
benchmarking-llama2-mistral-gemma-and-gpt-for | 0.91 |
benchmarking-llama2-mistral-gemma-and-gpt-for | 0.86 |
benchmarking-llama2-mistral-gemma-and-gpt-for | 0.92 |