HyperAI초신경

Dialogue Safety Prediction On Rt Inod

평가 지표

Best-of

평가 결과

이 벤치마크에서 각 모델의 성능 결과

		Paper Title
Baseline	0.92	Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for Hallucinations
Gemma	0.91	Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for Hallucinations
GPT-4	0.91	Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for Hallucinations
Mistral	0.87	Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for Hallucinations
Llama2	0.86	Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for Hallucinations

0 of 5 row(s) selected.

Dialogue Safety Prediction On Rt Inod | SOTA | HyperAI초신경