HyperAI초신경

Bias Detection On Rt Inod Bias

평가 지표

Best-of

평가 결과

이 벤치마크에서 각 모델의 성능 결과

		Paper Title
GPT-4	0.5	Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for Hallucinations
Gemma	0.41	Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for Hallucinations
Baseline	0.41	Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for Hallucinations
Mistral	0.36	Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for Hallucinations
Llama2	0.34	Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for Hallucinations

0 of 5 row(s) selected.

Bias Detection On Rt Inod Bias | SOTA | HyperAI초신경