HyperAI超神経

プラットフォーム

Dialogue Safety Prediction On Rt Inod

評価指標

Best-of

評価結果

このベンチマークにおける各モデルのパフォーマンス結果

		Paper Title
Baseline	0.92	Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for Hallucinations
Gemma	0.91	Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for Hallucinations
GPT-4	0.91	Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for Hallucinations
Mistral	0.87	Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for Hallucinations
Llama2	0.86	Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for Hallucinations

0 of 5 row(s) selected.

Dialogue Safety Prediction On Rt Inod | SOTA | HyperAI超神経