Truthfulqa

メトリクス

% info

% true

% true (gpt-judge)

bleu

bleurt

llm_model

mc1

mc2

model_url

organization

parameters

release_date

rouge

updated_time

結果

このベンチマークにおける各種モデルのパフォーマンス結果

															論文タイトル	コード
API	97.55	20.44	20.56	-17.18	-0.54	GPT-3 175B	0.21	0.33	https://openai.com/index/gpt-3-apps/	OpenAI	175B	2020.5.28	-17.17	2022.5.8	-

0 of 1 row(s) selected.

Truthfulqa

メトリクス

% info

% true

% true (gpt-judge)

bleu

bleurt

llm_model

mc1

mc2

model_url

organization

parameters

release_date

rouge

updated_time

結果

このベンチマークにおける各種モデルのパフォーマンス結果

															論文タイトル	コード
API	97.55	20.44	20.56	-17.18	-0.54	GPT-3 175B	0.21	0.33	https://openai.com/index/gpt-3-apps/	OpenAI	175B	2020.5.28	-17.17	2022.5.8	-

0 of 1 row(s) selected.