Hellobench
평가 지표
average
chat-rescaled score
heuristic text generation-rescaled score
llm_model
model_url
open-ended qa-rescaled score
organization
parameters
release_date
summarization-rescaled score
text completion-rescaled score
updated_time
평가 결과
이 벤치마크에서 각 모델의 성능 결과
비교 표
모델 이름 | average | chat-rescaled score | heuristic text generation-rescaled score | llm_model | model_url | open-ended qa-rescaled score | organization | parameters | release_date | summarization-rescaled score | text completion-rescaled score | updated_time |
---|---|---|---|---|---|---|---|---|---|---|---|---|
모델 1 | 48.55 | 42.88 | 47.87 | GPT-4o-2024-08-06 | https://platform.openai.com/docs/guides | 54.82 | OpenAI | N/A | 2024/8/6 | 29.71 | 67.49 | 2024/9/24 |