Calm
Metrics
0-shot cot
0-shot icl
1-shot icl
3-shot icl
average
basic
cn
doubt
ef
en
ignore
llm_model
manual cot
model_url
organization
parameters
release_date
robustness
std
updated_time
Results
Performance results of various models on this benchmark
| Paper Title | Code | |||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| API | 54.5 | 51.9 | 54.1 | 60.0 | 56.8 | 54.4 | 52.6 | 54.6 | 52.1 | 58.4 | 53.9 | GPT-4 | 75.4 | https://openai.com/product/gpt-4 | OpenAI | N/A | 2023/3/14 | 83.7 | 9.9 | 2024/5/1 | - |
0 of 1 row(s) selected.