Calm

Metrics

0-shot cot

0-shot icl

1-shot icl

3-shot icl

average

basic

doubt

ignore

llm_model

manual cot

model_url

organization

parameters

release_date

robustness

std

updated_time

Results

Performance results of various models on this benchmark

																					Paper Title	Code
API	54.5	51.9	54.1	60.0	56.8	54.4	52.6	54.6	52.1	58.4	53.9	GPT-4	75.4	https://openai.com/product/gpt-4	OpenAI	N/A	2023/3/14	83.7	9.9	2024/5/1	-

0 of 1 row(s) selected.

HyperAI

Console Console

Calm

Metrics

0-shot cot

0-shot icl

1-shot icl

3-shot icl

average

basic

doubt

ignore

llm_model

manual cot

model_url

organization

parameters

release_date

robustness

std

updated_time

Results

Performance results of various models on this benchmark

																					Paper Title	Code
API	54.5	51.9	54.1	60.0	56.8	54.4	52.6	54.6	52.1	58.4	53.9	GPT-4	75.4	https://openai.com/product/gpt-4	OpenAI	N/A	2023/3/14	83.7	9.9	2024/5/1	-

0 of 1 row(s) selected.

Command Palette

Calm

Metrics

Results

Command Palette

Calm

Metrics

Results