HyperAI

Question Answering On Drop

Metrics

Accuracy

Results

Performance results of various models on this benchmark

Comparison Table
Model NameAccuracy
large-language-models-can-self-improve78.2
large-language-models-can-self-improve83
large-language-models-can-self-improve71.7
large-language-models-can-self-improve60
large-language-models-can-self-improve70.6
large-language-models-can-self-improve76.2