Bbh

Metrics

bbh

bbhbooleanexpressions

bbhcausaljudgement

bbhdateunderstanding

bbhdisambiguationqa

bbhdycklanguages

bbhformalfallacies

bbhgeometricshapes

bbhhyperbaton

bbhlogicaldeductionfiveobjects

bbhlogicaldeductionsevenobjects

bbhlogicaldeductionthreeobjects

bbhmovierecommendation

bbhmultisteparithmetictwo

bbhnavigate

bbhobjectcounting

bbhpenguinsinatable

bbhreasoningaboutcoloredobjects

bbhruinnames

bbhsalienttranslationerrordetection

bbhsnarks

bbhsportsunderstanding

bbhtemporalsequences

bbhtrackingshuffledobjectsfiveobjects

bbhtrackingshuffledobjectssevenobjects

bbhtrackingshuffledobjectsthreeobjects

bbhweboflies

bbhwordsorting

key

model

num

org

rank

time

Results

Performance results of various models on this benchmark

																																			Paper Title	Code
Chat	86.700000	96.400000	72.200000	90.000000	85.600000	63.200000	81.200000	49.600000	99.200000	83.600000	58.800000	98.400000	87.200000	87.600000	98.800000	99.600000	97.300000	97.600000	89.200000	69.600000	90.400000	95.200000	100.000000	100.000000	100.000000	100.000000	100.000000	50.800000	1.000000	GPT-4	N/A	OpenAI	1.000000	2023/3/15	-

0 of 1 row(s) selected.

Bbh

Metrics

bbh

bbhbooleanexpressions

bbhcausaljudgement

bbhdateunderstanding

bbhdisambiguationqa

bbhdycklanguages

bbhformalfallacies

bbhgeometricshapes

bbhhyperbaton

bbhlogicaldeductionfiveobjects

bbhlogicaldeductionsevenobjects

bbhlogicaldeductionthreeobjects

bbhmovierecommendation

bbhmultisteparithmetictwo

bbhnavigate

bbhobjectcounting

bbhpenguinsinatable

bbhreasoningaboutcoloredobjects

bbhruinnames

bbhsalienttranslationerrordetection

bbhsnarks

bbhsportsunderstanding

bbhtemporalsequences

bbhtrackingshuffledobjectsfiveobjects

bbhtrackingshuffledobjectssevenobjects

bbhtrackingshuffledobjectsthreeobjects

bbhweboflies

bbhwordsorting

key

model

num

org

rank

time

Results

Performance results of various models on this benchmark

																																			Paper Title	Code
Chat	86.700000	96.400000	72.200000	90.000000	85.600000	63.200000	81.200000	49.600000	99.200000	83.600000	58.800000	98.400000	87.200000	87.600000	98.800000	99.600000	97.300000	97.600000	89.200000	69.600000	90.400000	95.200000	100.000000	100.000000	100.000000	100.000000	100.000000	50.800000	1.000000	GPT-4	N/A	OpenAI	1.000000	2023/3/15	-

0 of 1 row(s) selected.

Bbh | SOTA | HyperAI