HyperAI

Visual Reasoning On Clevrer

Metrics

Average-per ques.
Counterfactual-per opt.
Counterfactual-per ques.
Descriptive
Explanatory-per opt.
Explanatory-per ques.
Predictive-per opt.
Predictive-per ques.

Results

Performance results of various models on this benchmark

Comparison Table
Model NameAverage-per ques.Counterfactual-per opt.Counterfactual-per ques.DescriptiveExplanatory-per opt.Explanatory-per ques.Predictive-per opt.Predictive-per ques.
Model 173.379.9650.8988.3789.1981.5684.8372.38
Model 273.179.649.7788.7989.1681.2484.9572.6
Model 367.5781.0151.0774.9890.8175.6282.968.61
Model 460.2566.6525.8981.3983.4272.7878.560.95
Model 588.0591.1274.8995.0498.1894.9893.1187.28
Model 675.5280.3846.5290.789.5882.8290.5282.03
Model 790.2494.8384.2993.496.391.9495.6891.35
Model 888.2791.4275.6194.0198.4795.9993.4987.48
Model 988.7191.2575.3594.7798.2595.4694.1689.25
Model 1069.6574.0542.2388.0887.6479.682.8668.7
think-before-you-simulate-symbolic-reasoning95.2496.6190.7296.4699.9499.8193.9693.96
Model 1291.1492.9780.0595.7698.8896.9895.6991.75
Model 1369.2178.0844.689.9595.9491.9874.7350.34