Visual Reasoning On Clevrer
Metrics
Average-per ques.
Counterfactual-per opt.
Counterfactual-per ques.
Descriptive
Explanatory-per opt.
Explanatory-per ques.
Predictive-per opt.
Predictive-per ques.
Results
Performance results of various models on this benchmark
Comparison Table
Model Name | Average-per ques. | Counterfactual-per opt. | Counterfactual-per ques. | Descriptive | Explanatory-per opt. | Explanatory-per ques. | Predictive-per opt. | Predictive-per ques. |
---|---|---|---|---|---|---|---|---|
Model 1 | 73.3 | 79.96 | 50.89 | 88.37 | 89.19 | 81.56 | 84.83 | 72.38 |
Model 2 | 73.1 | 79.6 | 49.77 | 88.79 | 89.16 | 81.24 | 84.95 | 72.6 |
Model 3 | 67.57 | 81.01 | 51.07 | 74.98 | 90.81 | 75.62 | 82.9 | 68.61 |
Model 4 | 60.25 | 66.65 | 25.89 | 81.39 | 83.42 | 72.78 | 78.5 | 60.95 |
Model 5 | 88.05 | 91.12 | 74.89 | 95.04 | 98.18 | 94.98 | 93.11 | 87.28 |
Model 6 | 75.52 | 80.38 | 46.52 | 90.7 | 89.58 | 82.82 | 90.52 | 82.03 |
Model 7 | 90.24 | 94.83 | 84.29 | 93.4 | 96.3 | 91.94 | 95.68 | 91.35 |
Model 8 | 88.27 | 91.42 | 75.61 | 94.01 | 98.47 | 95.99 | 93.49 | 87.48 |
Model 9 | 88.71 | 91.25 | 75.35 | 94.77 | 98.25 | 95.46 | 94.16 | 89.25 |
Model 10 | 69.65 | 74.05 | 42.23 | 88.08 | 87.64 | 79.6 | 82.86 | 68.7 |
think-before-you-simulate-symbolic-reasoning | 95.24 | 96.61 | 90.72 | 96.46 | 99.94 | 99.81 | 93.96 | 93.96 |
Model 12 | 91.14 | 92.97 | 80.05 | 95.76 | 98.88 | 96.98 | 95.69 | 91.75 |
Model 13 | 69.21 | 78.08 | 44.6 | 89.95 | 95.94 | 91.98 | 74.73 | 50.34 |