HyperAI

Common Sense Reasoning On Rucos

Metrics

Average F1
EM

Results

Performance results of various models on this benchmark

Comparison Table
Model NameAverage F1EM
Model 10.740.716
Model 20.210.202
Model 30.290.29
Model 40.680.658
Model 50.920.924
Model 60.730.716
Model 70.860.859
Model 80.210.204
Model 90.670.665
mt5-a-massively-multilingual-pre-trained-text0.570.562
russiansuperglue-a-russian-language0.930.89
Model 120.790.752
unreasonable-effectiveness-of-rule-based0.250.247
Model 140.230.224
russiansuperglue-a-russian-language0.260.252
unreasonable-effectiveness-of-rule-based0.260.257
Model 170.320.314
Model 180.350.347
Model 190.360.351
Model 200.810.764
Model 210.220.218
unreasonable-effectiveness-of-rule-based0.250.247