HyperAI

Common Sense Reasoning On Rucos

Metriken

Average F1
EM

Ergebnisse

Leistungsergebnisse verschiedener Modelle zu diesem Benchmark

Vergleichstabelle
ModellnameAverage F1EM
Modell 10.740.716
Modell 20.210.202
Modell 30.290.29
Modell 40.680.658
Modell 50.920.924
Modell 60.730.716
Modell 70.860.859
Modell 80.210.204
Modell 90.670.665
mt5-a-massively-multilingual-pre-trained-text0.570.562
russiansuperglue-a-russian-language0.930.89
Modell 120.790.752
unreasonable-effectiveness-of-rule-based0.250.247
Modell 140.230.224
russiansuperglue-a-russian-language0.260.252
unreasonable-effectiveness-of-rule-based0.260.257
Modell 170.320.314
Modell 180.350.347
Modell 190.360.351
Modell 200.810.764
Modell 210.220.218
unreasonable-effectiveness-of-rule-based0.250.247