Question Answering On Hotpotqa
Métriques
ANS-EM
ANS-F1
JOINT-EM
JOINT-F1
SUP-EM
SUP-F1
Résultats
Résultats de performance de divers modèles sur ce benchmark
Tableau comparatif
Nom du modèle | ANS-EM | ANS-F1 | JOINT-EM | JOINT-F1 | SUP-EM | SUP-F1 |
---|---|---|---|---|---|---|
Modèle 1 | 0.598 | 0.727 | 0.345 | 0.602 | 0.480 | 0.749 |
Modèle 2 | 0.284 | 0.386 | 0.086 | 0.245 | 0.147 | 0.472 |
Modèle 3 | 0.307 | 0.402 | 0.000 | 0.000 | 0.000 | 0.000 |
Modèle 4 | 0.354 | 0.463 | 0.000 | 0.255 | 0.001 | 0.432 |
Modèle 5 | 0.299 | 0.391 | 0.083 | 0.258 | 0.132 | 0.497 |
hopretriever-retrieve-hops-over-wikipedia-to | 0.671 | 0.799 | 0.432 | 0.706 | 0.574 | 0.835 |
Modèle 7 | 0.394 | 0.514 | 0.133 | 0.370 | 0.242 | 0.585 |
Modèle 8 | 0.598 | 0.727 | 0.345 | 0.602 | 0.480 | 0.749 |
Modèle 9 | 0.335 | 0.427 | 0.110 | 0.284 | 0.156 | 0.493 |
answering-complex-open-domain-questions | 0.379 | 0.486 | 0.180 | 0.391 | 0.307 | 0.642 |
Modèle 11 | 0.608 | 0.739 | 0.380 | 0.639 | 0.531 | 0.793 |
Modèle 12 | 0.617 | 0.746 | 0.368 | 0.629 | 0.500 | 0.772 |
Modèle 13 | 0.482 | 0.613 | 0.306 | 0.530 | 0.483 | 0.739 |
Modèle 14 | 0.074 | 0.121 | 0.000 | 0.011 | 0.000 | 0.078 |
Modèle 15 | 0.433 | 0.538 | 0.145 | 0.391 | 0.219 | 0.596 |
chain-of-skills-a-configurable-model-for-open | 0.674 | 0.801 | 0.457 | 0.717 | 0.613 | 0.853 |
beam-retrieval-general-end-to-end-retrieval | 0.727 | 0.850 | 0.505 | 0.775 | 0.663 | 0.901 |
Modèle 18 | 0.588 | 0.717 | 0.293 | 0.568 | 0.416 | 0.725 |
Modèle 19 | 0.601 | 0.730 | 0.359 | 0.617 | 0.500 | 0.769 |
Modèle 20 | 0.671 | 0.799 | 0.431 | 0.698 | 0.572 | 0.826 |
Modèle 21 | 0.662 | 0.793 | 0.420 | 0.700 | 0.573 | 0.840 |
hotpotqa-a-dataset-for-diverse-explainable | 0.589 | 0.716 | 0.345 | 0.598 | 0.480 | 0.757 |
Modèle 23 | 0.560 | 0.689 | 0.292 | 0.553 | 0.441 | 0.730 |
dynamically-fused-graph-network-for-multi-hop | - | - | - | 0.5982 | - | - |
Modèle 25 | 0.369 | 0.460 | 0.115 | 0.291 | 0.153 | 0.468 |
Modèle 26 | 0.597 | 0.714 | 0.379 | 0.623 | 0.510 | 0.774 |
Modèle 27 | 0.490 | 0.608 | 0.271 | 0.496 | 0.417 | 0.700 |
Modèle 28 | 0.529 | 0.648 | 0.312 | 0.548 | 0.428 | 0.720 |
multi-hop-reading-comprehension-through | 0.300 | 0.407 | 0.000 | 0.000 | 0.000 | 0.000 |
Modèle 30 | 0.601 | 0.730 | 0.350 | 0.609 | 0.485 | 0.759 |
Modèle 31 | 0.603 | 0.731 | 0.359 | 0.617 | 0.499 | 0.768 |
Modèle 32 | 0.273 | 0.365 | 0.074 | 0.236 | 0.122 | 0.488 |
big-bird-transformers-for-longer-sequences | - | 0.755 | - | 0.736 | - | 0.891 |
Modèle 34 | 0.418 | 0.531 | 0.170 | 0.392 | 0.263 | 0.573 |
transformer-xh-multi-evidence-reasoning-with | 0.516 | 0.641 | 0.261 | 0.513 | 0.409 | 0.714 |
Modèle 36 | 0.581 | 0.710 | 0.000 | 0.000 | 0.000 | 0.000 |
Modèle 37 | 0.579 | 0.699 | 0.372 | 0.607 | 0.510 | 0.768 |
answering-while-summarizing-multi-task | 0.287 | 0.381 | 0.087 | 0.231 | 0.142 | 0.444 |
Modèle 39 | 0.596 | 0.724 | 0.345 | 0.601 | 0.479 | 0.748 |
Modèle 40 | 0.648 | 0.778 | 0.410 | 0.678 | 0.561 | 0.818 |
hierarchical-graph-network-for-multi-hop | 0.567 | 0.692 | 0.356 | 0.599 | 0.500 | 0.764 |
Modèle 42 | 0.646 | 0.778 | 0.411 | 0.670 | 0.557 | 0.812 |
Modèle 43 | 0.581 | 0.711 | 0.000 | 0.000 | 0.000 | 0.000 |
multi-hop-paragraph-retrieval-for-open-domain | 0.306 | 0.403 | 0.109 | 0.270 | 0.167 | 0.473 |
Modèle 45 | 0.617 | 0.746 | 0.368 | 0.629 | 0.500 | 0.772 |
Modèle 46 | 0.358 | 0.453 | 0.115 | 0.304 | 0.160 | 0.512 |
Modèle 47 | 0.615 | 0.746 | 0.362 | 0.624 | 0.503 | 0.772 |
a-simple-yet-strong-pipeline-for-hotpotqa | 0.555 | 0.675 | 0.329 | 0.562 | 0.456 | 0.730 |
learning-to-retrieve-reasoning-paths-over-1 | 0.600 | 0.730 | 0.354 | 0.612 | 0.491 | 0.764 |
Modèle 50 | 0.360 | 0.474 | 0.000 | 0.000 | 0.000 | 0.000 |
Modèle 51 | 0.300 | 0.407 | 0.000 | 0.000 | 0.000 | 0.000 |
retrieve-rerank-read-then-iterate-answering | 0.663 | 0.791 | 0.428 | 0.696 | 0.569 | 0.832 |
Modèle 53 | 0.475 | 0.606 | 0.049 | 0.334 | 0.076 | 0.448 |
Modèle 54 | 0.655 | 0.786 | 0.409 | 0.689 | 0.559 | 0.831 |
retrieve-rerank-read-then-iterate-answering | 0.657 | 0.782 | 0.421 | 0.686 | 0.559 | 0.821 |
Modèle 56 | 0.421 | 0.517 | 0.247 | 0.429 | 0.371 | 0.598 |
Modèle 57 | 0.604 | 0.732 | 0.380 | 0.629 | 0.520 | 0.771 |
Modèle 58 | 0.620 | 0.753 | 0.354 | 0.630 | 0.499 | 0.778 |
multi-paragraph-reasoning-with-knowledge | 0.277 | 0.372 | 0.070 | 0.247 | 0.127 | 0.472 |
answering-complex-open-domain-questions-with | 0.623 | 0.753 | 0.418 | 0.666 | 0.575 | 0.809 |
Modèle 61 | 0.080 | 0.221 | 0.000 | 0.000 | 0.000 | 0.000 |
Modèle 62 | 0.670 | 0.795 | 0.444 | 0.708 | 0.594 | 0.843 |
Modèle 63 | 0.630 | 0.754 | 0.404 | 0.662 | 0.546 | 0.800 |
cognitive-graph-for-multi-hop-reading | 0.371 | 0.489 | 0.124 | 0.349 | 0.228 | 0.577 |
Modèle 65 | 0.289 | 0.391 | 0.041 | 0.209 | 0.080 | 0.406 |
hotpotqa-a-dataset-for-diverse-explainable | 0.240 | 0.329 | 0.019 | 0.162 | 0.039 | 0.377 |
ddrqa-dynamic-document-reranking-for-open | 0.625 | 0.759 | 0.360 | 0.639 | 0.510 | 0.789 |
revealing-the-importance-of-semantic | 0.453 | 0.573 | 0.251 | 0.476 | 0.387 | 0.708 |
Modèle 69 | 0.582 | 0.709 | 0.310 | 0.569 | 0.429 | 0.713 |
adaptive-information-seeking-for-open-domain | 0.675 | 0.805 | 0.449 | 0.720 | 0.612 | 0.860 |
Modèle 71 | 0.236 | 0.320 | 0.033 | 0.175 | 0.056 | 0.400 |
Modèle 72 | 0.523 | 0.648 | 0.330 | 0.561 | 0.490 | 0.747 |