Referring Expression Segmentation On J Hmdb
Métriques
AP
IoU mean
IoU overall
Precision@0.5
Precision@0.6
Precision@0.7
Precision@0.8
Precision@0.9
Résultats
Résultats de performance de divers modèles sur ce benchmark
Tableau comparatif
Nom du modèle | AP | IoU mean | IoU overall | Precision@0.5 | Precision@0.6 | Precision@0.7 | Precision@0.8 | Precision@0.9 |
---|---|---|---|---|---|---|---|---|
end-to-end-referring-video-object | 0.392 | 0.698 | 0.701 | 0.939 | 0.852 | 0.616 | 0.166 | 0.001 |
collaborative-spatial-temporal-modeling-for | 0.335 | 0.604 | 0.598 | 0.783 | 0.639 | 0.378 | 0.076 | 0.000 |
clawcranenet-leveraging-object-level-relation | - | 0.655 | 0.644 | 0.880 | 0.796 | 0.566 | 0.147 | 0.002 |
asymmetric-cross-guided-attention-network-for | 0.289 | 0.584 | 0.576 | 0.756 | 0.564 | 0.287 | 0.034 | 0.000 |
actor-and-action-video-segmentation-from-a | 0.233 | 0.542 | 0.541 | 0.699 | 0.460 | 0.173 | 0.014 | 0.000 |
cross-modal-progressive-comprehension-for | 0.342 | 0.617 | 0.616 | 0.813 | 0.657 | 0.371 | 0.07 | 0.000 |
actor-and-action-modular-network-for-text | 0.321 | 0.576 | 0.583 | 0.773 | 0.627 | 0.360 | 0.044 | 0.000 |
spectrum-guided-multi-granularity-referring | 0.450 | 0.725 | 0.737 | 0.972 | 0.917 | 0.714 | 0.225 | 0.003 |
deeply-interleaved-two-stream-encoder-for | 0.441 | 0.666 | 0.68 | 0.874 | 0.791 | 0.586 | 0.182 | 0.30 |
visual-textual-capsule-routing-for-text-based | 0.261 | 0.550 | 0.535 | 0.677 | 0.513 | 0.283 | 0.051 | 0.000 |
end-to-end-referring-video-object | 0.366 | 0.679 | 0.674 | 0.91 | 0.815 | 0.57 | 0.144 | 0.001 |
segmentation-from-natural-language | 0.178 | 0.528 | 0.546 | 0.633 | 0.350 | 0.085 | 0.002 | 0.000 |
context-modulated-dynamic-networks-for-actor | 0.301 | 0.576 | 0.554 | 0.742 | 0.587 | 0.316 | 0.047 | 0.000 |
hierarchical-interaction-network-for-video | - | 0.568 | 0.606 | 0.731 | 0.62 | 0.392 | 0.088 | 0.0 |
polar-relative-positional-encoding-for-video | 0.294 | - | - | 0.572 | 0.690 | 0.319 | 0.06 | 0.001 |
soc-semantic-assisted-object-cluster-for | 0.446 | 0.723 | 0.736 | 0.969 | 0.914 | 0.711 | 0.213 | 0.001 |
soc-semantic-assisted-object-cluster-for | 0.397 | 0.701 | 0.707 | 0.947 | 0.864 | 0.627 | 0.179 | 0.001 |
hierarchical-interaction-network-for-video | - | 0.627 | 0.652 | 0.819 | 0.736 | 0.542 | 0.168 | 0.4 |
tracking-by-natural-language-specification | 0.173 | 0.491 | 0.529 | 0.578 | 0.335 | 0.103 | 0.060 | 0.000 |
actor-and-action-video-segmentation-from-a | 0.267 | 0.570 | 0.555 | 0.712 | 0.518 | 0.264 | 0.030 | 0.000 |
referring-segmentation-in-images-and-videos | - | 0.581 | 0.628 | 0.764 | 0.625 | 0.389 | 0.09 | 0.001 |