Referring Expression Segmentation On A2D
المقاييس
AP
IoU mean
IoU overall
Precision@0.5
Precision@0.6
Precision@0.7
Precision@0.8
Precision@0.9
النتائج
نتائج أداء النماذج المختلفة على هذا المعيار القياسي
جدول المقارنة
اسم النموذج | AP | IoU mean | IoU overall | Precision@0.5 | Precision@0.6 | Precision@0.7 | Precision@0.8 | Precision@0.9 |
---|---|---|---|---|---|---|---|---|
context-modulated-dynamic-networks-for-actor | 0.333 | 0.531 | 0.623 | 0.607 | 0.525 | 0.405 | 0.235 | 0.045 |
actor-and-action-video-segmentation-from-a | 0.215 | 0.426 | 0.551 | 0.5 | 0.376 | 0.231 | 0.094 | 0.004 |
deeply-interleaved-two-stream-encoder-for | 0.469 | 0.598 | 0.714 | 0.702 | 0.663 | 0.585 | 0.428 | 0.151 |
language-as-queries-for-referring-video | 0.550 | 0.703 | 0.786 | 0.831 | 0.804 | 0.741 | 0.579 | 0.212 |
collaborative-spatial-temporal-modeling-for | 0.399 | 0.561 | 0.662 | 0.654 | 0.589 | 0.497 | 0.333 | 0.091 |
multi-attention-network-for-compressed-video | 0.471 | 0.632 | 0.726 | 0.734 | 0.682 | 0.579 | 0.389 | 0.132 |
asymmetric-cross-guided-attention-network-for | 0.274 | 0.490 | 0.601 | 0.557 | 0.459 | 0.319 | 0.16 | 0.02 |
clawcranenet-leveraging-object-level-relation | - | 0.655 | 0.644 | 0.704 | 0.677 | 0.617 | 0.489 | 0.171 |
cross-modal-progressive-comprehension-for | 0.351 | 0.515 | 0.649 | 0.590 | 0.527 | 0.434 | 0.284 | 0.068 |
soc-semantic-assisted-object-cluster-for | 0.573 | 0.725 | 0.807 | 0.851 | 0.827 | 0.765 | 0.607 | 0.252 |
end-to-end-referring-video-object | 0.447 | 0.618 | 0.702 | 0.721 | 0.684 | 0.607 | 0.456 | 0.164 |
refvos-a-closer-look-at-referring-expressions | - | 0.599 | 0.599 | 0.495 | - | - | - | 0.064 |
actor-and-action-modular-network-for-text | 0.396 | 0.552 | 0.617 | 0.681 | 0.629 | 0.523 | 0.296 | 0.029 |
cross-modal-progressive-comprehension-for | 0.404 | 0.573 | 0.653 | 0.655 | 0.592 | 0.506 | 0.342 | 0.098 |
local-global-context-aware-transformer-for | 0.465 | 0.597 | 0.69 | 0.709 | 0.64 | 0.525 | 0.351 | 0.101 |
visual-textual-capsule-routing-for-text-based | 0.303 | 0.460 | 0.568 | 0.526 | 0.450 | 0.345 | 0.207 | 0.036 |
segmentation-from-natural-language | 0.132 | 0.350 | 0.474 | 0.348 | 0.236 | 0.133 | 0.033 | 0.000 |
actor-and-action-video-segmentation-from-a | 0.198 | 0.421 | 0.536 | 0.475 | 0.347 | 0.211 | 0.08 | 0.002 |
end-to-end-referring-video-object | 0.461 | 0.64 | 0.72 | 0.754 | 0.712 | 0.638 | 0.485 | 0.169 |
referring-segmentation-in-images-and-videos | - | 0.432 | 0.618 | 0.487 | 0.431 | 0.358 | 0.231 | 0.052 |
tracking-by-natural-language-specification | 0.163 | 0.354 | 0.515 | 0.387 | 0.290 | 0.175 | 0.066 | 0.001 |
polar-relative-positional-encoding-for-video | 0.388 | 0.529 | 0.661 | 0.634 | 0.579 | 0.483 | 0.322 | 0.083 |
soc-semantic-assisted-object-cluster-for | 0.504 | 0.669 | 0.747 | 0.79 | 0.756 | 0.687 | 0.535 | 0.195 |
spectrum-guided-multi-granularity-referring | 0.585 | 0.720 | 0.799 | 0.843 | 0.822 | 0.767 | 0.617 | 0.259 |
modeling-motion-with-multi-modal-features-for | 0.419 | 0.558 | 0.673 | 0.645 | 0.597 | 0.523 | 0.375 | 0.13 |
hierarchical-interaction-network-for-video | - | 0.497 | 0.672 | 0.578 | 0.534 | 0.456 | 0.311 | 0.093 |
hierarchical-interaction-network-for-video | - | 0.529 | 0.679 | 0.611 | 0.559 | 0.486 | 0.342 | 0.12 |