Referring Expression Segmentation On Refcoco 3
Metrics
Overall IoU
Results
Performance results of various models on this benchmark
Comparison Table
Model Name | Overall IoU |
---|---|
vlt-vision-language-transformer-and-query | 63.53 |
referring-image-segmentation-via-cross-modal-1 | 49.56 |
universal-segmentation-at-arbitrary | 72.70 |
mask-grounding-for-referring-image | 66.16 |
bi-directional-relationship-inferring-network | 48.57 |
safari-adaptive-sequence-transformer-for | 70.78 |
general-object-foundation-model-for-images | 69.6 |
cross-modal-self-attention-network-for | 43.76 |
lavt-language-aware-vision-transformer-for | 62.14 |
maskris-semantic-distortion-aware-data | 70.26 |
cris-clip-driven-referring-image-segmentation | 62.27 |
universal-instance-perception-as-object | 72.47 |
vision-language-transformer-and-query | 55.50 |
universal-segmentation-at-arbitrary | 73.18 |
polyformer-referring-image-segmentation-as | 69.33 |
gres-generalized-referring-expression-1 | 66.04 |
hyperseg-towards-universal-visual | 79.0 |
groundhog-grounding-large-language-models-to | 70.5 |
densely-connected-parameter-efficient-tuning | 75.2 |
mail-a-unified-mask-image-language-trimodal | 62.23 |
maskris-semantic-distortion-aware-data | 67.54 |
see-through-text-grouping-for-referring-image | 48.18 |
comprehensive-multi-modal-interactions-for | 52.75 |
polyformer-referring-image-segmentation-as | 67.64 |
multi-task-visual-grounding-with-coarse-to | 74.68 |
evf-sam-early-vision-language-fusion-for-text | 75.2 |
multi-label-cluster-discrimination-for-visual | 79.4 |
refvos-a-closer-look-at-referring-expressions | 44.71 |
improving-referring-image-segmentation-using | - |
hierarchical-open-vocabulary-universal-image-1 | 73.9 |
mattnet-modular-attention-network-for | 46.67 |