HyperAI

Referring Expression Segmentation On J Hmdb

Métriques

AP
IoU mean
IoU overall
Precision@0.5
Precision@0.6
Precision@0.7
Precision@0.8
Precision@0.9

Résultats

Résultats de performance de divers modèles sur ce benchmark

Nom du modèle
AP
IoU mean
IoU overall
Precision@0.5
Precision@0.6
Precision@0.7
Precision@0.8
Precision@0.9
Paper TitleRepository
MTTR (w=10)0.3920.6980.7010.9390.8520.6160.1660.001End-to-End Referring Video Object Segmentation with Multimodal Transformers
Hui et al.0.3350.6040.5980.7830.6390.3780.0760.000Collaborative Spatial-Temporal Modeling for Language-Queried Video Actor Segmentation-
ClawCraneNet-0.6550.6440.8800.7960.5660.1470.002ClawCraneNet: Leveraging Object-level Relation for Text-based Video Segmentation-
ACGA0.2890.5840.5760.7560.5640.2870.0340.000Asymmetric Cross-Guided Attention Network for Actor and Action Video Segmentation From Natural Language Query
Gavrilyuk et al.0.2330.5420.5410.6990.4600.1730.0140.000Actor and Action Video Segmentation from a Sentence
CMPC-V0.3420.6170.6160.8130.6570.3710.070.000Cross-Modal Progressive Comprehension for Referring Segmentation
AAMN0.3210.5760.5830.7730.6270.3600.0440.000Actor and Action Modular Network for Text-based Video Segmentation-
SgMg (Video-Swin-B)0.4500.7250.7370.9720.9170.7140.2250.003Spectrum-guided Multi-granularity Referring Video Object Segmentation
VLIDE0.4410.6660.680.8740.7910.5860.1820.30Deeply Interleaved Two-Stream Encoder for Referring Video Segmentation-
VT-Capsule0.2610.5500.5350.6770.5130.2830.0510.000Visual-Textual Capsule Routing for Text-Based Video Segmentation-
MTTR (w=8)0.3660.6790.6740.910.8150.570.1440.001End-to-End Referring Video Object Segmentation with Multimodal Transformers
Hu et al.0.1780.5280.5460.6330.3500.0850.0020.000Segmentation from Natural Language Expressions
CMDy0.3010.5760.5540.7420.5870.3160.0470.000Context Modulated Dynamic Networks for Actor and Action Video Segmentation with Language Queries-
RefVOS-0.5680.6060.7310.620.3920.0880.0Hierarchical interaction network for video object segmentation from referring expressions-
PRPE0.294--0.5720.6900.3190.060.001Polar Relative Positional Encoding for Video-Language Segmentation-
SOC (Video-Swin-B)0.4460.7230.7360.9690.9140.7110.2130.001SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation
SOC (Video-Swin-T)0.3970.7010.7070.9470.8640.6270.1790.001SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation
HINet-0.6270.6520.8190.7360.5420.1680.4Hierarchical interaction network for video object segmentation from referring expressions-
Li et al.0.1730.4910.5290.5780.3350.1030.0600.000Tracking by Natural Language Specification-
Gavrilyuk et al. (Optical flow)0.2670.5700.5550.7120.5180.2640.0300.000Actor and Action Video Segmentation from a Sentence
0 of 21 row(s) selected.