HyperAI

Referring Expression Segmentation On A2D

Métriques

AP
IoU mean
IoU overall
Precision@0.5
Precision@0.6
Precision@0.7
Precision@0.8
Precision@0.9

Résultats

Résultats de performance de divers modèles sur ce benchmark

Nom du modèle
AP
IoU mean
IoU overall
Precision@0.5
Precision@0.6
Precision@0.7
Precision@0.8
Precision@0.9
Paper TitleRepository
CMDy0.3330.5310.6230.6070.5250.4050.2350.045Context Modulated Dynamic Networks for Actor and Action Video Segmentation with Language Queries-
Gavriluyk el al. (Optical flow)0.2150.4260.5510.50.3760.2310.0940.004Actor and Action Video Segmentation from a Sentence
VLIDE0.4690.5980.7140.7020.6630.5850.4280.151Deeply Interleaved Two-Stream Encoder for Referring Video Segmentation-
ReferFormer (Video-Swin-B)0.5500.7030.7860.8310.8040.7410.5790.212Language as Queries for Referring Video Object Segmentation
Hui et al.0.3990.5610.6620.6540.5890.4970.3330.091Collaborative Spatial-Temporal Modeling for Language-Queried Video Actor Segmentation-
MANET0.4710.6320.7260.7340.6820.5790.3890.132Multi-Attention Network for Compressed Video Referring Object Segmentation
ACGA0.2740.4900.6010.5570.4590.3190.160.02Asymmetric Cross-Guided Attention Network for Actor and Action Video Segmentation From Natural Language Query
ClawCraneNet-0.6550.6440.7040.6770.6170.4890.171ClawCraneNet: Leveraging Object-level Relation for Text-based Video Segmentation-
CMPC-V (R2D)0.3510.5150.6490.5900.5270.4340.2840.068Cross-Modal Progressive Comprehension for Referring Segmentation
SOC (Video-Swin-B)0.5730.7250.8070.8510.8270.7650.6070.252SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation
MTTR (w=8)0.4470.6180.7020.7210.6840.6070.4560.164End-to-End Referring Video Object Segmentation with Multimodal Transformers
RefVOS-0.5990.5990.495---0.064RefVOS: A Closer Look at Referring Expressions for Video Object Segmentation
AAMN0.3960.5520.6170.6810.6290.5230.2960.029Actor and Action Modular Network for Text-based Video Segmentation-
CMPC-V (I3D)0.4040.5730.6530.6550.5920.5060.3420.098Cross-Modal Progressive Comprehension for Referring Segmentation
Locater0.4650.5970.690.7090.640.5250.3510.101Local-Global Context Aware Transformer for Language-Guided Video Segmentation
VT-Capsule0.3030.4600.5680.5260.4500.3450.2070.036Visual-Textual Capsule Routing for Text-Based Video Segmentation-
Hu et al.0.1320.3500.4740.3480.2360.1330.0330.000Segmentation from Natural Language Expressions
Gavriluyk el al.0.1980.4210.5360.4750.3470.2110.080.002Actor and Action Video Segmentation from a Sentence
MTTR (w=10)0.4610.640.720.7540.7120.6380.4850.169End-to-End Referring Video Object Segmentation with Multimodal Transformers
CMSA+CFSA-0.4320.6180.4870.4310.3580.2310.052Referring Segmentation in Images and Videos with Cross-Modal Self-Attention Network-
0 of 27 row(s) selected.