HyperAI超神经

Referring Expression Segmentation On Refer 1

评估指标

F
J
Ju0026F

评测结果

各个模型在此基准测试上的表现结果

模型名称
F
J
Ju0026F
Paper TitleRepository
MTTR (w=12)56.6454.0055.32End-to-End Referring Video Object Segmentation with Multimodal Transformers
ReferFormer (ResNet-50)56.654.855.6Language as Queries for Referring Video Object Segmentation
ReferDINO (Swin-B)71.5 67.069.3ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations-
MLRLSA48.4350.9649.70Multi-Level Representation Learning With Semantic Alignment for Referring Video Object Segmentation-
GLEE-Pro72.968.270.6General Object Foundation Model for Images and Videos at Scale
URVOS50.847.048.9URVOS: Unified Referring Video Object Segmentation Network with a Large-Scale Benchmark
MPG-SAM 276.171.773.9MPG-SAM 2: Adapting SAM 2 with Mask Priors and Global Context for Referring Video Object Segmentation-
HTR (Pre-training)68.965.367.1Temporally Consistent Referring Video Object Segmentation with Hybrid Memory
ViLLa68.664.666.5ViLLa: Video Reasoning Segmentation with Large Language Model
UniRef++-L69.064.866.9UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces
SOC (Video-Swin-T)60.557.859.2SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation
UNINEXT-H72.767.670.1Universal Instance Perception as Object Discovery and Retrieval
UniLSeg-10067.062.864.9Universal Segmentation at Arbitrary Granularity with Language Instruction
GroPrompt66.964.165.5GroPrompt: Efficient Grounded Prompting and Adaptation for Referring Video Object Segmentation-
VRS-HQ (Chat-UniVi-13B)73.16971The Devil is in Temporal Token: High Quality Video Reasoning Segmentation
R2VOS (Video-Swin-T)63.159.661.3Towards Robust Referring Video Object Segmentation with Cyclic Relational Consensus
MUTR70.466.468.4Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation
SOC (Joint training, Video-Swin-B)69.365.367.3±0.5SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation
VLT65.661.963.8VLT: Vision-Language Transformer and Query Generation for Referring Segmentation
LoSh-R66.062.564.2LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation
0 of 33 row(s) selected.