HyperAIHyperAI

Referring Expression Segmentation On Refer 1

المقاييس

F
J
Ju0026F

النتائج

نتائج أداء النماذج المختلفة على هذا المعيار القياسي

اسم النموذج
F
J
Ju0026F
Paper TitleRepository
MTTR (w=12)56.6454.0055.32End-to-End Referring Video Object Segmentation with Multimodal Transformers-
ReferFormer (ResNet-50)56.654.855.6Language as Queries for Referring Video Object Segmentation-
ReferDINO (Swin-B)71.5 67.069.3ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations-
MLRLSA48.4350.9649.70Multi-Level Representation Learning With Semantic Alignment for Referring Video Object Segmentation-
GLEE-Pro72.968.270.6General Object Foundation Model for Images and Videos at Scale-
URVOS50.847.048.9URVOS: Unified Referring Video Object Segmentation Network with a Large-Scale Benchmark
MPG-SAM 276.171.773.9MPG-SAM 2: Adapting SAM 2 with Mask Priors and Global Context for Referring Video Object Segmentation-
HTR (Pre-training)68.965.367.1Temporally Consistent Referring Video Object Segmentation with Hybrid Memory-
ViLLa68.664.666.5ViLLa: Video Reasoning Segmentation with Large Language Model-
UniRef++-L69.064.866.9UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces-
SOC (Video-Swin-T)60.557.859.2SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation-
UNINEXT-H72.767.670.1Universal Instance Perception as Object Discovery and Retrieval-
UniLSeg-10067.062.864.9Universal Segmentation at Arbitrary Granularity with Language Instruction-
GroPrompt66.964.165.5GroPrompt: Efficient Grounded Prompting and Adaptation for Referring Video Object Segmentation-
VRS-HQ (Chat-UniVi-13B)73.16971The Devil is in Temporal Token: High Quality Video Reasoning Segmentation-
R2VOS (Video-Swin-T)63.159.661.3Towards Robust Referring Video Object Segmentation with Cyclic Relational Consensus-
MUTR70.466.468.4Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation-
SOC (Joint training, Video-Swin-B)69.365.367.3±0.5SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation-
VLT65.661.963.8VLT: Vision-Language Transformer and Query Generation for Referring Segmentation-
LoSh-R66.062.564.2LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation-
0 of 33 row(s) selected.
Referring Expression Segmentation On Refer 1 | SOTA | HyperAI