HyperAI超神经
首页
资讯
最新论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
首页
SOTA
Referring Video Object Segmentation
Referring Video Object Segmentation On Refer
Referring Video Object Segmentation On Refer
评估指标
F
J
Ju0026F
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
F
J
Ju0026F
Paper Title
Repository
HTML-Video-SwinT
63.0
59.5
61.2
HTML: Hybrid Temporal-scale Multimodal Learning Framework for Referring Video Object Segmentation
-
HTR
68.9
65.3
67.1
Temporally Consistent Referring Video Object Segmentation with Hybrid Memory
VLT
65.6
61.9
63.8
VLT: Vision-Language Transformer and Query Generation for Referring Segmentation
GLEE-Plus
69.7
65.6
67.7
General Object Foundation Model for Images and Videos at Scale
HTML-SwinL
65.3
61.5
63.4
HTML: Hybrid Temporal-scale Multimodal Learning Framework for Referring Video Object Segmentation
-
ReferFormer (Large)
64.6
61.3
62.9
Language as Queries for Referring Video Object Segmentation
SOC
67.9
64.1
66.0
SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation
HTML-Video-SwinB
65.2
61.5
63.4
HTML: Hybrid Temporal-scale Multimodal Learning Framework for Referring Video Object Segmentation
-
HTML-ResNet101
59.8
57.3
58.5
HTML: Hybrid Temporal-scale Multimodal Learning Framework for Referring Video Object Segmentation
-
HTML-ResNet50
59.0
56.5
57.8
HTML: Hybrid Temporal-scale Multimodal Learning Framework for Referring Video Object Segmentation
-
VATEX
67.5
63.3
65.4
Vision-Aware Text Features in Referring Image Segmentation: From Object Understanding to Context Understanding
CMSA
38.1
34.8
36.4
Cross-Modal Self-Attention Network for Referring Image Segmentation
SgMg
67.4
63.9
65.7
Spectrum-guided Multi-granularity Referring Video Object Segmentation
GLEE-Pro
72.9
68.2
70.6
General Object Foundation Model for Images and Videos at Scale
FindTrack
72.0
68.6
70.3
Find First, Track Next: Decoupling Identification and Propagation in Referring Video Object Segmentation
HyperSeg
-
-
68.5
HyperSeg: Towards Universal Visual Segmentation with Large Language Model
R2VOS (Swin-T)
61.5
58.9
60.2
Towards Robust Referring Video Object Segmentation with Cyclic Relational Consensus
HTML-Video-SwinS
62.9
59.9
61.4
HTML: Hybrid Temporal-scale Multimodal Learning Framework for Referring Video Object Segmentation
-
0 of 18 row(s) selected.
Previous
Next