HyperAI초신경
홈
뉴스
최신 연구 논문
튜토리얼
데이터셋
백과사전
SOTA
LLM 모델
GPU 랭킹
컨퍼런스
전체 검색
소개
한국어
HyperAI초신경
Toggle sidebar
전체 사이트 검색...
⌘
K
홈
SOTA
Referring Expression Segmentation
Referring Expression Segmentation On Refcocog
Referring Expression Segmentation On Refcocog
평가 지표
Overall IoU
평가 결과
이 벤치마크에서 각 모델의 성능 결과
Columns
모델 이름
Overall IoU
Paper Title
Repository
MLCD-Seg-7B
79.9
Multi-label Cluster Discrimination for Visual Representation Learning
GROUNDHOG
74.1
GROUNDHOG: Grounding Large Language Models to Holistic Segmentation
-
LAVT
61.24
LAVT: Language-Aware Vision Transformer for Referring Image Segmentation
EVF-SAM
76.8
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model
UniLSeg-20
78.41
Universal Segmentation at Arbitrary Granularity with Language Instruction
DETRIS
74.6
Densely Connected Parameter-Efficient Tuning for Referring Image Segmentation
UniLSeg-100
79.27
Universal Segmentation at Arbitrary Granularity with Language Instruction
GLEE-Pro
72.9
General Object Foundation Model for Images and Videos at Scale
PolyFormer-L
69.2
PolyFormer: Referring Image Segmentation as Sequential Polygon Generation
MaskRIS (Swin-B)
65.55
MaskRIS: Semantic Distortion-aware Data Augmentation for Referring Image Segmentation
SHNet
49.90
Comprehensive Multi-Modal Interactions for Referring Image Segmentation
X-Decoder (Davit-d5)
64.6
Generalized Decoding for Pixel, Image, and Language
-
MagNet
65.36
Mask Grounding for Referring Image Segmentation
SafaRi-B
70.48
SafaRi:Adaptive Sequence Transformer for Weakly Supervised Referring Expression Segmentation
-
VLT (Darknet53)
52.99
Vision-Language Transformer and Query Generation for Referring Segmentation
VLT (Swin-B)
63.49
VLT: Vision-Language Transformer and Query Generation for Referring Segmentation
PolyFormer-B
67.76
PolyFormer: Referring Image Segmentation as Sequential Polygon Generation
HyperSeg
79.4
HyperSeg: Towards Universal Visual Segmentation with Large Language Model
VATEX
-
Vision-Aware Text Features in Referring Image Segmentation: From Object Understanding to Context Understanding
C3VG
74.43
Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints
0 of 21 row(s) selected.
Previous
Next