HyperAI
الرئيسية
الأخبار
أحدث الأوراق البحثية
الدروس
مجموعات البيانات
الموسوعة
SOTA
نماذج LLM
لوحة الأداء GPU
الفعاليات
البحث
حول
العربية
HyperAI
Toggle sidebar
البحث في الموقع...
⌘
K
الرئيسية
SOTA
Open Vocabulary Semantic Segmentation
Open Vocabulary Semantic Segmentation On 2
Open Vocabulary Semantic Segmentation On 2
المقاييس
mIoU
النتائج
نتائج أداء النماذج المختلفة على هذا المعيار القياسي
Columns
اسم النموذج
mIoU
Paper Title
Repository
TTD (MaskCLIP)
12.7
TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias
MAFT-ViTL
32.0
Learning Mask-aware CLIP Representations for Zero-Shot Segmentation
-
FC-CLIP
34.1
Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP
MAFT+
36.1
Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation
CAT-Seg
37.9
CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation
OVSeg + OpenDAS
35.8
OpenDAS: Open-Vocabulary Domain Adaptation for 2D and 3D Segmentation
-
SimSeg
20.5
A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language Model
SCAN
33.5
Open-Vocabulary Segmentation with Semantic-Assisted Calibration
Mask-Adapter
38.2
Mask-Adapter: The Devil is in the Masks for Open-Vocabulary Segmentation
-
MaskCLIP
23.7
Open-Vocabulary Universal Image Segmentation with MaskCLIP
POMP
20.7
-
-
EBSeg-L
32.8
Open-Vocabulary Semantic Segmentation with Image Embedding Balancing
TTD (TCL)
17.0
TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias
ODISE
29.9
Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models
PACL
31.4
Open Vocabulary Semantic Segmentation with Patch Aligned Contrastive Learning
SILC
37.7
SILC: Improving Vision Language Pretraining with Self-Distillation
-
OVSeg Swin-B
29.6
Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP
LaVG
15.8
In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation
MaskCLIP++
38.2
MaskCLIP++: A Mask-Based CLIP Fine-tuning Framework for Open-Vocabulary Image Segmentation
CLIPSelf
34.5
CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction
-
0 of 21 row(s) selected.
Previous
Next