Unsupervised Semantic Segmentation With 7
Metrics
mIoU
Results
Performance results of various models on this benchmark
Model Name | mIoU | Paper Title | Repository |
---|---|---|---|
ProxyCLIP | 83.3 | ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation | - |
MaskCLIP | 74.9 | Extract Free Dense Labels from CLIP | - |
TCL | 83.2 | Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs | - |
GroupViT (RedCaps) | 79.7 | GroupViT: Semantic Segmentation Emerges from Text Supervision | - |
Trident | 88.7 | Harnessing Vision Foundation Models for High-Performance, Training-Free Open Vocabulary Segmentation | - |
TagAlign | 87.9 | TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification | - |
COSMOS ViT-B/16 | 77.7 | COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training | - |
ReCo | 57.7 | ReCo: Retrieve and Co-segment for Zero-shot Transfer | - |
0 of 8 row(s) selected.