Home News Papers Tutorials Datasets Wiki SOTA LLM Models GPU Leaderboard Events

English

Open Vocabulary Semantic Segmentation On 5

Metrics

mIoU

Results

Performance results of various models on this benchmark

Model Name	mIoU	Paper Title	Repository
TCL	83.2	Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs
MaskCLIP++	96.8	High-Quality Mask Tuning Matters for Open-Vocabulary Segmentation
SCAN	97.2	Open-Vocabulary Segmentation with Semantic-Assisted Calibration
POMP	89.4	Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition
OVSeg Swin-B	94.5	Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP
EBSeg-L	96.4	Open-Vocabulary Semantic Segmentation with Image Embedding Balancing
ODISE	84.6	Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models
ZegFormer	-	Decoupling Zero-Shot Semantic Segmentation
TagAlign(trained with image-text pairs)	87.9	TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification
ZSSeg	-	A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language Model
MAFT-ViTL	92.1	Learning Mask-aware CLIP Representations for Zero-Shot Segmentation
PACL	72.3	Open Vocabulary Semantic Segmentation with Patch Aligned Contrastive Learning
HyperSeg	92.1	HyperSeg: Towards Universal Visual Segmentation with Large Language Model
MAFT+	96.5	Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation
LaVG	82.5	In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation
SILC	97.6	SILC: Improving Vision Language Pretraining with Self-Distillation	-
FC-CLIP	95.4	Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP
CAT-Seg	97.0	CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation
MAFT-ViTL	92.1	-	-

0 of 19 row(s) selected.