Open Vocabulary Semantic Segmentation On 1

Metrics

mIoU

Results

Performance results of various models on this benchmark

		Paper Title
HyperSeg	64.6	HyperSeg: Towards Universal Visual Segmentation with Large Language Model
SILC	63.5	SILC: Improving Vision Language Pretraining with Self-Distillation
CAT-Seg	63.3	CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation
MaskCLIP++	62.5	High-Quality Mask Tuning Matters for Open-Vocabulary Segmentation
CLIPSelf	62.3	CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction
SED	60.6	SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation
Mask-Adapter	60.4	Mask-Adapter: The Devil is in the Masks for Open-Vocabulary Segmentation
EBSeg-L	60.2	Open-Vocabulary Semantic Segmentation with Image Embedding Balancing
MAFT+	59.4	Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation
SCAN	59.3	Open-Vocabulary Segmentation with Semantic-Assisted Calibration
MAFT-ViTL	58.5	Learning Mask-aware CLIP Representations for Zero-Shot Segmentation
FC-CLIP	58.4	Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP
ODISE	57.3	Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models
OVSeg Swin-B	55.7	Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP
PACL	50.1	Open Vocabulary Semantic Segmentation with Patch Aligned Contrastive Learning
SimSeg	47.7	A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language Model
MaskCLIP	45.9	Open-Vocabulary Universal Image Segmentation with MaskCLIP
TaAlign(trained with image-text pairs)	37.6	TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification
TTD (TCL)	37.4	TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias
LaVG	34.7	In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation

0 of 23 row(s) selected.

Command Palette

Open Vocabulary Semantic Segmentation On 1

Metrics

Results