Speech Prompted Semantic Segmentation On
Metrics
mAP
mIoU
Results
Performance results of various models on this benchmark
Comparison Table
Model Name | mAP | mIoU |
---|---|---|
contrastive-audio-visual-masked-autoencoder | 27.2 | 19.9 |
jointly-discovering-visual-objects-and-spoken | 32.2 | 26.3 |
imagebind-one-embedding-space-to-bind-them | 20.2 | 19.7 |
separating-the-chirp-from-the-chat-self | 48.7 | 36.8 |