Sound Prompted Semantic Segmentation On
Metriken
mAP
mIoU
Ergebnisse
Leistungsergebnisse verschiedener Modelle zu diesem Benchmark
| Paper Title | |||
|---|---|---|---|
| DenseAV | 32.7 | 24.7 | Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language |
| CAVMAE | 26.0 | 17.0 | Contrastive Audio-Visual Masked Autoencoder |
| ImageBIND | 19.7 | 20.5 | ImageBind: One Embedding Space To Bind Them All |
| DAVENet | 16.8 | 18.1 | Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input |
0 of 4 row(s) selected.