Sound Prompted Semantic Segmentation On

mAP

mIoU

評価結果

このベンチマークにおける各モデルのパフォーマンス結果

			Paper Title
DenseAV	32.7	24.7	Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language
CAVMAE	26.0	17.0	Contrastive Audio-Visual Masked Autoencoder
ImageBIND	19.7	20.5	ImageBind: One Embedding Space To Bind Them All
DAVENet	16.8	18.1	Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input

0 of 4 row(s) selected.