Cross Modal Retrieval On Rsitmd
Metriken
Image-to-text R@1
Mean Recall
text-to-imageR@1
Ergebnisse
Leistungsergebnisse verschiedener Modelle zu diesem Benchmark
Vergleichstabelle
Modellname | Image-to-text R@1 | Mean Recall | text-to-imageR@1 |
---|---|---|---|
rs5m-a-large-scale-vision-language-dataset | 32.30% | 51.81% | 25.04% |
efficient-remote-sensing-with-harmonized | 32.74% | 52.27% | 25.62% |
remote-sensing-cross-modal-text-image | 14.82% | 31.41% | 11.15% |
exploring-a-fine-grained-multiscale-method | 10.63% | 29.72% | 11.51% |
a-prior-instruction-representation-framework | 18.14% | 38.24% | 12.17% |
parameter-efficient-transfer-learning-for-1 | 23.67% | 44.47% | 20.10% |
global-local-information-soft-alignment-for | 32.08% | 50.69% | 23.36% |
reducing-semantic-confusion-scene-aware | 13.35% | 34.11% | 11.24% |
remoteclip-a-vision-language-foundation-model | 28.76% | 50.52% | 23.76% |
direction-oriented-visual-semantic-embedding | 16.81% | 37.73% | 12.20% |