Cross Modal Retrieval On Rsicd
Metriken
Image-to-text R@1
Mean Recall
text-to-image R@1
Ergebnisse
Leistungsergebnisse verschiedener Modelle zu diesem Benchmark
Vergleichstabelle
Modellname | Image-to-text R@1 | Mean Recall | text-to-image R@1 |
---|---|---|---|
exploring-a-fine-grained-multiscale-method | 5.21% | 15.53% | 4.08% |
parameter-efficient-transfer-learning-for-1 | 14.13% | 31.12% | 11.63% |
efficient-remote-sensing-with-harmonized | 20.52% | 38.95% | 15.84% |
global-local-information-soft-alignment-for | 20.68% | 37.69% | 14.73% |
a-prior-instruction-representation-framework | 9.88% | 24.46% | 6.97% |
direction-oriented-visual-semantic-embedding | 8.66% | 22.72% | 6.04% |
reducing-semantic-confusion-scene-aware | 7.41% | 20.61% | 5.56% |
rs5m-a-large-scale-vision-language-dataset | 21.13% | 38.87% | 15.59% |
remoteclip-a-vision-language-foundation-model | 18.39% | 36.35% | 14.73% |
remote-sensing-cross-modal-text-image | 6.59% | 18.96% | 4.69% |