Cross Modal Retrieval On Recipe1M
Métriques
Image-to-text R@1
Text-to-image R@1
Résultats
Résultats de performance de divers modèles sur ce benchmark
Tableau comparatif
Nom du modèle | Image-to-text R@1 | Text-to-image R@1 |
---|---|---|
structured-vision-language-pretraining-for | 73.6 | 74.7 |
cross-modal-retrieval-and-synthesis-x-mrs | 64 | 63.9 |
revamping-cross-modal-recipe-retrieval-with | 60.0 | 60.3 |
cross-modal-retrieval-in-the-cooking-context | 39.8 | 40.2 |
cross-modal-food-retrieval-learning-a-joint | 54.0 | 54.9 |
learning-cross-modal-embeddings-with | 51.8 | 52.8 |
transformer-decoders-with-multimodal | 72.3 | 72.6 |
transformer-decoders-with-multimodal | 68.2 | 68.3 |
structured-vision-language-pretraining-for | 74.9 | 75.6 |