Search for a command to run...
Fine-grained Visual Textual Alignment for Cross-Modal Retrieval using Transformer Encoders