8 months ago

Abstract

Cross-modal text-molecule retrieval model aims to learn a shared featurespace of the text and molecule modalities for accurate similarity calculation,which facilitates the rapid screening of molecules with specific properties andactivities in drug design. However, previous works have two main defects.First, they are inadequate in capturing modality-shared features consideringthe significant gap between text sequences and molecule graphs. Second, theymainly rely on contrastive learning and adversarial training for cross-modalityalignment, both of which mainly focus on the first-order similarity, ignoringthe second-order similarity that can capture more structural information in theembedding space. To address these issues, we propose a novel cross-modaltext-molecule retrieval model with two-fold improvements. Specifically, on thetop of two modality-specific encoders, we stack a memory bank based featureprojector that contain learnable memory vectors to extract modality-sharedfeatures better. More importantly, during the model training, we calculate fourkinds of similarity distributions (text-to-text, text-to-molecule,molecule-to-molecule, and molecule-to-text similarity distributions) for eachinstance, and then minimize the distance between these similarity distributions(namely second-order similarity losses) to enhance cross-modal alignment.Experimental results and analysis strongly demonstrate the effectiveness of ourmodel. Particularly, our model achieves SOTA performance, outperforming thepreviously-reported best result by 6.4%.

Source PDF