2 months ago

Reverse Knowledge Distillation: Training a Large Model using a Small One for Retinal Image Matching on Limited Data

Nasser, Sahar Almahfouz ; Gupte, Nihar ; Sethi, Amit

Abstract

Retinal image matching plays a crucial role in monitoring disease progressionand treatment response. However, datasets with matched keypoints betweentemporally separated pairs of images are not available in abundance to traintransformer-based model. We propose a novel approach based on reverse knowledgedistillation to train large models with limited data while preventingoverfitting. Firstly, we propose architectural modifications to a CNN-basedsemi-supervised method called SuperRetina that help us improve its results on apublicly available dataset. Then, we train a computationally heavier modelbased on a vision transformer encoder using the lighter CNN-based model, whichis counter-intuitive in the field knowledge-distillation research wheretraining lighter models based on heavier ones is the norm. Surprisingly, suchreverse knowledge distillation improves generalization even further. Ourexperiments suggest that high-dimensional fitting in representation space mayprevent overfitting unlike training directly to match the final output. We alsoprovide a public dataset with annotations for retinal image keypoint detectionand matching to help the research community develop algorithms for retinalimage applications.