Label Propagation for Zero-shot Classification with Vision-Language Models

Vision-Language Models (VLMs) have demonstrated impressive performance onzero-shot classification, i.e. classification when provided merely with a listof class names. In this paper, we tackle the case of zero-shot classificationin the presence of unlabeled data. We leverage the graph structure of theunlabeled data and introduce ZLaP, a method based on label propagation (LP)that utilizes geodesic distances for classification. We tailor LP to graphscontaining both text and image features and further propose an efficient methodfor performing inductive inference based on a dual solution and asparsification step. We perform extensive experiments to evaluate theeffectiveness of our method on 14 common datasets and show that ZLaPoutperforms the latest related works. Code:https://github.com/vladan-stojnic/ZLaP