FeatUp: A Model-Agnostic Framework for Features at Any Resolution

Deep features are a cornerstone of computer vision research, capturing imagesemantics and enabling the community to solve downstream tasks even in thezero- or few-shot regime. However, these features often lack the spatialresolution to directly perform dense prediction tasks like segmentation anddepth prediction because models aggressively pool information over large areas.In this work, we introduce FeatUp, a task- and model-agnostic framework torestore lost spatial information in deep features. We introduce two variants ofFeatUp: one that guides features with high-resolution signal in a singleforward pass, and one that fits an implicit model to a single image toreconstruct features at any resolution. Both approaches use a multi-viewconsistency loss with deep analogies to NeRFs. Our features retain theiroriginal semantics and can be swapped into existing applications to yieldresolution and performance gains even without re-training. We show that FeatUpsignificantly outperforms other feature upsampling and image super-resolutionapproaches in class activation map generation, transfer learning forsegmentation and depth prediction, and end-to-end training for semanticsegmentation.