Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator

Monocular depth estimation (MDE) aims to predict scene depth from a singleRGB image and plays a crucial role in 3D scene understanding. Recent advancesin zero-shot MDE leverage normalized depth representations anddistillation-based learning to improve generalization across diverse scenes.However, current depth normalization methods for distillation, relying onglobal normalization, can amplify noisy pseudo-labels, reducing distillationeffectiveness. In this paper, we systematically analyze the impact of differentdepth normalization strategies on pseudo-label distillation. Based on ourfindings, we propose Cross-Context Distillation, which integrates global andlocal depth cues to enhance pseudo-label quality. Additionally, we introduce amulti-teacher distillation framework that leverages complementary strengths ofdifferent depth estimation models, leading to more robust and accurate depthpredictions. Extensive experiments on benchmark datasets demonstrate that ourapproach significantly outperforms state-of-the-art methods, bothquantitatively and qualitatively.