1. Tutorial Introduction

Distill-Any-Depth is an innovative monocular depth estimation project jointly launched on February 28, 2025, by Zhejiang University of Technology, Westlake University, Henan University, and the National University of Singapore. This project integrates the advantages of multiple open-source models through a distillation algorithm, achieving high-precision depth estimation with only a small amount of unlabeled data, thus setting a new state-of-the-art (SOTA) performance standard. Related papers are available below. Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator .

Here are its key points:

Multi-Teacher Distillation Framework
- By randomly selecting multiple teacher models to generate pseudo labels, the advantages of different models are combined to improve the quality of pseudo labels.
- The cross-context distillation mechanism is introduced to combine local details with global information, significantly enhancing the robustness of the model.
Local Normalization Strategy
- Traditional global normalization will amplify noise. This project proposes to perform normalization within the cropped area to retain local details (such as object edges and small hole structures) and improve prediction accuracy.
Low data dependency
- Only 20,000 unlabeled images are needed (far lower than the millions of annotations required by traditional methods), which greatly reduces the cost of data annotation.
Generalization
- In benchmark tests such as NYUv2 (indoor), KITTI (outdoor driving), and DIODE (complex lighting), the error indicator (AbsRel) is significantly better than the previous model.
robustness
- It performs stably in transparent objects, reflective surfaces and dynamic scenes, solving the failure problem of traditional models under complex conditions.
efficiency
- The inference speed is more than 10 times faster than that of Diffusion-based models (such as Marigold), supporting real-time applications.

The computing resources used in this tutorial are a single RTX 4090 card.

2. Effect display

3. Operation steps

1. Start the container

If "Bad Gateway" is displayed, it means the model is initializing. Since the model is large, please wait about 2-3 minutes and refresh the page.

2. Usage steps

result

4. Discussion

🖌️ If you see a high-quality project, please leave a message in the background to recommend it! In addition, we have also established a tutorial exchange group. Welcome friends to scan the QR code and remark [SD Tutorial] to join the group to discuss various technical issues and share application effects↓

Citation Information

The citation information for this project is as follows:

@article{he2025distill,
  title   = {Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator},
  author  = {Xiankang He and Dongyan Guo and Hongji Li and Ruibo Li and Ying Cui and Chi Zhang},
  year    = {2025},
  journal = {arXiv preprint arXiv: 2502.19204}
}

This notebook is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at [email protected] for prompt review and removal.

Related Notebooks

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Run this Notebook

Date

8 months ago

Size

422.93 MB

1. Tutorial Introduction

Here are its key points:

Multi-Teacher Distillation Framework
- By randomly selecting multiple teacher models to generate pseudo labels, the advantages of different models are combined to improve the quality of pseudo labels.
- The cross-context distillation mechanism is introduced to combine local details with global information, significantly enhancing the robustness of the model.
Local Normalization Strategy
- Traditional global normalization will amplify noise. This project proposes to perform normalization within the cropped area to retain local details (such as object edges and small hole structures) and improve prediction accuracy.
Low data dependency
- Only 20,000 unlabeled images are needed (far lower than the millions of annotations required by traditional methods), which greatly reduces the cost of data annotation.
Generalization
- In benchmark tests such as NYUv2 (indoor), KITTI (outdoor driving), and DIODE (complex lighting), the error indicator (AbsRel) is significantly better than the previous model.
robustness
- It performs stably in transparent objects, reflective surfaces and dynamic scenes, solving the failure problem of traditional models under complex conditions.
efficiency
- The inference speed is more than 10 times faster than that of Diffusion-based models (such as Marigold), supporting real-time applications.

The computing resources used in this tutorial are a single RTX 4090 card.

2. Effect display