HyperAIHyperAI
2 months ago

DepthMaster: Taming Diffusion Models for Monocular Depth Estimation

Ziyang Song, Zerong Wang, Bo Li, Hao Zhang, Ruijie Zhu, Li Liu, Peng-Tao Jiang, Tianzhu Zhang
DepthMaster: Taming Diffusion Models for Monocular Depth Estimation
Abstract

Monocular depth estimation within the diffusion-denoising paradigmdemonstrates impressive generalization ability but suffers from low inferencespeed. Recent methods adopt a single-step deterministic paradigm to improveinference efficiency while maintaining comparable performance. However, theyoverlook the gap between generative and discriminative features, leading tosuboptimal results. In this work, we propose DepthMaster, a single-stepdiffusion model designed to adapt generative features for the discriminativedepth estimation task. First, to mitigate overfitting to texture detailsintroduced by generative features, we propose a Feature Alignment module, whichincorporates high-quality semantic features to enhance the denoising network'srepresentation capability. Second, to address the lack of fine-grained detailsin the single-step deterministic framework, we propose a Fourier Enhancementmodule to adaptively balance low-frequency structure and high-frequencydetails. We adopt a two-stage training strategy to fully leverage the potentialof the two modules. In the first stage, we focus on learning the global scenestructure with the Feature Alignment module, while in the second stage, weexploit the Fourier Enhancement module to improve the visual quality. Throughthese efforts, our model achieves state-of-the-art performance in terms ofgeneralization and detail preservation, outperforming other diffusion-basedmethods across various datasets. Our project page can be found athttps://indu1ge.github.io/DepthMaster_page.

DepthMaster: Taming Diffusion Models for Monocular Depth Estimation | Latest Papers | HyperAI