HyperAIHyperAI

Command Palette

Search for a command to run...

3年前

インフォームド・サンプリャ:生成コンピュータビジョンモデルにおけるベイズ推論のための判別的手法

Varun Jampani Sebastian Nowozin Matthew Loper Peter V. Gehler

コンピュータビジョン入門

RTX 5090のコンピュートリソースがわずか20時間分 $1 (価値 $7)
ノートブックへ移動

概要

コンピュータビジョンは、照明、形状、テクスチャの大きなばらつきのために困難であり、さらに、遮蔽により画像信号は非加算的である。生成モデルは、潜在変数と事前信念の関数として画像生成過程を正確にモデル化することで、このばらつきを説明することを約束した。ベイズ事後推論は、原理的には観測を説明し得るはずであった。直感的には魅力的であるが、コンピュータビジョンにおける生成モデルは、事後推論の困難さのため、その約束をほとんど果たせなかった。その結果、学界は効率的な判別的手法を好むようになった。我々は依然として、コンピュータビジョンにおける生成モデルの有効性を信じているが、既存の判別的手法、さらにはヒューリスティックなコンピュータビジョン手法を活用する必要があると主張する。我々は、このアイデアを、情報付きサンプリングを用いて原理的に実装し、慎重な実験により、レンダラープログラムを構成要素として含む困難な生成モデルにおいてその有効性を示した。我々は、既存のグラフィックスレンダリングエンジンを逆変換する問題、すなわち「逆グラフィックス」として理解できるアプローチに焦点を当てた。情報付きサンプリングは、既存のコンピュータビジョン技術に基づく単純な判別提案を用いることで、推論の大幅な改善を実現した。

One-sentence Summary

The authors propose the Informed Sampler, a Bayesian inference framework that enhances generative computer vision models by integrating discriminative proposals from existing computer vision technology, yielding significant inference improvements for inverse graphics tasks that invert graphics rendering engines.

Key Contributions

  • Introduces an informed MCMC sampler that leverages histogram-of-gradients features and the OpenCV library to generate discriminative proposals for efficient posterior inference in generative computer vision models.
  • Applies the framework to invert existing graphics rendering engines for camera extrinsics estimation, occlusion reasoning, and parametric human body shape estimation using the BlendSCAPE model.
  • Demonstrates that the informed sampler achieves reliable convergence and significant performance improvements on challenging multi-modal problems compared to standard Metropolis-Hastings sampling.

Introduction

Generative computer vision models aim to reconstruct scene parameters by simulating physical image formation, providing a principled framework for inverse graphics and Bayesian inference. However, these models have historically struggled because posterior inference becomes computationally intractable in high-dimensional spaces with complex occlusions and multi-modal distributions. This fundamental bottleneck has driven the field toward purely discriminative approaches that bypass explicit generative reasoning. To overcome this limitation, the authors develop the informed sampler, an MCMC method that leverages standard discriminative computer vision features to generate targeted proposals for latent variables. By combining heuristic guidance with rigorous generative evaluation, this approach enables efficient and reliable posterior estimation in complex rendering-based models that were previously out of reach.

Method

The authors leverage a Metropolis-Hastings Markov Chain Monte Carlo (MCMC) framework to perform Bayesian inference over the posterior distribution p(θI^)p(\theta|\hat{I})p(θI^), where θ\thetaθ represents the parameters of a generative model and I^\hat{I}I^ is an observed image. This approach is necessary because the posterior distribution is typically intractable due to the complex nature of the generative process, which in this context is a graphics engine rendering images. The core of the method is the design of an informed proposal distribution that enhances the efficiency of the sampling process.

The standard MCMC procedure involves iteratively proposing a new state θˉ\bar{\theta}θˉ from a proposal distribution T(θt)T(\cdot|\theta_t)T(θt) and accepting or rejecting this proposal based on the Metropolis-Hastings acceptance ratio. The key innovation in this work is the construction of a mixture proposal distribution Tα(I^,θt)T_\alpha(\cdot|\hat{I},\theta_t)Tα(I^,θt) that combines a local proposal TL(θt)T_L(\cdot|\theta_t)TL(θt) with a global proposal TG(I^)T_G(\cdot|\hat{I})TG(I^). The local proposal, typically a symmetric distribution like a multivariate normal, facilitates local exploration of the parameter space. The global proposal, TG(I^)T_G(\cdot|\hat{I})TG(I^), is conditioned on the observed image I^\hat{I}I^ and is designed to make larger, more informative jumps in the parameter space. This global proposal is learned in an offline stage using discriminative methods, allowing it to leverage knowledge about the relationship between images and parameters.

The construction of the global proposal TGT_GTG is based on a non-parametric density estimation technique. The method first generates a large dataset of paired samples (θ(i),I(i))(\theta^{(i)}, I^{(i)})(θ(i),I(i)) by simulating from the generative model p(Iθ)p(θ)p(I|\theta)p(\theta)p(Iθ)p(θ). A feature representation v(I)v(I)v(I) is computed for each image, and a k-means clustering algorithm is applied to group the images based on these features. For each resulting cluster CjC_jCj, a kernel density estimate (KDE) is fitted to the corresponding set of parameters θ(Cj)\theta^{(C_j)}θ(Cj). This process yields a conditional density estimate TG(I^)T_G(\cdot|\hat{I})TG(I^) for any new image I^\hat{I}I^: the image is first assigned to a cluster via v(I^)v(\hat{I})v(I^), and the corresponding KDE for that cluster is used as the global proposal.

At test time, the informed sampler, referred to as INF-MH, combines the local and global proposals using a mixture coefficient α[0,1]\alpha \in [0, 1]α[0,1]. The overall transition kernel is T=αTL+(1α)TGT = \alpha T_L + (1 - \alpha) T_GT=αTL+(1α)TG. This mixture allows for a flexible balance between local exploration and global, image-conditioned moves. The algorithm proceeds by first identifying the appropriate cluster for the observed image, then sampling from the mixture kernel and applying the Metropolis-Hastings acceptance rule to ensure the correct stationary distribution is achieved. This framework is designed to be general, and the authors demonstrate its application across diverse computer vision problems.

Experiment

The evaluation utilizes multiple parallel MCMC chains across three computer vision tasks to assess sampler convergence and posterior exploration. The camera extrinsics and occluding tiles experiments validate the methods' ability to navigate multi-modal and high-dimensional distributions, demonstrating that informed sampling combined with block-wise updates successfully overcomes the convergence failures of traditional baselines. Furthermore, the body shape estimation task confirms the practical utility of this approach through accurate 3D mesh reconstruction, reliable uncertainty quantification, and robustness under incomplete observations, collectively establishing that leveraging discriminative features to guide MCMC exploration significantly enhances inference reliability across complex vision problems.

The authors compare several sampling methods across three experimental setups, evaluating their performance using acceptance rates, convergence diagnostics, and mode discovery. Results show that informed sampling methods, particularly INF-MH, achieve higher acceptance rates and faster convergence compared to baseline methods. The informed samplers also demonstrate superior performance in exploring multi-modal posterior distributions and discovering different modes. Informed sampling methods achieve higher acceptance rates and faster convergence compared to baseline methods. INF-MH converges faster and discovers more modes than other samplers in multi-modal posterior distributions. The informed sampling approach outperforms baseline methods in terms of convergence and mode discovery across different experimental setups.

The authors analyze the performance of different sampling methods in terms of convergence and acceptance rates, focusing on how proposal standard deviation affects these metrics. Results show that acceptance rates decrease as the proposal standard deviation increases, while PSRF values stabilize after a certain number of iterations, indicating convergence. The informed sampling approach achieves higher acceptance rates and faster convergence compared to baseline methods. Acceptance rates decrease as the proposal standard deviation increases, with optimal performance observed at lower values. PSRF values stabilize after a few thousand iterations, indicating convergence for all methods. The informed sampling approach achieves higher acceptance rates and faster convergence compared to baseline methods.

The authors present results from a body shape estimation experiment where they infer 3D human body shapes from depth images using a generative model and informed sampling methods. The approach uses a mixture of global and local proposals to improve convergence and accuracy, with results showing that the informed sampler achieves lower reconstruction errors and better convergence compared to baseline methods. The method also allows for uncertainty quantification in the reconstructed mesh and enables the prediction of body measurements with associated confidence intervals. The informed sampling method achieves lower reconstruction errors and faster convergence compared to baseline methods in body shape estimation. The proposed approach enables uncertainty quantification in the reconstructed 3D mesh, with higher variance in regions of higher error. Body measurements can be predicted from the posterior distribution over shape parameters, with results showing accurate recovery and characterization of uncertainty.

The authors analyze the performance of informed sampling methods in comparison to baseline samplers across multiple experiments. Results show that informed samplers achieve faster convergence and higher acceptance rates compared to traditional methods, with the mixture coefficient in the informed sampling approach having a significant impact on performance. The informed samplers outperform baselines in terms of both convergence speed and sampling efficiency. Informed sampling methods achieve faster convergence and higher acceptance rates compared to baseline samplers. The mixture coefficient in the informed sampling approach significantly affects performance, with higher values leading to better acceptance rates. Informed samplers outperform baseline methods in terms of convergence speed and sampling efficiency.

The authors compare several sampling methods across three experimental setups, evaluating their performance using acceptance rates, potential scale reduction factors, and root mean square error. Results show that informed sampling methods, particularly those combining global and local proposals, achieve higher acceptance rates, faster convergence, and lower reconstruction errors compared to baseline samplers. The performance of these methods varies across different experimental settings, with informed samplers demonstrating superior convergence and stability in high-dimensional or multi-modal scenarios. Informed sampling methods achieve higher acceptance rates and faster convergence compared to baseline methods across multiple experiments. The combination of global and local proposals in mixture kernels leads to better performance than using either approach alone. In high-dimensional or multi-modal problems, informed samplers outperform baseline methods, which often fail to converge or exhibit poor mixing.

The experiments evaluate informed sampling methods against traditional baselines across multiple setups, including probabilistic modeling tasks and a 3D human body shape estimation application. These trials validate how effectively each approach converges, explores complex posterior landscapes, and reconstructs target shapes. Qualitatively, informed samplers consistently demonstrate superior stability and exploration capabilities, particularly when combining global and local proposals. The findings confirm that this approach not only accelerates convergence and improves sampling efficiency in complex scenarios but also provides reliable uncertainty quantification for practical predictions.


AIでAIを構築

アイデアからローンチまで — 無料のAIコーディング支援、すぐに使える環境、最高のGPU価格でAI開発を加速。

AI コーディング補助
すぐに使える GPU
最適な料金体系

HyperAI Newsletters

最新情報を購読する
北京時間 毎週月曜日の午前9時 に、その週の最新情報をメールでお届けします
メール配信サービスは MailChimp によって提供されています