HyperAIHyperAI

Command Palette

Search for a command to run...

il y a 3 ans

L'échantillonneur informé : Une approche discriminative pour l'inférence bayésienne dans les modèles de vision par ordinateur générative

Varun Jampani Sebastian Nowozin Matthew Loper Peter V. Gehler

Introduction à la vision par ordinateur

20 heures de calcul sur RTX 5090 pour seulement $1 (valeur $7)
Aller à Notebook

Résumé

La vision par ordinateur est difficile en raison d’une grande variabilité de l’éclairage, de la forme et de la texture ; de plus, le signal image est non additif en raison de l’occlusion. Les modèles génératifs promettaient de rendre compte de cette variabilité en modélisant avec précision le processus de formation de l’image en tant que fonction de variables latentes dotées de croyances a priori. L’inférence du postérieur bayésien pourrait alors, en principe, expliquer l’observation. Bien que séduisante intuitivement, l’application des modèles génératifs à la vision par ordinateur a largement échoué à tenir cette promesse en raison de la difficulté de l’inférence du postérieur. Par conséquent, la communauté a privilégié des approches discriminatives efficaces. Nous croyons toujours à l’utilité des modèles génératifs en vision par ordinateur, mais nous soutenons qu’il est nécessaire d’exploiter les méthodes discriminatives existantes, voire heuristiques, de vision par ordinateur. Nous mettons en œuvre cette idée de manière rigoureuse à l’aide d’un échantillonneur informé, et démontrons son efficacité par des expériences soignées sur des modèles génératifs complexes dont les composants sont des programmes de rendu. Nous nous concentrons sur le problème de l’inversion d’un moteur de rendu graphique existant, une approche qui peut être comprise comme « l’Inverse Graphics ». L’échantillonneur informé, utilisant des propositions discriminatives simples fondées sur les technologies existantes de vision par ordinateur, permet d’obtenir des améliorations significatives de l’inférence.

One-sentence Summary

The authors propose the Informed Sampler, a Bayesian inference framework that enhances generative computer vision models by integrating discriminative proposals from existing computer vision technology, yielding significant inference improvements for inverse graphics tasks that invert graphics rendering engines.

Key Contributions

  • Introduces an informed MCMC sampler that leverages histogram-of-gradients features and the OpenCV library to generate discriminative proposals for efficient posterior inference in generative computer vision models.
  • Applies the framework to invert existing graphics rendering engines for camera extrinsics estimation, occlusion reasoning, and parametric human body shape estimation using the BlendSCAPE model.
  • Demonstrates that the informed sampler achieves reliable convergence and significant performance improvements on challenging multi-modal problems compared to standard Metropolis-Hastings sampling.

Introduction

Generative computer vision models aim to reconstruct scene parameters by simulating physical image formation, providing a principled framework for inverse graphics and Bayesian inference. However, these models have historically struggled because posterior inference becomes computationally intractable in high-dimensional spaces with complex occlusions and multi-modal distributions. This fundamental bottleneck has driven the field toward purely discriminative approaches that bypass explicit generative reasoning. To overcome this limitation, the authors develop the informed sampler, an MCMC method that leverages standard discriminative computer vision features to generate targeted proposals for latent variables. By combining heuristic guidance with rigorous generative evaluation, this approach enables efficient and reliable posterior estimation in complex rendering-based models that were previously out of reach.

Method

The authors leverage a Metropolis-Hastings Markov Chain Monte Carlo (MCMC) framework to perform Bayesian inference over the posterior distribution p(θI^)p(\theta|\hat{I})p(θI^), where θ\thetaθ represents the parameters of a generative model and I^\hat{I}I^ is an observed image. This approach is necessary because the posterior distribution is typically intractable due to the complex nature of the generative process, which in this context is a graphics engine rendering images. The core of the method is the design of an informed proposal distribution that enhances the efficiency of the sampling process.

The standard MCMC procedure involves iteratively proposing a new state θˉ\bar{\theta}θˉ from a proposal distribution T(θt)T(\cdot|\theta_t)T(θt) and accepting or rejecting this proposal based on the Metropolis-Hastings acceptance ratio. The key innovation in this work is the construction of a mixture proposal distribution Tα(I^,θt)T_\alpha(\cdot|\hat{I},\theta_t)Tα(I^,θt) that combines a local proposal TL(θt)T_L(\cdot|\theta_t)TL(θt) with a global proposal TG(I^)T_G(\cdot|\hat{I})TG(I^). The local proposal, typically a symmetric distribution like a multivariate normal, facilitates local exploration of the parameter space. The global proposal, TG(I^)T_G(\cdot|\hat{I})TG(I^), is conditioned on the observed image I^\hat{I}I^ and is designed to make larger, more informative jumps in the parameter space. This global proposal is learned in an offline stage using discriminative methods, allowing it to leverage knowledge about the relationship between images and parameters.

The construction of the global proposal TGT_GTG is based on a non-parametric density estimation technique. The method first generates a large dataset of paired samples (θ(i),I(i))(\theta^{(i)}, I^{(i)})(θ(i),I(i)) by simulating from the generative model p(Iθ)p(θ)p(I|\theta)p(\theta)p(Iθ)p(θ). A feature representation v(I)v(I)v(I) is computed for each image, and a k-means clustering algorithm is applied to group the images based on these features. For each resulting cluster CjC_jCj, a kernel density estimate (KDE) is fitted to the corresponding set of parameters θ(Cj)\theta^{(C_j)}θ(Cj). This process yields a conditional density estimate TG(I^)T_G(\cdot|\hat{I})TG(I^) for any new image I^\hat{I}I^: the image is first assigned to a cluster via v(I^)v(\hat{I})v(I^), and the corresponding KDE for that cluster is used as the global proposal.

At test time, the informed sampler, referred to as INF-MH, combines the local and global proposals using a mixture coefficient α[0,1]\alpha \in [0, 1]α[0,1]. The overall transition kernel is T=αTL+(1α)TGT = \alpha T_L + (1 - \alpha) T_GT=αTL+(1α)TG. This mixture allows for a flexible balance between local exploration and global, image-conditioned moves. The algorithm proceeds by first identifying the appropriate cluster for the observed image, then sampling from the mixture kernel and applying the Metropolis-Hastings acceptance rule to ensure the correct stationary distribution is achieved. This framework is designed to be general, and the authors demonstrate its application across diverse computer vision problems.

Experiment

The evaluation utilizes multiple parallel MCMC chains across three computer vision tasks to assess sampler convergence and posterior exploration. The camera extrinsics and occluding tiles experiments validate the methods' ability to navigate multi-modal and high-dimensional distributions, demonstrating that informed sampling combined with block-wise updates successfully overcomes the convergence failures of traditional baselines. Furthermore, the body shape estimation task confirms the practical utility of this approach through accurate 3D mesh reconstruction, reliable uncertainty quantification, and robustness under incomplete observations, collectively establishing that leveraging discriminative features to guide MCMC exploration significantly enhances inference reliability across complex vision problems.

The authors compare several sampling methods across three experimental setups, evaluating their performance using acceptance rates, convergence diagnostics, and mode discovery. Results show that informed sampling methods, particularly INF-MH, achieve higher acceptance rates and faster convergence compared to baseline methods. The informed samplers also demonstrate superior performance in exploring multi-modal posterior distributions and discovering different modes. Informed sampling methods achieve higher acceptance rates and faster convergence compared to baseline methods. INF-MH converges faster and discovers more modes than other samplers in multi-modal posterior distributions. The informed sampling approach outperforms baseline methods in terms of convergence and mode discovery across different experimental setups.

The authors analyze the performance of different sampling methods in terms of convergence and acceptance rates, focusing on how proposal standard deviation affects these metrics. Results show that acceptance rates decrease as the proposal standard deviation increases, while PSRF values stabilize after a certain number of iterations, indicating convergence. The informed sampling approach achieves higher acceptance rates and faster convergence compared to baseline methods. Acceptance rates decrease as the proposal standard deviation increases, with optimal performance observed at lower values. PSRF values stabilize after a few thousand iterations, indicating convergence for all methods. The informed sampling approach achieves higher acceptance rates and faster convergence compared to baseline methods.

The authors present results from a body shape estimation experiment where they infer 3D human body shapes from depth images using a generative model and informed sampling methods. The approach uses a mixture of global and local proposals to improve convergence and accuracy, with results showing that the informed sampler achieves lower reconstruction errors and better convergence compared to baseline methods. The method also allows for uncertainty quantification in the reconstructed mesh and enables the prediction of body measurements with associated confidence intervals. The informed sampling method achieves lower reconstruction errors and faster convergence compared to baseline methods in body shape estimation. The proposed approach enables uncertainty quantification in the reconstructed 3D mesh, with higher variance in regions of higher error. Body measurements can be predicted from the posterior distribution over shape parameters, with results showing accurate recovery and characterization of uncertainty.

The authors analyze the performance of informed sampling methods in comparison to baseline samplers across multiple experiments. Results show that informed samplers achieve faster convergence and higher acceptance rates compared to traditional methods, with the mixture coefficient in the informed sampling approach having a significant impact on performance. The informed samplers outperform baselines in terms of both convergence speed and sampling efficiency. Informed sampling methods achieve faster convergence and higher acceptance rates compared to baseline samplers. The mixture coefficient in the informed sampling approach significantly affects performance, with higher values leading to better acceptance rates. Informed samplers outperform baseline methods in terms of convergence speed and sampling efficiency.

The authors compare several sampling methods across three experimental setups, evaluating their performance using acceptance rates, potential scale reduction factors, and root mean square error. Results show that informed sampling methods, particularly those combining global and local proposals, achieve higher acceptance rates, faster convergence, and lower reconstruction errors compared to baseline samplers. The performance of these methods varies across different experimental settings, with informed samplers demonstrating superior convergence and stability in high-dimensional or multi-modal scenarios. Informed sampling methods achieve higher acceptance rates and faster convergence compared to baseline methods across multiple experiments. The combination of global and local proposals in mixture kernels leads to better performance than using either approach alone. In high-dimensional or multi-modal problems, informed samplers outperform baseline methods, which often fail to converge or exhibit poor mixing.

The experiments evaluate informed sampling methods against traditional baselines across multiple setups, including probabilistic modeling tasks and a 3D human body shape estimation application. These trials validate how effectively each approach converges, explores complex posterior landscapes, and reconstructs target shapes. Qualitatively, informed samplers consistently demonstrate superior stability and exploration capabilities, particularly when combining global and local proposals. The findings confirm that this approach not only accelerates convergence and improves sampling efficiency in complex scenarios but also provides reliable uncertainty quantification for practical predictions.


Créer de l'IA avec l'IA

De l'idée au lancement — accélérez votre développement IA avec le co-codage IA gratuit, un environnement prêt à l'emploi et le meilleur prix pour les GPU.

Codage assisté par IA
GPU prêts à l’emploi
Tarifs les plus avantageux

HyperAI Newsletters

Abonnez-vous à nos dernières mises à jour
Nous vous enverrons les dernières mises à jour de la semaine dans votre boîte de réception à neuf heures chaque lundi matin
Propulsé par MailChimp