HyperAIHyperAI

Command Palette

Search for a command to run...

3 years ago

The Informed Sampler: A Discriminative Approach to Bayesian Inference in Generative Computer Vision Models

Varun Jampani Sebastian Nowozin Matthew Loper Peter V. Gehler

Introduction to Computer Vision

20 Hours of RTX 5090 Compute Resources for Only $1 (Worth $7)
Go to Notebook

Abstract

Computer vision is hard because of a large variability in lighting, shape, and texture; in addition the image signal is non-additive due to occlusion. Generative models promised to account for this variability by accurately modelling the image formation process as a function of latent variables with prior beliefs. Bayesian posterior inference could then, in principle, explain the observation. While intuitively appealing, generative models for computer vision have largely failed to deliver on that promise due to the difficulty of posterior inference. As a result the community has favoured efficient discriminative approaches. We still believe in the usefulness of generative models in computer vision, but argue that we need to leverage existing discriminative or even heuristic computer vision methods. We implement this idea in a principled way with an informed sampler and in careful experiments demonstrate it on challenging generative models which contain renderer programs as their components. We concentrate on the problem of inverting an existing graphics rendering engine, an approach that can be understood as "Inverse Graphics". The informed sampler, using simple discriminative proposals based on existing computer vision technology, achieves significant improvements of inference.

One-sentence Summary

The authors propose the Informed Sampler, a Bayesian inference framework that enhances generative computer vision models by integrating discriminative proposals from existing computer vision technology, yielding significant inference improvements for inverse graphics tasks that invert graphics rendering engines.

Key Contributions

  • Introduces an informed MCMC sampler that leverages histogram-of-gradients features and the OpenCV library to generate discriminative proposals for efficient posterior inference in generative computer vision models.
  • Applies the framework to invert existing graphics rendering engines for camera extrinsics estimation, occlusion reasoning, and parametric human body shape estimation using the BlendSCAPE model.
  • Demonstrates that the informed sampler achieves reliable convergence and significant performance improvements on challenging multi-modal problems compared to standard Metropolis-Hastings sampling.

Introduction

Generative computer vision models aim to reconstruct scene parameters by simulating physical image formation, providing a principled framework for inverse graphics and Bayesian inference. However, these models have historically struggled because posterior inference becomes computationally intractable in high-dimensional spaces with complex occlusions and multi-modal distributions. This fundamental bottleneck has driven the field toward purely discriminative approaches that bypass explicit generative reasoning. To overcome this limitation, the authors develop the informed sampler, an MCMC method that leverages standard discriminative computer vision features to generate targeted proposals for latent variables. By combining heuristic guidance with rigorous generative evaluation, this approach enables efficient and reliable posterior estimation in complex rendering-based models that were previously out of reach.

Method

The authors leverage a Metropolis-Hastings Markov Chain Monte Carlo (MCMC) framework to perform Bayesian inference over the posterior distribution p(θI^)p(\theta|\hat{I})p(θI^), where θ\thetaθ represents the parameters of a generative model and I^\hat{I}I^ is an observed image. This approach is necessary because the posterior distribution is typically intractable due to the complex nature of the generative process, which in this context is a graphics engine rendering images. The core of the method is the design of an informed proposal distribution that enhances the efficiency of the sampling process.

The standard MCMC procedure involves iteratively proposing a new state θˉ\bar{\theta}θˉ from a proposal distribution T(θt)T(\cdot|\theta_t)T(θt) and accepting or rejecting this proposal based on the Metropolis-Hastings acceptance ratio. The key innovation in this work is the construction of a mixture proposal distribution Tα(I^,θt)T_\alpha(\cdot|\hat{I},\theta_t)Tα(I^,θt) that combines a local proposal TL(θt)T_L(\cdot|\theta_t)TL(θt) with a global proposal TG(I^)T_G(\cdot|\hat{I})TG(I^). The local proposal, typically a symmetric distribution like a multivariate normal, facilitates local exploration of the parameter space. The global proposal, TG(I^)T_G(\cdot|\hat{I})TG(I^), is conditioned on the observed image I^\hat{I}I^ and is designed to make larger, more informative jumps in the parameter space. This global proposal is learned in an offline stage using discriminative methods, allowing it to leverage knowledge about the relationship between images and parameters.

The construction of the global proposal TGT_GTG is based on a non-parametric density estimation technique. The method first generates a large dataset of paired samples (θ(i),I(i))(\theta^{(i)}, I^{(i)})(θ(i),I(i)) by simulating from the generative model p(Iθ)p(θ)p(I|\theta)p(\theta)p(Iθ)p(θ). A feature representation v(I)v(I)v(I) is computed for each image, and a k-means clustering algorithm is applied to group the images based on these features. For each resulting cluster CjC_jCj, a kernel density estimate (KDE) is fitted to the corresponding set of parameters θ(Cj)\theta^{(C_j)}θ(Cj). This process yields a conditional density estimate TG(I^)T_G(\cdot|\hat{I})TG(I^) for any new image I^\hat{I}I^: the image is first assigned to a cluster via v(I^)v(\hat{I})v(I^), and the corresponding KDE for that cluster is used as the global proposal.

At test time, the informed sampler, referred to as INF-MH, combines the local and global proposals using a mixture coefficient α[0,1]\alpha \in [0, 1]α[0,1]. The overall transition kernel is T=αTL+(1α)TGT = \alpha T_L + (1 - \alpha) T_GT=αTL+(1α)TG. This mixture allows for a flexible balance between local exploration and global, image-conditioned moves. The algorithm proceeds by first identifying the appropriate cluster for the observed image, then sampling from the mixture kernel and applying the Metropolis-Hastings acceptance rule to ensure the correct stationary distribution is achieved. This framework is designed to be general, and the authors demonstrate its application across diverse computer vision problems.

Experiment

The evaluation utilizes multiple parallel MCMC chains across three computer vision tasks to assess sampler convergence and posterior exploration. The camera extrinsics and occluding tiles experiments validate the methods' ability to navigate multi-modal and high-dimensional distributions, demonstrating that informed sampling combined with block-wise updates successfully overcomes the convergence failures of traditional baselines. Furthermore, the body shape estimation task confirms the practical utility of this approach through accurate 3D mesh reconstruction, reliable uncertainty quantification, and robustness under incomplete observations, collectively establishing that leveraging discriminative features to guide MCMC exploration significantly enhances inference reliability across complex vision problems.

The authors compare several sampling methods across three experimental setups, evaluating their performance using acceptance rates, convergence diagnostics, and mode discovery. Results show that informed sampling methods, particularly INF-MH, achieve higher acceptance rates and faster convergence compared to baseline methods. The informed samplers also demonstrate superior performance in exploring multi-modal posterior distributions and discovering different modes. Informed sampling methods achieve higher acceptance rates and faster convergence compared to baseline methods. INF-MH converges faster and discovers more modes than other samplers in multi-modal posterior distributions. The informed sampling approach outperforms baseline methods in terms of convergence and mode discovery across different experimental setups.

The authors analyze the performance of different sampling methods in terms of convergence and acceptance rates, focusing on how proposal standard deviation affects these metrics. Results show that acceptance rates decrease as the proposal standard deviation increases, while PSRF values stabilize after a certain number of iterations, indicating convergence. The informed sampling approach achieves higher acceptance rates and faster convergence compared to baseline methods. Acceptance rates decrease as the proposal standard deviation increases, with optimal performance observed at lower values. PSRF values stabilize after a few thousand iterations, indicating convergence for all methods. The informed sampling approach achieves higher acceptance rates and faster convergence compared to baseline methods.

The authors present results from a body shape estimation experiment where they infer 3D human body shapes from depth images using a generative model and informed sampling methods. The approach uses a mixture of global and local proposals to improve convergence and accuracy, with results showing that the informed sampler achieves lower reconstruction errors and better convergence compared to baseline methods. The method also allows for uncertainty quantification in the reconstructed mesh and enables the prediction of body measurements with associated confidence intervals. The informed sampling method achieves lower reconstruction errors and faster convergence compared to baseline methods in body shape estimation. The proposed approach enables uncertainty quantification in the reconstructed 3D mesh, with higher variance in regions of higher error. Body measurements can be predicted from the posterior distribution over shape parameters, with results showing accurate recovery and characterization of uncertainty.

The authors analyze the performance of informed sampling methods in comparison to baseline samplers across multiple experiments. Results show that informed samplers achieve faster convergence and higher acceptance rates compared to traditional methods, with the mixture coefficient in the informed sampling approach having a significant impact on performance. The informed samplers outperform baselines in terms of both convergence speed and sampling efficiency. Informed sampling methods achieve faster convergence and higher acceptance rates compared to baseline samplers. The mixture coefficient in the informed sampling approach significantly affects performance, with higher values leading to better acceptance rates. Informed samplers outperform baseline methods in terms of convergence speed and sampling efficiency.

The authors compare several sampling methods across three experimental setups, evaluating their performance using acceptance rates, potential scale reduction factors, and root mean square error. Results show that informed sampling methods, particularly those combining global and local proposals, achieve higher acceptance rates, faster convergence, and lower reconstruction errors compared to baseline samplers. The performance of these methods varies across different experimental settings, with informed samplers demonstrating superior convergence and stability in high-dimensional or multi-modal scenarios. Informed sampling methods achieve higher acceptance rates and faster convergence compared to baseline methods across multiple experiments. The combination of global and local proposals in mixture kernels leads to better performance than using either approach alone. In high-dimensional or multi-modal problems, informed samplers outperform baseline methods, which often fail to converge or exhibit poor mixing.

The experiments evaluate informed sampling methods against traditional baselines across multiple setups, including probabilistic modeling tasks and a 3D human body shape estimation application. These trials validate how effectively each approach converges, explores complex posterior landscapes, and reconstructs target shapes. Qualitatively, informed samplers consistently demonstrate superior stability and exploration capabilities, particularly when combining global and local proposals. The findings confirm that this approach not only accelerates convergence and improves sampling efficiency in complex scenarios but also provides reliable uncertainty quantification for practical predictions.


Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp