Papers
Topics
Authors
Recent
2000 character limit reached

Possibilistic Variational Inference

Updated 29 November 2025
  • Possibilistic Variational Inference is an approach that adapts variational inference by using maxitive measures, replacing integrals with suprema and infima to model epistemic uncertainty.
  • It employs a maxitive analogue of the Donsker–Varadhan principle to derive dual consistency bounds and implements tractable optimization over exponential-family possibility functions.
  • The framework offers robust uncertainty quantification and interpretable plausibility bounds, making it valuable for applications like robust decision-making and safety-critical systems.

Possibilistic variational inference (PVI) is the adaptation of variational inference methods to possibility theory, an imprecise probability framework that models epistemic uncertainty directly via maxitive measures rather than subjective probabilities. Unlike conventional VI—where additivity underpins entropy, expectation, and divergence—PVI operates with suprema and infima, yielding robust and interpretable inference in scenarios with sparse or imprecise information. The PVI methodology is anchored in a maxitive analogue of the Donsker–Varadhan formula, reconstituting core concepts of Bayesian VI for possibility functions and enabling tractable optimization over exponential-family variational classes (Singh et al., 26 Nov 2025).

1. Foundations of Possibility Theory

Possibility theory formalizes uncertain information on an abstract space Ω\Omega with uncertain variables θΘ\theta \in \Theta. A possibility function fF(Θ)f \in \mathcal{F}(\Theta) satisfies f:Θ[0,1]f:\Theta \to [0,1] and supθf(θ)=1\sup_\theta f(\theta) = 1. For events AΘA \subset \Theta, the degree of possibility is Π(A)sup{f(θ) ⁣: ⁣θA}\Pi(A) \coloneqq \sup\{f(\theta)\!:\!\theta\in A\}, establishing the maxitive—rather than additive—nature of uncertainty. Analogues to classical operations include:

  • Marginalization: fθ(θ)=supψΨfθ,ψ(θ,ψ)f_{\theta}(\theta) = \sup_{\psi \in \Psi} f_{\theta,\psi}(\theta, \psi) for joint fθ,ψf_{\theta,\psi} on Θ×Ψ\Theta \times \Psi.
  • Conditioning: fθψ(θψ)=fθ,ψ(θ,ψ)/fψ(ψ)f_{\theta|\psi}(\theta|\psi) = f_{\theta,\psi}(\theta,\psi)/f_\psi(\psi) when fψ(ψ)>0f_\psi(\psi)>0.

Possibilistic expectation is the mode set Ef[θ]argsupθf(θ)E_f[\theta] \coloneqq \arg\sup_{\theta} f(\theta), and precision at a unique mode is If(θ)2logf(θ)θ=Ef[θ]\mathcal{I}_f(\theta) \coloneqq -\nabla^2 \log f(\theta)|_{\theta=E_f[\theta]}. The Gaussian-style normal possibility function N(θ;μ,Σ)exp[12(θμ)TΣ1(θμ)]\overline{N}(\theta;\mu,\Sigma) \coloneqq \exp\left[-\frac{1}{2}(\theta-\mu)^T\Sigma^{-1}(\theta-\mu)\right] attains mode at μ\mu and precision Σ1\Sigma^{-1}.

2. Maxitive Donsker–Varadhan Principle

PVI is characterized by the sup/inf replacement of integrals and divergences. For prior π(θ)exp[R(θ)]\pi(\theta) \propto \exp[-R(\theta)] and loss (θ)\ell(\theta), the maxitive posterior is:

gmax(θ)=exp[(θ)]π(θ)supθ{exp[(θ)]π(θ)},g^*_{\mathrm{max}}(\theta) = \frac{\exp[-\ell(\theta)]\pi(\theta)}{\sup_{\theta'} \{\exp[-\ell(\theta')]\pi(\theta')\}},

with "maxitive evidence" Zmax=supθexp[(θ)]π(θ)Z_{\mathrm{max}} = \sup_\theta \exp[-\ell(\theta)]\pi(\theta).

The core variational identities (Theorem 3) are:

logZmax=supgF(Θ)infθ{(θ)logg(θ)π(θ)}=infgF(Θ)supθ{(θ)logg(θ)π(θ)}.\log Z_{\mathrm{max}} = \sup_{g \in \mathcal{F}(\Theta)} \inf_\theta \{-\ell(\theta) - \log \frac{g(\theta)}{\pi(\theta)}\} = \inf_{g \in \mathcal{F}(\Theta)} \sup_\theta \{-\ell(\theta) - \log \frac{g(\theta)}{\pi(\theta)}\}.

These yield dual consistency bounds:

Bound Name Mathematical Formulation Order Relation
Lower consistency CBO(g)infθ{(θ)log[g(θ)/π(θ)]}\underline{\mathrm{CBO}}(g)\coloneqq \inf_\theta\{-\ell(\theta)-\log[g(\theta)/\pi(\theta)]\} maximizer ggmaxg \preceq g^*_{\mathrm{max}}
Upper consistency CBO(g)supθ{(θ)log[g(θ)/π(θ)]}\overline{\mathrm{CBO}}(g)\coloneqq \sup_\theta\{-\ell(\theta)-\log[g(\theta)/\pi(\theta)]\} minimizer ggmaxg \succeq g^*_{\mathrm{max}}

Optimizing these bounds recovers the posterior up to partial order; gmaxg^*_{\mathrm{max}} is both supremal and infimal among lower and upper bound optimizers, respectively. Symmetric balancing via αCBO(g)(1α)CBO(g)\alpha\overline{\mathrm{CBO}}(g)-(1-\alpha)\underline{\mathrm{CBO}}(g) interpolates between bounds.

3. Exponential-Family Possibility Functions

As full maximization over F(Θ)\mathcal{F}(\Theta) is generally intractable, PVI is operationalized over tractable variational families. The possibilistic exponential family is:

gλ(θ)=exp[λT(θ)A(λ)B(θ)],g_\lambda(\theta) = \exp[\lambda^\top T(\theta) - A(\lambda) - B(\theta)],

where T(θ)RdT(\theta)\in \mathbb{R}^d, B(θ)B(\theta) is given, and Λ\Lambda is a convex subset of Rd\mathbb{R}^d. The log-partition function is

A(λ)=supθΘ{λT(θ)B(θ)}.A(\lambda) = \sup_{\theta\in\Theta}\{\lambda^\top T(\theta) - B(\theta)\}.

Examples include analogues of Bernoulli, Poisson, and Gaussian possibility functions.

4. Optimization Criteria and Update Rules

Restricting gg to exponential families, the lower-bound objective simplifies to

CBO(gλ)=infθ{(θ)λT(θ)+A(λ)+B(θ)logπ(θ)}.\underline{\mathrm{CBO}}(g_\lambda) = \inf_\theta \{-\ell(\theta) - \lambda^\top T(\theta) + A(\lambda) + B(\theta) - \log \pi(\theta)\}.

Typically, the inner function in θ\theta is strongly concave, so one can define a mode θ(λ)=argsupθ{(θ)+λT(θ)B(θ)}\theta_*(\lambda) = \arg\sup_\theta\{\ell(\theta) + \lambda^\top T(\theta) - B(\theta)\}, yielding

CBO(gλ)=(θ(λ))λT(θ(λ))+A(λ)+B(θ(λ))logπ(θ(λ)).\underline{\mathrm{CBO}}(g_\lambda) = -\ell(\theta_*(\lambda)) - \lambda^\top T(\theta_*(\lambda)) + A(\lambda) + B(\theta_*(\lambda)) - \log \pi(\theta_*(\lambda)).

Gradient-style update (Proposition 6) is

λt+1=λtρt[θ(λt+θ(θ(λt)))θ(λt)],\lambda_{t+1} = \lambda_t - \rho_t\,[\theta_*(\lambda_t+\nabla_\theta \ell(\theta_*(\lambda_t))) - \theta_*(\lambda_t)],

with Laplace-style approximation,

λt+1λtρtIλt1θ(θ(λt)),\lambda_{t+1} \approx \lambda_t - \rho_t\,\mathcal{I}_{\lambda_t}^{-1}\nabla_\theta \ell(\theta_*(\lambda_t)),

where Iλt\mathcal{I}_{\lambda_t} is the mode precision.

Special cases recover known algorithms:

  • Multivariate normal (fixed covariance Σ\Sigma): λt+1λtρtμ(μt)\lambda_{t+1}\approx\lambda_t-\rho_t\nabla_\mu\ell(\mu_t) for μt=Σ1λt\mu_t=\Sigma^{-1}\lambda_t.
  • Binomial (nn trials): λt+1λtρt[nλt(nλt)]1θ(θt)\lambda_{t+1}\approx\lambda_t-\rho_t[n\lambda_t(n-\lambda_t)]^{-1}\nabla_\theta\ell(\theta_t); in probability p=λ/np=\lambda/n, pt+1pt(ρt/n)p(pt)p_{t+1}\approx p_t-(\rho_t/n)\nabla_p\ell(p_t).

5. Theoretical Properties

The compactness of F(Θ)\mathcal{F}(\Theta) and semicontinuity of the CBO functionals guarantee existence of optimizers, though not uniqueness. The optimizer set admits partial ordering: gmaxg^*_{\mathrm{max}} is the unique extremal representative. Under smoothness and concavity conditions on \ell, TT, and BB, the gradient-type updates converge to stationary points within the variational family.

Regarding robustness, possibility posteriors can always be made maximally conservative (the constant function $1$), and PVI delivers under- and over-estimates around the mode, encasing worst- and best-case plausibility bounds. Interpretability stems from possibility functions' direct encoding of plausibility, which is especially useful when additive probabilities for rare events are unattainable. The max-sup structure generates sharp bounding envelopes for uncertainty.

6. Comparison with Classical Bayesian Variational Inference

Both probabilistic and possibilistic VI begin from Gibbs-type posteriors p(θ)exp[(θ)]π(θ)p(\theta)\propto\exp[-\ell(\theta)]\pi(\theta) and employ variational families parameterized by λ\lambda, with updates derived from gradient steps on a bound (ELBO for probability, CBO for possibility). Distinguishing features include:

  • Integration vs. maximization: Integrals and additivity in classical VI are replaced by maxima and maxitivity in PVI.
  • Divergence and entropy: KL divergence is supplanted by max-relative entropy Dmax(gf)=supθlog[g(θ)/f(θ)]D_{\mathrm{max}}(g \Vert f) = \sup_\theta\log[g(\theta)/f(\theta)], yielding dual bounds (\underline{CBO}, \overline{CBO}) for under/over-estimation of plausibility.
  • Partition function: In PVI, the log-partition function is computed via sup\sup, simplifying normalization and bounding.
  • Non-additivity: Maxitive combination underlies all algebraic structures in possibility theory, distinctly departing from probability theory’s additive framework.

7. Applications and Empirical Insights

PVI is particularly suited for robust decision-making under model misspecification, multi-target filtering and tracking in partially known environments, and safety-critical systems where event probabilities are fundamentally inaccessible but plausibility ranking remains feasible. Empirical studies (Houssineau & Nott, 2022; Xue et al., 2025) indicate that possibilistic Bernoulli-filter and linear tracking algorithms are interpretable as instances of PVI within exponential families, often yielding closed-form updates and demonstrating resilience to outliers.

PVI replaces integral-based entropy and KL-like divergence with sup/inf-based analogues, producing a dual-bound variational apparatus and a natural gradient-style optimization scheme within exponential families. The resulting framework directly quantifies epistemic plausibility, offers tractable solutions under deep uncertainty, and generalizes classical algorithms to the maxitive setting (Singh et al., 26 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Possibilistic Variational Inference.