Papers
Topics
Authors
Recent
Search
2000 character limit reached

Maxitive Donsker–Varadhan Theorem

Updated 9 June 2026
  • The Maxitive Donsker–Varadhan theorem is a framework in possibility theory that replaces additive constructs with suprema and maxitive analogues to handle epistemic uncertainty.
  • It establishes a dual variational representation that mirrors the classical formula by substituting integrated expectations with supremal operations and KL divergence with max-relative entropy.
  • The theorem underpins coordinate-ascent updates in possibilistic variational inference, facilitating robust optimization in maxitive exponential family models.

The Maxitive Donsker–Varadhan theorem provides the cornerstone variational representation for inference within the framework of possibility theory, mirroring the classical additive Donsker–Varadhan formula while replacing probability-centric constructs with maxitive analogues. This formulation underpins a rigorous approach to possibilistic variational inference (PVI), where epistemic uncertainty is modeled via possibility functions, and essential operations such as integration, expectation, and divergence are inherently maxitive rather than additive. The theorem enables coordinate-ascent update schemes analogous in spirit to classical variational inference, but based on modes and max-relative entropy.

1. Classical Donsker–Varadhan Formula

The classical Donsker–Varadhan variational principle provides a dual representation for the cumulant generating function in terms of probability measures and relative entropy. For a measurable space (Θ,F)(\Theta, \mathcal{F}) with probability measure ν\nu and measurable function h:ΘRh: \Theta \rightarrow \mathbb{R} such that ehdν<\int e^{h} \, d\nu < \infty, the formula is:

Λ(h):=logeh(θ)ν(dθ)=supρP(Θ)[Eρ[h]KL(ρν)]\Lambda(h) := \log \int e^{h(\theta)} \nu(d\theta) = \sup_{\rho \in \mathcal{P}(\Theta)} [ \mathbb{E}_\rho[h] - \mathrm{KL}(\rho\|\nu) ]

The supremum is attained at the Gibbs measure dρ/dνeh(θ)d\rho^* / d\nu \propto e^{h(\theta)}. This result underpins much of variational inference (VI), where expectations and KL\mathrm{KL} divergences govern the optimization objective and its gradients.

2. Key Maxitive Structures in Possibility Theory

Possibility theory replaces additive probabilistic constructs with structures tailored to imprecise or incomplete information:

  • Possibility Functions: A function π:Θ[0,1]\pi: \Theta \rightarrow [0,1] with supθπ(θ)=1\sup_\theta \pi(\theta) = 1. The set F(Θ)\mathcal{F}(\Theta) consists of all such ν\nu0, ordered pointwise (ν\nu1 iff ν\nu2).
  • Maxitive Integral: For a nonnegative "reward" ν\nu3, the maxitive (supremal) integral with respect to ν\nu4 is ν\nu5, or ν\nu6 if ν\nu7 already incorporates ν\nu8.
  • Maxitive Expectation: The counterpart to expectation is the mode; for a real-valued ν\nu9, h:ΘRh: \Theta \rightarrow \mathbb{R}0.
  • Max-Relative Entropy: For h:ΘRh: \Theta \rightarrow \mathbb{R}1, define h:ΘRh: \Theta \rightarrow \mathbb{R}2.

These constructs accommodate maximally informative choices under epistemic uncertainty, bypassing the need for probabilistic additivity.

3. Maxitive Donsker–Varadhan Theorem

Let h:ΘRh: \Theta \rightarrow \mathbb{R}3 be a prior possibility function and h:ΘRh: \Theta \rightarrow \mathbb{R}4 a nonnegative loss. Define the maxitive model evidence ("consistency") and its logarithm:

h:ΘRh: \Theta \rightarrow \mathbb{R}5

The Maxitive Donsker–Varadhan theorem asserts a saddle-point-like characterization:

h:ΘRh: \Theta \rightarrow \mathbb{R}6

  • The sup-inf form (3.1) is maximized by any h:ΘRh: \Theta \rightarrow \mathbb{R}7, while the inf-sup form (3.2) is minimized by any h:ΘRh: \Theta \rightarrow \mathbb{R}8, where the "Gibbs" posterior possibility function h:ΘRh: \Theta \rightarrow \mathbb{R}9 is

ehdν<\int e^{h} \, d\nu < \infty0

The additive integral ehdν<\int e^{h} \, d\nu < \infty1 of the classical theorem is replaced with a supremum, and the Kullback–Leibler divergence by ehdν<\int e^{h} \, d\nu < \infty2. This represents a maxitive duality structure inherent in possibility theory (Singh et al., 26 Nov 2025).

4. Sketch of Proof and Theoretical Parallels

The proof begins by expressing the log-consistency as

ehdν<\int e^{h} \, d\nu < \infty3

For any ehdν<\int e^{h} \, d\nu < \infty4 with ehdν<\int e^{h} \, d\nu < \infty5,

ehdν<\int e^{h} \, d\nu < \infty6

taking the supremum over ehdν<\int e^{h} \, d\nu < \infty7 yields (3.1), with equality at ehdν<\int e^{h} \, d\nu < \infty8. Dually, as ehdν<\int e^{h} \, d\nu < \infty9,

Λ(h):=logeh(θ)ν(dθ)=supρP(Θ)[Eρ[h]KL(ρν)]\Lambda(h) := \log \int e^{h(\theta)} \nu(d\theta) = \sup_{\rho \in \mathcal{P}(\Theta)} [ \mathbb{E}_\rho[h] - \mathrm{KL}(\rho\|\nu) ]0

and taking the supremum over Λ(h):=logeh(θ)ν(dθ)=supρP(Θ)[Eρ[h]KL(ρν)]\Lambda(h) := \log \int e^{h(\theta)} \nu(d\theta) = \sup_{\rho \in \mathcal{P}(\Theta)} [ \mathbb{E}_\rho[h] - \mathrm{KL}(\rho\|\nu) ]1 recovers (3.2), again tight at Λ(h):=logeh(θ)ν(dθ)=supρP(Θ)[Eρ[h]KL(ρν)]\Lambda(h) := \log \int e^{h(\theta)} \nu(d\theta) = \sup_{\rho \in \mathcal{P}(\Theta)} [ \mathbb{E}_\rho[h] - \mathrm{KL}(\rho\|\nu) ]2. The structure thus mirrors the classical Donsker–Varadhan proof—integrals are replaced with suprema, and KL divergence with max-relative entropy.

5. Maxitive Exponential Families and Possibilistic Variational Inference

Maxitive exponential families provide tractable variational classes for PVI:

  • For Λ(h):=logeh(θ)ν(dθ)=supρP(Θ)[Eρ[h]KL(ρν)]\Lambda(h) := \log \int e^{h(\theta)} \nu(d\theta) = \sup_{\rho \in \mathcal{P}(\Theta)} [ \mathbb{E}_\rho[h] - \mathrm{KL}(\rho\|\nu) ]3,
  • Λ(h):=logeh(θ)ν(dθ)=supρP(Θ)[Eρ[h]KL(ρν)]\Lambda(h) := \log \int e^{h(\theta)} \nu(d\theta) = \sup_{\rho \in \mathcal{P}(\Theta)} [ \mathbb{E}_\rho[h] - \mathrm{KL}(\rho\|\nu) ]4 ensures Λ(h):=logeh(θ)ν(dθ)=supρP(Θ)[Eρ[h]KL(ρν)]\Lambda(h) := \log \int e^{h(\theta)} \nu(d\theta) = \sup_{\rho \in \mathcal{P}(\Theta)} [ \mathbb{E}_\rho[h] - \mathrm{KL}(\rho\|\nu) ]5.

The lower consistency bound (CBO)

Λ(h):=logeh(θ)ν(dθ)=supρP(Θ)[Eρ[h]KL(ρν)]\Lambda(h) := \log \int e^{h(\theta)} \nu(d\theta) = \sup_{\rho \in \mathcal{P}(\Theta)} [ \mathbb{E}_\rho[h] - \mathrm{KL}(\rho\|\nu) ]6

is the PVI analogue of the ELBO. Maximizing this over Λ(h):=logeh(θ)ν(dθ)=supρP(Θ)[Eρ[h]KL(ρν)]\Lambda(h) := \log \int e^{h(\theta)} \nu(d\theta) = \sup_{\rho \in \mathcal{P}(\Theta)} [ \mathbb{E}_\rho[h] - \mathrm{KL}(\rho\|\nu) ]7 yields the best approximation within the chosen class, i.e.,

Λ(h):=logeh(θ)ν(dθ)=supρP(Θ)[Eρ[h]KL(ρν)]\Lambda(h) := \log \int e^{h(\theta)} \nu(d\theta) = \sup_{\rho \in \mathcal{P}(\Theta)} [ \mathbb{E}_\rho[h] - \mathrm{KL}(\rho\|\nu) ]8

6. Coordinate-Ascent Updates and Connections to Classical Variational Inference

Coordinate ascent in the PVI framework for exponential families is justified via the Maxitive Donsker–Varadhan theorem. For any maximizer

Λ(h):=logeh(θ)ν(dθ)=supρP(Θ)[Eρ[h]KL(ρν)]\Lambda(h) := \log \int e^{h(\theta)} \nu(d\theta) = \sup_{\rho \in \mathcal{P}(\Theta)} [ \mathbb{E}_\rho[h] - \mathrm{KL}(\rho\|\nu) ]9

a legitimate ascent step is

dρ/dνeh(θ)d\rho^* / d\nu \propto e^{h(\theta)}0

where dρ/dνeh(θ)d\rho^* / d\nu \propto e^{h(\theta)}1 is the mode. For key families:

  • Gaussian (known covariance dρ/dνeh(θ)d\rho^* / d\nu \propto e^{h(\theta)}2): With dρ/dνeh(θ)d\rho^* / d\nu \propto e^{h(\theta)}3, dρ/dνeh(θ)d\rho^* / d\nu \propto e^{h(\theta)}4, the update formula recovers standard gradient descent in the mean parameter dρ/dνeh(θ)d\rho^* / d\nu \propto e^{h(\theta)}5.
  • Binomial (dρ/dνeh(θ)d\rho^* / d\nu \propto e^{h(\theta)}6 trials): dρ/dνeh(θ)d\rho^* / d\nu \propto e^{h(\theta)}7, standard parameter dρ/dνeh(θ)d\rho^* / d\nu \propto e^{h(\theta)}8, yields the familiar gradient-descent recursion on dρ/dνeh(θ)d\rho^* / d\nu \propto e^{h(\theta)}9.

These updates strongly parallel classical variational coordinate ascent, but all expectations and entropic quantities are maxitive.

7. Implications and Research Directions

The Maxitive Donsker–Varadhan theorem enables principled variational inference in contexts dominated by epistemic uncertainty, imprecision, or incomplete information, where additivity is not justified. The PVI methodology with maxitive divergences admits direct analogues of probabilistic update rules, facilitating robust and interpretable optimization in exponential-family models. The construction and analysis of new variational families, as well as the extension to more complex loss landscapes and hierarchical models, represent active research directions within possibilistic inference frameworks (Singh et al., 26 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Maxitive Donsker–Varadhan Theorem.