Papers
Topics
Authors
Recent
Search
2000 character limit reached

Amortized Conjugate Posterior (ACP)

Updated 31 March 2026
  • The paper introduces ACP, a hybrid variational inference technique that combines classical conjugate dual bounds from the noisy-OR model with neural amortization for efficient posterior approximation.
  • ACP achieves improved parameter learning and robust ELBO maximization, as demonstrated by superior F1 scores and enhanced topic coherence in data-scarce settings.
  • ACP leverages analytic posterior forms encoded via neural networks to naturally incorporate generative structure, offering scalability and improved generalization in discrete latent variable models.

The Amortized Conjugate Posterior (ACP) is a hybrid variational inference technique designed for structured probabilistic models, exemplified by the binary noisy-OR latent variable model. ACP integrates classical conjugate dual bounds on the likelihood with the scalability and efficiency of amortized inference, achieving improved posterior approximation and parameter learning by leveraging both model structure and neural inference networks. Originally introduced by Steinhardt and Miller (2019), ACP directly maximizes the evidence lower bound (ELBO) while encoding inductive structure from the generative process into the variational family, enabling efficient and robust inference even in data-scarce regimes (Yan et al., 2019).

1. Generative Structure: The Noisy-OR Model

ACP operates on discrete latent variable models such as the noisy-OR, where observed binary variables x=(x1,...,xD){0,1}Dx = (x_1, ..., x_D) \in \{0,1\}^D are generated by a collection of KK independent binary latent causes z=(z1,...,zK){0,1}Kz = (z_1, ..., z_K) \in \{0,1\}^K together with a fixed leak variable z01z_0 \equiv 1. The prior factorizes as p(z)=k=1Kμkzk(1μk)1zkp(z) = \prod_{k=1}^{K} \mu_k^{z_k} (1-\mu_k)^{1-z_k}, with μk\mu_k as Bernoulli parameters. The conditional noisy-OR likelihood is given by

p(xi=0z)=exp(θi0k=1Kθikzk),p(x_i = 0 | z) = \exp\left(-\theta_{i0} - \sum_{k=1}^K \theta_{ik}z_k\right),

where θik=log(1pik)\theta_{ik} = -\log(1 - p_{ik}) with pik=p(xi=1zk=1)p_{ik} = p(x_i=1|z_k=1). Thus, the marginal log-likelihood logp(x)=logzp(x,z)\log p(x) = \log \sum_z p(x,z) is combinatorially intractable for exact inference (Yan et al., 2019).

2. Classical Conjugate Dual Variational Inference

To circumvent intractability, ACP starts from a classical approach: deriving a tractable upper bound for the problematic log-likelihood terms using Fenchel conjugates. For xi=1x_i=1, the bound

logp(xi=1z)ψiaig(ψi)\log p(x_i=1|z) \leq \psi_i a_i - g(\psi_i)

is employed, with ai=θi0+kθikzka_i = \theta_{i0} + \sum_k \theta_{ik} z_k and gg the Fenchel conjugate of f(s)=log(1es)f(s) = \log(1 - e^{-s}). For xi=0x_i=0, the true likelihood is retained. This upper-bounded surrogate joint likelihood pUB(xz;ψ)p_{UB}(x|z;\psi) yields a tractable factorized variational posterior

q(zx;ψ)=k=1KBernoulli(qk(x;θ,μ,ψ)),q(z|x; \psi) = \prod_{k=1}^K \textrm{Bernoulli}(q_k(x; \theta, \mu, \psi)),

with

qk(x;θ,μ,ψ)=σ(i:xi=1ψiθiki:xi=0θik+logμk1μk),q_k(x; \theta, \mu, \psi) = \sigma\left(\sum_{i:x_i=1} \psi_i \theta_{ik} - \sum_{i:x_i=0} \theta_{ik} + \log \frac{\mu_k}{1-\mu_k}\right),

where σ\sigma is the sigmoid function. The classical conjugate dual inference (CDI) approach optimizes each ψi\psi_i per datapoint by fixed-point iteration to tighten the bound on the marginal likelihood (Yan et al., 2019).

3. Amortized Conjugate Posterior: Formulation

Instead of relocating the variational parameters ψ\psi for each datapoint, ACP amortizes these parameters across the dataset using a neural network encoder, typically a multilayer perceptron (MLP) with two hidden layers (e.g., 200–400 ReLU units, no dropout). For each input xx, the encoder produces ψ(x;φ)RD\psi(x;\varphi) \in \mathbb{R}^D, which parameterizes the variational posterior in the analytic form above. This construction yields the amortized conjugate posterior

qφ(zx)=k=1KBernoulli(qk(x;θ,μ,φ)),q_\varphi(z|x) = \prod_{k=1}^K \textrm{Bernoulli}(q_k(x;\theta,\mu,\varphi)),

with qkq_k as defined above but with amortized ψi(x;φ)\psi_i(x;\varphi). The variational family, structured via conjugate duality, inherits inductive bias from the generative model, yielding enhanced generalization in low-data settings (Yan et al., 2019).

4. ELBO Objective and Training Procedure

ACP directly maximizes the ELBO:

L(φ,θ,μ)=Exdata[Ezqφ(zx)[logp(xz;θ)]KL[qφ(zx)p(z;μ)]].\mathcal{L}(\varphi,\theta,\mu) = \mathbb{E}_{x \sim \text{data}}\Big[\mathbb{E}_{z \sim q_\varphi(z|x)}[\log p(x|z;\theta)] - \text{KL}[q_\varphi(z|x)\Vert p(z;\mu)]\Big].

Training proceeds via stochastic gradient ascent using Adam. For each batch of xx, the encoder outputs ψ\psi, the variational posteriors qkq_k, and Gumbel-Softmax relaxation is used to enable reparameterization and backpropagation through discrete latents. Monte Carlo estimation (with L=1L=1 sample per datapoint sufficing in practice) is used for the positive ELBO terms, while negative and KL terms are computed analytically. Gradients with respect to φ\varphi, θ\theta, μ\mu are taken, and parameters are updated until convergence (Yan et al., 2019).

Step Method Details
Encoder MLP 2×200 ReLU, linear output, no dropout
Latent variable Gumbel-Softmax Annealed temperature 1.0→0.1 over 10k steps
Optimizer Adam Learning rate 10310^{-3}, batch size 200

5. Empirical Benchmarks and Comparative Analysis

Empirical studies across inference accuracy, parameter recovery, generative modeling, and real-world topic modeling evidence that ACP consistently matches or outperforms traditional amortized variational inference (AVI) and unconstrained stochastic variational inference (SVI). With Ntrain=1000N_\text{train}=1000, both ACP and AVI achieve high F1 (94%\sim 94\%), with negative ELBOs of $14.4$ (ACP) and $14.0$ (AVI); for Ntrain=20N_\text{train}=20, ACP vastly outperforms AVI (NELBO=22.2\text{NELBO}=22.2 vs $37.2$, F1 76.1%76.1\% vs 49.2%49.2\%). For generative and parameter estimation tasks, ACP displays superior robustness and generalizes better, particularly in data-scarce regimes where AVI suffers from overfitting and SVI underperforms. In topic modeling of NeurIPS title data, ACP maintains high topic coherence (PMI $2.75$) even with fewer documents, whereas AVI’s coherence drops to $2.55$ (Yan et al., 2019).

6. Theoretical and Algorithmic Insights

ACP’s hybrid variational family—plugging analytic, model-derived posterior forms into a neural amortization architecture—injects generative structure into encoder learning, yielding posteriors aligned to the model’s statistical dependencies. This structure reduces the risk of overfitting, particularly when the number of examples is limited. A crucial observation is that maximizing tightness of the classical dual likelihood bound (as in CDI) does not ensure optimal ELBO or posterior quality, whereas ACP maximizes the ELBO directly without this misalignment. When sufficient data are available, ACP achieves theoretical parity with unconstrained AVI in flexibility, but its structural bias is advantageous in practical scenarios with limited data (Yan et al., 2019).

The ACP framework generalizes beyond the noisy-OR model to any latent variable model where conjugate dual upper bounds and tractable analytic posteriors can be derived. Related approaches, such as amortized Bayesian inference for clustering models (Pakman et al., 2018), also exploit the structure afforded by conjugacy within mixture models and Dirichlet process mixtures, using permutation-invariant neural architectures. These methods highlight a growing trend toward integrating analytic structure from classical statistics with the scalability and expressivity conferred by neural amortization. A plausible implication is that amortized conjugate posteriors enable efficient, i.i.d. approximate-posterior sampling at a computational cost competitive with classical MCMC, while capturing structural inductive biases (Yan et al., 2019, Pakman et al., 2018).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Amortized Conjugate Posterior (ACP).