Amortized Conjugate Posterior (ACP)
- The paper introduces ACP, a hybrid variational inference technique that combines classical conjugate dual bounds from the noisy-OR model with neural amortization for efficient posterior approximation.
- ACP achieves improved parameter learning and robust ELBO maximization, as demonstrated by superior F1 scores and enhanced topic coherence in data-scarce settings.
- ACP leverages analytic posterior forms encoded via neural networks to naturally incorporate generative structure, offering scalability and improved generalization in discrete latent variable models.
The Amortized Conjugate Posterior (ACP) is a hybrid variational inference technique designed for structured probabilistic models, exemplified by the binary noisy-OR latent variable model. ACP integrates classical conjugate dual bounds on the likelihood with the scalability and efficiency of amortized inference, achieving improved posterior approximation and parameter learning by leveraging both model structure and neural inference networks. Originally introduced by Steinhardt and Miller (2019), ACP directly maximizes the evidence lower bound (ELBO) while encoding inductive structure from the generative process into the variational family, enabling efficient and robust inference even in data-scarce regimes (Yan et al., 2019).
1. Generative Structure: The Noisy-OR Model
ACP operates on discrete latent variable models such as the noisy-OR, where observed binary variables are generated by a collection of independent binary latent causes together with a fixed leak variable . The prior factorizes as , with as Bernoulli parameters. The conditional noisy-OR likelihood is given by
where with . Thus, the marginal log-likelihood is combinatorially intractable for exact inference (Yan et al., 2019).
2. Classical Conjugate Dual Variational Inference
To circumvent intractability, ACP starts from a classical approach: deriving a tractable upper bound for the problematic log-likelihood terms using Fenchel conjugates. For , the bound
is employed, with and the Fenchel conjugate of . For , the true likelihood is retained. This upper-bounded surrogate joint likelihood yields a tractable factorized variational posterior
with
where is the sigmoid function. The classical conjugate dual inference (CDI) approach optimizes each per datapoint by fixed-point iteration to tighten the bound on the marginal likelihood (Yan et al., 2019).
3. Amortized Conjugate Posterior: Formulation
Instead of relocating the variational parameters for each datapoint, ACP amortizes these parameters across the dataset using a neural network encoder, typically a multilayer perceptron (MLP) with two hidden layers (e.g., 200–400 ReLU units, no dropout). For each input , the encoder produces , which parameterizes the variational posterior in the analytic form above. This construction yields the amortized conjugate posterior
with as defined above but with amortized . The variational family, structured via conjugate duality, inherits inductive bias from the generative model, yielding enhanced generalization in low-data settings (Yan et al., 2019).
4. ELBO Objective and Training Procedure
ACP directly maximizes the ELBO:
Training proceeds via stochastic gradient ascent using Adam. For each batch of , the encoder outputs , the variational posteriors , and Gumbel-Softmax relaxation is used to enable reparameterization and backpropagation through discrete latents. Monte Carlo estimation (with sample per datapoint sufficing in practice) is used for the positive ELBO terms, while negative and KL terms are computed analytically. Gradients with respect to , , are taken, and parameters are updated until convergence (Yan et al., 2019).
| Step | Method | Details |
|---|---|---|
| Encoder | MLP | 2×200 ReLU, linear output, no dropout |
| Latent variable | Gumbel-Softmax | Annealed temperature 1.0→0.1 over 10k steps |
| Optimizer | Adam | Learning rate , batch size 200 |
5. Empirical Benchmarks and Comparative Analysis
Empirical studies across inference accuracy, parameter recovery, generative modeling, and real-world topic modeling evidence that ACP consistently matches or outperforms traditional amortized variational inference (AVI) and unconstrained stochastic variational inference (SVI). With , both ACP and AVI achieve high F1 (), with negative ELBOs of $14.4$ (ACP) and $14.0$ (AVI); for , ACP vastly outperforms AVI ( vs $37.2$, F1 vs ). For generative and parameter estimation tasks, ACP displays superior robustness and generalizes better, particularly in data-scarce regimes where AVI suffers from overfitting and SVI underperforms. In topic modeling of NeurIPS title data, ACP maintains high topic coherence (PMI $2.75$) even with fewer documents, whereas AVI’s coherence drops to $2.55$ (Yan et al., 2019).
6. Theoretical and Algorithmic Insights
ACP’s hybrid variational family—plugging analytic, model-derived posterior forms into a neural amortization architecture—injects generative structure into encoder learning, yielding posteriors aligned to the model’s statistical dependencies. This structure reduces the risk of overfitting, particularly when the number of examples is limited. A crucial observation is that maximizing tightness of the classical dual likelihood bound (as in CDI) does not ensure optimal ELBO or posterior quality, whereas ACP maximizes the ELBO directly without this misalignment. When sufficient data are available, ACP achieves theoretical parity with unconstrained AVI in flexibility, but its structural bias is advantageous in practical scenarios with limited data (Yan et al., 2019).
7. Extensions and Related Approaches
The ACP framework generalizes beyond the noisy-OR model to any latent variable model where conjugate dual upper bounds and tractable analytic posteriors can be derived. Related approaches, such as amortized Bayesian inference for clustering models (Pakman et al., 2018), also exploit the structure afforded by conjugacy within mixture models and Dirichlet process mixtures, using permutation-invariant neural architectures. These methods highlight a growing trend toward integrating analytic structure from classical statistics with the scalability and expressivity conferred by neural amortization. A plausible implication is that amortized conjugate posteriors enable efficient, i.i.d. approximate-posterior sampling at a computational cost competitive with classical MCMC, while capturing structural inductive biases (Yan et al., 2019, Pakman et al., 2018).