Papers
Topics
Authors
Recent
Search
2000 character limit reached

Adaptive Gaussian Mixture MH

Updated 1 March 2026
  • Fully Adaptive Gaussian Mixture MH (AGM-MH) is an adaptive MCMC method that recursively updates a mixture of Gaussian proposals to approximate complex target distributions.
  • It dynamically adjusts weights, means, and covariances using the full sample history, lowering autocorrelation and enhancing mixing for multi-modal and high-dimensional data.
  • Empirical results show that AGM-MH achieves lower estimation error and faster convergence compared to traditional nonadaptive Metropolis–Hastings methods, forming a basis for advanced variants like AIMM.

Fully Adaptive Gaussian Mixture Metropolis–Hastings (AGM-MH) is a class of independent Metropolis–Hastings algorithms employing a proposal distribution modeled as a mixture of Gaussian components. Its key innovation is the simultaneous, recursive adaptation of all mixture parameters (weights, means, and covariances) using the entire sample history, with the explicit goal of improving efficiency for multi-modal and high-dimensional target distributions. The proposal is dynamically refined to progressively approximate the target density, thus lowering autocorrelation and enhancing mixing. AGM-MH is foundational in adaptive MCMC and serves as the underlying framework for algorithms such as Adaptive Incremental Mixture MCMC (AIMM) (Luengo et al., 2012, Maire et al., 2016).

1. Algorithmic Structure and Proposal Design

At each iteration tt, the AGM-MH proposal is a KK-component Gaussian mixture,

qt(x)=i=1Kwi(t)N(x;μi(t),Σi(t)),q_t(x') = \sum_{i=1}^K w_i(t)\, \mathcal{N}(x'; \mu_i(t), \Sigma_i(t)),

where wi(t)w_i(t), μi(t)\mu_i(t), and Σi(t)\Sigma_i(t) are the adaptive weights, means, and covariance matrices, respectively. The proposal is independent of the current chain position, and each new sample is accepted with probability

α=min{1,π(x)qt(xt)π(xt)qt(x)},\alpha = \min\left\{1, \frac{\pi(x')\, q_t(x_t)}{\pi(x_t)\, q_t(x')}\right\},

where π(x)\pi(x) is the (unnormalized) target density (Luengo et al., 2012).

In the AIMM framework, the proposal generalizes dynamically:

qn(x)=ωnQ0(x)+(1ωn)=1Mnβj=1Mnβjϕ(x),q_n(x) = \omega_n Q_0(x) + (1 - \omega_n) \sum_{\ell=1}^{M_n} \frac{\beta_\ell}{\sum_{j=1}^{M_n} \beta_j}\, \phi_\ell(x),

with Q0Q_0 a defensive Gaussian or broad prior component, ϕ(x)\phi_\ell(x) Gaussian increments, and both the number of components (Mn+1M_n+1) and their parameters determined online (Maire et al., 2016).

2. Parameter Adaptation and Recursive Formulas

Parameter updates in AGM-MH are performed recursively using accepted samples:

  • At each step, assign the newly accepted sample xt+1x_{t+1} to the closest component j=argminixt+1μi(t)2j = \arg\min_i \|x_{t+1} - \mu_i(t)\|^2.
  • Update the mean:

μj(t+1)=1mjxt+1+(11mj)μj(t),\mu_j(t + 1) = \frac{1}{m_j} x_{t+1} + \left(1 - \frac{1}{m_j}\right)\mu_j(t),

where mjm_j is the number of samples assigned to component jj.

  • Update the covariance:

Σj(t+1)=1mj1((xt+1μj(t+1))(xt+1μj(t+1))Tmj+ϵI)+mj2mj1Σj(t),\Sigma_j(t + 1) = \frac{1}{m_j - 1}\left( \frac{(x_{t+1} - \mu_j(t+1))(x_{t+1} - \mu_j(t+1))^T}{m_j} + \epsilon I \right) + \frac{m_j - 2}{m_j - 1} \Sigma_j(t),

with a ridge parameter ϵ\epsilon ensuring positive definiteness.

  • Update weights:

wi(t+1)=mit+K+1.w_i(t + 1) = \frac{m_i}{t + K + 1}.

For AIMM, a new Gaussian component is added when local discrepancy Wn(Xn+1)=π(Xn+1)/qn(Xn+1)W_n(X_{n+1}) = \pi(X_{n+1})/q_n(X_{n+1}) exceeds a threshold WW. The new component’s mean is the current proposal Xn+1X_{n+1}, its covariance is estimated from the neighborhood of Xn+1X_{n+1} (using Mahalanobis distance), and its unnormalized weight is set by βMn+1=[π(Xn+1)]γ\beta_{M_n+1} = [\pi(X_{n+1})]^\gamma with γ(0,1)\gamma \in (0, 1) (Luengo et al., 2012, Maire et al., 2016).

3. Initialization and Practical Guidelines

Effective performance requires choices for the number of mixture components, their initialization, adaptation timescales, and regularization:

  • Component number KK: For fixed-KK AGM-MH, K1020K \approx 10–20 (or proportional to the anticipated number of modes); in AIMM, KK grows adaptively with added components.
  • Initial means μi(0)\mu_i(0): Distributed around expected modes if prior information is available; otherwise, scattered randomly over a large support.
  • Initial covariances Σi(0)\Sigma_i(0): Typically σ02In\sigma_0^2 I_n with σ0\sigma_0 large to ensure global exploration.
  • Training length TtrainT_\text{train}: Sufficiently long, e.g., 100d100 \cdot d iterations, so each component accrues samples before adaptation stabilizes.
  • Stopping time TstopT_\text{stop}: Either the total budget TtotT_\text{tot} or earlier, with vanishing adaptation guaranteeing ergodicity.
  • Ridge parameter ϵ\epsilon: Small (e.g., 10610^{-6}10310^{-3}) to prevent degeneracy.
  • AIMM-specific tuning: Discrepancy threshold WdW \approx d, neighborhood scale τ[0.3,0.7]\tau \in [0.3, 0.7], and unnormalized weight exponent γ(0.3,0.7)\gamma \in (0.3, 0.7) (Luengo et al., 2012, Maire et al., 2016).

Unused components tend to wi0w_i \to 0 and may be pruned to reduce computational cost in fixed-KK settings.

4. Convergence and Ergodicity

AGM-MH, both in its fixed and incremental forms, is designed to be ergodic with respect to π(x)\pi(x). The adaptation mechanism satisfies the "diminishing adaptation" criterion, as updates scale as 1/mj1/m_j and the Law of Large Numbers ensures these updates vanish over time. Containment is enforced by the ridge parameter ϵ\epsilon in the covariance matrices, maintaining strictly positive-definite Σi\Sigma_i. Standard results [Roberts & Rosenthal 2007] guarantee ergodicity provided adaptation vanishes and the proposal remains well-behaved.

AIMM generalizes this result, with explicit theorems for both unbounded and compact parameter spaces. Under conditions such as lower bounds on covariance determinants, subexponential tails for Q0Q_0, and an upper bound on component number in compact spaces, both diminishing adaptation and containment hold, ensuring convergence in total variation to the target (Luengo et al., 2012, Maire et al., 2016).

5. Computational Complexity

The per-iteration cost for fixed-KK AGM-MH is O(Kd2)O(K d^2), dominated by:

  • Sampling: O(Kd2)O(K d^2) for generating a mixture sample.
  • Evaluation: O(Kd2)O(K d^2) per density evaluation.
  • Component search: O(Kd)O(K d) for finding the closest mean.
  • Covariance update: O(d2)O(d^2) per update (Luengo et al., 2012).

Empirically, as adaptation proceeds, many wiw_i decay, and pruning or "moving window" techniques (especially in AIMM, via f-AIMM) can further limit computational overhead (Maire et al., 2016).

6. Numerical Performance and Comparative Results

Empirical evaluation demonstrates that AGM-MH achieves substantially lower sample autocorrelation and more accurate estimation relative to nonadaptive Metropolis–Hastings using the same initial proposal, with only mild extra computational cost. Representative findings include:

  • One-dimensional bimodal: With K=2K=2,
    • Nonadaptive MH autocorrelation: 0.78\sim 0.78
    • AGM-MH autocorrelation: 0.18\sim 0.18
    • Mean-square error (MSE) on mean: 1.5×103\sim 1.5 \times 10^{-3}
    • Final component properties: means ±1.88\sim \pm 1.88, variances 0.16\sim 0.16, weights 0.5\sim 0.5
  • One-dimensional MM-component mixtures:
    • MSEs for normalizing constant estimation decay with MM, e.g., 1.6×1041.6 \times 10^{-4} (M=2M=2), 2×1052 \times 10^{-5} (M=6M=6)
    • AGM-MH autocorrelations: $0.13–0.16$ vs $0.46–0.81$ for nonadaptive MH
    • Acceptance rates improve after adaptation
  • Two-dimensional mixtures:
    • With K=2K=2, parameters converge quickly to true modes and covariances
    • With K=10K=10, only those near modes adapt, unused components become inactive

AIMM and its variant f-AIMM demonstrate competitive or superior performance to adaptive random-walk Metropolis and fixed-mixture AGM-MH for high-dimensional, multimodal, or heavy-tailed targets, with additional algorithmic flexibility in tuning adaptation rates and controlling proposal complexity (Luengo et al., 2012, Maire et al., 2016).

7. Extensions and Theoretical Variants

AGM-MH constitutes a foundational class upon which algorithms such as AIMM are constructed. In AIMM, the number of mixture components is not fixed but augments adaptively in response to local coverage deficiencies, as signaled by large local discrepancy. Efficient local covariance estimation and online weight updates ensure the proposal remains flexible and can concentrate on relevant regions of the target.

Theoretical guarantees extend to unbounded or compact state spaces under routine conditions. Heuristic and empirical strategies—such as capping component numbers ("moving window"), adapting discrepancy thresholds, and rescaling weights—are documented for practical efficiency. These approaches further enable the linearization of computational cost with increasing component numbers (Maire et al., 2016).


For detailed algorithms, theoretical proofs, and experimental protocols, see (Luengo et al., 2012) ("Fully Adaptive Gaussian Mixture Metropolis-Hastings Algorithm") and (Maire et al., 2016) ("Adaptive Incremental Mixture Markov chain Monte Carlo").

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Fully Adaptive Gaussian Mixture MH (AGM-MH).