Adaptive Gaussian Mixture MH

Updated 1 March 2026

Fully Adaptive Gaussian Mixture MH (AGM-MH) is an adaptive MCMC method that recursively updates a mixture of Gaussian proposals to approximate complex target distributions.
It dynamically adjusts weights, means, and covariances using the full sample history, lowering autocorrelation and enhancing mixing for multi-modal and high-dimensional data.
Empirical results show that AGM-MH achieves lower estimation error and faster convergence compared to traditional nonadaptive Metropolis–Hastings methods, forming a basis for advanced variants like AIMM.

Fully Adaptive Gaussian Mixture Metropolis–Hastings (AGM-MH) is a class of independent Metropolis–Hastings algorithms employing a proposal distribution modeled as a mixture of Gaussian components. Its key innovation is the simultaneous, recursive adaptation of all mixture parameters (weights, means, and covariances) using the entire sample history, with the explicit goal of improving efficiency for multi-modal and high-dimensional target distributions. The proposal is dynamically refined to progressively approximate the target density, thus lowering autocorrelation and enhancing mixing. AGM-MH is foundational in adaptive MCMC and serves as the underlying framework for algorithms such as Adaptive Incremental Mixture MCMC (AIMM) (Luengo et al., 2012, Maire et al., 2016).

1. Algorithmic Structure and Proposal Design

At each iteration $t$ , the AGM-MH proposal is a $K$ -component Gaussian mixture,

$q_t(x') = \sum_{i=1}^K w_i(t)\, \mathcal{N}(x'; \mu_i(t), \Sigma_i(t)),$

where $w_i(t)$ , $\mu_i(t)$ , and $\Sigma_i(t)$ are the adaptive weights, means, and covariance matrices, respectively. The proposal is independent of the current chain position, and each new sample is accepted with probability

$\alpha = \min\left\{1, \frac{\pi(x')\, q_t(x_t)}{\pi(x_t)\, q_t(x')}\right\},$

where $\pi(x)$ is the (unnormalized) target density (Luengo et al., 2012).

In the AIMM framework, the proposal generalizes dynamically:

$q_n(x) = \omega_n Q_0(x) + (1 - \omega_n) \sum_{\ell=1}^{M_n} \frac{\beta_\ell}{\sum_{j=1}^{M_n} \beta_j}\, \phi_\ell(x),$

with $Q_0$ a defensive Gaussian or broad prior component, $\phi_\ell(x)$ Gaussian increments, and both the number of components ( $M_n+1$ ) and their parameters determined online (Maire et al., 2016).

2. Parameter Adaptation and Recursive Formulas

Parameter updates in AGM-MH are performed recursively using accepted samples:

At each step, assign the newly accepted sample $x_{t+1}$ to the closest component $j = \arg\min_i \|x_{t+1} - \mu_i(t)\|^2$ .
Update the mean:

$\mu_j(t + 1) = \frac{1}{m_j} x_{t+1} + \left(1 - \frac{1}{m_j}\right)\mu_j(t),$

where $m_j$ is the number of samples assigned to component $j$ .

Update the covariance:

$\Sigma_j(t + 1) = \frac{1}{m_j - 1}\left( \frac{(x_{t+1} - \mu_j(t+1))(x_{t+1} - \mu_j(t+1))^T}{m_j} + \epsilon I \right) + \frac{m_j - 2}{m_j - 1} \Sigma_j(t),$

with a ridge parameter $\epsilon$ ensuring positive definiteness.

Update weights:

$w_i(t + 1) = \frac{m_i}{t + K + 1}.$

For AIMM, a new Gaussian component is added when local discrepancy $W_n(X_{n+1}) = \pi(X_{n+1})/q_n(X_{n+1})$ exceeds a threshold $W$ . The new component’s mean is the current proposal $X_{n+1}$ , its covariance is estimated from the neighborhood of $X_{n+1}$ (using Mahalanobis distance), and its unnormalized weight is set by $\beta_{M_n+1} = [\pi(X_{n+1})]^\gamma$ with $\gamma \in (0, 1)$ (Luengo et al., 2012, Maire et al., 2016).

3. Initialization and Practical Guidelines

Effective performance requires choices for the number of mixture components, their initialization, adaptation timescales, and regularization:

Component number $K$ : For fixed- $K$ AGM-MH, $K \approx 10–20$ (or proportional to the anticipated number of modes); in AIMM, $K$ grows adaptively with added components.
Initial means $\mu_i(0)$ : Distributed around expected modes if prior information is available; otherwise, scattered randomly over a large support.
Initial covariances $\Sigma_i(0)$ : Typically $\sigma_0^2 I_n$ with $\sigma_0$ large to ensure global exploration.
Training length $T_\text{train}$ : Sufficiently long, e.g., $100 \cdot d$ iterations, so each component accrues samples before adaptation stabilizes.
Stopping time $T_\text{stop}$ : Either the total budget $T_\text{tot}$ or earlier, with vanishing adaptation guaranteeing ergodicity.
Ridge parameter $\epsilon$ : Small (e.g., $10^{-6}$ – $10^{-3}$ ) to prevent degeneracy.
AIMM-specific tuning: Discrepancy threshold $W \approx d$ , neighborhood scale $\tau \in [0.3, 0.7]$ , and unnormalized weight exponent $\gamma \in (0.3, 0.7)$ (Luengo et al., 2012, Maire et al., 2016).

Unused components tend to $w_i \to 0$ and may be pruned to reduce computational cost in fixed- $K$ settings.

4. Convergence and Ergodicity

AGM-MH, both in its fixed and incremental forms, is designed to be ergodic with respect to $\pi(x)$ . The adaptation mechanism satisfies the "diminishing adaptation" criterion, as updates scale as $1/m_j$ and the Law of Large Numbers ensures these updates vanish over time. Containment is enforced by the ridge parameter $\epsilon$ in the covariance matrices, maintaining strictly positive-definite $\Sigma_i$ . Standard results [Roberts & Rosenthal 2007] guarantee ergodicity provided adaptation vanishes and the proposal remains well-behaved.

AIMM generalizes this result, with explicit theorems for both unbounded and compact parameter spaces. Under conditions such as lower bounds on covariance determinants, subexponential tails for $Q_0$ , and an upper bound on component number in compact spaces, both diminishing adaptation and containment hold, ensuring convergence in total variation to the target (Luengo et al., 2012, Maire et al., 2016).

5. Computational Complexity

The per-iteration cost for fixed- $K$ AGM-MH is $O(K d^2)$ , dominated by:

Sampling: $O(K d^2)$ for generating a mixture sample.
Evaluation: $O(K d^2)$ per density evaluation.
Component search: $O(K d)$ for finding the closest mean.
Covariance update: $O(d^2)$ per update (Luengo et al., 2012).

Empirically, as adaptation proceeds, many $w_i$ decay, and pruning or "moving window" techniques (especially in AIMM, via f-AIMM) can further limit computational overhead (Maire et al., 2016).

6. Numerical Performance and Comparative Results

Empirical evaluation demonstrates that AGM-MH achieves substantially lower sample autocorrelation and more accurate estimation relative to nonadaptive Metropolis–Hastings using the same initial proposal, with only mild extra computational cost. Representative findings include:

One-dimensional bimodal: With $K=2$ $K = 2$ ,
- Nonadaptive MH autocorrelation: $\sim 0.78$
- AGM-MH autocorrelation: $\sim 0.18$
- Mean-square error (MSE) on mean: $\sim 1.5 \times 10^{-3}$
- Final component properties: means $\sim \pm 1.88$ , variances $\sim 0.16$ , weights $\sim 0.5$
One-dimensional $M$ -component mixtures:
- MSEs for normalizing constant estimation decay with $M$ , e.g., $1.6 \times 10^{-4}$ ( $M=2$ ), $2 \times 10^{-5}$ ( $M=6$ )
- AGM-MH autocorrelations: $0.13–0.16$ vs $0.46–0.81$ for nonadaptive MH
- Acceptance rates improve after adaptation
Two-dimensional mixtures:
- With $K=2$ , parameters converge quickly to true modes and covariances
- With $K=10$ , only those near modes adapt, unused components become inactive

AIMM and its variant f-AIMM demonstrate competitive or superior performance to adaptive random-walk Metropolis and fixed-mixture AGM-MH for high-dimensional, multimodal, or heavy-tailed targets, with additional algorithmic flexibility in tuning adaptation rates and controlling proposal complexity (Luengo et al., 2012, Maire et al., 2016).

7. Extensions and Theoretical Variants

AGM-MH constitutes a foundational class upon which algorithms such as AIMM are constructed. In AIMM, the number of mixture components is not fixed but augments adaptively in response to local coverage deficiencies, as signaled by large local discrepancy. Efficient local covariance estimation and online weight updates ensure the proposal remains flexible and can concentrate on relevant regions of the target.

Theoretical guarantees extend to unbounded or compact state spaces under routine conditions. Heuristic and empirical strategies—such as capping component numbers ("moving window"), adapting discrepancy thresholds, and rescaling weights—are documented for practical efficiency. These approaches further enable the linearization of computational cost with increasing component numbers (Maire et al., 2016).

For detailed algorithms, theoretical proofs, and experimental protocols, see (Luengo et al., 2012) ("Fully Adaptive Gaussian Mixture Metropolis-Hastings Algorithm") and (Maire et al., 2016) ("Adaptive Incremental Mixture Markov chain Monte Carlo").

Markdown Report Issue Upgrade to Chat

References (2)

Fully Adaptive Gaussian Mixture Metropolis-Hastings Algorithm (2012)

Adaptive Incremental Mixture Markov chain Monte Carlo (2016)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Fully Adaptive Gaussian Mixture MH (AGM-MH).