SeMIS: Sequential Multiple Importance Sampling

Updated 23 February 2026

SeMIS is a computational framework that combines multiple proposal distributions to efficiently estimate integrals and perform Bayesian inference.
It utilizes mixture estimators, sequential adaptation, and convex optimization to achieve notable variance reduction and robustness in challenging sampling scenarios.
Empirical results show significant gains, with up to 22× variance reduction and improved effective sample sizes in applications like rare event simulation and structural model updating.

Sequential Multiple Importance Sampling (SeMIS) is a class of algorithms designed to efficiently estimate integrals and perform Bayesian inference by sequentially combining samples from multiple adaptive proposal distributions. The SeMIS family achieves lower variance and greater robustness than classical importance sampling, leveraging mixture-based estimators, sequential adaptation, and, in some variants, convex optimization for control variate coefficients and mixture weights. Its main applications include rare event simulation, high-dimensional evidence estimation, and uncertainty quantification for complex multimodal or singular integrands.

1. Formal Definition and Problem Setup

SeMIS aims to compute expectations or integrals of the form

$\mu = \int f(x)p(x)\,dx,$

where $p$ is a nominal or target distribution (which may be the posterior in Bayesian settings), and $f$ is a test or payoff function that can be irregular or highly concentrated (e.g., in rare-event scenarios).

Given $J$ or more proposal densities $\{q_j\}_{j=1}^J$ , each absolutely continuous with respect to $p$ , SeMIS constructs a mixture density

$q_\alpha(x) = \sum_{j=1}^J \alpha_j q_j(x), \quad \text{with} \ \alpha \in S = \left\{\alpha_j \geq 0, \sum_j \alpha_j = 1 \right\}.$

Samples may be drawn IID from $q_\alpha$ , or generated via stratified allocation according to the $\alpha_j$ .

In high-dimensional Bayesian evidence estimation, SeMIS builds a sequence of proposals $\{q_i(\cdot)\}$ interpolating from the prior to a softly truncated likelihood-weighted prior, e.g.,

$q_i(\theta;\gamma_i) = p_i\ \pi(\theta)\ \min\left\{\frac{L(\theta)}{\gamma_i L_{\max}},\, 1\right\},$

with a hyperparameter sequence $0 = \gamma_0 < \gamma_1 < \dots < \gamma_{I-1} = 1$ , following the approach in (Binbin et al., 7 Jul 2025).

2. Core Estimators and Importance Weights

SeMIS uses mixture importance sampling (MIS) estimators of the form

$\hat{\mu}_\alpha = \frac{1}{n} \sum_{i=1}^n w(x_i)\,f(x_i), \quad w(x) = \frac{p(x)}{q_\alpha(x)},$

or, in the sequential context,

$\hat{\mu}_T = \frac{1}{\sum_{s=t_T+1}^T N_s} \sum_{s=t_T+1}^T \sum_{i=1}^{N_s} w_i^{(s)} h\left(x_i^{(s)}\right),$

where $w_i^{(s)}$ may use either the balance heuristic (summing over all proposal densities seen so far) or discarding–reweighting in which earlier poor proposals are assigned zero weight (Thijssen et al., 2018).

For Bayesian inference, the evidence is estimated as

$\widehat Z_{\rm MIS} = \sum_{i=0}^{I-1} \frac{1}{N_i} \sum_{k=1}^{N_i} \frac{L(\theta_{i,k})\,\pi(\theta_{i,k})}{\sum_{j=0}^{I-1} N_j q_j(\theta_{i,k};\gamma_j)},$

with all generated samples entering the estimator, weighted by their normalized contributions (Binbin et al., 7 Jul 2025).

3. Variance Reduction, Regret Bounds, and Control Variates

SeMIS exploits variance reduction via both optimal proposal mixture weighting and (where applicable) control variates. The variance of the MIS estimator is (He et al., 2014):

$\operatorname{Var}(\hat{I}_\alpha) = \frac{1}{n} \sigma^2_\alpha, \quad \sigma^2_\alpha = \int \frac{[f(x)p(x) - \mu q_\alpha(x)]^2}{q_\alpha(x)} dx.$

A general regret bound holds: for any mixture $\alpha$ and the optimal control variate $\beta^*$ for that mixture,

$\sigma^2_{\alpha, \beta^*} \leq \min_k \frac{\sigma^2_{q_k, \beta_k}}{\alpha_k},$

implying that the uniform mixture ( $\alpha_k = 1/J$ ) suffers at most a factor $J$ increase in variance compared to the best single proposal, but can be overly conservative when $J \gg 1$ and only a few $q_j$ are effective.

Optimal choice of mixture probabilities and control variate coefficients is enabled by the joint convexity of the variance in $(\alpha, \beta)$ , and can be practically computed by convex optimization (He et al., 2014).

4. Sequential and Adaptive Algorithmic Frameworks

Distinct SeMIS variants exist, with common elements:

Two-stage optimization and refinement: A pilot stage samples from an initial mixture (often uniform), then fits mixture weights and control variates via convex optimization; the main stage draws further samples from the optimized mixture for the final estimator (He et al., 2014).
Sequential adaptation: The proposal $q_t$ at round $t$ is adapted using information from all past samples and weights, followed by drawing new samples and constructing appropriate importance weights via either the full balance heuristic or discarding of poor early proposals (Thijssen et al., 2018).
Soft truncation for multimodal posteriors: In high-dimensional Bayesian inference, intermediate proposals interpolate between prior and posterior using softly truncated priors, maintaining tail connectivity for effective mode mixing (Binbin et al., 7 Jul 2025).
Weight computation: Mixture-type estimators use balance heuristic weights, while discarding–reweighting reduces computational overhead by discarding earlier samples and focusing on recent proposals.

A representative pseudocode structure for SeMIS (Binbin et al., 7 Jul 2025):

Draw samples from prior or initial proposal.
Adaptively construct proposal sequence $\{q_i\}$ by tuning truncation parameters for specified acceptance probabilities.
For each stage:
- Seed new proposals using accepted samples from the previous stage.
- Generate new samples by MCMC (e.g., elliptical slice sampling).
- Update importance weights via balance heuristic.
Aggregate estimates using all samples and stage-specific weights.
Optionally, resample from the pooled set for approximate posterior draws.

5. Computational Cost and Consistency

SeMIS algorithms differ in per-iteration cost:

The balance heuristic requires $\mathcal{O}(MT)$ operations per round (with $M$ new samples and $T$ total rounds), totaling $\mathcal{O}(MT^2)$ .
Discarding–reweighting reduces this to $\mathcal{O}(M)$ per round, i.e., $\mathcal{O}(MT)$ overall, by keeping only a subset of the sample blocks and updating denominators incrementally (Thijssen et al., 2018).

SeMIS estimators remain consistent under both approaches: under mild regularity conditions (e.g., dominated targets, bounded $L^r$ moments), the estimator converges almost surely to the desired expectation as total sample size grows. This holds with fixed or adaptive discarding schedules, provided the number of retained samples diverges (Thijssen et al., 2018).

6. Empirical Performance and Applications

SeMIS achieves substantial gains in variance reduction and effective sample size:

In integrals with singularities or rare events, tailored mixture weights and control variates yield variance reduction factors up to $19\times$ (optimized mixture, no CV) or $22\times$ (optimized mixture plus CV) vs. plain Monte Carlo; uniform mixture yields no improvement (He et al., 2014).
In high-dimensional, multimodal Bayesian inference (e.g., Eggbox and Gaussian-shells problems, up to 20D), SeMIS yields lowest bias (<1%), lowest coefficient of variation (0.1–1.5%), and the best K–S statistics for posterior marginals compared to subset simulation (SuS) and adaptive BUS (aBUS). Effective sample size per likelihood evaluation is frequently doubled or tripled relative to comparators (Binbin et al., 7 Jul 2025).
In practical engineering applications such as finite element model updating, SeMIS localizes structural stiffness loss and quantifies uncertainty even under incomplete measurement scenarios, revealing multimodal posteriors when data do not uniquely determine the parameters (Binbin et al., 7 Jul 2025).
In diffusion process simulation, optimized discarding-AMIS matches the effective sample size of full balance-AMIS but with an order of magnitude less CPU time (Thijssen et al., 2018).

7. Practical Guidance and Theoretical Significance

For practitioners:

A budget of several hundred samples per stage, level probability $p \sim 0.1$ , and balance heuristic weights is generally effective.
Convex optimization for mixture weights and control variates should be performed with safety lower bounds on weights and, if needed, marginal relaxation to ensure feasibility.
For Bayesian updating with high-dimensional or multimodal posteriors, softly truncated proposals and adaptive resampling via SeMIS are recommended; elliptical-slice MCMC in whitened coordinates is a robust default kernel (Binbin et al., 7 Jul 2025).
Monitoring variance, effective sample size, and estimator stability enables adaptive stopping.

The theoretical significance of SeMIS lies in its convex-analytic foundation for variance minimization, its formal consistency guarantees under broad conditions, and its analytical regret bounds relative to unknown optimal proposals (He et al., 2014, Thijssen et al., 2018). Its sequential, adaptive structure enables scalable and robust inference in high-dimensional, multimodal, and rare-event regimes, making it a core tool for advanced Monte Carlo computation.