Papers
Topics
Authors
Recent
Search
2000 character limit reached

Rejection Sampling Method

Updated 2 February 2026
  • Rejection-sampling-based methods are Monte Carlo techniques that sample from target distributions using tractable proposals and accept/reject steps to ensure correctness.
  • They enable efficient sequential Bayesian updates by tracking posterior moments, drastically reducing memory requirements in high-dimensional settings.
  • Extensions such as envelope clipping in approximate rejection sampling balance computational efficiency with controlled estimation error for online inference.

A rejection-sampling-based method comprises a class of Monte Carlo algorithms for drawing independent samples from a target probability distribution, typically specified up to normalization, by leveraging samples from a tractable proposal (or envelope) distribution and an accept/reject criterion based on a ratio of densities. The method is foundational in computational statistics, Bayesian inference, and scientific simulation, with theoretical guarantees on correctness and broad applicability to both discrete and continuous spaces. Its modern formulations address efficiency, adaptivity, memory savings, extensions beyond log-concave domains, and integration with particle filtering for online sequential inference.

1. Classical Rejection Sampling: Principle and Algorithm

Given a latent parameter θ\theta with known prior p(θ)p(\theta) and likelihood p(xθ)p(x \mid \theta), the goal is to generate samples from the posterior p(θx)p(xθ)p(θ)p(\theta \mid x) \propto p(x \mid \theta)p(\theta). The standard rejection sampling algorithm proceeds as follows:

  1. Setup: Choose a proposal density q(θ)q(\theta) such that Mq(θ)p(xθ)p(θ)Mq(\theta) \geq p(x \mid \theta)p(\theta) for all θ\theta, with minimal Msupθp(xθ)p(θ)/q(θ)M \geq \sup_\theta p(x \mid \theta)p(\theta)/q(\theta). In Bayesian contexts, often q(θ)=p(θ)q(\theta) = p(\theta).
  2. Sampling: Repeatedly:
    • Draw θq(θ)\theta \sim q(\theta).
    • Draw uUniform[0,1]u \sim \text{Uniform}[0,1].
    • Accept θ\theta if up(xθ)p(θ)/(Mq(θ))u \leq p(x \mid \theta)p(\theta)/(M q(\theta)); else reject and repeat.
  3. Correctness: Accepted samples are exactly distributed according to the target posterior.

The acceptance probability is p(x)/Mp(x)/M, where p(x)=p(xθ)p(θ)dθp(x) = \int p(x \mid \theta)p(\theta)d\theta. Although exact, classical rejection sampling is intractable for sharply peaked likelihoods or high-dimensional θ\theta since MM can grow rapidly, lowering acceptance exponentially (Wiebe et al., 2015).

2. Rejection Filtering: Moment Tracking with Particle Updates

Rejection filtering augments rejection sampling by updating only low-order posterior moments (mean μ\mu, covariance Σ\Sigma) with each batch of accept/reject trials, instead of storing all accepted samples:

  • For mm trials, accumulate sums M0M \leftarrow 0, S0S \leftarrow 0, count NaN_a.
  • For accepted θi\theta_i, update MM+θiM \leftarrow M + \theta_i and SS+θiθiS \leftarrow S + \theta_i \theta_i^\top; NaNa+1N_a \leftarrow N_a + 1.
  • At batch end: if Na>0N_a> 0, estimate μnew=M/Na\mu_{\text{new}} = M/N_a, Σnew=[SNaμnewμnew]/(Na1)\Sigma_{\text{new}} = [S - N_a \mu_{\text{new}} \mu_{\text{new}}^\top]/(N_a - 1). If Na=0N_a=0, inflate Σ\Sigma by (1+r)(1+r).

This "particle filtering" for moments delivers an O(D2log(1/ϵ))O(D^2 \log(1/\epsilon)) memory cost—dramatically less than O(NaD)O(N_a D) for storing all samples, where DD is parameter dimension and ϵ\epsilon desired accuracy (Wiebe et al., 2015).

3. Approximate Rejection Sampling: Envelope Clipping

Exact M=supθp(xθ)/q(θ)M = \sup_\theta p(x \mid \theta)/q(\theta) may be impractically large or unknown. Approximate rejection sampling uses an envelope κE\kappa_E with κE<supθp(xθ)\kappa_E < \sup_\theta p(x \mid \theta), accepting with probability min{p(xθ)/κE,1}\min\{p(x \mid \theta)/\kappa_E, 1\}. Over-acceptance occurs for θ\theta where p(xθ)>κEp(x \mid \theta) > \kappa_E, but total error is controlled:

If the over-accepted mass δ\delta is small, the Hellinger distance between the approximate and true posterior is O(δ)O(\sqrt{\delta}) (Wiebe et al., 2015). Acceptance rate remains at least p(x)(1δ)/κEp(x)(1-\delta)/\kappa_E.

4. Complete Rejection Filtering Algorithms

Exact Rejection Filtering (RFUpdate):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Input: prior p(θ), prior moments (μ, Σ), evidence E, envelope κ_E, trials m, recovery r
Output: posterior moments (μ_new, Σ_new), N_a

M←0, S←0, N_a←0
for i = 1..m:
    θ ∼ p(θ), u∼Uniform(0,1)
    if u ≤ p(E|θ)/κ_E:
        M←M+θ, S←S+θ θᵀ, N_a←N_a+1
if N_a>0:
    μ_new←M/N_a
    Σ_new←[S−N_a μ_new μ_newᵀ]/(N_a−1)
    return (μ_new,Σ_new,N_a)
else:
    μ_new←μ
    Σ_new←(1+r)Σ
    return (μ_new,Σ_new,0)

Approximate Rejection Filtering: As above, with accept-test umin(p(Eθ)/κE,1)u \leq \min(p(E|θ)/κ_E, 1).

5. Computational Guarantees and Empirical Performance

Under efficient prior and uniform sampling, and bounded p(Eθ)/κE[0,1]{p(E|\theta)} / \kappa_E \in [0,1], the algorithm runs in O(mD3)O(m D^3) time per update and O(D2log(D/ϵ))O(D^2 \log(D/\epsilon)) bits to ϵ\epsilon-precision. Memory savings over particle filters are significant in high dimensional or streaming settings.

Empirical evaluations demonstrate:

A. Frequency Tracking (active experiments):

  • Unknown oscillator phase ϕ(tk)\phi(t_k).
  • With m100m\approx 100 proposals per update and <1<1 kbit memory, rejection filtering achieves RMSE π/120\approx \pi/120, matching or exceeding SMC (Liu–West) with 10410^4 particles (100× more memory).

B. MNIST Classification (feature querying):

  • Active selection of pixel intensities.
  • Stopping thresholds δ{0.1,0.01,0.001}\delta \in \{0.1, 0.01, 0.001\}, error rates 12%1-2\% (even/odd), outperforming kNN under identical feature budgets.
  • Feature importance naturally arises from query frequencies; removing lowest-frequency pixels preserves >99%> 99\% accuracy.

Key findings: rejection filtering supports sequential Bayesian updates on memory-constrained devices, enables active learning and feature selection, and is robust even with loose envelope choices (Wiebe et al., 2015).

6. Advantages, Limitations, and Domains of Application

Advantages:

  • Memory efficiency: O(D2log(1/ϵ)D^2 \log(1/\epsilon)), orders of magnitude superior for large DD.
  • Asymptotic correctness: Maintains correct posterior summaries under exact envelope.
  • Robustness to approximate envelopes: Controlled error when envelope bounds are not tight.
  • Online and active settings: Applicable to time-dependent, data-adaptive Bayesian inference.

Limitations:

  • Very small acceptance probabilities in high dimensions or extremely peaked likelihoods may still pose practical barriers.
  • Parametric approximation: Assumes that posterior is well captured by first/second moments, e.g., Gaussian form; multimodal or strongly non-Gaussian posteriors may require other strategies.

Domain of Application:

  • Sequential Bayesian parameter estimation, streaming inference, online classification, and settings where both computational and storage costs are constraints.

7. Summary of the Rejection-Sampling-Based Method Innovations

Rejection filtering generalizes classical rejection sampling by incrementally updating estimates of posterior moments via efficient trial batches, not cloud storage of samples. With both exact and envelope-clipped variants, it dramatically reduces required memory, preserves correctness, and is robust to envelope misspecification. Its empirical performance matches or exceeds standard particle filters in time-dependent and feature-selection scenarios, enabling efficient Bayesian learning under compute/memory constraints and online inference (Wiebe et al., 2015).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Rejection-Sampling-Based Method.