Nested Particle Filters for Online Inference

Updated 9 November 2025

Nested Particle Filters (NPFs) are a two-level sequential Monte Carlo method that uses coupled outer parameter particles and inner state particles for online Bayesian inference.
NPFs recursively update weighted particle sets to approximate both the parameter posterior and state trajectories, enabling efficient handling of high-dimensional and non-Markovian models.
The method offers rigorous convergence guarantees and lower computational complexity compared to alternatives like SMC², making it attractive for real-time parameter learning and experimental design.

A nested particle filter (NPF) is a two-level, fully sequential Monte Carlo method designed for efficient online Bayesian inference in state-space models with unknown static parameters. By explicitly maintaining two coupled populations of weighted particles—one for the parameters and one for the latent state trajectories—NPFs provide consistent approximations to the sequence of posterior probability measures over both parameters and system states, with rigorous guarantees for convergence and computational complexity. NPFs achieve online, recursive computation and are especially effective in high-dimensional and non-Markovian models where standard particle filters and single-level SMC methods degenerate or become computationally intractable.

1. Model Structure and Mathematical Notation

Consider a discrete-time state-space Markov model indexed by a static parameter $\theta \in D_\theta \subset \mathbb{R}^{d_\theta}$ :

$X_0 \sim \tau_0(dx)$ ,
$X_t\mid X_{t-1}=x \sim \tau_{t,\theta}(dx|x)$ ,
$Y_t\mid X_t=x \sim g_{t,\theta}(y_t|x)$ .

For fixed $\theta$ , define:

the filter: $\phi_{t,\theta}(dx) = P(X_t \in dx \mid Y_{1:t}, \theta)$ ,
the predictive: $\xi_{t,\theta}(dx) = P(X_t \in dx \mid Y_{1:t-1},\theta) = \tau_{t,\theta} \phi_{t-1,\theta}$ .

The primary objective is to recursively approximate the parameter posterior $\mu_t(d\theta) = P(d\theta \mid Y_{1:t})$ and, if required, the joint posterior $\pi_t(d\theta,dx) = P(d\theta,dx \mid Y_{1:t})$ .

2. Nested Particle Filter Algorithm

An NPF maintains:

$N$ "outer" particles $\{\theta_t^{(i)}\}_{i=1}^N$ with weights $W_t^{(i)}$ approximating $\mu_t$ ;
For each $\theta$ -particle, an "inner" particle filter of size $M$ approximating the state filter $\phi_{t,\theta_t^{(i)}}$ .

At each time step $t$ :

Parameter jitter (rejuvenation): For $i=1...N$ , sample $\bar\theta_t^{(i)} \sim \kappa_N(d\theta|\theta_{t-1}^{(i)})$ , a jittering kernel with $\mathrm{Var}(\kappa_N) = O(1/N^\alpha)$ .
Inner filter update: Treat $\{x_{t-1}^{(i,j)}\}_{j=1}^M$ ${x_{t - 1}^{(i, j)}}_{j = 1}^{M}$ as samples from $\phi_{t-1,\bar\theta_t^{(i)}}$ $ϕ_{t - 1, \overset{ˉ}{θ}_{t}^{(i)}}$ (by continuity of $\phi$ $ϕ$ ). For $j=1...M$ $j = 1... M$ :
- Propagate: $\bar x_t^{(i,j)} \sim \tau_{t,\bar\theta_t^{(i)}}(\cdot|x_{t-1}^{(i,j)})$ ,
- Weight: $w_{t|t-1}^{(j,i)} \propto g_{t,\bar\theta_t^{(i)}}(y_t|\bar x_t^{(i,j)})$ ,
- Normalize weights, resample to obtain $x_t^{(i,j)}$ .
- Compute prediction empirical measure: $\xi_{t,\bar\theta_t^{(i)}}^M = \frac{1}{M}\sum_{j=1}^M \delta_{\bar x_t^{(i,j)}}$ .
- Compute marginal likelihood estimate: $u_t^M(\bar\theta_t^{(i)}) = \frac{1}{M}\sum_{j=1}^M g_{t,\bar\theta_t^{(i)}}(y_t|\bar x_t^{(i,j)})$ .
Parameter weight update: $\tilde W_t^{(i)} = u_t^M(\bar\theta_t^{(i)})$ ; normalize: $W_t^{(i)} = \tilde W_t^{(i)}/\sum_k \tilde W_t^{(k)}$ .
Outer resampling: Resample $\theta_t^{(i)}$ and associated $x_t^{(i,j)}$ by $W_t^{(i)}$ ; reset all weights to $1/N$.

Key formulas:

Parameter posterior: $\mu_t^{N,M}(d\theta) = \frac{1}{N}\sum_{i=1}^N \delta_{\theta_t^{(i)}}(d\theta)$ .
State filter approximation: $\phi_{t,\theta_t^{(i)}}^M = \frac{1}{M}\sum_{j=1}^M \delta_{x_t^{(i,j)}}$ .

3. Theoretical Properties and Computational Analysis

Computational cost: Each time step involves $O(M)$ per outer particle (for state propagation, weighting, resampling) and $O(N)$ for parameter normalization/resampling; total $O(NM)$ per step, or $O(NMT)$ across $T$ steps. In contrast, the SMC $^2$ method requires $O(T^2 NM)$ due to re-running inner filters for each outer propagation step.

Convergence: Under bounded support for $D_\theta$ , uniform boundedness/positivity of $g_{t,\theta}(y|x)$ , Lipschitz continuity of $\phi_{t,\theta}$ in $\theta$ , and suitable jittering kernel properties, the $L_p$ error for any bounded test function $h$ satisfies:

$\mathbb{E}\left[\left| (h,\mu_t^{N,M}) - (h,\mu_t) \right|^p \right]^{1/p} \le \frac{c_t \|h\|_\infty}{\sqrt{N}} + \frac{\bar c_t\|h\|_\infty}{\sqrt{M}}.$

The error in approximating the joint posterior enjoys the same rate.

For typical trade-offs, setting $N = M$ (balancing variance contributions) minimizes the $L_1$ error.

4. Connections and Comparisons to Other Nested SMC Methods

Relationship to SMC $^2$ : Both methods use a two-layer particle structure. SMC $^2$ performs a particle MCMC move on parameters, assigning weights via the full likelihood $P(y_{1:t} | \theta)$ and is not recursive in $t$ , leading to $O(T^2NM)$ cost. The NPF is recursive, only using the latest filtering distributions and fresh rejuvenation, avoiding MCMC moves and resulting in $O(TNM)$ cost. SMC $^2$ achieves $O(1/\sqrt{N})$ convergence for fixed $M$ , while NPF's convergence requires both $N,M\to \infty$ but attains an overall $O(1/\sqrt{N} + 1/\sqrt{M})$ rate.

Inside-Out Variants (for Risk-Sensitive and Experimental Design Applications):

Inside-Out SMC $^2$ (IO-SMC $^2$ ) forms the nested structure by propagating augmented trajectories $z_{0:t}$ in the outer filter, with an inner IBIS filter tracking $p(\theta | z_{0:t})$ , integrating design selection with posterior tracking. The algorithmic core relies on resampling/tempering and resample-move steps for both levels. The method is well suited for risk-sensitive policy optimization in experimental design with non-exchangeable data, providing computational cost $O(NMT)$ (often $O(MT^2)$ for $N \propto T$ ) (Iqbal et al., 2024).
Inside-Out Nested Particle Filter (IO-NPF) further improves efficiency by replacing costly inner MCMC moves with $O(1)$ rejuvenation steps (random jitter kernels), fully recursive updates, and empirically favorable scaling for online design in non-Markovian models, achieving $O(T^2)$ amortized cost (for $N \propto T, M = O(1)$ ) in amortized Bayesian experimental design. IO-NPF allows for backward-sampling smoothers of $O(N)$ cost to address path degeneracy (Iqbal et al., 2024).

5. Backward Sampling and Smoothing

Degeneracy of genealogy-tracking in sequential particle smoothing limits the recovery of joint trajectories as $T$ increases. IO-NPF (and related algorithms) utilize backward-sampling schemes of the “sparse MCMC” type, performing accept-reject passes in reverse time using Rao–Blackwellized transition probabilities:

At $T$ : Sample index $I_T$ from outer weights.
For $t = T-1$ down to $0$, propose ancestor index, compute acceptance ratio based on the ratio of forward weights and likelihoods down the trajectory using Rao–Blackwellized marginals.
This approach yields an $O(N)$ smoother for the full trajectory, with correct invariant law.

The backward sampler corrects for trajectory degeneracy at negligible additional cost per outer iteration and is applicable in risk-sensitive, non-Markovian, and non-exchangeable settings, as demonstrated on challenging nonlinear dynamical examples.

6. Empirical Performance and Practical Implementation

In nonlinear and partially observed dynamical systems (e.g., Lorenz-63, stochastic pendulum), NPFs with $N = M$ exhibit parameter estimation error scaling as $1/\sqrt{N}$ over tens of thousands of steps, with stable long-run online performance. For Bayesian experimental design, IO-NPF matches or outperforms IO-SMC $^2$ in EIG-optimal policy estimation, while running an order of magnitude faster ( $\sim0.34$ s vs $5.7$s per amortization iteration for $T=50$ ). Adding the backward-sampling pass further improves efficiency and effective information gain, with nearly optimal performance compared to exact and implicit baselines at a fraction of the computational cost (Iqbal et al., 2024, Iqbal et al., 2024).

Implementation requires:

A mechanism for evaluating $f(x_t\mid\cdot, \theta)$ in closed form for both inner and outer steps.
Tuning jitter kernel variance (typically decaying as $1/N$ or $1/M$) to provide sufficient exploration while controlling bias.
Choice of $N$ to control variance at the trajectory level, $M$ for parameter posterior accuracy (with $M\approx 100-500$ sufficient in nonlinear cases).

Algorithmic recursivity ensures scalability and parallelizability over the outer particle population.

7. Practical Limitations and Extensions

NPFs require that the system transition and observation densities ( $\tau,\ g$ or $f$ ) be computable in closed form for each parameter/state. Their sequential structure and error control make them well-suited to online parameter learning, high-frequency experimental design, and inference in large-scale or non-exchangeable dynamical systems.

A major limitation is the necessity of closed-form tractability at both particle levels; models without tractable densities are not directly amenable to NPF schemes. For very high-dimensional latent or parameter spaces, the method inherits the usual limitations of particle filters; variance- and degeneracy-adaptive schemes can ameliorate but not eliminate the curse of dimensionality. Further, in non-Markovian Feynman-Kac models and long design horizons, forward particle genealogies may still collapse, motivating further research into scalable smoothing and adaptive resampling mechanisms.

NPFs, as well as their Inside-Out and risk-sensitive extensions, provide a general framework for recursive, online, and amortized Bayesian inference and design, with strong theoretical and empirical support for their accuracy and efficiency in high-velocity, high-dimensional, and non-exchangeable sequential learning problems (Crisan et al., 2013, Iqbal et al., 2024, Iqbal et al., 2024).

PDF Markdown Chat (Pro)

References (3)

Nesting Particle Filters for Experimental Design in Dynamical Systems (2024)

Recursive Nested Filtering for Efficient Amortized Bayesian Experimental Design (2024)

Nested particle filters for online parameter estimation in discrete-time state-space Markov models (2013)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Nested Particle Filters (NPFs).

Nested Particle Filters for Online Inference

1. Model Structure and Mathematical Notation

2. Nested Particle Filter Algorithm

3. Theoretical Properties and Computational Analysis

4. Connections and Comparisons to Other Nested SMC Methods

5. Backward Sampling and Smoothing

6. Empirical Performance and Practical Implementation

7. Practical Limitations and Extensions

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Nested Particle Filters for Online Inference

1. Model Structure and Mathematical Notation

2. Nested Particle Filter Algorithm

3. Theoretical Properties and Computational Analysis

4. Connections and Comparisons to Other Nested SMC Methods

5. Backward Sampling and Smoothing

6. Empirical Performance and Practical Implementation

7. Practical Limitations and Extensions

Sponsor

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research