Papers
Topics
Authors
Recent
2000 character limit reached

Nested Particle Filters for Online Inference

Updated 9 November 2025
  • Nested Particle Filters (NPFs) are a two-level sequential Monte Carlo method that uses coupled outer parameter particles and inner state particles for online Bayesian inference.
  • NPFs recursively update weighted particle sets to approximate both the parameter posterior and state trajectories, enabling efficient handling of high-dimensional and non-Markovian models.
  • The method offers rigorous convergence guarantees and lower computational complexity compared to alternatives like SMC², making it attractive for real-time parameter learning and experimental design.

A nested particle filter (NPF) is a two-level, fully sequential Monte Carlo method designed for efficient online Bayesian inference in state-space models with unknown static parameters. By explicitly maintaining two coupled populations of weighted particles—one for the parameters and one for the latent state trajectories—NPFs provide consistent approximations to the sequence of posterior probability measures over both parameters and system states, with rigorous guarantees for convergence and computational complexity. NPFs achieve online, recursive computation and are especially effective in high-dimensional and non-Markovian models where standard particle filters and single-level SMC methods degenerate or become computationally intractable.

1. Model Structure and Mathematical Notation

Consider a discrete-time state-space Markov model indexed by a static parameter θDθRdθ\theta \in D_\theta \subset \mathbb{R}^{d_\theta}:

  • X0τ0(dx)X_0 \sim \tau_0(dx),
  • XtXt1=xτt,θ(dxx)X_t\mid X_{t-1}=x \sim \tau_{t,\theta}(dx|x),
  • YtXt=xgt,θ(ytx)Y_t\mid X_t=x \sim g_{t,\theta}(y_t|x).

For fixed θ\theta, define:

  • the filter: ϕt,θ(dx)=P(XtdxY1:t,θ)\phi_{t,\theta}(dx) = P(X_t \in dx \mid Y_{1:t}, \theta),
  • the predictive: ξt,θ(dx)=P(XtdxY1:t1,θ)=τt,θϕt1,θ\xi_{t,\theta}(dx) = P(X_t \in dx \mid Y_{1:t-1},\theta) = \tau_{t,\theta} \phi_{t-1,\theta}.

The primary objective is to recursively approximate the parameter posterior μt(dθ)=P(dθY1:t)\mu_t(d\theta) = P(d\theta \mid Y_{1:t}) and, if required, the joint posterior πt(dθ,dx)=P(dθ,dxY1:t)\pi_t(d\theta,dx) = P(d\theta,dx \mid Y_{1:t}).

2. Nested Particle Filter Algorithm

An NPF maintains:

  • NN "outer" particles {θt(i)}i=1N\{\theta_t^{(i)}\}_{i=1}^N with weights Wt(i)W_t^{(i)} approximating μt\mu_t;
  • For each θ\theta-particle, an "inner" particle filter of size MM approximating the state filter ϕt,θt(i)\phi_{t,\theta_t^{(i)}}.

At each time step tt:

  1. Parameter jitter (rejuvenation): For i=1...Ni=1...N, sample θˉt(i)κN(dθθt1(i))\bar\theta_t^{(i)} \sim \kappa_N(d\theta|\theta_{t-1}^{(i)}), a jittering kernel with Var(κN)=O(1/Nα)\mathrm{Var}(\kappa_N) = O(1/N^\alpha).
  2. Inner filter update: Treat {xt1(i,j)}j=1M\{x_{t-1}^{(i,j)}\}_{j=1}^M as samples from ϕt1,θˉt(i)\phi_{t-1,\bar\theta_t^{(i)}} (by continuity of ϕ\phi). For j=1...Mj=1...M:
    • Propagate: xˉt(i,j)τt,θˉt(i)(xt1(i,j))\bar x_t^{(i,j)} \sim \tau_{t,\bar\theta_t^{(i)}}(\cdot|x_{t-1}^{(i,j)}),
    • Weight: wtt1(j,i)gt,θˉt(i)(ytxˉt(i,j))w_{t|t-1}^{(j,i)} \propto g_{t,\bar\theta_t^{(i)}}(y_t|\bar x_t^{(i,j)}),
    • Normalize weights, resample to obtain xt(i,j)x_t^{(i,j)}.
    • Compute prediction empirical measure: ξt,θˉt(i)M=1Mj=1Mδxˉt(i,j)\xi_{t,\bar\theta_t^{(i)}}^M = \frac{1}{M}\sum_{j=1}^M \delta_{\bar x_t^{(i,j)}}.
    • Compute marginal likelihood estimate: utM(θˉt(i))=1Mj=1Mgt,θˉt(i)(ytxˉt(i,j))u_t^M(\bar\theta_t^{(i)}) = \frac{1}{M}\sum_{j=1}^M g_{t,\bar\theta_t^{(i)}}(y_t|\bar x_t^{(i,j)}).
  3. Parameter weight update: W~t(i)=utM(θˉt(i))\tilde W_t^{(i)} = u_t^M(\bar\theta_t^{(i)}); normalize: Wt(i)=W~t(i)/kW~t(k)W_t^{(i)} = \tilde W_t^{(i)}/\sum_k \tilde W_t^{(k)}.
  4. Outer resampling: Resample θt(i)\theta_t^{(i)} and associated xt(i,j)x_t^{(i,j)} by Wt(i)W_t^{(i)}; reset all weights to $1/N$.

Key formulas:

  • Parameter posterior: μtN,M(dθ)=1Ni=1Nδθt(i)(dθ)\mu_t^{N,M}(d\theta) = \frac{1}{N}\sum_{i=1}^N \delta_{\theta_t^{(i)}}(d\theta).
  • State filter approximation: ϕt,θt(i)M=1Mj=1Mδxt(i,j)\phi_{t,\theta_t^{(i)}}^M = \frac{1}{M}\sum_{j=1}^M \delta_{x_t^{(i,j)}}.

3. Theoretical Properties and Computational Analysis

Computational cost: Each time step involves O(M)O(M) per outer particle (for state propagation, weighting, resampling) and O(N)O(N) for parameter normalization/resampling; total O(NM)O(NM) per step, or O(NMT)O(NMT) across TT steps. In contrast, the SMC2^2 method requires O(T2NM)O(T^2 NM) due to re-running inner filters for each outer propagation step.

Convergence: Under bounded support for DθD_\theta, uniform boundedness/positivity of gt,θ(yx)g_{t,\theta}(y|x), Lipschitz continuity of ϕt,θ\phi_{t,\theta} in θ\theta, and suitable jittering kernel properties, the LpL_p error for any bounded test function hh satisfies:

E[(h,μtN,M)(h,μt)p]1/pcthN+cˉthM.\mathbb{E}\left[\left| (h,\mu_t^{N,M}) - (h,\mu_t) \right|^p \right]^{1/p} \le \frac{c_t \|h\|_\infty}{\sqrt{N}} + \frac{\bar c_t\|h\|_\infty}{\sqrt{M}}.

The error in approximating the joint posterior enjoys the same rate.

For typical trade-offs, setting N=MN = M (balancing variance contributions) minimizes the L1L_1 error.

4. Connections and Comparisons to Other Nested SMC Methods

Relationship to SMC2^2: Both methods use a two-layer particle structure. SMC2^2 performs a particle MCMC move on parameters, assigning weights via the full likelihood P(y1:tθ)P(y_{1:t} | \theta) and is not recursive in tt, leading to O(T2NM)O(T^2NM) cost. The NPF is recursive, only using the latest filtering distributions and fresh rejuvenation, avoiding MCMC moves and resulting in O(TNM)O(TNM) cost. SMC2^2 achieves O(1/N)O(1/\sqrt{N}) convergence for fixed MM, while NPF's convergence requires both N,MN,M\to \infty but attains an overall O(1/N+1/M)O(1/\sqrt{N} + 1/\sqrt{M}) rate.

Inside-Out Variants (for Risk-Sensitive and Experimental Design Applications):

  • Inside-Out SMC2^2 (IO-SMC2^2) forms the nested structure by propagating augmented trajectories z0:tz_{0:t} in the outer filter, with an inner IBIS filter tracking p(θz0:t)p(\theta | z_{0:t}), integrating design selection with posterior tracking. The algorithmic core relies on resampling/tempering and resample-move steps for both levels. The method is well suited for risk-sensitive policy optimization in experimental design with non-exchangeable data, providing computational cost O(NMT)O(NMT) (often O(MT2)O(MT^2) for NTN \propto T) (Iqbal et al., 2024).
  • Inside-Out Nested Particle Filter (IO-NPF) further improves efficiency by replacing costly inner MCMC moves with O(1)O(1) rejuvenation steps (random jitter kernels), fully recursive updates, and empirically favorable scaling for online design in non-Markovian models, achieving O(T2)O(T^2) amortized cost (for NT,M=O(1)N \propto T, M = O(1)) in amortized Bayesian experimental design. IO-NPF allows for backward-sampling smoothers of O(N)O(N) cost to address path degeneracy (Iqbal et al., 2024).

5. Backward Sampling and Smoothing

Degeneracy of genealogy-tracking in sequential particle smoothing limits the recovery of joint trajectories as TT increases. IO-NPF (and related algorithms) utilize backward-sampling schemes of the “sparse MCMC” type, performing accept-reject passes in reverse time using Rao–Blackwellized transition probabilities:

  • At TT: Sample index ITI_T from outer weights.
  • For t=T1t = T-1 down to $0$, propose ancestor index, compute acceptance ratio based on the ratio of forward weights and likelihoods down the trajectory using Rao–Blackwellized marginals.
  • This approach yields an O(N)O(N) smoother for the full trajectory, with correct invariant law.

The backward sampler corrects for trajectory degeneracy at negligible additional cost per outer iteration and is applicable in risk-sensitive, non-Markovian, and non-exchangeable settings, as demonstrated on challenging nonlinear dynamical examples.

6. Empirical Performance and Practical Implementation

In nonlinear and partially observed dynamical systems (e.g., Lorenz-63, stochastic pendulum), NPFs with N=MN = M exhibit parameter estimation error scaling as 1/N1/\sqrt{N} over tens of thousands of steps, with stable long-run online performance. For Bayesian experimental design, IO-NPF matches or outperforms IO-SMC2^2 in EIG-optimal policy estimation, while running an order of magnitude faster (0.34\sim0.34s vs $5.7$s per amortization iteration for T=50T=50). Adding the backward-sampling pass further improves efficiency and effective information gain, with nearly optimal performance compared to exact and implicit baselines at a fraction of the computational cost (Iqbal et al., 2024, Iqbal et al., 2024).

Implementation requires:

  • A mechanism for evaluating f(xt,θ)f(x_t\mid\cdot, \theta) in closed form for both inner and outer steps.
  • Tuning jitter kernel variance (typically decaying as $1/N$ or $1/M$) to provide sufficient exploration while controlling bias.
  • Choice of NN to control variance at the trajectory level, MM for parameter posterior accuracy (with M100500M\approx 100-500 sufficient in nonlinear cases).

Algorithmic recursivity ensures scalability and parallelizability over the outer particle population.

7. Practical Limitations and Extensions

NPFs require that the system transition and observation densities (τ, g\tau,\ g or ff) be computable in closed form for each parameter/state. Their sequential structure and error control make them well-suited to online parameter learning, high-frequency experimental design, and inference in large-scale or non-exchangeable dynamical systems.

A major limitation is the necessity of closed-form tractability at both particle levels; models without tractable densities are not directly amenable to NPF schemes. For very high-dimensional latent or parameter spaces, the method inherits the usual limitations of particle filters; variance- and degeneracy-adaptive schemes can ameliorate but not eliminate the curse of dimensionality. Further, in non-Markovian Feynman-Kac models and long design horizons, forward particle genealogies may still collapse, motivating further research into scalable smoothing and adaptive resampling mechanisms.

NPFs, as well as their Inside-Out and risk-sensitive extensions, provide a general framework for recursive, online, and amortized Bayesian inference and design, with strong theoretical and empirical support for their accuracy and efficiency in high-velocity, high-dimensional, and non-exchangeable sequential learning problems (Crisan et al., 2013, Iqbal et al., 2024, Iqbal et al., 2024).

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Nested Particle Filters (NPFs).

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube