Papers
Topics
Authors
Recent
Search
2000 character limit reached

LatentTrack (LT): Online Predictive Filtering

Updated 7 February 2026
  • LatentTrack is a sequential neural architecture designed for online probabilistic prediction under nonstationary dynamics using low-dimensional latent representations.
  • It implements a three-phase predict–generate–update filtering pipeline with lightweight hypernetworks and amortized inference for constant-time adaptation.
  • Empirical evaluations on the Jena Climate benchmark demonstrate state-of-the-art accuracy and calibrated uncertainty compared to traditional Bayesian and latent-variable models.

LatentTrack (LT) is a sequential neural architecture designed for online probabilistic prediction under nonstationary dynamics, implementing causal Bayesian filtering in a low-dimensional latent space. At each time step, a lightweight hypernetwork generates predictor weights conditioned on the current latent, enabling constant-time adaptation of model parameters without per-step gradient-based training. LT’s formulation generalizes to both structured (Markovian) and unstructured latent transition models, and employs amortized inference to update beliefs with each new observation, facilitating a predict–generate–update filtering pipeline in function space. Evaluated on challenging long-horizon regression (e.g., Jena Climate), LT demonstrates state-of-the-art predictive accuracy and uncertainty calibration against both static Bayesian and sequential latent-variable baselines, particularly under evolving and distribution-shifting data regimes (Haq, 31 Jan 2026).

1. Predict–Generate–Update Filtering in Function Space

LT casts the evolution of the effective predictor fθtf_{\theta_t} as Bayesian filtering over a latent state ztRdz_t \in \mathbb{R}^d with an associated summary statistic htRHh_t \in \mathbb{R}^H that aggregates past inputs. This leads to a three-phase filtering pipeline at each timestep tt:

  • Predict (Prior Propagation): Given the historical data D1:t1D_{1:t-1}, the prior over the next latent ztz_t is produced as a Gaussian, parameterized either marginally (unstructured) by summary ht1h_{t-1}, or in a structured (Markovian) manner conditioned additionally on zt1z_{t-1}.
  • Generate: Monte Carlo samples of ztz_t are mapped by a learned hypernetwork gϕg_\phi to full sets of predictor weights θt\theta_t, yielding a mixture predictive distribution for yty_t.
  • Update: Upon receipt of (xt,yt)(x_t, y_t), an amortized inference network provides the variational posterior distribution q(ztD1:t)q(z_t|D_{1:t}), updating the latent belief in constant amortized time (Haq, 31 Jan 2026).

The process maintains fixed per-step computational cost, avoids per-timestep inner-loop learning, and provides calibrated predictive distributions through sampled mixtures of predictors.

2. Latent-Dynamics Model: Structure and Parameterization

LT’s latent-dynamics model can be specialized as follows:

  • Unstructured Dynamics: The prior is specified marginally,

po(ztD1:t1)=N(zt;μP(ht1),diag(σP(ht1)2)),p_o(z_t | D_{1:t-1}) = \mathcal{N}(z_t; \mu^P(h_{t-1}), \operatorname{diag}(\sigma^P(h_{t-1})^2)),

where μP\mu^P and σP\sigma^P are neural projections from the recurrent summary of past data.

  • Structured (Markov) Dynamics: A Markovian assumption is encoded by

pp(ztzt1,D1:t1)=N(zt;μP(zt1,ht1),diag(σP(zt1,ht1)2)),p_p(z_t|z_{t-1}, D_{1:t-1}) = \mathcal{N}(z_t; \mu^P(z_{t-1}, h_{t-1}), \operatorname{diag}(\sigma^P(z_{t-1}, h_{t-1})^2)),

which supports richer temporal correlation and memory effects by explicitly modeling transitions between consecutive latents.

The variational posterior at each step is amortized as

q(ztD1:t)=N(zt;μQ(ht),diag(σQ(ht)2)),q(z_t | D_{1:t}) = \mathcal{N}(z_t; \mu^Q(h_t), \operatorname{diag}(\sigma^Q(h_t)^2)),

where μQ\mu^Q, σQ\sigma^Q are outputs of a neural head conditioned on updated summary hth_t. For regression, the observation model is

p(ytxt,zt)=N(yt;fθ(xt),θ2(xt)),p(y_t | x_t, z_t) = \mathcal{N}(y_t; f_\theta(x_t), \ell_\theta^2(x_t)),

where θ=gϕ(zt)\theta = g_\phi(z_t).

3. Hypernetwork-Driven Sequential Weight Generation

The hypernetwork gϕg_\phi parameterizes the mapping from latent state to predictor weights:

θt=gϕ(zt).\theta_t = g_\phi(z_t).

  • Input: ztRdz_t \in \mathbb{R}^{d} (typical d=8d=8).
  • Output: Full parameter vector θt\theta_t for the base regressor (order 10410^410510^5 parameters).
  • Architecture: Example form is a two-layer MLP,

gϕ(z)=W2σ(W1z+b1)+b2,g_\phi(z) = W_2\,\sigma(W_1 z + b_1) + b_2,

with W1W_1, W2W_2 linear maps and σ\sigma a nonlinearity (e.g., ReLU).

In contrast with gradient-based adaptation, all weights in the base predictor evolve via changes in ztz_t and ϕ\phi, not via direct SGD steps on θt\theta_t.

4. Amortized Inference and Variational Training Objective

Learning proceeds by maximizing a filtering ELBO at each time point:

  • Unstructured variant:

Lt=Eq(ztD1:t)[logp(ytxt;gϕ(zt))]KL(q(ztD1:t)po(ztD1:t1)).\mathcal{L}_t = \mathbb{E}_{q(z_t|D_{1:t})}\bigl[\log p(y_t|x_t;g_\phi(z_t))\bigr] - \mathrm{KL}(q(z_t|D_{1:t}) \| p_o(z_t|D_{1:t-1})).

  • Structured variant:

Ltstruct=Eq(ztD1:t)[logp(ytxt;gϕ(zt))]Eq(zt1D1:t1)[KL(q(ztD1:t)pp(ztzt1,D1:t1))].\mathcal{L}_t^{\mathrm{struct}} = \mathbb{E}_{q(z_t|D_{1:t})}[\log p(y_t|x_t;g_\phi(z_t))] - \mathbb{E}_{q(z_{t-1}|D_{1:t-1})} [\mathrm{KL}(q(z_t|D_{1:t})\|p_p(z_t|z_{t-1},D_{1:t-1}))].

Summing across the sequence yields the objective approximating the log marginal likelihood of the data: Ltotal=t=1T{Eq(ztD1:t)[logp(ytxt;gϕ(zt))]KL(q(ztD1:t)p(zt))}.\mathcal{L}_{\mathrm{total}} = \sum_{t=1}^T \Big\{ \mathbb{E}_{q(z_t|D_{1:t})}[\log p(y_t|x_t;g_\phi(z_t))] - \mathrm{KL}(q(z_t|D_{1:t}) \| p(z_t|\cdot)) \Big\}. A KL annealing weight βt\beta_t may be applied during training.

5. Monte Carlo Filtering and Uncertainty Calibration

At inference time, LT forms predictive mixtures via Monte Carlo filtering:

  • Draw KK samples zt(k)p(ztht1)z_t^{(k)} \sim p(z_t|h_{t-1}).
  • For each, compute θt(k)=gϕ(zt(k))\theta_t^{(k)} = g_\phi(z_t^{(k)}), yielding mixture components.
  • Predictive distribution:

p(ytxt)1Kk=1KN(yt;fθt(k)(xt),θt(k)2(xt)).p(y_t|x_t) \approx \frac{1}{K} \sum_{k=1}^K \mathcal{N}\bigl(y_t; f_{\theta_t^{(k)}}(x_t), \ell_{\theta_t^{(k)}}^2(x_t)\bigr).

  • Mixture mean and variance separate aleatoric and epistemic components:

E[yt]=1Kkfθ(k)(xt),Var[yt]=1Kkθ(k)2(xt)+Vark[fθ(k)(xt)].\mathbb{E}[y_t] = \frac{1}{K} \sum_k f_{\theta^{(k)}}(x_t), \quad \mathrm{Var}[y_t] = \frac{1}{K} \sum_k \ell_{\theta^{(k)}}^2(x_t) + \mathrm{Var}_k[f_{\theta^{(k)}}(x_t)].

This explicit mixture provides calibrated uncertainty without overconfidence, as evidenced by near-uniform PIT histograms and tight calibration curves in empirical study (Haq, 31 Jan 2026).

6. Computational Complexity and Comparison to Baselines

LT’s adaptation is strictly constant time O(1)O(1) per step, with per-timestep cost decomposing as: one RNN update, two small network heads (prior and posterior), KK hypernetwork forward passes, and KK base-model evaluations. No inner-loop optimization or gradient steps on the prediction model are needed at test time. In contrast, meta-learning and gradient-based adaptation approaches require multiple backpropagations per step. This establishes LT as highly efficient for streaming data scenarios (Haq, 31 Jan 2026).

7. Empirical Evaluation: Jena Climate Benchmark

LT was evaluated on the Jena Climate dataset for long-horizon temperature prediction (36 h ahead, strict causal setting). Training uses 70% of each series (256 window, TBPTT), evaluation on the final 30% over 25 random seeds.

  • Baselines: Capacity/computation-matched VRNN, DSSM (stateful latent-variable RNNs), MC‐Dropout, Bayes-by-Backprop, Deep Ensembles (static Bayesian approaches).
  • Metrics: Negative log-likelihood (NLL), mean squared error (MSE), per-step ranking, and catastrophic failure rate.
  • Key Results:
    • LT-Structured achieves median NLL ≈ 2.32 (VRNN: 3.38, DSSM: 2.93); median MSE ≈ 1.93 (VRNN: 45.6, DSSM: 14.8).
    • Rank-1 in NLL for 58.8% of steps and MSE for 51.4% (each ≤ 20% for baselines).
    • Catastrophic failure rate (max NLL > 10610^6): 12% for LT-Structured, 4% for LT-Unstructured, >38% for VRNN and DSSM.
    • Calibration: PIT histogram flatter, calibration curve tighter for LT, supporting high-quality probabilistic prediction.

These results demonstrate that latent-conditioned function evolution offers a robust alternative to conventional state-space sequence models in online, distribution-shifting settings (Haq, 31 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to LatentTrack (LT).