LatentTrack (LT): Online Predictive Filtering

Updated 7 February 2026

LatentTrack is a sequential neural architecture designed for online probabilistic prediction under nonstationary dynamics using low-dimensional latent representations.
It implements a three-phase predict–generate–update filtering pipeline with lightweight hypernetworks and amortized inference for constant-time adaptation.
Empirical evaluations on the Jena Climate benchmark demonstrate state-of-the-art accuracy and calibrated uncertainty compared to traditional Bayesian and latent-variable models.

LatentTrack (LT) is a sequential neural architecture designed for online probabilistic prediction under nonstationary dynamics, implementing causal Bayesian filtering in a low-dimensional latent space. At each time step, a lightweight hypernetwork generates predictor weights conditioned on the current latent, enabling constant-time adaptation of model parameters without per-step gradient-based training. LT’s formulation generalizes to both structured (Markovian) and unstructured latent transition models, and employs amortized inference to update beliefs with each new observation, facilitating a predict–generate–update filtering pipeline in function space. Evaluated on challenging long-horizon regression (e.g., Jena Climate), LT demonstrates state-of-the-art predictive accuracy and uncertainty calibration against both static Bayesian and sequential latent-variable baselines, particularly under evolving and distribution-shifting data regimes (Haq, 31 Jan 2026).

1. Predict–Generate–Update Filtering in Function Space

LT casts the evolution of the effective predictor $f_{\theta_t}$ as Bayesian filtering over a latent state $z_t \in \mathbb{R}^d$ with an associated summary statistic $h_t \in \mathbb{R}^H$ that aggregates past inputs. This leads to a three-phase filtering pipeline at each timestep $t$ :

Predict (Prior Propagation): Given the historical data $D_{1:t-1}$ , the prior over the next latent $z_t$ is produced as a Gaussian, parameterized either marginally (unstructured) by summary $h_{t-1}$ , or in a structured (Markovian) manner conditioned additionally on $z_{t-1}$ .
Generate: Monte Carlo samples of $z_t$ are mapped by a learned hypernetwork $g_\phi$ to full sets of predictor weights $\theta_t$ , yielding a mixture predictive distribution for $y_t$ .
Update: Upon receipt of $(x_t, y_t)$ , an amortized inference network provides the variational posterior distribution $q(z_t|D_{1:t})$ , updating the latent belief in constant amortized time (Haq, 31 Jan 2026).

The process maintains fixed per-step computational cost, avoids per-timestep inner-loop learning, and provides calibrated predictive distributions through sampled mixtures of predictors.

2. Latent-Dynamics Model: Structure and Parameterization

LT’s latent-dynamics model can be specialized as follows:

Unstructured Dynamics: The prior is specified marginally,

$p_o(z_t | D_{1:t-1}) = \mathcal{N}(z_t; \mu^P(h_{t-1}), \operatorname{diag}(\sigma^P(h_{t-1})^2)),$

where $\mu^P$ and $\sigma^P$ are neural projections from the recurrent summary of past data.

Structured (Markov) Dynamics: A Markovian assumption is encoded by

$p_p(z_t|z_{t-1}, D_{1:t-1}) = \mathcal{N}(z_t; \mu^P(z_{t-1}, h_{t-1}), \operatorname{diag}(\sigma^P(z_{t-1}, h_{t-1})^2)),$

which supports richer temporal correlation and memory effects by explicitly modeling transitions between consecutive latents.

The variational posterior at each step is amortized as

$q(z_t | D_{1:t}) = \mathcal{N}(z_t; \mu^Q(h_t), \operatorname{diag}(\sigma^Q(h_t)^2)),$

where $\mu^Q$ , $\sigma^Q$ are outputs of a neural head conditioned on updated summary $h_t$ . For regression, the observation model is

$p(y_t | x_t, z_t) = \mathcal{N}(y_t; f_\theta(x_t), \ell_\theta^2(x_t)),$

where $\theta = g_\phi(z_t)$ .

3. Hypernetwork-Driven Sequential Weight Generation

The hypernetwork $g_\phi$ parameterizes the mapping from latent state to predictor weights:

$\theta_t = g_\phi(z_t).$

Input: $z_t \in \mathbb{R}^{d}$ (typical $d=8$ ).
Output: Full parameter vector $\theta_t$ for the base regressor (order $10^4$ – $10^5$ parameters).
Architecture: Example form is a two-layer MLP,

$g_\phi(z) = W_2\,\sigma(W_1 z + b_1) + b_2,$

with $W_1$ , $W_2$ linear maps and $\sigma$ a nonlinearity (e.g., ReLU).

In contrast with gradient-based adaptation, all weights in the base predictor evolve via changes in $z_t$ and $\phi$ , not via direct SGD steps on $\theta_t$ .

4. Amortized Inference and Variational Training Objective

Learning proceeds by maximizing a filtering ELBO at each time point:

Unstructured variant:

$\mathcal{L}_t = \mathbb{E}_{q(z_t|D_{1:t})}\bigl[\log p(y_t|x_t;g_\phi(z_t))\bigr] - \mathrm{KL}(q(z_t|D_{1:t}) \| p_o(z_t|D_{1:t-1})).$

Structured variant:

$\mathcal{L}_t^{\mathrm{struct}} = \mathbb{E}_{q(z_t|D_{1:t})}[\log p(y_t|x_t;g_\phi(z_t))] - \mathbb{E}_{q(z_{t-1}|D_{1:t-1})} [\mathrm{KL}(q(z_t|D_{1:t})\|p_p(z_t|z_{t-1},D_{1:t-1}))].$

Summing across the sequence yields the objective approximating the log marginal likelihood of the data: $\mathcal{L}_{\mathrm{total}} = \sum_{t=1}^T \Big\{ \mathbb{E}_{q(z_t|D_{1:t})}[\log p(y_t|x_t;g_\phi(z_t))] - \mathrm{KL}(q(z_t|D_{1:t}) \| p(z_t|\cdot)) \Big\}.$ A KL annealing weight $\beta_t$ may be applied during training.

5. Monte Carlo Filtering and Uncertainty Calibration

At inference time, LT forms predictive mixtures via Monte Carlo filtering:

Draw $K$ samples $z_t^{(k)} \sim p(z_t|h_{t-1})$ .
For each, compute $\theta_t^{(k)} = g_\phi(z_t^{(k)})$ , yielding mixture components.
Predictive distribution:

$p(y_t|x_t) \approx \frac{1}{K} \sum_{k=1}^K \mathcal{N}\bigl(y_t; f_{\theta_t^{(k)}}(x_t), \ell_{\theta_t^{(k)}}^2(x_t)\bigr).$

Mixture mean and variance separate aleatoric and epistemic components:

$\mathbb{E}[y_t] = \frac{1}{K} \sum_k f_{\theta^{(k)}}(x_t), \quad \mathrm{Var}[y_t] = \frac{1}{K} \sum_k \ell_{\theta^{(k)}}^2(x_t) + \mathrm{Var}_k[f_{\theta^{(k)}}(x_t)].$

This explicit mixture provides calibrated uncertainty without overconfidence, as evidenced by near-uniform PIT histograms and tight calibration curves in empirical study (Haq, 31 Jan 2026).

6. Computational Complexity and Comparison to Baselines

LT’s adaptation is strictly constant time $O(1)$ per step, with per-timestep cost decomposing as: one RNN update, two small network heads (prior and posterior), $K$ hypernetwork forward passes, and $K$ base-model evaluations. No inner-loop optimization or gradient steps on the prediction model are needed at test time. In contrast, meta-learning and gradient-based adaptation approaches require multiple backpropagations per step. This establishes LT as highly efficient for streaming data scenarios (Haq, 31 Jan 2026).

7. Empirical Evaluation: Jena Climate Benchmark

LT was evaluated on the Jena Climate dataset for long-horizon temperature prediction (36 h ahead, strict causal setting). Training uses 70% of each series (256 window, TBPTT), evaluation on the final 30% over 25 random seeds.

Baselines: Capacity/computation-matched VRNN, DSSM (stateful latent-variable RNNs), MC‐Dropout, Bayes-by-Backprop, Deep Ensembles (static Bayesian approaches).
Metrics: Negative log-likelihood (NLL), mean squared error (MSE), per-step ranking, and catastrophic failure rate.
Key Results:
- LT-Structured achieves median NLL ≈ 2.32 (VRNN: 3.38, DSSM: 2.93); median MSE ≈ 1.93 (VRNN: 45.6, DSSM: 14.8).
- Rank-1 in NLL for 58.8% of steps and MSE for 51.4% (each ≤ 20% for baselines).
- Catastrophic failure rate (max NLL > $10^6$ ): 12% for LT-Structured, 4% for LT-Unstructured, >38% for VRNN and DSSM.
- Calibration: PIT histogram flatter, calibration curve tighter for LT, supporting high-quality probabilistic prediction.

These results demonstrate that latent-conditioned function evolution offers a robust alternative to conventional state-space sequence models in online, distribution-shifting settings (Haq, 31 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

LatentTrack: Sequential Weight Generation via Latent Filtering (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to LatentTrack (LT).