Papers
Topics
Authors
Recent
Search
2000 character limit reached

AdaNODEs: Adaptive NODEs for Time Series

Updated 26 January 2026
  • AdaNODEs are a test-time adaptation framework that employs a variational encoder–latent–decoder architecture with Neural ODEs to adjust pre-trained models for distribution shifts.
  • They update only lightweight parameters (α and γ) via a dual-term loss combining negative log-likelihood and KL divergence, enabling efficient adaptation to amplitude, frequency, and phase shifts.
  • Empirical evaluations demonstrate significant reductions in MSE on synthetic and real-world tasks while maintaining computational efficiency through adaptive ODE solvers and minimal gradient steps.

AdaNODEs are a test-time adaptation (TTA) framework for time series forecasting that utilizes neural ordinary differential equations (NODEs) to adapt pre-trained models to distribution shifts without the need for source data or target labels at test time. The method distinguishes itself by modifying the latent dynamics of a Variational NODE model using a lightweight, regression-specific adaptation loss and minimally invasive parameter updates, achieving robustness to amplitude, frequency, and phase shifts in time series data (Dang et al., 19 Jan 2026).

1. Architecture and Theoretical Foundations

AdaNODEs employ a variational encoder–latent–decoder architecture, with the latent block realized as a Neural ODE. Given an observed context sequence {yp(t),t}\{y_p(t), t\} for times t0,,tkt_0, \ldots, t_k, the encoder fencf_\text{enc} produces a Gaussian posterior over the initial latent state:

z(t0)qϕ(z(t0)yp,tp)z(t_0) \sim q_\phi(z(t_0) \mid y_p, t_p)

The latent state then evolves according to

dz(t)dt=fnode(z(t),t;θ),\frac{d z(t)}{dt} = f_\text{node}(z(t), t; \theta),

where z(t)Rdz(t) \in \mathbb{R}^d, t[t0,tN]t \in [t_0, t_N], and θ\theta are the NODE parameters, fixed after training on source data. The ODE evolution is performed using adaptive solvers such as RK45, alongside the adjoint method for efficient memory usage. Each latent state z(t)z(t) is mapped by the decoder fdecf_\text{dec} to a predictive distribution:

p(y(t)z(t);θ)=N(μ(t),σ2(t)),p(y(t) \mid z(t); \theta) = \mathcal{N}(\mu(t), \sigma^2(t)),

where μ\mu, σ\sigma are decoder outputs.

The motivation for a continuous-time latent evolution is the ability to accommodate irregular sampling, long-range dependencies, and distribution shifts that manifest as frequency or phase changes, which discrete architectures handle less naturally.

2. Test-Time Adaptation in AdaNODEs

AdaNODEs operate in a source-free setting, where, at test time, the model encounters unlabeled, shifted target domain sequences. The adaptation process is performed entirely at inference, leveraging only the model’s predictions and context.

Adaptation Loss

Classic TTA losses (e.g., entropy minimization) are unsuitable for regression. AdaNODEs employ a novel two-term variational loss function over scalar parameters α,γRd\alpha, \gamma \in \mathbb{R}^d acting on the latent ODE:

L(α,γ)=λtTLtNLL(α,γ)+(1λ)LKL(α,γ)\mathcal{L}(\alpha, \gamma) = \lambda \sum_{t \in \mathcal{T}} \mathcal{L}_t^\text{NLL}(\alpha, \gamma) + (1 - \lambda) \mathcal{L}^\text{KL}(\alpha, \gamma)

with

  • LtNLL\mathcal{L}_t^\text{NLL} = expected negative log-likelihood under the predictive posterior,
  • LKL\mathcal{L}^\text{KL} = KL divergence between the context-only and context+forecast latent posteriors.

Minimizing LtNLL\mathcal{L}_t^\text{NLL} sharpens the model's forecast distribution, analogous to entropy minimization, while the KL term enforces posterior consistency to smooth adaptation and prevent drifting latent encodings.

Adaptation Algorithm

Instead of updating any of the pretrained NODE or decoder weights, AdaNODEs introduce and optimize only α,γ\alpha, \gamma in the ODE:

dz(t)dt=fnode(αz(t)+γ;θ),\frac{d z(t)}{dt} = f_\text{node}(\alpha \odot z(t) + \gamma; \theta),

where \odot denotes elementwise multiplication. Here,

  • α\alpha scales the latent dynamics (adaptation to amplitude/frequency shifts),
  • γ\gamma shifts them (adaptation to phase/time-delay shifts).

At inference, α\alpha and γ\gamma are initialized (α=1,γ=0\alpha = 1, \gamma = 0) and updated via gradient descent on L\mathcal{L}, typically requiring only 1–5 steps and incurring limited computational overhead.

3. Mechanisms for Handling Distribution Shifts

AdaNODEs explicitly address two primary time series shift types:

  • Amplitude or frequency shifts (e.g., changes in oscillation speed or magnitude)
  • Time delays or phase shifts (e.g., systematic early/late signal arrivals)

Shifts are parameterized in five severity levels, ranging from mild (L1) to severe (L5), impacting scaling and offset in the target time series. By scaling and shifting the latent dynamics through α,γ\alpha, \gamma, AdaNODEs efficiently counteract such perturbations at the ODE level, rather than relying on overparameterized adaptors or retraining.

The negative log-likelihood term in the adaptation loss commits the model to confident predictions, while KL regularization maintains consistency to prevent the latent space from degenerating under severe shifts.

4. Empirical Evaluation and Benchmarks

Experiments demonstrate the efficacy of AdaNODEs across synthetic and real-world tasks:

  • Synthetic 1D signals: On tasks with amplitude/frequency or time-shift corruption, AdaNODEs achieve up to 18% relative mean squared error (MSE) reduction at highest shift severity, with a mean gain of 5.9% over a non-adapting source model.
  • Rotating MNIST: For sequentially rotated digit images, AdaNODEs reduce MSE by 9.6% and increase Pearson’s correlation and Concordance Correlation Coefficient (CCC) by 28.4% and 28.3%, respectively, outperforming both source and state-of-the-art UDA and TTT baselines.

Ablation studies confirm that both terms in L\mathcal{L} are required for robust adaptation: omitting the KL term undermines adaptation under large shifts; omitting NLL leads to excessively smoothed, less informative forecasts. Qualitative analyses reveal that AdaNODEs are able to correctly slow down or speed up the latent rotation to match ground truth dynamics, unlike the fixed source model.

Data Type Max Impr. (MSE) Avg Impr. (MSE) Max Impr. (CC/CCC)
1D synthetic 18% 5.9%
Rotating MNIST 9.6% 28.4% / 28.3%

5. Algorithmic and Computational Considerations

The adaptation phase in AdaNODEs is computationally efficient:

  • Only two small vectors (α\alpha, γ\gamma; typically tens to hundreds of scalars) are updated, reducing memory and computational cost.
  • The dominant overhead is the ODE solve, handled efficiently with RK45 and the adjoint sensitivity method.
  • Per-batch adaptation typically incurs a latency of a few tens of milliseconds on GPU hardware.

Hyperparameter choices are robust across tasks:

  • Learning rate η\eta typically in [1×104,1×103][1 \times 10^{-4}, 1 \times 10^{-3}]
  • Balancing parameter λ0.5\lambda \approx 0.5 yields optimal tradeoffs between forecast confidence and latent consistency, with [0.3,0.7][0.3, 0.7] performing robustly.
  • ODE solver tolerances can be relaxed (absolute/relative tolerance 1×1031 \times 10^{-3}) for speed-accuracy tradeoffs.

Best practices include minimizing steps to prevent noise overfitting and leveraging modern ODE solver libraries for memory efficiency.

6. Context and Significance in Temporal Adaptation

AdaNODEs provide a principled solution to distribution shifts in time series forecasting under strict source-free and label-free test-time adaptation constraints. Unlike most TTA literature, which presumes independent and identically distributed data or access to labeled/auxiliary data, AdaNODEs address temporal dependencies directly in the continuous-time latent space. The architecture accommodates irregular sampling, varying signal frequency, and heterogeneous temporal correlations naturally, due to the continuous ODE-based representation.

A plausible implication is broader application to other domains with evolving, temporally correlated data and unknown test-time data statistics, such as biomedical timeseries, IoT sensor analytics, and geophysical forecasting, particularly where retraining or target supervision at test-time is impractical or prohibited.

The AdaNODEs paradigm exemplifies a “light-touch” approach—minimally invasive, interpretable parameter updates in the model’s latent dynamics—yielding robust generalization to unanticipated real-world data shifts, as demonstrated on both synthetic and complex high-dimensional sequences (Dang et al., 19 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to AdaNODEs.