SurF: A Generative Model for Multivariate Irregular Time Series Forecasting

Published 13 May 2026 in cs.LG | (2605.14069v1)

Abstract: Irregularly sampled multivariate event streams remain a stubbornly difficult modality for generative modeling: tokenization-based approaches break down when inter-event intervals vary by orders of magnitude, and neural temporal point processes are bottlenecked by window-level numerical quadrature. We (i) propose SurF, a generative model that uses the Time Rescaling Theorem (TRT) as a learnable bijection between event sequences and i.i.d.\ unit-rate exponential noise, enabling a single model to be trained across heterogeneous event-stream datasets; (ii) three efficient parameterizations of the cumulative intensity that scale to long sequences; and (iii) a Transformer-based encoder for multi-dataset pretraining. On six real-world benchmarks, SurF achieves the best reported time RMSE on Earthquake, Retweet, and Taobao, and is within trial-level noise of the strongest specialist on the remaining three. Under a strict leave-one-out protocol, the held-out checkpoint beats every classical and neural-autoregressive baseline on 5/6 datasets and beats every baseline on Amazon and Earthquake, an initial step toward foundation models over asynchronous event streams.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper introduces a generative normalizing flow model that transforms irregular event streams into i.i.d. unit-rate exponential gaps via the time rescaling theorem for precise forecasting.
It details three parameterizations (MoE, CSB, GLQ) that deliver closed-form likelihoods, efficient sampling, and robust zero-shot transfer across heterogeneous datasets.
Empirical evaluations show SurF’s superior RMSE, calibration, and computational efficiency, establishing a new foundation for asynchronous event stream modeling.

SurF: A Normalizing Flow Approach to Irregular Multivariate Time Series Forecasting

Motivation and Problem Formulation

The paper "SurF: A Generative Model for Multivariate Irregular Time Series Forecasting" (2605.14069) presents a framework for generative modeling of irregularly sampled multivariate event streams—a modality where standard tokenization- and sequence-modeling approaches fail due to variable inter-event intervals and asynchronous channel updates. Temporal Point Processes (TPPs) provide the canonical modeling tool in such regimes, parameterizing the event rate via a conditional intensity $\lambda^*(t \mid \mathcal{H}_t)$ and employing its cumulative intensity $\Lambda^*(t) = \int_0^t \lambda^*(s \mid \mathcal{H}_s) ds$ for likelihood computation and sampling.

SurF addresses two principal bottlenecks in this setting: (1) the inefficiency of window-level numerical quadrature in existing neural TPPs, and (2) the lack of a universal reference distribution for evaluation and cross-dataset transfer, which is critical for foundation modeling in asynchronous event domains.

Theoretical Foundation: Bidirectional Time-Rescaling via the TRT

A key insight is the reinterpretation of the Time Rescaling Theorem (TRT) as a learnable bijection between observed event sequences and i.i.d. unit-rate exponential noise. Traditionally, the TRT is used as a diagnostic: if the fitted cumulative intensity is correct, rescaled inter-arrival times $\Delta z_i = \Lambda^*(t_i) - \Lambda^*(t_{i-1})$ are i.i.d. Exp(1). SurF exploits this structure not only for evaluation but also for training and generative sampling.

By enforcing strict smoothness and positivity of the intensity, SurF guarantees the invertibility of $\Lambda^*$ per the inverse function theorem (see Theorem~\ref{thm:reverse_rescaling} in the paper), allowing both forward mapping (observed times to exponential noise) and reverse mapping (noise to sampled event times). The theoretical apparatus ensures the transformation is lossless and that the change-of-variables Jacobian is diagonal, yielding tractable likelihoods.

Figure 1: The SurF noising–denoising framework; event dynamics collapse to independent exponential gaps following time-rescaling, enabling lossless generative reversal.

Model Architecture and Parameterizations

SurF models the cumulative intensity directly, inverting the classic intensity-first approach. Three parameterizations are proposed:

MoE (Mixture of Exponentials): Closed-form, monotonic decay, suited to processes with simple decaying intensity patterns.
CSB (Cumulative Softplus Basis): Closed-form, accommodates non-monotonic shapes via mixtures of sigmoid-like primitives, proven to be universal for positive functions (see Proposition~\ref{prop:csb_universal}).
GLQ (Gauss–Legendre Quadrature): Highly flexible, unconstrained positive MLP with fixed-cost per-interval quadrature; numerical error is empirically negligible with $Q=8$ .

Each variant computes the SurF amortized loss efficiently, avoiding costly window-level quadrature that inhibits scalability in prior neural TPPs.

SurF as a Foundation Model: Cross-Dataset and Zero-Shot Capabilities

SurF leverages a Transformer-based encoder for shared history representation, enabling a single model to generalize across datasets with heterogeneous timescales and event dynamics. The crucial practical claim is that in the SurF training objective, all datasets are mapped to the same Exp(1) canonical target. This dataset-invariance ensures robust zero-shot transfer: a model trained on a mix of event streams can predict on unseen datasets without fine-tuning.

Empirical Results

SurF is evaluated on six benchmarks (Taobao, Taxi, Retweet, StackOverflow, Amazon, Earthquake) covering diverse domains and temporal granularity. The main metrics are time RMSE for inter-arrival prediction and type (event mark) accuracy.

Best Reported RMSE: SurF achieves the lowest time RMSE on Earthquake, Retweet, and Taobao, and is within trial-level noise of the strongest specialist on Taxi, Amazon, and StackOverflow.
Zero-Shot Superiority: Under strict leave-one-out cross-dataset protocols, SurF outperforms classical and neural autoregressive baselines on $5/6$ datasets without dataset-specific adaptation.
Calibration and Goodness-of-Fit: SurF produces well-calibrated event densities; empirical CDFs of rescaled gaps match the Exp(1) distribution closely (Figure 2), indicating nearly ideal calibration. Negative log-likelihood is minimized in accordance with the best suited variant for each dataset.
Figure 2: Time-rescaling goodness-of-fit; SurF residuals are closely matched to Exp(1), demonstrating calibration across datasets and parameterizations.
Multi-Horizon Forecasting Stability: SurF retains near-optimal RMSE and type accuracy over iterated forecasting horizons, both in finetuned and zero-shot settings.
Figure 3: Per-horizon inter-arrival RMSE and type accuracy, remaining stable with repeated forecasting steps for both finetuned SurF variants and zero-shot checkpoints.
Recovery of Intensity Structure: On synthetic oscillatory processes, SurF recovers the true firing rate and ISI distribution precisely, while standard Hawkes models fail to capture key dynamics.
Figure 4: SurF recovers true oscillatory intensity and interval distributions; classical Hawkes processes exhibit substantial bias and noise.
Forecast Trajectories: Visualization of forecasted event sequences shows that finetuned SurF closely matches ground-truth event times, and pretrained SurF remains competitive without adaptation.
Figure 5: Forecast trajectories across datasets; marks actual (circles), pretrained (squares), and finetuned (triangles) forecast times and types.
Computational Efficiency: SurF-MoE and SurF-CSB require no numerical integration and are markedly faster than quadrature-based neural TPPs. SurF-GLQ maintains fixed per-interval computation for scalability.

Practical and Theoretical Implications

SurF establishes a constructive normalizing flow that enables efficient likelihood computation, exact generative sampling, and calibrated evaluation for irregular multivariate event streams. The explicit framing of the TRT as a trainable bijection advances the field toward dataset-invariant foundation models for asynchronous modalities, a regime that has been historically inaccessible to sequence models trained via tokenization.

Methodologically, SurF delivers:

Closed-form or nearly unbiased likelihoods, decoupled from sequence/window length.
Stable $O(N)$ gradient flow, improving numerical properties relative to RNN-based TPPs.
Universal expressiveness via CSB and GLQ parameterizations, permitting adaptation to arbitrary intensity dynamics.

The zero-shot transfer demonstrated empirically and motivated theoretically positions SurF as a launching point for foundation-scale modeling of heterogeneous event streams—potentially impacting fields ranging from healthcare, geoscience, and financial modeling to complex sensor systems.

Speculation and Future Directions

Scaling SurF to vast corpora of asynchronous event streams is an open challenge. Current applicability is limited by the strict positivity enforced for bijectivity; extending to piecewise-zero intensity (true dead-time) via explicit masking or learnable floor is theoretically sound but may introduce practical biases in certain regimes.

Further, richer parameterizations for cumulative intensity (e.g., neural spline flows, adaptive basis mixtures) and improved mark modeling (event type) will enhance SurF's adaptability. The explicit bijection structure opens opportunities for out-of-distribution generalization, uncertainty quantification, and domain adaptation via latent canonical spaces.

Conclusion

SurF delivers a principled generative model for irregular multivariate time series, reframing temporal point process modeling as a learnable normalizing flow. Through explicit bidirectional time-rescaling, scalable amortized loss, and robust empirical performance—including strict zero-shot cross-dataset transfer—SurF advances the theoretical and practical foundation for modeling asynchronous event streams, laying the groundwork for universal, foundation-scale forecasting models in this challenging domain.

Markdown Report Issue