Papers
Topics
Authors
Recent
Search
2000 character limit reached

Temporal Point Processes: Models and Advances

Updated 8 February 2026
  • Temporal point processes are stochastic models that represent sequences of discrete events in continuous time using conditional intensity functions.
  • They evolve from classical models like Poisson and Hawkes to advanced neural and nonparametric approaches, enhancing forecasting and uncertainty quantification.
  • Recent advances include parallel sampling, intensity-free methods, and joint mark-time modeling to improve interpretability and scalability.

A temporal point process (TPP) is a stochastic process whose realizations are sequences of discrete events localized in continuous time. TPPs provide a foundational framework for modeling asynchronous event data across diverse scientific, engineering, and social domains. Their mathematical core is the event occurrence mechanism described by the conditional intensity function, which quantifies the instantaneous rate of event arrivals conditioned on the complete event history. Over the past decade, TPPs have evolved from classical models (such as Poisson and Hawkes processes) to sophisticated neural and nonparametric representations. The resulting models now underpin state-of-the-art methods in event sequence forecasting, generative modeling, uncertainty quantification, and adaptivity to complex temporal patterns.

1. Mathematical Foundations of Temporal Point Processes

A TPP defined on [0,T][0,T] generates a realization—a sequence of event times—0<t1<t2<<tNT0 < t_1 < t_2 < \cdots < t_N \le T, and, in the marked case, associated marks mim_i from a finite or structured set. The process history up to tt is Ht={(tj,mj):tj<t}\mathcal{H}_t = \{(t_j, m_j): t_j < t\}.

The canonical characterizing function is the conditional intensity: λ(tHt)=limΔt0P(event in [t,t+Δt)Ht)Δt\lambda^*(t \mid \mathcal{H}_t) = \lim_{\Delta t\to 0} \frac{\mathbb{P}(\text{event in } [t, t+\Delta t) \mid \mathcal{H}_t)}{\Delta t} or, in multivariate (marked) settings,

λk(tHt)=limΔt0P(type-k event in [t,t+Δt)Ht)Δt\lambda^*_k(t \mid \mathcal{H}_t) = \lim_{\Delta t\to 0} \frac{\mathbb{P}(\text{type-}k \text{ event in } [t, t+\Delta t) \mid \mathcal{H}_t)}{\Delta t}

The log-likelihood for an observed sequence {(ti,mi)}i=1N\{(t_i, m_i)\}_{i=1}^N is

L=i=1Nlogλmi(tiHti)k=1K0Tλk(uHu)du\mathcal{L} = \sum_{i=1}^N \log \lambda_{m_i}(t_i \mid \mathcal{H}_{t_i}) - \sum_{k=1}^K \int_0^T \lambda_k(u \mid \mathcal{H}_u) du

(Shchur et al., 2021, Zhou et al., 24 Jan 2025). The survival function, hazard function, and CDF/density of the next event time are interrelated via integral transforms, laying the basis for both theory and inference.

2. Classical and Neural Model Classes

Classical TPPs employ hand-crafted intensities:

  • Poisson: constant intensity, no history dependence.
  • Hawkes: intensity excited by past events, typically via decaying kernels: λ(t)=μ+ti<tϕ(tti)\lambda^*(t) = \mu + \sum_{t_i < t} \phi(t - t_i).
  • Self-correcting, power-law, mixture kernels: to capture various forms of temporal dependence (Potter et al., 20 Mar 2025).

Neural TPPs generalize this by parameterizing the history-to-intensity mapping via deep networks:

  • RNN-based: Hidden state hih_i encodes history; λ(t)\lambda(t) = MLP(hi1,tti1h_{i-1}, t-t_{i-1}) (Shchur et al., 2021).
  • Transformer/Attention-based: Self-attention layers aggregate past events; decoder MLP outputs intensity/density (Xue et al., 2023, Meng et al., 2024).
  • Neural ODE/SDE TPPs: ODE/SDE governs the continuous-time hidden trajectory; event updates at arrivals (Zhou et al., 24 Jan 2025).
  • Intensity-free models: Conditional density of next inter-event time is modeled directly via flows or mixtures, bypassing explicit intensity parameterization (Shchur et al., 2019, Mehrasa et al., 2019).

Recent progress includes flow-based nonautoregressive sampling and continuous-time Markov process formulations (Lüdke et al., 7 Oct 2025, Shchur et al., 2020), and benchmark suites for reproducible evaluation (Xue et al., 2023).

3. Beyond Intensity: Generative, Uncertainty, and Interpretability Advances

3.1. Non-Autoregressive and Flow-Based Models

Recent models bypass sequential sampling bottlenecks by learning generative flows over event sequences. In Edit-Based Flow Matching (EdiTPP), sequence generation proceeds via a continuous-time Markov chain of insert-delete-substitute edits, parameterized by learned edit rates. This non-autoregressive process allows sampling complete sequences in parallel, yielding statistically and computationally superior generation compared to autoregressive neural TPPs. EdiTPP achieves the best sample quality and 2–5× speedup on benchmarks, and further allows tuning the compute-accuracy tradeoff via the CTMC resolution (Lüdke et al., 7 Oct 2025).

TriTPP parameterizes the TPP density as a triangular normalizing flow from homogeneous Poisson reference times to real events, enabling parallelized likelihood evaluation and fast sampling. Maximum-likelihood estimation remains exact, and the framework supports variational inference in latent continuous-time discrete-state models (Shchur et al., 2020).

3.2. Intensity-Free and Distributional Learning

Intensity-free approaches model the conditional probability density of inter-event times directly using universal one-dimensional normalizing flows or log-normal mixtures, allowing for exact likelihoods, efficient ancestral sampling, and closed-form moments. This paradigm provides state-of-the-art predictive accuracy and unique capabilities such as missing-data imputation and sequence embedding (Shchur et al., 2019, Mehrasa et al., 2019, Subramanian et al., 27 Nov 2025).

CDF-based methods such as CuFun sidestep the numerical and modeling constraints of explicit intensity forms by parameterizing the cumulative hazard via a monotonic neural network, leveraging RNN-based history encoding and multiplicative scaling for long-range dependencies. This yields both stable likelihoods and improved periodic structure capture relative to intensity-based TPPs (Wang et al., 2024).

3.3. Joint Mark-Time and Multivariate Modeling

Classical neural TPPs often posit conditional independence between event time and mark. Recent work directly links the distribution of inter-arrival times to the next event’s mark, modeling their joint distribution either via multivariate intensity parameterization or, more tractably, with a distinct per-type density. This approach yields improved fit and micro-F1 scores across synthetic and real data (Waghmare et al., 2022). Moreover, distribution-free conformal prediction methods now enable construction of joint prediction regions for next time and mark with provable finite-sample marginal coverage (Dheur et al., 2024).

4. Model Training, Benchmarking, and Theory

4.1. Supervised, Semi-Supervised, and Meta-Learning

Standard neural TPP models are trained by maximum likelihood, requiring evaluation or approximation of the integral term in the log-likelihood (Shchur et al., 2021, Xue et al., 2023). Semi-supervised extensions incorporate unlabeled sequences via auxiliary reconstruction objectives, robustifying marker prediction under partial annotation regimes (Reddy et al., 2021). Meta-learning approaches reframe each event sequence as a distinct task, leveraging permutation-invariant context aggregation and latent-variable hierarchies (e.g., neural process or attentive neural process analogs), which improves generalization, especially in nonstationary or partially observed environments (Bae et al., 2023).

4.2. Benchmarks and Evaluation

Open-source benchmarking tools such as EasyTPP provide standardized data interfaces, evaluation suites, reference implementations, and reproducibility infrastructure for neural TPPs. Key evaluation metrics include negative log-likelihood (NLL), time- and mark-RMSE, event-type error rate, optimal transport on long-horizon sequences, and significance via permutation tests (Xue et al., 2023). Unified experimental protocols have clarified the importance of time and mark embedding choice, history encoder structure, and decoder parameterization—vectorial and learnable time embeddings and log-normal mixture decoders generally yield best predictive and calibration performance (Bosser et al., 2023).

4.3. Theoretical Guarantees

Rigorous analyses have recently established approximation and generalization guarantees for RNN-TPPs. Multi-layer tanh networks with bounded spectral norm approximate a broad class of intensity functions (Poisson, (non)linear Hawkes, self-correcting) to arbitrary precision over bounded-length histories. Excess risk bounds for NLL (log-likelihood loss) scale as O~(ns/2(s+1))\tilde{O}(n^{-s/2(s+1)}) in sample size nn, given regularity assumptions; RNN depth L4L\le4 suffices for universal TPP approximation (Chen et al., 2024). Tail-truncation and covering-number arguments address the challenge of unbounded event count per window.

5. Extensions: Covariates, Interpretability, and LLM Integration

5.1. Covariate-Augmented TPPs and Interpretability

Transformer-based covariate TPPs such as TransFeat-TPP encode time, mark, and feature/covariate vectors into joint embeddings, learning the conditional density of the next event via a log-normal mixture. Simultaneous attention-based feature importance modules recover interpretable, unsupervised rankings of covariate relevance per event or globally, and the learned importances match domain knowledge in real datasets (e.g., weather for pollution, clinical measurements for disease) (Meng et al., 2024).

Hybrid-rule TPPs incorporate symbolic temporal-logic rules (interpreted as decayed history contributions) and continuous covariate intensities, discovered via Bayesian-optimized rule mining. This compositional design provably improves both predictive accuracy and interpretability over existing symbolic or neural TPPs in medical event modeling (Cao et al., 15 Apr 2025).

5.2. Continual, Federated, and LLM-Based TPPs

Prompt-based TPPs (PromptTPP) tackle distributional drift and catastrophic forgetting in streaming data scenarios using small, learnable prompt pools for continual learning—retrieved and fused with event encodings—enabling adaptation under privacy and memory constraints (Xue et al., 2023).

Models such as TPP-LLM integrate pretrained LLMs as semantic decoders, leveraging marker textual descriptions and temporal encodings, with parameter-efficient fine-tuning (e.g., LoRA) for event forecasting. These models outperform classical and transformer-based neural TPP baselines on log-likelihood, accuracy, and RMSE, and facilitate event sequence modeling in settings with rich semantic structure in event types (Liu et al., 2024). LLM-powered TPPs open new directions for causal, multimodal, and few-shot event modeling (Zhou et al., 24 Jan 2025).

6. Open Problems and Future Directions

Key open avenues for TPP research include:

Together, these developments establish TPPs as a core mathematical object for modern event sequence machine learning, integrating statistical rigor, neural and symbolic expressivity, interpretability, and computational tractability.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)
7.

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Temporal Point Processes (TPPs).