Temporal Point Processes: Models and Advances

Updated 8 February 2026

Temporal point processes are stochastic models that represent sequences of discrete events in continuous time using conditional intensity functions.
They evolve from classical models like Poisson and Hawkes to advanced neural and nonparametric approaches, enhancing forecasting and uncertainty quantification.
Recent advances include parallel sampling, intensity-free methods, and joint mark-time modeling to improve interpretability and scalability.

A temporal point process (TPP) is a stochastic process whose realizations are sequences of discrete events localized in continuous time. TPPs provide a foundational framework for modeling asynchronous event data across diverse scientific, engineering, and social domains. Their mathematical core is the event occurrence mechanism described by the conditional intensity function, which quantifies the instantaneous rate of event arrivals conditioned on the complete event history. Over the past decade, TPPs have evolved from classical models (such as Poisson and Hawkes processes) to sophisticated neural and nonparametric representations. The resulting models now underpin state-of-the-art methods in event sequence forecasting, generative modeling, uncertainty quantification, and adaptivity to complex temporal patterns.

1. Mathematical Foundations of Temporal Point Processes

A TPP defined on $[0,T]$ generates a realization—a sequence of event times— $0 < t_1 < t_2 < \cdots < t_N \le T$ , and, in the marked case, associated marks $m_i$ from a finite or structured set. The process history up to $t$ is $\mathcal{H}_t = \{(t_j, m_j): t_j < t\}$ .

The canonical characterizing function is the conditional intensity: $\lambda^*(t \mid \mathcal{H}_t) = \lim_{\Delta t\to 0} \frac{\mathbb{P}(\text{event in } [t, t+\Delta t) \mid \mathcal{H}_t)}{\Delta t}$ or, in multivariate (marked) settings,

$\lambda^*_k(t \mid \mathcal{H}_t) = \lim_{\Delta t\to 0} \frac{\mathbb{P}(\text{type-}k \text{ event in } [t, t+\Delta t) \mid \mathcal{H}_t)}{\Delta t}$

The log-likelihood for an observed sequence $\{(t_i, m_i)\}_{i=1}^N$ is

$\mathcal{L} = \sum_{i=1}^N \log \lambda_{m_i}(t_i \mid \mathcal{H}_{t_i}) - \sum_{k=1}^K \int_0^T \lambda_k(u \mid \mathcal{H}_u) du$

(Shchur et al., 2021, Zhou et al., 24 Jan 2025). The survival function, hazard function, and CDF/density of the next event time are interrelated via integral transforms, laying the basis for both theory and inference.

2. Classical and Neural Model Classes

Classical TPPs employ hand-crafted intensities:

Poisson: constant intensity, no history dependence.
Hawkes: intensity excited by past events, typically via decaying kernels: $\lambda^*(t) = \mu + \sum_{t_i < t} \phi(t - t_i)$ .
Self-correcting, power-law, mixture kernels: to capture various forms of temporal dependence (Potter et al., 20 Mar 2025).

Neural TPPs generalize this by parameterizing the history-to-intensity mapping via deep networks:

RNN-based: Hidden state $h_i$ encodes history; $\lambda(t)$ = MLP( $h_{i-1}, t-t_{i-1}$ ) (Shchur et al., 2021).
Transformer/Attention-based: Self-attention layers aggregate past events; decoder MLP outputs intensity/density (Xue et al., 2023, Meng et al., 2024).
Neural ODE/SDE TPPs: ODE/SDE governs the continuous-time hidden trajectory; event updates at arrivals (Zhou et al., 24 Jan 2025).
Intensity-free models: Conditional density of next inter-event time is modeled directly via flows or mixtures, bypassing explicit intensity parameterization (Shchur et al., 2019, Mehrasa et al., 2019).

Recent progress includes flow-based nonautoregressive sampling and continuous-time Markov process formulations (Lüdke et al., 7 Oct 2025, Shchur et al., 2020), and benchmark suites for reproducible evaluation (Xue et al., 2023).

3. Beyond Intensity: Generative, Uncertainty, and Interpretability Advances

3.1. Non-Autoregressive and Flow-Based Models

Recent models bypass sequential sampling bottlenecks by learning generative flows over event sequences. In Edit-Based Flow Matching (EdiTPP), sequence generation proceeds via a continuous-time Markov chain of insert-delete-substitute edits, parameterized by learned edit rates. This non-autoregressive process allows sampling complete sequences in parallel, yielding statistically and computationally superior generation compared to autoregressive neural TPPs. EdiTPP achieves the best sample quality and 2–5× speedup on benchmarks, and further allows tuning the compute-accuracy tradeoff via the CTMC resolution (Lüdke et al., 7 Oct 2025).

TriTPP parameterizes the TPP density as a triangular normalizing flow from homogeneous Poisson reference times to real events, enabling parallelized likelihood evaluation and fast sampling. Maximum-likelihood estimation remains exact, and the framework supports variational inference in latent continuous-time discrete-state models (Shchur et al., 2020).

3.2. Intensity-Free and Distributional Learning

Intensity-free approaches model the conditional probability density of inter-event times directly using universal one-dimensional normalizing flows or log-normal mixtures, allowing for exact likelihoods, efficient ancestral sampling, and closed-form moments. This paradigm provides state-of-the-art predictive accuracy and unique capabilities such as missing-data imputation and sequence embedding (Shchur et al., 2019, Mehrasa et al., 2019, Subramanian et al., 27 Nov 2025).

CDF-based methods such as CuFun sidestep the numerical and modeling constraints of explicit intensity forms by parameterizing the cumulative hazard via a monotonic neural network, leveraging RNN-based history encoding and multiplicative scaling for long-range dependencies. This yields both stable likelihoods and improved periodic structure capture relative to intensity-based TPPs (Wang et al., 2024).

3.3. Joint Mark-Time and Multivariate Modeling

Classical neural TPPs often posit conditional independence between event time and mark. Recent work directly links the distribution of inter-arrival times to the next event’s mark, modeling their joint distribution either via multivariate intensity parameterization or, more tractably, with a distinct per-type density. This approach yields improved fit and micro-F1 scores across synthetic and real data (Waghmare et al., 2022). Moreover, distribution-free conformal prediction methods now enable construction of joint prediction regions for next time and mark with provable finite-sample marginal coverage (Dheur et al., 2024).

4. Model Training, Benchmarking, and Theory

4.1. Supervised, Semi-Supervised, and Meta-Learning

Standard neural TPP models are trained by maximum likelihood, requiring evaluation or approximation of the integral term in the log-likelihood (Shchur et al., 2021, Xue et al., 2023). Semi-supervised extensions incorporate unlabeled sequences via auxiliary reconstruction objectives, robustifying marker prediction under partial annotation regimes (Reddy et al., 2021). Meta-learning approaches reframe each event sequence as a distinct task, leveraging permutation-invariant context aggregation and latent-variable hierarchies (e.g., neural process or attentive neural process analogs), which improves generalization, especially in nonstationary or partially observed environments (Bae et al., 2023).

4.2. Benchmarks and Evaluation

Open-source benchmarking tools such as EasyTPP provide standardized data interfaces, evaluation suites, reference implementations, and reproducibility infrastructure for neural TPPs. Key evaluation metrics include negative log-likelihood (NLL), time- and mark-RMSE, event-type error rate, optimal transport on long-horizon sequences, and significance via permutation tests (Xue et al., 2023). Unified experimental protocols have clarified the importance of time and mark embedding choice, history encoder structure, and decoder parameterization—vectorial and learnable time embeddings and log-normal mixture decoders generally yield best predictive and calibration performance (Bosser et al., 2023).

4.3. Theoretical Guarantees

Rigorous analyses have recently established approximation and generalization guarantees for RNN-TPPs. Multi-layer tanh networks with bounded spectral norm approximate a broad class of intensity functions (Poisson, (non)linear Hawkes, self-correcting) to arbitrary precision over bounded-length histories. Excess risk bounds for NLL (log-likelihood loss) scale as $\tilde{O}(n^{-s/2(s+1)})$ in sample size $n$ , given regularity assumptions; RNN depth $L\le4$ suffices for universal TPP approximation (Chen et al., 2024). Tail-truncation and covering-number arguments address the challenge of unbounded event count per window.

5. Extensions: Covariates, Interpretability, and LLM Integration

5.1. Covariate-Augmented TPPs and Interpretability

Transformer-based covariate TPPs such as TransFeat-TPP encode time, mark, and feature/covariate vectors into joint embeddings, learning the conditional density of the next event via a log-normal mixture. Simultaneous attention-based feature importance modules recover interpretable, unsupervised rankings of covariate relevance per event or globally, and the learned importances match domain knowledge in real datasets (e.g., weather for pollution, clinical measurements for disease) (Meng et al., 2024).

Hybrid-rule TPPs incorporate symbolic temporal-logic rules (interpreted as decayed history contributions) and continuous covariate intensities, discovered via Bayesian-optimized rule mining. This compositional design provably improves both predictive accuracy and interpretability over existing symbolic or neural TPPs in medical event modeling (Cao et al., 15 Apr 2025).

5.2. Continual, Federated, and LLM-Based TPPs

Prompt-based TPPs (PromptTPP) tackle distributional drift and catastrophic forgetting in streaming data scenarios using small, learnable prompt pools for continual learning—retrieved and fused with event encodings—enabling adaptation under privacy and memory constraints (Xue et al., 2023).

Models such as TPP-LLM integrate pretrained LLMs as semantic decoders, leveraging marker textual descriptions and temporal encodings, with parameter-efficient fine-tuning (e.g., LoRA) for event forecasting. These models outperform classical and transformer-based neural TPP baselines on log-likelihood, accuracy, and RMSE, and facilitate event sequence modeling in settings with rich semantic structure in event types (Liu et al., 2024). LLM-powered TPPs open new directions for causal, multimodal, and few-shot event modeling (Zhou et al., 24 Jan 2025).

6. Open Problems and Future Directions

Key open avenues for TPP research include:

Scalability: Linear- or sublinear-time architectures (e.g., S4, RWKV) and parallel generative flows to handle massive asynchronous event logs (Zhou et al., 24 Jan 2025, Lüdke et al., 7 Oct 2025, Shchur et al., 2020).
Uncertainty Quantification: Model-agnostic conformal inference and calibrated probabilistic decoders for reliable prediction and decision-making (Dheur et al., 2024).
Interpretability and Causality: Integration of symbolic reasoning modules (rule mining, logic programming) and attention visualizations with neural TPPs, as well as causal structure learning and intervention forecasting (Cao et al., 15 Apr 2025, Meng et al., 2024).
Multimodality and Federated Learning: Multimodal event sequences (text, image, graph) and privacy-preserving cross-institutional training of TPP models (Zhou et al., 24 Jan 2025).
Theory: Comprehensive non-asymptotic risk bounds and universal approximation theorems for emerging TPP model families (Chen et al., 2024).

Together, these developments establish TPPs as a core mathematical object for modern event sequence machine learning, integrating statistical rigor, neural and symbolic expressivity, interpretability, and computational tractability.

Markdown Upgrade to Chat

References (20)

Neural Temporal Point Processes: A Review (2021)

Advances in Temporal Point Processes: Bayesian, Neural, and LLM Approaches (2025)

Temporal Point Process Modeling of Aggressive Behavior Onset in Psychiatric Inpatient Youths with Autism (2025)

EasyTPP: Towards Open Benchmarking Temporal Point Processes (2023)

TransFeat-TPP: An Interpretable Deep Covariate Temporal Point Processes (2024)

Intensity-Free Learning of Temporal Point Processes (2019)

Point Process Flows (2019)

Edit-Based Flow Matching for Temporal Point Processes (2025)

Fast and Flexible Temporal Point Processes with Triangular Maps (2020)

10.

Density-based Neural Temporal Point Processes for Heartbeat Dynamics (2025)

11.

Cumulative Distribution Function based General Temporal Point Processes (2024)

12.

Modeling Inter-Dependence Between Time and Mark in Multivariate Temporal Point Processes (2022)

13.

Distribution-Free Conformal Joint Prediction Regions for Neural Marked Temporal Point Processes (2024)

14.

Semi-supervised Learning for Marked Temporal Point Processes (2021)

15.

Meta Temporal Point Processes (2023)

16.

On the Predictive Accuracy of Neural Temporal Point Process Models for Continuous-time Event Data (2023)

17.

On Non-asymptotic Theory of Recurrent Neural Networks in Temporal Point Processes (2024)

18.

Interpretable Hybrid-Rule Temporal Point Processes (2025)

19.

Prompt-augmented Temporal Point Process for Streaming Event Sequence (2023)

20.

TPP-LLM: Modeling Temporal Point Processes by Efficiently Fine-Tuning Large Language Models (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Temporal Point Processes (TPPs).