Temporal Point Processes in Theory & Practice

Updated 2 May 2026

Temporal Point Processes are stochastic models characterized by sequences of discrete events in continuous time, defined via a conditional intensity function.
They encompass varied modeling approaches including intensity-based, intensity-free, and neural sequence models to capture complex event dynamics.
Applications span finance, healthcare, and social networks, with recent advances focusing on improved computational efficiency and long-range dependency modeling.

A temporal point process (TPP) is a stochastic process whose realizations are sequences of discrete events localized in continuous time. TPPs provide a mathematical framework for modeling, analyzing, and forecasting event dynamics across domains such as social networks, e-commerce, medicine, finance, and neuroscience. The distinguishing characteristic of a TPP is its continuous-time formulation, in contrast to discretized time-series models. The conditional intensity function is central to the theory and application of TPPs and forms the foundation of both classical and modern neural TPP modeling.

1. Mathematical Foundations of Temporal Point Processes

Given an observation window $[0,T]$ , let $\mathcal{T} = \{t_i\}_{i=1}^N$ denote the increasing sequence of event times. The (unmarked) conditional intensity function is defined as

$\lambda^*(t) = \mathbb{P}(\text{event in } [t, t+\mathrm{d}t) \mid \mathcal{H}(t)) / \mathrm{d}t,$

where $\mathcal{H}(t)$ is the history up to but not including $t$ .

For marked processes, each event also carries an associated mark $m_i\in \mathcal{M}$ , and the intensity generalizes to a vector $\lambda^*_k(t)$ for each mark type $k$ . The joint likelihood of observing $\{(t_i, m_i)\}_{i=1}^N$ is

$p(\mathcal{T}, \mathbf{m}) = \prod_{i=1}^N \lambda_{m_i}^*(t_i) \exp\left(-\int_{0}^{T} \sum_k \lambda_k^*(u) du\right).$

The cumulative distribution function (CDF) for the inter-event interval $\mathcal{T} = \{t_i\}_{i=1}^N$ 0 is

$\mathcal{T} = \{t_i\}_{i=1}^N$ 1

The corresponding density, survival, and hazard (intensity) functions are: $\mathcal{T} = \{t_i\}_{i=1}^N$ 2 For multivariate or marked TPPs, the joint conditional is typically modeled as $\mathcal{T} = \{t_i\}_{i=1}^N$ 3, or explicitly coupled as $\mathcal{T} = \{t_i\}_{i=1}^N$ 4, properly accounting for mark-conditioned timing dynamics (Waghmare et al., 2022).

2. Core Model Classes and Parameterization Strategies

2.1. Intensity-Based and Intensity-Free Models

Intensity-Based Models: Specify or learn the functional form of $\mathcal{T} = \{t_i\}_{i=1}^N$ 5, often through hand-crafted kernels (e.g. Hawkes, self-correcting, renewal) or learnable parameterizations. Classical methods include linear Hawkes with $\mathcal{T} = \{t_i\}_{i=1}^N$ 6. Neural approaches embed history using RNNs, Transformers, or ODEs and map hidden representations to intensity through parametric or semi-parametric forms (Omi et al., 2019, Shchur et al., 2021).

Intensity-Free Models: Directly parameterize the conditional CDF or inter-event density, circumventing the need for numerical integration of $\mathcal{T} = \{t_i\}_{i=1}^N$ 7. Notable frameworks include:

Monotonic Neural Networks for CDFs (CuFun): $\mathcal{T} = \{t_i\}_{i=1}^N$ 8 is parameterized by a monotonic neural network with positive weights and a sigmoid nonlinearity, allowing auto-differentiation for density/hazard estimation and log-likelihood computation (Wang et al., 2024).
Normalizing Flows: Temporal normalizing flows learn invertible transforms from base noise to inter-arrival times, allowing expressiveness for highly nonparametric laws (Mehrasa et al., 2019, Shchur et al., 2020).
Mixture Density Models: Use mixtures of tractable densities (log-normal, Weibull, etc.) with history-dependent parameters as in (Waghmare et al., 2022, Subramanian et al., 27 Nov 2025).

2.2. Neural Sequence Models

RNNs/LSTMs/GRUs: Encode event history into summarizing states $\mathcal{T} = \{t_i\}_{i=1}^N$ 9; used in RMTPP, FullyNN, and numerous hybrid approaches (Shchur et al., 2021, Omi et al., 2019). Encode both timing and optionally mark information (Waghmare et al., 2022, Reddy et al., 2021).

Transformers and Attention: Allow parallel, long-range, and contextual encoding of event history and marks, often providing performance gains in multi-type and high-mark cardinality settings (Shchur et al., 2021, Bae et al., 2023).

ODE/Continuous-Time Models: Capture between-event evolution for continuous-time latent states $\lambda^*(t) = \mathbb{P}(\text{event in } [t, t+\mathrm{d}t) \mid \mathcal{H}(t)) / \mathrm{d}t,$ 0, enabling more granular modeling of temporal drift (Lin et al., 2021, Shchur et al., 2021).

Meta-Learning (Neural Processes): Framing TPPs as sequences of tasks and learning rapid adaptation to new sequences through context aggregation and latent-variable modeling (Bae et al., 2023).

3. Specialized TPP Frameworks and Inference Algorithms

3.1. Cumulative Distribution Function-Based TPPs (CuFun)

CuFun leverages direct CDF modeling $\lambda^*(t) = \mathbb{P}(\text{event in } [t, t+\mathrm{d}t) \mid \mathcal{H}(t)) / \mathrm{d}t,$ 1 via a monotonic neural network. The network structure ensures that, for each fixed $\lambda^*(t) = \mathbb{P}(\text{event in } [t, t+\mathrm{d}t) \mid \mathcal{H}(t)) / \mathrm{d}t,$ 2, $\lambda^*(t) = \mathbb{P}(\text{event in } [t, t+\mathrm{d}t) \mid \mathcal{H}(t)) / \mathrm{d}t,$ 3 is non-decreasing, with enforced bounds at $\lambda^*(t) = \mathbb{P}(\text{event in } [t, t+\mathrm{d}t) \mid \mathcal{H}(t)) / \mathrm{d}t,$ 4 and $\lambda^*(t) = \mathbb{P}(\text{event in } [t, t+\mathrm{d}t) \mid \mathcal{H}(t)) / \mathrm{d}t,$ 5. The architecture combines an RNN history encoder with two positive-weight single-layer networks for $\lambda^*(t) = \mathbb{P}(\text{event in } [t, t+\mathrm{d}t) \mid \mathcal{H}(t)) / \mathrm{d}t,$ 6 and $\lambda^*(t) = \mathbb{P}(\text{event in } [t, t+\mathrm{d}t) \mid \mathcal{H}(t)) / \mathrm{d}t,$ 7, merged by an element-wise product before further monotone transformation (Wang et al., 2024). The log-likelihood is computed via

$\lambda^*(t) = \mathbb{P}(\text{event in } [t, t+\mathrm{d}t) \mid \mathcal{H}(t)) / \mathrm{d}t,$ 8

eliminating the need for numerical quadrature. This methodology demonstrates state-of-the-art performance in both synthetic and real-world settings, excelling in long-range temporal dependency capture and numerical stability.

3.2. Flow-Based and Generative Models

Flows—either triangular or conditional—capture the sequential nature of event times using invertible mappings, enabling exact likelihoods and ODE-based sampling. Examples include Point Process Flows and TriTPP (Mehrasa et al., 2019, Shchur et al., 2020). These models admit efficient, parallelizable likelihood evaluations and avoid recursive dependence on event history at sampling time.

Conditional generators, such as CEG and variational/diffusion-based autoregressive nets, sample (time, mark) pairs conditioned on learned embeddings without explicitly representing intensity functions (Dong et al., 2023). These are especially advantageous for high-dimensional mark spaces.

3.3. Semi-Supervised and Latent Variable Models

Models such as SSL-MTPP combine supervised and unsupervised branches to learn from both labeled (time, mark) and unlabeled (time-only) sequences, yielding robustness in data-scarce regimes (Reddy et al., 2021).

Variational Neural TPPs introduce global sequence-level latent variables $\lambda^*(t) = \mathbb{P}(\text{event in } [t, t+\mathrm{d}t) \mid \mathcal{H}(t)) / \mathrm{d}t,$ 9 with neural inference networks and optimize the ELBO, capturing uncertainty and heterogeneity (Eom et al., 2022). Such latent variable models improve rare-event coverage and event-type diversity in predictions.

4. Application Domains and Empirical Results

TPPs are applied in:

Social Networking and E-Commerce: User interactions, content dissemination, and transactional streams, where TPPs facilitate behavioral analysis and trend forecasting (Wang et al., 2024, Zhou et al., 24 Jan 2025).
Healthcare: Electronic health records and intensive care unit data, where interpretability and accurate event timing (e.g. disease progression) are required. Hybrid-Rule TPPs integrating temporal logic rules and numerical features achieve state-of-the-art predictive and interpretative results in clinical event modeling (Cao et al., 15 Apr 2025).
Finance: Trading activity and transaction modeling (e.g. NYSE trades), where fine temporal granularity and robust extrapolation are fundamental (Wang et al., 2024).
Human Activity and Sports Analytics: Joint estimation of when, what, and where for video or trajectory analysis (e.g. Time Perception Machine, (Zhong et al., 2018)).
Neuroscience and Signal Processing: Spike train modeling and physiological signals (e.g. heartbeat modeling with density-based TPPs (Subramanian et al., 27 Nov 2025)).
Discrete Sampling and Neural Computation: Multivariate TPP-based samplers for distributions on discrete space, with connections to queueing networks and neural sampling (Stewart et al., 10 Mar 2026).

Empirically, modern neural and flow-based models outperform parametric and mixture-based baselines on multiple metrics: negative log-likelihood (NLL), root mean squared error (RMSE), F1-score for mark prediction, and metrics of synthetic/realistic process behavior (e.g. effective sample size, count MAE, Wasserstein distance). Integral-free methods such as CuFun consistently realize robust improvements in both accuracy and computational efficiency (Wang et al., 2024).

5. Key Methodological and Computational Advances

Integral-Free Likelihoods: Direct density or CDF modeling enables auto-differentiated, closed-form likelihoods, eschewing the need for numerical quadrature and enhancing numerical stability (Wang et al., 2024, Omi et al., 2019).
Flexible, Universal Approximators: Modern neural architectures (deep monotonic nets, normalizing flows, or mixture-density networks) are universal function approximators for continuous, non-parametric inter-event distributions, overcoming the limitations of classical exponential or sum-kernel forms.
Efficient and Parallelizable Sampling: Triangular maps (TriTPP) and speculative sampling schemes enable parallel likelihood evaluation and parallel simulation of event sequences, overcoming the intrinsic sequentialism of classical thinning or autoregressive sampling (Biloš et al., 22 Oct 2025, Shchur et al., 2020).
Long-Range and Periodic Dependency Modeling: Architectures that capture explicit memory (RNN, attention) and model cumulative distributions enable direct learning of long-range, periodic, or multimodal event patterns (Wang et al., 2024, Waghmare et al., 2022).
Interpretability and Rule Integration: Hybrid logic–numeric intensity models, such as HRTPP, explicitly integrate temporal logic predicates and numerical features, achieving high rule validity, superior predictive accuracy, and stable, interpretable clinical rule discovery (Cao et al., 15 Apr 2025).

6. Limitations, Open Challenges, and Future Directions

Scalability and Memory: RNN encoders may fail to capture extremely long histories; attention or transformer-based encoders provide partial remedies at increased computational cost (Wang et al., 2024, Shchur et al., 2021). O( $\mathcal{H}(t)$ 0) complexity in transformers remains a bottleneck for large event streams (Zhou et al., 24 Jan 2025).
Positivity/Monotonicity Constraints: Neural CDF/hazard modeling imposes nonnegativity and monotonicity on network weights and activations—this requires projection, clamping, or specialized optimization (Wang et al., 2024).
Mark and Feature Complexity: Extending to high-dimensional or structured marks (e.g. images, spatio-temporal, text) or continuous/covariate marks is active research (Dong et al., 2023).
Interpretability: Deeper neural and flow-based models often lack the intrinsic interpretability of kernel-based or rule-augmented systems. There is continued need for architectures supporting causality and model introspection (Cao et al., 15 Apr 2025).
Evaluation and Benchmarking: Negative log-likelihood remains standard, but improved scoring rules and goodness-of-fit diagnostics (e.g. rescaling-transform/KS distance) are necessary for robust model comparison, particularly as expressive models can "overfit" NLL without capturing correct process dynamics (Subramanian et al., 27 Nov 2025, Shchur et al., 2021).

Future research directions include adaptive rule mining and composition, meta-learning for few-shot sequence adaptation, efficient parallel inference, integration with large multimodal models, and online/adaptive refinement for streaming or concept-drifting data (Zhou et al., 24 Jan 2025, Wang et al., 2024, Bae et al., 2023). The theoretical characterization of universal approximators for marked and multivariate point processes, together with scalable, interpretable, and uncertainty-sensitive modeling, remains a central focus for advancing the field.