Hyper Hawkes Process (HHP): Interpretable Event Modeling
- Hyper Hawkes Process (HHP) is a marked temporal point process model that extends classical Hawkes processes by leveraging a latent state and history-dependent hypernetwork.
- It introduces a latent dimensional lifting mechanism that enables both higher model expressivity and effective dimensional compression to capture complex temporal dependencies.
- The model offers transparent event-level interpretability through conditionally linear recurrences while achieving efficient parameter usage and high predictive performance.
The Hyper Hawkes Process (HHP) is a class of marked temporal point process (MTPP) models that simultaneously addresses the interpretability limitations of neural MTPPs and the rigidity of classical Hawkes processes. HHP achieves this by expanding the dynamics into a latent space and introducing a history-dependent hypernetwork, yielding models that are both highly expressive and amenable to rigorous, event-level interpretability. The model exhibits piecewise, conditionally linear recurrences in the latent state, enabling both transparent prediction mechanisms and high predictive performance characteristic of neural models.
1. Model Specification and Latent Dynamics
Let the event history be with marks . HHP models a -dimensional latent state , whose time evolution determines the vector of event intensities . The coupled system is:
where:
- is the counting process indicator;
- collects mark-specific impulse vectors;
- , , and ensure nonnegative intensities;
- is a history-encoded hypernetwork providing dynamics.
Between events , fixes , permitting the closed-form state update:
At each event of type , the latent state is updated by:
Across the whole trajectory, the latent process is thus governed by a piecewise, conditionally linear recurrence.
2. Latent Dimensional Lifting and Expressivity
In classical linear Hawkes, the latent and mark dimensions coincide (), with parameters . HHP lifts this rigidity, allowing for expressivity or for compression:
Each mark-specific event injects a vector , and projects the high-dimensional state to the -dimensional intensity. This decoupling enables HHP to model dependencies unapproachable by standard Hawkes models, while retaining analytic tractability of the latent process.
3. Hypernetwork Dynamics and Architecture
The decay/control matrix is history- and time-adaptive via a neural hypernetwork based on a GRU. For each event index , the hypernetwork maintains a hidden state :
From :
- with
- (using a standard parameterization to output unitary matrices, a la Jing et al. 2017)
Thus, forms the eigendecomposition of over .
4. Interpretability and Linear Attribution Mechanisms
The conditional linearity of the update law enables decomposition of the latent state into per-event "particles" for events , for any :
In the limit where is constant (classical Hawkes), this reduces to the well-known exponential decay form:
This structure permits precise attribution of instantaneous and cumulative influence for each event via leave-one-out probes:
Such closed-form probes can determine the degree to which each past event excites or inhibits the process, generalizing the transparency of classical Hawkes models to the more expressive HHP framework.
5. Training Procedure and Inference Workflow
HHP is trained by maximizing the standard log-likelihood for MTPPs:
The time integral term is approximated via uniform sampling in each inter-event interval, and the hypernetwork is re-evaluated only at event times. The parameter set is optimized end-to-end with Adam, using only early stopping and weight decay (search over latent dimension , GRU hidden size, etc.) with no additional regularization.
6. Benchmarking and Empirical Performance
HHP was evaluated across diverse real-world datasets: Amazon reviews, Retweet cascades, NY Taxi pickups, Taobao purchases, StackOverflow posts, Last.fm listening logs, and MIMIC-II medical events. Metrics included per-event log-likelihood (time- and mark-decomposed), next-time RMSE, next-mark accuracy, and calibration (PCE for time, ECE for marks), aggregated by a composite rank.
The principal baselines were: RMTPP, NHP, SAHP, THP, IFTPP, AttNHP, and S2P2. HHP achieved a composite average rank of 2.6 (placing as best or second-best on 4 of 6 metrics), with particular strength in time RMSE (1.4) and mark accuracy (1.7), and matched state-of-the-art log-likelihood (rank 2.0 against S2P2’s 1.9). Notably, HHP required on average 54% fewer parameters than S2P2, while maintaining top-tier predictive performance.
| Dataset | Best/Second-Best Metrics | Parameter Efficiency |
|---|---|---|
| Amazon, Retweet, ... | 4/6 metrics (#1 or #2) | 54% fewer than S2P2 |
7. Synthesis: Flexibility, Interpretability, and Research Context
HHP fundamentally bridges the dichotomy between classical and neural MTPP models. By maintaining the linear Hawkes recurrence, HHP preserves the closed-form, per-event attribution—enabling rigorous interpretability probes for direct inspection of model predictions at the event level. Simultaneously, the inclusion of a hypernetwork responsible for generating piecewise constant, history-conditioned decay dynamics (), and the increased latent state dimension, ameliorate limitations in expressivity observed in standard Hawkes frameworks. The model thus exhibits non-stationary, adaptive temporal memory, combining the transparent structure of Hawkes with the flexibility and performance previously characteristic only of neural MTPPs. The empirical results demonstrate that HHP’s interpretability does not come at the expense of predictive prowess, offering a route towards interpretable, high-capacity event modeling in real-world temporal domains.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free