Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
GPT-5.1
GPT-5.1 114 tok/s
Gemini 3.0 Pro 53 tok/s Pro
Gemini 2.5 Flash 132 tok/s Pro
Kimi K2 176 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Hyper Hawkes Process (HHP): Interpretable Event Modeling

Updated 9 November 2025
  • Hyper Hawkes Process (HHP) is a marked temporal point process model that extends classical Hawkes processes by leveraging a latent state and history-dependent hypernetwork.
  • It introduces a latent dimensional lifting mechanism that enables both higher model expressivity and effective dimensional compression to capture complex temporal dependencies.
  • The model offers transparent event-level interpretability through conditionally linear recurrences while achieving efficient parameter usage and high predictive performance.

The Hyper Hawkes Process (HHP) is a class of marked temporal point process (MTPP) models that simultaneously addresses the interpretability limitations of neural MTPPs and the rigidity of classical Hawkes processes. HHP achieves this by expanding the dynamics into a latent space and introducing a history-dependent hypernetwork, yielding models that are both highly expressive and amenable to rigorous, event-level interpretability. The model exhibits piecewise, conditionally linear recurrences in the latent state, enabling both transparent prediction mechanisms and high predictive performance characteristic of neural models.

1. Model Specification and Latent Dynamics

Let the event history be Ht={(ti,ki)}i=1Nt\mathcal{H}_t = \{(t_i, k_i)\}_{i=1}^{N_t} with marks ki{1,,K}k_i \in \{1, \dots, K\}. HHP models a dd-dimensional latent state xtRd\mathbf{x}_t \in \mathbb{R}^d, whose time evolution determines the vector of event intensities λt=[λt1,,λtK]\boldsymbol\lambda_t = [\lambda_t^1, \ldots, \lambda_t^K]^\top. The coupled system is:

dxt=βtxtdt+αdNt,βt=fθ(Ht),λt=σ(μ+Wxt)d\mathbf{x}_t = -\boldsymbol\beta_t \mathbf{x}_{t-} dt + \boldsymbol\alpha\, d\mathbf{N}_t,\quad \boldsymbol\beta_t = f_\theta(\mathcal{H}_{t-}),\quad \boldsymbol\lambda_t = \sigma\bigl(\boldsymbol\mu + W\,\mathbf{x}_{t-}\bigr)

where:

  • Nt{0,1}K\mathbf{N}_t \in \{0,1\}^K is the counting process indicator;
  • αRd×K\boldsymbol\alpha \in \mathbb{R}^{d\times K} collects mark-specific impulse vectors;
  • WRK×dW \in \mathbb{R}^{K \times d}, μRK\boldsymbol\mu \in \mathbb{R}^K, and σ(z)=log(1+ez)\sigma(z) = \log(1 + e^z) ensure nonnegative intensities;
  • fθf_\theta is a history-encoded hypernetwork providing dynamics.

Between events ti<t<ti+1t_i < t < t_{i+1}, fθ(Hti)f_\theta(\mathcal{H}_{t_i}) fixes βt=βi=ViDiVi\boldsymbol\beta_t = \beta_i = V_i D_i V_i^*, permitting the closed-form state update:

xt=VieDi(tti)Vixti\mathbf{x}_{t} = V_i\,e^{D_i\,(t-t_i)}\,V_i^*\,\mathbf{x}_{t_i}

At each event (i+1)(i+1) of type ki+1k_{i+1}, the latent state is updated by:

xti+1=xti+1+αki+1\mathbf{x}_{t_{i+1}} = \mathbf{x}_{t_{i+1}-} + \boldsymbol\alpha_{k_{i+1}}

Across the whole trajectory, the latent process is thus governed by a piecewise, conditionally linear recurrence.

2. Latent Dimensional Lifting and Expressivity

In classical linear Hawkes, the latent and mark dimensions coincide (d=Kd = K), with parameters β,αRK×K\beta, \alpha \in \mathbb{R}^{K \times K}. HHP lifts this rigidity, allowing dKd\gg K for expressivity or d<Kd < K for compression:

dxt=βtxtdt+k=1KαkdNtk,λt=σ(μ+Wxt)d\mathbf{x}_t = -\beta_t \mathbf{x}_{t-} dt + \sum_{k=1}^K \alpha_k\, dN_t^k,\quad \boldsymbol\lambda_t = \sigma\bigl(\boldsymbol\mu + W\,\mathbf{x}_{t-}\bigr)

Each mark-specific event injects a vector αk\alpha_k, and WW projects the high-dimensional state to the KK-dimensional intensity. This decoupling enables HHP to model dependencies unapproachable by standard Hawkes models, while retaining analytic tractability of the latent process.

3. Hypernetwork Dynamics and Architecture

The decay/control matrix βt\beta_t is history- and time-adaptive via a neural hypernetwork based on a GRU. For each event index ii, the hypernetwork maintains a hidden state ziRhz_i \in \mathbb{R}^h:

zi=GRUϕ(zi1,[log(titi1),eki]),z0=0z_i = \mathrm{GRU}_\phi(z_{i-1}, [\log(t_i - t_{i-1}), e_{k_i}]),\quad z_0 = \mathbf{0}

From ziz_i:

  • di=Wdzi+bdRdd_i = W_d z_i + b_d \in \mathbb{R}^d
  • Di=diag(softplus(di)u)D_i = -\mathrm{diag}(\mathrm{softplus}(d_i) \odot u) with (Di)<0\Re(D_i) < 0
  • vi=Wvzi+bvR2drv_i = W_v z_i + b_v \in \mathbb{R}^{2dr}
  • Vi=unitary(vi)V_i = \mathrm{unitary}(v_i) (using a standard parameterization to output unitary matrices, a la Jing et al. 2017)

Thus, {Vi,Di}=fθ(Hti)\{V_i, D_i\} = f_\theta(\mathcal{H}_{t_i}) forms the eigendecomposition of βt\beta_t over (ti,ti+1](t_i, t_{i+1}].

4. Interpretability and Linear Attribution Mechanisms

The conditional linearity of the update law enables decomposition of the latent state into per-event "particles" for events jij \leq i, for any t(ti,ti+1]t \in (t_i, t_{i+1}]:

xt(j)=W(k=jiVkeDk(min{t,tk+1}tk)Vk)αkj\mathbf{x}_t^{(j)} = W \left( \prod_{k=j}^i V_k e^{D_k(\min\{t, t_{k+1}\} - t_k)} V_k^* \right) \alpha_{k_j}

λt=σ(μ+j=1ixt(j))\boldsymbol\lambda_t = \sigma\left(\boldsymbol\mu + \sum_{j=1}^i \mathbf{x}_t^{(j)}\right)

In the limit where β\beta is constant (classical Hawkes), this reduces to the well-known exponential decay form:

xt(j)=eβ(ttj)αkj\mathbf{x}_t^{(j)} = e^{-\beta (t-t_j)} \alpha_{k_j}

This structure permits precise attribution of instantaneous and cumulative influence for each event via leave-one-out probes:

DFλt(j)=λtσ(μ+ijxt(i))\mathrm{DF}\lambda_t^{(j)} = \lambda_t - \sigma \left( \boldsymbol\mu + \sum_{i\neq j} \mathbf{x}_t^{(i)} \right)

DFΛt(j)=0tDFλs(j)ds\mathrm{DF}\Lambda_t^{(j)} = \int_0^t \mathrm{DF}\lambda_s^{(j)} ds

Such closed-form probes can determine the degree to which each past event excites or inhibits the process, generalizing the transparency of classical Hawkes models to the more expressive HHP framework.

5. Training Procedure and Inference Workflow

HHP is trained by maximizing the standard log-likelihood for MTPPs:

L(HT)=i=1NTlogλtiki0Tk=1Kλskds\mathcal{L}(\mathcal{H}_T) = \sum_{i=1}^{N_T} \log \lambda_{t_i}^{k_i} - \int_0^T \sum_{k=1}^K \lambda_s^k ds

The time integral term is approximated via uniform sampling in each inter-event interval, and the hypernetwork is re-evaluated only at event times. The parameter set θ={ϕ,α,W,μ}\theta = \{ \phi, \boldsymbol\alpha, W, \boldsymbol\mu \} is optimized end-to-end with Adam, using only early stopping and weight decay (search over latent dimension dd, GRU hidden size, etc.) with no additional regularization.

6. Benchmarking and Empirical Performance

HHP was evaluated across diverse real-world datasets: Amazon reviews, Retweet cascades, NY Taxi pickups, Taobao purchases, StackOverflow posts, Last.fm listening logs, and MIMIC-II medical events. Metrics included per-event log-likelihood (time- and mark-decomposed), next-time RMSE, next-mark accuracy, and calibration (PCE for time, ECE for marks), aggregated by a composite rank.

The principal baselines were: RMTPP, NHP, SAHP, THP, IFTPP, AttNHP, and S2P2. HHP achieved a composite average rank of 2.6 (placing as best or second-best on 4 of 6 metrics), with particular strength in time RMSE (1.4) and mark accuracy (1.7), and matched state-of-the-art log-likelihood (rank 2.0 against S2P2’s 1.9). Notably, HHP required on average 54% fewer parameters than S2P2, while maintaining top-tier predictive performance.

Dataset Best/Second-Best Metrics Parameter Efficiency
Amazon, Retweet, ... 4/6 metrics (#1 or #2) 54% fewer than S2P2

7. Synthesis: Flexibility, Interpretability, and Research Context

HHP fundamentally bridges the dichotomy between classical and neural MTPP models. By maintaining the linear Hawkes recurrence, HHP preserves the closed-form, per-event attribution—enabling rigorous interpretability probes for direct inspection of model predictions at the event level. Simultaneously, the inclusion of a hypernetwork responsible for generating piecewise constant, history-conditioned decay dynamics (βt\beta_t), and the increased latent state dimension, ameliorate limitations in expressivity observed in standard Hawkes frameworks. The model thus exhibits non-stationary, adaptive temporal memory, combining the transparent structure of Hawkes with the flexibility and performance previously characteristic only of neural MTPPs. The empirical results demonstrate that HHP’s interpretability does not come at the expense of predictive prowess, offering a route towards interpretable, high-capacity event modeling in real-world temporal domains.

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Hyper Hawkes Process (HHP).