FIM-PP: Foundation Inference Model for Point Processes
- FIM-PP is a rigorous statistical framework for Bayesian sequential inference in point processes, jointly estimating latent trajectories and static parameters.
- It utilizes sequential Monte Carlo samplers enhanced by state-space saturation and data-point tempering to overcome challenges like weight and path degeneracy.
- Originally designed for financial applications such as ultra-high frequency trading, the methodology also extends to neuroscience, social systems, and other dynamic domains.
A Foundation Inference Model for Point Processes (FIM-PP) refers to a rigorous statistical and computational modeling framework for sequential inference in point process systems, particularly under conditions of partial or discrete observation. The term arises from work addressing the realistic setting where both the latent trajectory of a point process and key static parameters are jointly unknown, and must be inferred as data arrives over time. FIM-PP integrates Bayesian modeling, sequential Monte Carlo (SMC) methods, and specialized strategies to manage the expanding state space and degeneracy issues intrinsic to dynamically evolving point processes. It is particularly motivated by financial applications where marked point processes are observed at discrete times, but its methodology extends to other domains (e.g., neuroscience, social systems) involving partially observed or complex temporal structures.
1. Bayesian Formulation and Posterior Structure
FIM-PP builds on the Bayesian approach to inference for point process models observed only partially and discretely, incorporating static parameter uncertainty via prior distributions. The latent process trajectory—comprising the sequence of event times and marks—is unobserved and dynamically expanding. In canonical financial modeling settings, such as ultra-high frequency trading, one models the latent event rate λₜ as a doubly stochastic (Cox) process, often driven by a compound Poisson process with latent changepoints:
where each event corresponds to an unknown jump time φⱼ and jump size ζⱼ.
The joint posterior at time tₙ, given observed marks and times , static parameters μ, σ, and latent trajectory , has the form:
The likelihood term involves the product over observed events, mark densities, and exponential compensator parts:
Such theoretical structure enables principled sequential updating as new data are observed.
2. Sequential Monte Carlo Sampling Framework
Exact filtering is computationally infeasible; therefore, FIM-PP employs particle-based SMC samplers for posterior approximation. N particles representing process histories are propagated and weighted at each data arrival step. The kernel-based update is:
with backward kernel optimally chosen as the reversal of with respect to current posteriors.
Effective Sample Size (ESS) is monitored; resampling occurs when ESS falls below threshold. MCMC rejuvenation sweeps are optionally included post-resampling to mitigate particle degeneracy.
3. State-Space Expansion, Proposal Mismatch, and Weight Degeneracy
Crucial challenges originate in the necessary expansion of state space as new events (changepoints) are sampled. If proposals for new latent states are drawn only from prior, mismatch with the likelihood terms—especially the compensator and observed points—induces high variance in importance weights. This can cause:
- Severe weight degeneracy, where most particles have negligible weight and resampling dominates (over 90% in naive implementations).
- Path degeneracy, where particle diversity is lost over sequential updates, leading to inaccurate smoothing and filtering.
Variance lower bounds for importance sampling estimators worsen with increasing observation count, underscoring inefficiency of straightforward proposals in high-dimensional latent space.
4. Efficient Solutions: State-Space Saturation and Data-Point Tempering
The paper introduces two solutions to control weight variance and degeneracy:
A. State-Space Saturation
When the full observation interval is known, the state space is saturated by fixing latent domain:
Likelihood terms are "switched on" only over observed intervals . Incremental SMC weights simplify to ratios of saturated posteriors:
This approach eliminates new dimension proposals, reducing variance, though computational cost remains high due to maintenance of full-particle histories.
B. Data-Point Tempering
When is unknown or data should be integrated gradually, tempering adds new likelihood terms sequentially for each incoming observation. The latent extension is performed via prior, and likelihood terms for new observations are introduced incrementally; see:
Under uniformity and mixing conditions, the variance of estimates remains bounded independently of number of data points:
for any test function .
5. Theoretical Guarantees and Empirical Diagnostics
The framework’s efficacy is ensured both in theory and practice:
- Proposition 1 establishes a non-asymptotic error bound for particle approximations in the relevant models.
- Without tempering/saturation, error growth with time is demonstrated theoretically and empirically (via effective sample size and RMSE).
- Saturated and tempered samplers exhibit markedly lower resampling rates (as low as 2% for tempered kernels vs 90% naive), improved RMSE in intensity estimates, and lower computational overhead.
Reduced Computational Complexity (RCC) variants limit MCMC sweeps to the most recent 20 changepoints, trading smoothing accuracy for constant per-iteration cost.
6. Application Domains and Extensions
The FIM-PP methodology was originally developed for financial applications—tracking latent trade intensities in ultra-high frequency trading. Sequential Bayesian inference enables real-time filtering and retrospective smoothing as trades occur. However, the methodology is generically applicable wherever latent marked point processes are sequentially updated, including:
- Event-stream modeling in computational neuroscience
- Queueing/risk/event analysis in engineering
- Marked process inference in social or network systems
Key features extend directly to these contexts: Bayesian updating of static parameters, particle propagation under expanding latent histories, and modular strategies for combating degeneracy.
7. Summary and Significance
FIM-PP formalizes simulation-based Bayesian sequential inference for partially observed marked point processes with static parameters, grounded in SMC methodology enhanced by state-space saturation and data-point tempering techniques. The framework balances statistical rigor (theoretical non-asymptotic error bounds, control of degeneracy) and computational feasibility (stable ESS, low resampling rates with saturation/tempering) for practically challenging inference problems. The approach ensures that inference adapts as new data arrive, maintaining consistency, efficiency, and accommodating dynamically expanding latent structure—a foundation for robust Bayesian analysis of point process systems across multiple scientific and engineering domains.