Wasserstein Learning of Deep Generative Point Process Models (1705.08051v1)

Published 23 May 2017 in cs.LG and stat.ML

Abstract: Point processes are becoming very popular in modeling asynchronous sequential data due to their sound mathematical foundation and strength in modeling a variety of real-world phenomena. Currently, they are often characterized via intensity function which limits model's expressiveness due to unrealistic assumptions on its parametric form used in practice. Furthermore, they are learned via maximum likelihood approach which is prone to failure in multi-modal distributions of sequences. In this paper, we propose an intensity-free approach for point processes modeling that transforms nuisance processes to a target one. Furthermore, we train the model using a likelihood-free leveraging Wasserstein distance between point processes. Experiments on various synthetic and real-world data substantiate the superiority of the proposed point process model over conventional ones.

Citations (163)

View on Semantic Scholar

Summary

The paper introduces a novel deep generative framework for modeling point processes using Wasserstein GANs, which is intensity-free and likelihood-free.
Empirical results show the framework outperforms traditional likelihood methods on synthetic and real data, demonstrating robustness to model mis-specification.
Utilizing Wasserstein distance within the GAN framework improves training stability and addresses issues like mode collapse common in likelihood-based methods.

A Technical Overview of "Wasserstein Learning of Deep Generative Point Process Models"

The paper "Wasserstein Learning of Deep Generative Point Process Models" introduces a novel methodology for modeling point processes by leveraging deep generative models and the Wasserstein distance. Point processes, particularly temporal point processes, are integral in modeling sequential event data across various domains, such as e-commerce, healthcare, and social networks. Traditional approaches often rely heavily on the assumption of a specific parametric intensity function, which may not adequately capture the complexities of real-world sequential data due to inherent parametric constraints and assumptions.

Methodological Contributions

The primary contribution of this paper is the proposal of a novel generative framework for point processes that does not rely on conditional intensity functions. Instead, the authors introduce a deep generative model that utilizes a Wasserstein Generative Adversarial Network (WGAN) approach. This shift is notable for several reasons:

Intensity-Free Modeling: The intensity-free methodology bypasses the restrictive assumptions typically associated with the functional form of intensity-based models. Instead, it maps a simpler point process to a target distribution directly, facilitating more flexible modeling of complex sequential behaviors.
Likelihood-Free Training: The model eschews conventional likelihood-based estimation, favoring Wasserstein distance minimization for training. Likelihood-based methods are prone to issues like mode collapse, a problem that generative adversarial networks (GANs), particularly those optimized via the Wasserstein distance, are well-suited to address.
Recurrent Neural Networks (RNNs) Utilization: The model employs RNNs for capturing the sequential nature inherent in point processes, aligning with previous attempts that use neural networks for dynamic temporal modeling but extending their application through adversarial training.

Empirical Validation

The proposed framework demonstrates robust performance across both synthetic and real-world datasets. Noteworthy experiments highlighted include:

Synthetic Data Trials: The model consistently outperforms maximum likelihood estimation (MLE) methods when the parametric form is not correctly specified. This finding underscores the model's robustness to mis-specification, attributed to its intensity-free, generative adversarial architecture.
Real-World Applications: Across diverse datasets from healthcare to finance, the model captures the underlying event dynamics more accurately than parametrically constrained approaches, suggesting practical utility in varied contexts lacking precise parametric detail.

Theoretical and Practical Implications

This work places significant emphasis on the superiority of utilizing Wasserstein distance within the GAN framework versus Kullback-Leibler divergence commonly employed in MLE-based models. This transition is pivotal in addressing issues such as mode dropping and unstable training surfaces and aligns with broader trends in GAN optimization.

Practically, practitioners in fields requiring precise event modeling but lacking detailed intensity functions can deploy this model to better understand temporal dynamics. Furthermore, the methodological advancements introduced may inspire adaptations across other types of stochastic processes beyond temporal point processes.

Future Directions

The paper leaves open several avenues for future research:

Extension to Marked Point Processes: Extending the current model to handle marked point processes could enrich its applicability, allowing it to account not only for event timing but also for categorical or continuous labels associated with events.
Exploration in Structured Spaces: Investigating adaptations of the model in structured spaces beyond the real half-line, such as spatial point processes, would broaden its utility in geography and ecology.
Embedding Alternative Distance Metrics: Exploring alternative geometrically or topologically meaningful distance metrics could yield further improvements in training stability and model fidelity.

In conclusion, the "Wasserstein Learning of Deep Generative Point Process Models" paper represents a significant step forward in the modeling of complex sequential phenomena, free from many of the traditional constraints imposed by intensity functions and likelihood-based training methodologies. Its promising results across synthetic and real-world datasets make it a compelling option for researchers and practitioners in temporal data analysis.