Deep Reinforcement Learning of Marked Temporal Point Processes (1805.09360v2)

Published 23 May 2018 in cs.LG, cs.SI, and stat.ML

Abstract: In a wide variety of applications, humans interact with a complex environment by means of asynchronous stochastic discrete events in continuous time. Can we design online interventions that will help humans achieve certain goals in such asynchronous setting? In this paper, we address the above problem from the perspective of deep reinforcement learning of marked temporal point processes, where both the actions taken by an agent and the feedback it receives from the environment are asynchronous stochastic discrete events characterized using marked temporal point processes. In doing so, we define the agent's policy using the intensity and mark distribution of the corresponding process and then derive a flexible policy gradient method, which embeds the agent's actions and the feedback it receives into real-valued vectors using deep recurrent neural networks. Our method does not make any assumptions on the functional form of the intensity and mark distribution of the feedback and it allows for arbitrarily complex reward functions. We apply our methodology to two different applications in personalized teaching and viral marketing and, using data gathered from Duolingo and Twitter, we show that it may be able to find interventions to help learners and marketers achieve their goals more effectively than alternatives.

Citations (105)

View on Semantic Scholar

Summary

The paper introduces a novel deep reinforcement learning framework specifically designed for marked temporal point processes (MTPPs) to handle asynchronous stochastic events without restrictive assumptions on event intensities or distributions.
It proposes a flexible policy gradient method using deep recurrent neural networks that learns policies defined by intensity and mark distribution functions, allowing for the maximization of arbitrary reward functions with generalized gradient derivation.
Numerical validation demonstrates the method's effectiveness in personalized teaching and viral marketing applications, showing improved performance over baselines by dynamically optimizing event timing for objectives like memory recall and content visibility.

Deep Reinforcement Learning of Marked Temporal Point Processes

In the landscape of machine learning methodologies, this research presents an innovative approach to deep reinforcement learning (RL) applied to marked temporal point processes (MTPPs). As asynchronous stochastic discrete events become a commonality across various real-world applications, the challenge of designing timely interventions in continuous time emerges as a significant problem. This work provides a framework for solving such problems through a novel RL paradigm that does not make restrictive assumptions on the functional forms of event intensities and distributions.

Key Methodological Insights

This research introduces a reinforcement learning problem where both agent actions and environmental feedback are asynchronous stochastic events characterized by MTPPs. To address this, the authors developed a flexible policy gradient method that utilizes deep recurrent neural networks (RNNs) to learn policies defined by the intensity and mark distribution functions of the processes. This modeling enables the representation of complex temporal dependencies without specifying the structure of the underlying distributions.

The proposed policy gradient method allows for the maximization of arbitrary reward functions. The gradients required for policy optimization are derived without assuming a specific form for the intensity and distribution functions, thus generalizing beyond traditional stochastic optimal control frameworks. This adaptability is complemented by an efficient sampling mechanism for generating action event times, leveraging cumulative distribution functions to handle event feedback dynamically.

Applications and Numerical Validation

The paper demonstrates the applicability of the proposed method in two domains: personalized teaching and viral marketing. In personalized teaching, the objective is to optimize the timing of content presentation to aid long-term memorization, using data from the Duolingo platform. In viral marketing, the methodology seeks to optimize the posting schedule to enhance content visibility on social media platforms like Twitter.

For the personalized teaching application, the RL model notably improved recall probabilities at test times compared to baseline methods, including a stochastic optimal control-based alternative. This demonstrates the model's capacity to derive effective learning interventions without access to a precise student model or predefined objective functions. In the viral marketing scenario, the model outperformed existing algorithms designed for environments with assumption-laden inverse chronological feed sorting by dynamically optimizing the time and position metrics.

Implications and Future Directions

This research elevates the potential of MTPPs within the reinforcement learning landscape. By successfully applying this methodology to domains where interactions are asynchronous and continuous, it opens new avenues for developing RL solutions that operate under less restrictive assumptions about environmental dynamics and reward functions.

From a practical perspective, the integration of MTPP models with deep RL holds promise for applications in healthcare, finance, and network systems, where event timing greatly influences outcomes. Theoretically, these findings prompt further exploration into more sophisticated RL algorithms, such as actor-critic or multi-agent frameworks, specifically tailored for MTPPs.

The open-source release of the implementation and datasets provides a valuable resource for the research community, enabling further exploration and application of RL in event-driven processes. As interest in applications with complex, asynchronously timed interactions continues to rise, this research marks a pivotal contribution towards advancing the state-of-the-art in reinforcement learning for such environments.

Related Papers

YouTube

Show All Videos