Dice Question Streamline Icon: https://streamlinehq.com

Off‑policy learning for continuous‑time mean–variance reinforcement learning

Develop and analyze off‑policy learning theory and algorithms for continuous‑time mean–variance reinforcement learning, including policy evaluation and policy gradient methods when training data are generated under behavior policies different from the target policies to be executed.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper trains with stochastic policies but recommends deterministic execution, noting this setup as a form of off‑policy learning. While this is motivated and discussed, the authors explicitly list off‑policy learning as an open question, indicating the need for formal theory and guarantees in the continuous‑time MV context.

References

In the MV setting, important open questions include performance guarantees of modified online algorithms, improvement of regret bound, off-policy learning, and large investors whose actions impact the asset prices (so counterfactuals become unobservable by mere “paper portfolios”).