Improving regret bounds for continuous‑time mean–variance RL

Improve the cumulative Sharpe‑ratio regret bound for the baseline stochastic approximation algorithm for continuous‑time mean–variance portfolio selection in a multi‑stock Black–Scholes market, tightening the current sublinear O(√(N (log N)^p log log N)) rate toward sharper or optimal bounds.

Background

The baseline algorithm achieves a sublinear cumulative regret (in Sharpe ratio) of order √(N (log N)^p log log N), ensuring near‑optimal performance asymptotically. The authors explicitly identify the improvement of this regret bound as an open question, aiming for tighter rates and potentially optimal constants.

References

In the MV setting, important open questions include performance guarantees of modified online algorithms, improvement of regret bound, off-policy learning, and large investors whose actions impact the asset prices (so counterfactuals become unobservable by mere “paper portfolios”).

— Mean--Variance Portfolio Selection by Continuous-Time Reinforcement Learning: Algorithms, Regret Analysis, and Empirical Study (2412.16175 - Huang et al., 8 Dec 2024) in Section 6 (Conclusions)

Improving regret bounds for continuous‑time mean–variance RL

Sponsor

Background

References

Related Problems