Improving regret bounds for continuous‑time mean–variance RL
Improve the cumulative Sharpe‑ratio regret bound for the baseline stochastic approximation algorithm for continuous‑time mean–variance portfolio selection in a multi‑stock Black–Scholes market, tightening the current sublinear O(√(N (log N)^p log log N)) rate toward sharper or optimal bounds.
References
In the MV setting, important open questions include performance guarantees of modified online algorithms, improvement of regret bound, off-policy learning, and large investors whose actions impact the asset prices (so counterfactuals become unobservable by mere “paper portfolios”).
                — Mean--Variance Portfolio Selection by Continuous-Time Reinforcement Learning: Algorithms, Regret Analysis, and Empirical Study
                
                (2412.16175 - Huang et al., 8 Dec 2024) in Section 6 (Conclusions)