Performance guarantees for modified online continuous‑time RL algorithms
Establish theoretical performance guarantees, including convergence rates and regret bounds, for the modified online variants of the continuous‑time reinforcement learning algorithms for mean–variance portfolio selection—specifically the vCTRL, pCTRL, c‑mCTRL, and c‑dCTRL strategies—beyond the baseline algorithm analyzed for the multi‑stock Black–Scholes market.
References
In the MV setting, important open questions include performance guarantees of modified online algorithms, improvement of regret bound, off-policy learning, and large investors whose actions impact the asset prices (so counterfactuals become unobservable by mere “paper portfolios”).
                — Mean--Variance Portfolio Selection by Continuous-Time Reinforcement Learning: Algorithms, Regret Analysis, and Empirical Study
                
                (2412.16175 - Huang et al., 8 Dec 2024) in Section 6 (Conclusions)