Performance guarantees for modified online continuous‑time RL algorithms

Establish theoretical performance guarantees, including convergence rates and regret bounds, for the modified online variants of the continuous‑time reinforcement learning algorithms for mean–variance portfolio selection—specifically the vCTRL, pCTRL, c‑mCTRL, and c‑dCTRL strategies—beyond the baseline algorithm analyzed for the multi‑stock Black–Scholes market.

Background

The paper provides convergence and sublinear Sharpe‑ratio regret guarantees for a baseline stochastic approximation algorithm in a Black–Scholes market, and then proposes several practical online variants (vCTRL, pCTRL, c‑mCTRL, c‑dCTRL) for real‑time use and constraints. The authors note that extending rigorous guarantees to these modified online algorithms remains an explicit open question.

References

In the MV setting, important open questions include performance guarantees of modified online algorithms, improvement of regret bound, off-policy learning, and large investors whose actions impact the asset prices (so counterfactuals become unobservable by mere “paper portfolios”).

— Mean--Variance Portfolio Selection by Continuous-Time Reinforcement Learning: Algorithms, Regret Analysis, and Empirical Study (2412.16175 - Huang et al., 8 Dec 2024) in Section 6 (Conclusions)

Performance guarantees for modified online continuous‑time RL algorithms

Sponsor

Background

References

Related Problems