Dice Question Streamline Icon: https://streamlinehq.com

Improving regret bounds for continuous‑time mean–variance RL

Improve the cumulative Sharpe‑ratio regret bound for the baseline stochastic approximation algorithm for continuous‑time mean–variance portfolio selection in a multi‑stock Black–Scholes market, tightening the current sublinear O(√(N (log N)^p log log N)) rate toward sharper or optimal bounds.

Information Square Streamline Icon: https://streamlinehq.com

Background

The baseline algorithm achieves a sublinear cumulative regret (in Sharpe ratio) of order √(N (log N)p log log N), ensuring near‑optimal performance asymptotically. The authors explicitly identify the improvement of this regret bound as an open question, aiming for tighter rates and potentially optimal constants.

References

In the MV setting, important open questions include performance guarantees of modified online algorithms, improvement of regret bound, off-policy learning, and large investors whose actions impact the asset prices (so counterfactuals become unobservable by mere “paper portfolios”).