Large‑investor mean–variance RL with price impact and unobservable counterfactuals

Develop continuous‑time reinforcement learning frameworks and theoretical guarantees for mean–variance portfolio selection by large investors whose trading actions impact asset prices and factors, addressing the challenge that counterfactual wealth trajectories under alternative portfolios cannot be inferred from observed price paths due to endogenous price impact.

Background

The paper’s small‑investor assumption enables counterfactual inference of wealth under alternative portfolios from observed price paths. The authors explicitly note that extending the framework to large investors with price impact is an open question because counterfactuals become unobservable, necessitating new RL formulations and theory for policy evaluation and improvement in endogenous environments.

References

In the MV setting, important open questions include performance guarantees of modified online algorithms, improvement of regret bound, off-policy learning, and large investors whose actions impact the asset prices (so counterfactuals become unobservable by mere “paper portfolios”).

— Mean--Variance Portfolio Selection by Continuous-Time Reinforcement Learning: Algorithms, Regret Analysis, and Empirical Study (2412.16175 - Huang et al., 8 Dec 2024) in Section 6 (Conclusions)

Large‑investor mean–variance RL with price impact and unobservable counterfactuals

Sponsor

Background

References

Related Problems