Dice Question Streamline Icon: https://streamlinehq.com

Large‑investor mean–variance RL with price impact and unobservable counterfactuals

Develop continuous‑time reinforcement learning frameworks and theoretical guarantees for mean–variance portfolio selection by large investors whose trading actions impact asset prices and factors, addressing the challenge that counterfactual wealth trajectories under alternative portfolios cannot be inferred from observed price paths due to endogenous price impact.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper’s small‑investor assumption enables counterfactual inference of wealth under alternative portfolios from observed price paths. The authors explicitly note that extending the framework to large investors with price impact is an open question because counterfactuals become unobservable, necessitating new RL formulations and theory for policy evaluation and improvement in endogenous environments.

References

In the MV setting, important open questions include performance guarantees of modified online algorithms, improvement of regret bound, off-policy learning, and large investors whose actions impact the asset prices (so counterfactuals become unobservable by mere “paper portfolios”).