Regret and stability guarantees for average‑reward reinforcement learning under nonstationarity
Derive regret or stability bounds for average‑reward reinforcement learning applied to portfolio control with Lipschitz rewards under nonstationary environments that satisfy mixing conditions, in order to provide principled guidance on sample complexity and robustness of the RL layer used in the RL‑BHRP framework.
Sponsor
References
While we establish the feasibility of the two-level weight construction and state a policy-gradient identity under standard regularity, several theoretical questions remain open. For the RL layer, generalization guarantees under nonstationarity are not available; deriving regret or stability bounds for average-reward RL with Lipschitz rewards and mixing conditions would provide principled guidance on sample complexity and robustness.