Develop an O(1/n) inverse CDF estimator integrable into the ACVaR algorithm

Develop an estimation scheme for the inverse cumulative distribution function of the stationary reward distribution evaluated at c that achieves convergence rate O(1/n), and integrate this estimator concurrently into the two-time-scale stochastic approximation algorithm after the warm start, in a manner compatible with the analytical framework based on Borkar–Juneja–Kherani (2004) and Kontoyiannis–Meyn (2003).

Background

The current implementation estimates F{-1}(c) via Gaussian kernel density estimation as a preprocessing step, then treats it as fixed during the main stochastic approximation iterations. A faster, concurrent estimator with O(1/n) decay would enable updating the threshold while running the algorithm.

The authors explicitly seek an estimator that can be integrated into the two-time-scale scheme without breaking the applicability of the large-deviations conditioning results they rely on, thus enabling a unified analysis.

References

Some technical issues that remain are as follows.

  1. Is there an estimation scheme for estimating inverse CDF evaluated at $c$ with $\mathcal{O}\left(\frac{1}{n}\right)$ decay so that the estimation can be made a concurrent part of the algorithm after the warm start, in a manner that allows us to leverage the results of , to include it in the overall analysis?
An Asymptotic CVaR Measure of Risk for Markov Chains (2405.13513 - Patel et al., 22 May 2024) in Section 5 (Conclusion), Item 4