Extend average-case guarantees to large action spaces
Develop an efficient version of Value-Guided Backtracking (VGB) for large action spaces under only the average-case value-function error assumption—namely, for each h, the expectations E_{y_{1:h}∼π(·|x)}[V̂(x,y_{1:h})/V*(x,y_{1:h})] ≤ 1+V and E_{y_{1:h}∼π(·|x)}[V*(x,y_{1:h})/V̂(x,y_{1:h})] ≤ 1+V—without imposing additional uniform bounds. Design and analyze a transition mechanism (for example, rejection sampling with controlled failure) that yields provable coverage or accuracy guarantees comparable to those established for the small-action regime.
References
However, with only the average-case bound \cref{assump:avg-mgf}, it is unclear how to make a similar argument work. It may also be possible to avoid extra assumptions via a more delicate average-case argument that allows the rejection sampling procedure to occasionally fail. We leave this question for future work.