Extend average-case guarantees to large action spaces

Develop an efficient version of Value-Guided Backtracking (VGB) for large action spaces under only the average-case value-function error assumption—namely, for each h, the expectations E_{y_{1:h}∼π(·|x)}[V̂(x,y_{1:h})/V*(x,y_{1:h})] ≤ 1+V and E_{y_{1:h}∼π(·|x)}[V*(x,y_{1:h})/V̂(x,y_{1:h})] ≤ 1+V—without imposing additional uniform bounds. Design and analyze a transition mechanism (for example, rejection sampling with controlled failure) that yields provable coverage or accuracy guarantees comparable to those established for the small-action regime.

Background

In the paper’s analysis of VGB under average-case value-function error (Assumption 4.2), the theoretical guarantees are provided only for the small-action setting, where enumerating actions is tractable. For large action spaces, the proof in the uniform-error case uses rejection sampling with bounded density ratios, a step that does not straightforwardly carry over under average-case error assumptions.

The authors note that with only average-case bounds, it is unclear how to implement transitions efficiently and suggest that more delicate arguments—such as allowing rejection sampling to occasionally fail—might remove the need for extra assumptions. Formalizing such an approach and proving guarantees analogous to the small-action case remains unresolved.

References

However, with only the average-case bound \cref{assump:avg-mgf}, it is unclear how to make a similar argument work. It may also be possible to avoid extra assumptions via a more delicate average-case argument that allows the rejection sampling procedure to occasionally fail. We leave this question for future work.

— Taming Imperfect Process Verifiers: A Sampling Perspective on Backtracking (2510.03149 - Rohatgi et al., 3 Oct 2025) in Remark: Large-|A| regime with average-case error (Section 4.2)

Extend average-case guarantees to large action spaces

Sponsor

Background

References

Related Problems