Optimal horizon dependence for SMC with a perfect process reward model

Determine the optimal asymptotic dependence on the horizon H of the minimum number of particles N required by Sequential Monte Carlo (SMC) to achieve non-trivial sampling accuracy when the process reward model equals the true value function V*, i.e., when \hat V = V*.

Background

The paper shows that even when the process reward model is perfect (\hat V = V*), SMC requires at least \Omega(\sqrt{H}) particles to obtain non-trivial sampling accuracy, despite the existence of a simple one-particle exact sampler in this special case. The authors introduce SMC-RS to address a related pathology but note that the tight dependence on H for SMC itself remains unresolved.

This problem asks for a precise characterization of how the required number of particles for SMC scales with the horizon H in the idealized setting where the process reward model is exact, which would clarify whether the \Omega(\sqrt{H}) lower bound is tight or can be improved.

References

We leave the problem of determining the optimal H-dependence as an open question.

— Reject, Resample, Repeat: Understanding Parallel Reasoning in Language Model Inference (2603.07887 - Golowich et al., 9 Mar 2026) in Section 3.3 (Beyond SMC: Near-perfect PRM)

Optimal horizon dependence for SMC with a perfect process reward model

Background

References

Related Problems