Optimal horizon dependence for SMC with a perfect process reward model
Determine the optimal asymptotic dependence on the horizon H of the minimum number of particles N required by Sequential Monte Carlo (SMC) to achieve non-trivial sampling accuracy when the process reward model equals the true value function V*, i.e., when \hat V = V*.
References
We leave the problem of determining the optimal H-dependence as an open question.
— Reject, Resample, Repeat: Understanding Parallel Reasoning in Language Model Inference
(2603.07887 - Golowich et al., 9 Mar 2026) in Section 3.3 (Beyond SMC: Near-perfect PRM)