- The paper presents a reversible quantum circuit construction for coherent rollouts in finite-horizon decision problems, addressing branch-dependent action selection.
- It introduces coherent rank-select primitives with sequential and blocked constructions, ensuring optimal gate complexity and ancilla efficiency in Qiskit implementations.
- The work establishes classical lower bounds and demonstrates near-quadratic quantum query speedup via integrating maximum finding with amplitude estimation.
Coherent Rollout Oracles for Finite-Horizon Sequential Decision Problems
The paper addresses the construction of explicit unitary quantum circuits for simulating rollouts in finite-horizon sequential decision problems, particularly where valid actions are branch-dependent and stochastic dynamics require randomness to live in quantum registers. The motivation follows the oracularization barrier described by Dunjko et al., which stipulates that quantum algorithms interacting with classically stochastic environments cannot rely on implicit randomness; instead, randomness must be encoded unitarily to ensure coherent simulation and reversibility. This paper formalizes the requirements for quantum rollouts and develops primitives for branch-dependent action selection via totalized coherent rank-select, enabling application to domains where rollout-based planning is standard.
Coherent Rank-Select: Circuit Complexity and Construction
The central technical advancement is a reversible-circuit complexity analysis of coherent rank-select—the primitive required to decode a random selector into a valid action index when validity is determined by a branch-dependent entangled N-bit mask. For selector width w=⌈log2(N+1)⌉, two constructions are provided:
- Sequential Scan (Bounded-Span Layouts): This scan uses O(Nw) gates and O(w) clean ancillae, maintaining optimality under bounded-span connectivity, as proved by prefix/suffix communication lower bounds. The scan method sequentially updates a prefix counter and conditionally writes into an output register preloaded with a sentinel value, achieving gate-optimality.
- Blocked Construction (Long-Range Gates): When circuit layouts allow long-range gates, a blocked construction achieves O(Nlogw) gates with O(w) ancillae. The N action bits are partitioned into blocks, and only the block containing the indexed valid action is selected. This is shown to respect the Ω(N) unconditional lower bound.
No prior work supplies a reversible circuit for rank-select indexing under branch-dependent validity predicates where the input mask is entangled with the computation. The classical alternatives materialize all valid indices, which is infeasible for quantum simulation due to astronomical ancilla costs.
Oracle Construction: Three-Phase Decomposition
The quantum rollout oracle is decomposed into three reversible phases:
- Coherent Rank-Select Indexing: Decodes the r-th valid action under each branch-dependent mask, as detailed above.
- Reversible Stochastic Transition: Explicit randomness is stored in quantum dice registers, enabling unitary evolution of the configuration. Each cell update is controlled by local neighborhood information and resolved via comparisons against dice registers, ensuring full reversibility.
- Coherent Terminal Evaluation: Evaluates terminal rewards using a reversible predicate, typically writing a binary payoff to a single flag qubit. All intermediate computations are uncomputed post-evaluation.
This three-phase unitary is realized with polynomial size and ancilla-efficient circuits in Qiskit, ensuring unitarity and full reversibility per the requirements of quantum amplitude estimation and maximum finding primitives from Wang et al.
Classical Lower Bound and Bounded-Influence Lifting
The paper establishes a classical lower bound of Ω(k/ε2) rollout queries for w=⌈log2(N+1)⌉0-correct arm selection, where w=⌈log2(N+1)⌉1 is the number of arms and w=⌈log2(N+1)⌉2 the accuracy target. The lower bound is derived via a transportation lemma that specifies the information gain per rollout, matching the standard best-arm identification literature.
To generalize this lower bound beyond contrived instances, a bounded-influence lifting theorem is proved, rooted in stability and modularity conditions. It demonstrates that if local factor modifications only have limited influence, the classical rollout cost persists across an exponential family of configurations in realistic spatial planning domains. This is non-trivial for dynamics where peripheral changes can propagate—handled further by subcritical spatial-decay arguments.
Quantum Upper Bound: Query Complexity Separation
By composing the constructed oracle with quantum maximum finding and amplitude estimation, the quantum query complexity for best-arm selection becomes w=⌈log2(N+1)⌉3 in the coherent oracle access model. The speedup stems from the ability to query actions and stochastic outcomes in superposition, which the constructed oracle enables by design.
Gate-level costs are separated: classical and quantum algorithms both pay w=⌈log2(N+1)⌉4 circuit cost per rollout, but quantum algorithms achieve near-quadratic reductions in the number of rollout queries required.
Instantiation and Empirical Validation
Two domain instantiations are supplied:
- Epidemic Intervention (SIR Model): Action selection corresponds to vaccination placement; disease spread is modeled stochastically on a grid. The full three-phase oracle is realized, and classical lower bounds are lifted using subcritical spatial-decay.
- Sway Placement Game: A two-player stochastic placement game stresses branch-dependent valid-action indexing. Branchwise correctness is verified via comparison to exhaustive classical rollouts.
Resource counts (qubits and gates) before decomposition and error-corrected overheads are provided for both models, demonstrating the feasibility and correctness of the Qiskit implementation against classical rollouts.
Implications and Future Directions
The research constructively addresses the oracularization barrier for quantum rollouts in implicit-state sequential decision problems. Practically, this enables coherent simulation whenever state-dependent validity predicates, reversible transition dynamics, and polynomial-size terminal evaluation can be realized. The theoretical implication is a near-quadratic separation in rollout query complexity between classical and quantum planning, extended to exponential configuration families by bounded-influence lifting.
Extensions to long-range dynamics failing the spatial-decay condition require alternative coupling arguments. Closed-loop policies with mid-circuit recomputation present challenges for reversibility and composition, potentially necessitating new quantum design principles.
Conclusion
The paper demonstrates that coherent rollouts for sequential decision problems with branch-dependent valid actions are achievable with explicit polynomial-size quantum circuits. The reversible rank-select primitive is analyzed and realized, forming the foundation for coherent rollout oracles in implicit-state settings. The classical lower bound is generalized across exponential configuration families, and the oracle achieves w=⌈log2(N+1)⌉5 quantum query complexity. The approach is validated on epidemic and placement-game domains, highlighting the practical and theoretical relevance for quantum planning algorithms (2604.25962).