Papers
Topics
Authors
Recent
Search
2000 character limit reached

Coherent Rollout Oracles for Finite-Horizon Sequential Decision Problems

Published 28 Apr 2026 in quant-ph and cs.DS | (2604.25962v1)

Abstract: Coherent quantum rollout for sequential decision problems requires a unitary simulator: randomness must live in explicit quantum registers, and basis-state selectors must be mapped to actions reversibly. With branch-dependent valid actions, this mapping is totalized coherent rank-select over an entangled $N$-bit validity mask: return the position of the $r$-th valid bit, or a sentinel if $r$ is out of range. We give the first reversible-circuit complexity analysis of this primitive. For selector width $w = \lceil \log_2(N+1) \rceil$, rank-select admits an $O(Nw)$-gate low-ancilla bounded-span scan, proved gate-optimal in its model, and an $O(N\log w)$-gate low-ancilla blocked construction when long-range gates are available; across all bounded-fan-in layouts, the unconditional gate lower bound is $Ω(N)$. Composing rank-select with reversible transition and predicate-evaluation circuits gives an explicit polynomial-size coherent rollout oracle for finite-horizon planning problems satisfying these primitive assumptions. The resulting oracle satisfies the access model of the best-arm pipeline of Wang et al., yielding $\widetilde{O}(\sqrt{k}/\varepsilon)$ coherent oracle calls against the standard classical $Ω(k/\varepsilon2)$ arm-pull lower bound. We give a bounded-influence lifting theorem that extends this lower-bound construction from a base configuration to an exponential family of configurations. We instantiate the construction on SIR epidemic intervention, with a stochastic placement-game sanity check, and machine-check the main results in Lean 4. Code and proofs: https://github.com/BinRoot/b01t/tree/main/demos/rollout.

Authors (1)

Summary

  • The paper presents a reversible quantum circuit construction for coherent rollouts in finite-horizon decision problems, addressing branch-dependent action selection.
  • It introduces coherent rank-select primitives with sequential and blocked constructions, ensuring optimal gate complexity and ancilla efficiency in Qiskit implementations.
  • The work establishes classical lower bounds and demonstrates near-quadratic quantum query speedup via integrating maximum finding with amplitude estimation.

Coherent Rollout Oracles for Finite-Horizon Sequential Decision Problems

Problem Formulation and Motivation

The paper addresses the construction of explicit unitary quantum circuits for simulating rollouts in finite-horizon sequential decision problems, particularly where valid actions are branch-dependent and stochastic dynamics require randomness to live in quantum registers. The motivation follows the oracularization barrier described by Dunjko et al., which stipulates that quantum algorithms interacting with classically stochastic environments cannot rely on implicit randomness; instead, randomness must be encoded unitarily to ensure coherent simulation and reversibility. This paper formalizes the requirements for quantum rollouts and develops primitives for branch-dependent action selection via totalized coherent rank-select, enabling application to domains where rollout-based planning is standard.

Coherent Rank-Select: Circuit Complexity and Construction

The central technical advancement is a reversible-circuit complexity analysis of coherent rank-select—the primitive required to decode a random selector into a valid action index when validity is determined by a branch-dependent entangled NN-bit mask. For selector width w=log2(N+1)w = \lceil\log_2(N + 1)\rceil, two constructions are provided:

  • Sequential Scan (Bounded-Span Layouts): This scan uses O(Nw)O(Nw) gates and O(w)O(w) clean ancillae, maintaining optimality under bounded-span connectivity, as proved by prefix/suffix communication lower bounds. The scan method sequentially updates a prefix counter and conditionally writes into an output register preloaded with a sentinel value, achieving gate-optimality.
  • Blocked Construction (Long-Range Gates): When circuit layouts allow long-range gates, a blocked construction achieves O(Nlogw)O(N\log w) gates with O(w)O(w) ancillae. The NN action bits are partitioned into blocks, and only the block containing the indexed valid action is selected. This is shown to respect the Ω(N)\Omega(N) unconditional lower bound.

No prior work supplies a reversible circuit for rank-select indexing under branch-dependent validity predicates where the input mask is entangled with the computation. The classical alternatives materialize all valid indices, which is infeasible for quantum simulation due to astronomical ancilla costs.

Oracle Construction: Three-Phase Decomposition

The quantum rollout oracle is decomposed into three reversible phases:

  1. Coherent Rank-Select Indexing: Decodes the rr-th valid action under each branch-dependent mask, as detailed above.
  2. Reversible Stochastic Transition: Explicit randomness is stored in quantum dice registers, enabling unitary evolution of the configuration. Each cell update is controlled by local neighborhood information and resolved via comparisons against dice registers, ensuring full reversibility.
  3. Coherent Terminal Evaluation: Evaluates terminal rewards using a reversible predicate, typically writing a binary payoff to a single flag qubit. All intermediate computations are uncomputed post-evaluation.

This three-phase unitary is realized with polynomial size and ancilla-efficient circuits in Qiskit, ensuring unitarity and full reversibility per the requirements of quantum amplitude estimation and maximum finding primitives from Wang et al.

Classical Lower Bound and Bounded-Influence Lifting

The paper establishes a classical lower bound of Ω(k/ε2)\Omega(k/\varepsilon^2) rollout queries for w=log2(N+1)w = \lceil\log_2(N + 1)\rceil0-correct arm selection, where w=log2(N+1)w = \lceil\log_2(N + 1)\rceil1 is the number of arms and w=log2(N+1)w = \lceil\log_2(N + 1)\rceil2 the accuracy target. The lower bound is derived via a transportation lemma that specifies the information gain per rollout, matching the standard best-arm identification literature.

To generalize this lower bound beyond contrived instances, a bounded-influence lifting theorem is proved, rooted in stability and modularity conditions. It demonstrates that if local factor modifications only have limited influence, the classical rollout cost persists across an exponential family of configurations in realistic spatial planning domains. This is non-trivial for dynamics where peripheral changes can propagate—handled further by subcritical spatial-decay arguments.

Quantum Upper Bound: Query Complexity Separation

By composing the constructed oracle with quantum maximum finding and amplitude estimation, the quantum query complexity for best-arm selection becomes w=log2(N+1)w = \lceil\log_2(N + 1)\rceil3 in the coherent oracle access model. The speedup stems from the ability to query actions and stochastic outcomes in superposition, which the constructed oracle enables by design.

Gate-level costs are separated: classical and quantum algorithms both pay w=log2(N+1)w = \lceil\log_2(N + 1)\rceil4 circuit cost per rollout, but quantum algorithms achieve near-quadratic reductions in the number of rollout queries required.

Instantiation and Empirical Validation

Two domain instantiations are supplied:

  • Epidemic Intervention (SIR Model): Action selection corresponds to vaccination placement; disease spread is modeled stochastically on a grid. The full three-phase oracle is realized, and classical lower bounds are lifted using subcritical spatial-decay.
  • Sway Placement Game: A two-player stochastic placement game stresses branch-dependent valid-action indexing. Branchwise correctness is verified via comparison to exhaustive classical rollouts.

Resource counts (qubits and gates) before decomposition and error-corrected overheads are provided for both models, demonstrating the feasibility and correctness of the Qiskit implementation against classical rollouts.

Implications and Future Directions

The research constructively addresses the oracularization barrier for quantum rollouts in implicit-state sequential decision problems. Practically, this enables coherent simulation whenever state-dependent validity predicates, reversible transition dynamics, and polynomial-size terminal evaluation can be realized. The theoretical implication is a near-quadratic separation in rollout query complexity between classical and quantum planning, extended to exponential configuration families by bounded-influence lifting.

Extensions to long-range dynamics failing the spatial-decay condition require alternative coupling arguments. Closed-loop policies with mid-circuit recomputation present challenges for reversibility and composition, potentially necessitating new quantum design principles.

Conclusion

The paper demonstrates that coherent rollouts for sequential decision problems with branch-dependent valid actions are achievable with explicit polynomial-size quantum circuits. The reversible rank-select primitive is analyzed and realized, forming the foundation for coherent rollout oracles in implicit-state settings. The classical lower bound is generalized across exponential configuration families, and the oracle achieves w=log2(N+1)w = \lceil\log_2(N + 1)\rceil5 quantum query complexity. The approach is validated on epidemic and placement-game domains, highlighting the practical and theoretical relevance for quantum planning algorithms (2604.25962).

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 4 likes about this paper.