Approximability of discounted ℓ-step transition look-ahead planning

Determine whether polynomial-time approximation schemes (PTAS) exist for the discounted ℓ-step transition look-ahead planning problem in finite tabular Markov Decision Processes, or alternatively, establish inapproximability results proving that even constant-factor approximation cannot be achieved for this setting.

Background

The paper studies reinforcement learning with transition look-ahead, where the agent observes the states that would be visited upon playing any sequence of actions up to a fixed horizon ℓ before acting. It establishes a sharp computational boundary: planning with one-step transition look-ahead (ℓ=1) is solvable in polynomial time via a linear programming formulation, while planning with multi-step look-ahead (ℓ≥2) is NP-hard for both discounted and average-reward objectives.

Given NP-hardness for ℓ≥2, exact optimal planning is intractable in general, motivating questions about approximation. The authors explicitly raise the open problem of whether efficient approximation schemes exist for the discounted ℓ-look-ahead planning problem, or whether strong inapproximability barriers (e.g., no constant-factor approximation) hold. Resolving this would clarify the algorithmic landscape beyond exact optimization for transition look-ahead in tabular MDPs.

References

On the approximation side, it remains open whether polynomial-time approximation schemes (PTAS) exist for discounted \ell–look-ahead planning, or conversely, whether even constant-factor approximation is impossible.

— On the hardness of RL with Lookahead (2510.19372 - Pla et al., 22 Oct 2025) in Section 6: Conclusion and future work

Approximability of discounted ℓ-step transition look-ahead planning

Background

References

Related Problems