Practical seriality of sequential decision problems in RL

Ascertain whether computing optimal policies in practical sequential decision problems, modeled as Markov Decision Processes, is inherently serial in the sense of lying outside the threshold-circuit class TC and therefore requiring serial computation that cannot be efficiently parallelized.

Background

The authors prove there exist Markov Decision Processes for which computing the optimal policy is inherently serial by reduction from problems that resist parallelization, connecting to P-completeness.

However, they note uncertainty about whether this inherent seriality holds broadly in practice for real-world RL tasks, and proceed to provide heuristic arguments (e.g., about return estimation) suggesting serial bottlenecks.

References

However, in practice, it remains unclear whether this problem is likely serial.

— The Serial Scaling Hypothesis (2507.12549 - Liu et al., 16 Jul 2025) in Section 4.4, Sequential Decision Problems

Practical seriality of sequential decision problems in RL

Background

References

Related Problems