Conjecture: Inherent seriality is common in reinforcement learning

Prove or refute the conjecture that inherent seriality is a common phenomenon in reinforcement learning beyond the specific DO1 (depth-of-1) construction, by identifying broad classes of RL environments or objectives that cannot be solved by L-uniform threshold circuits (i.e., are outside TC) and hence require serial computation.

Background

In the appendix, the authors construct DO1 environments tied to a P-complete approximation problem and prove that parallel algorithms cannot, in the worst case, produce approximately optimal value functions or policies, implying inherent seriality in these settings.

They relate this to other P-complete approximation problems and, based on these connections, explicitly conjecture that such seriality extends more broadly across RL.

References

This leads us to conjecture that inherent seriality may be a fairly common phenomenon in RL, not particular to the DO1 setup.

— The Serial Scaling Hypothesis (2507.12549 - Liu et al., 16 Jul 2025) in Appendix, Inherently Serial Problems in RL

Conjecture: Inherent seriality is common in reinforcement learning

Sponsor

Background

References

Related Problems