Role of BCFM regularization as an anchor for learning return distributions

Establish whether the bootstrapped conditional flow matching (BCFM) regularization term functions as an efficient anchor for learning the full return distribution in Value Flows, and characterize the conditions under which this anchoring effect holds.

Background

The practical training objective augments the distributional conditional flow matching loss with a bootstrapped CFM regularization (BCFM) due to instability observed when optimizing the DCFM loss alone.

Empirically, adding BCFM substantially improves performance and sample efficiency in ablations, motivating a conjecture about its anchoring role in learning the return distribution.

References

We conjecture that the BCFM regularization serves as an efficient anchor for learning the full return distribution.

— Value Flows (2510.07650 - Dong et al., 9 Oct 2025) in Section 5: The key components of Value Flows (Ablation)

Role of BCFM regularization as an anchor for learning return distributions

Sponsor

Background

References

Related Problems