Role of BCFM regularization as an anchor for learning return distributions
Establish whether the bootstrapped conditional flow matching (BCFM) regularization term functions as an efficient anchor for learning the full return distribution in Value Flows, and characterize the conditions under which this anchoring effect holds.
References
We conjecture that the BCFM regularization serves as an efficient anchor for learning the full return distribution.
— Value Flows
(2510.07650 - Dong et al., 9 Oct 2025) in Section 5: The key components of Value Flows (Ablation)