Dice Question Streamline Icon: https://streamlinehq.com

Persistence of stochastic iterative computation benefits beyond behavior cloning

Investigate whether the performance benefits of combining stochasticity injection and supervised iterative computation persist outside behavior cloning, specifically in reinforcement learning fine-tuning, large-scale pretraining, and long-horizon planning settings.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper systematically analyzes why flow-based generative control policies outperform regression policies in behavior cloning and identifies the key mechanism as the combination of stochasticity injection and supervised iterative computation. A minimal iterative policy (MIP) is introduced to validate this mechanism, showing competitive performance with flow models across diverse BC tasks.

While these findings are established within behavior cloning, the authors explicitly raise uncertainty about whether the same advantages carry over to other important decision-making regimes, including RL-based fine-tuning, large-scale multi-task pretraining, and long-horizon planning.

References

Finally, our analysis focuses on behavior cloning. It remains an open question whether the benefits of the \componentref{comp:stoch}+\componentref{comp:sic} paradigm persist in other settings, such as RL-finetuning, large-scale pretraining, or long-horizon planning.

Much Ado About Noising: Dispelling the Myths of Generative Robotic Control (2512.01809 - Pan et al., 1 Dec 2025) in Discussion