Scaling behavior of SOP to significantly larger robot fleets

Determine whether the near-linear scaling in wall-clock training efficiency observed under the Scalable Online Post-training (SOP) framework when increasing the number of robot actors continues to hold for significantly larger robot fleets during online, distributed, multi-task post-training of generalist Vision-Language-Action policies in the physical world.

Background

The paper introduces SOP, a closed-loop actor–learner system that couples distributed real-world data collection with centralized online learning for generalist Vision-Language-Action (VLA) models. Empirically, SOP shows near-linear speedups in time-to-target success rate when scaling the number of robot actors from 1 to 4 in grocery restocking tasks.

The authors note that while these results demonstrate favorable scaling within the tested regime, it remains unknown whether similar near-linear scaling persists for significantly larger fleets, where system bottlenecks (e.g., communication, learner throughput, synchronization latency) might emerge. This uncertainty motivates a formal investigation of scaling limits under real-world deployment conditions.

References

Whether near-linear scaling extends to significantly larger fleets, and how to support continual acquisition of new skills without catastrophic forgetting, are open questions.

SOP: A Scalable Online Post-Training System for Vision-Language-Action Models (2601.03044 - Pan et al., 6 Jan 2026) in Discussion and Future Work