Continual skill acquisition without catastrophic forgetting in SOP

Develop methods to support continual acquisition of new robotic manipulation skills within the Scalable Online Post-training (SOP) framework while preventing catastrophic forgetting in a single shared generalist Vision-Language-Action policy trained via online, distributed, multi-task interaction.

Background

SOP trains a single generalist VLA policy across multiple tasks using on-policy experience streamed from a robot fleet, preserving generality while improving task proficiency. The system employs task-balanced adaptive sampling to mix online and offline buffers during updates.

As deployments grow in scope and duration, the policy must continually acquire new skills and adapt to evolving environments. The authors explicitly note that how to enable such continual learning without catastrophic forgetting of previously acquired capabilities remains an open question in this framework.

References

Whether near-linear scaling extends to significantly larger fleets, and how to support continual acquisition of new skills without catastrophic forgetting, are open questions.

SOP: A Scalable Online Post-Training System for Vision-Language-Action Models (2601.03044 - Pan et al., 6 Jan 2026) in Discussion and Future Work