Generalization of SPEAR-1 at larger task and environment scales

Ascertain how well SPEAR-1 generalizes when evaluated on orders of magnitude more tasks and environments, particularly in comparison to the π_0.5 vision-language-action model that is trained on significantly more diverse robot data.

Background

The paper demonstrates that SPEAR-1 matches or exceeds competitive baselines, including π_0.5, on a set of real-world tasks while using substantially less robot data.

However, the authors explicitly acknowledge uncertainty about SPEAR-1’s performance when scaled to many more tasks and environments, especially relative to π_0.5 which benefits from far more diverse training data.

References

It also remains to be seen how well SPEAR-1 generalizes to orders of magnitude more tasks and environments against models such as $\pi_{0.5}$ trained on significantly more diverse robot data.

SPEAR-1: Scaling Beyond Robot Demonstrations via 3D Understanding (2511.17411 - Nikolov et al., 21 Nov 2025) in Section: Discussion and Limitations