Develop robust multimodal fusion for generalist robotic manipulation policies
Develop principled multimodal fusion techniques that reliably integrate visual, proprioceptive, and linguistic inputs to improve performance and generalization of generalist robotic manipulation policies.
References
Despite progress in training generalist policies, challenges such as catastrophic forgetting, data heterogeneity, scarcity of high-quality data, multimodal fusion, handling dexterity, and maintaining real-time inference speed remain open research problems.
                — A Careful Examination of Large Behavior Models for Multitask Dexterous Manipulation
                
                (2507.05331 - Team et al., 7 Jul 2025) in Section 2.1, Related WorkâRobot Learning at Scale