Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 62 tok/s

Gemini 2.5 Pro 51 tok/s Pro

GPT-5 Medium 20 tok/s Pro

GPT-5 High 24 tok/s Pro

GPT-4o 75 tok/s Pro

Kimi K2 206 tok/s Pro

GPT OSS 120B 457 tok/s Pro

Claude Sonnet 4.5 35 tok/s Pro

2000 character limit reached

Object-Centric Latent Action Learning (2502.09680v2)

Published 13 Feb 2025 in cs.CV and cs.AI

Abstract: Leveraging vast amounts of unlabeled internet video data for embodied AI is currently bottlenecked by the lack of action labels and the presence of action-correlated visual distractors. Although recent latent action policy optimization (LAPO) has shown promise in inferring proxy-action labels from visual observations, its performance degrades significantly when distractors are present. To address this limitation, we propose a novel object-centric latent action learning framework that centers on objects rather than pixels. We leverage self-supervised object-centric pretraining to disentangle action-related and distracting dynamics. This allows LAPO to focus on task-relevant interactions, resulting in more robust proxy-action labels, enabling better imitation learning and efficient adaptation of the agent with just a few action-labeled trajectories. We evaluated our method in eight visually complex tasks across the Distracting Control Suite (DCS) and Distracting MetaWorld (DMW). Our results show that object-centric pretraining mitigates the negative effects of distractors by 50%, as measured by downstream task performance: average return (DCS) and success rate (DMW).