Papers
Topics
Authors
Recent
Search
2000 character limit reached

EgoAgent: A Joint Predictive Agent Model in Egocentric Worlds

Published 9 Feb 2025 in cs.CV, cs.AI, and cs.LG | (2502.05857v2)

Abstract: This paper addresses the task of learning an agent model behaving like humans, which can jointly perceive, predict, and act in egocentric worlds. Previous methods usually train separate models for these three abilities, which prevents them from learning from each other. In this paper, we propose a joint predictive agent model, named EgoAgent, that simultaneously learns to represent the world, predict future states, and take reasonable actions within a single transformer. EgoAgent introduces two innovations to learn from the causal and temporally intertwined nature of these abilities: (1) Interleaved sequential modeling of states and actions with the causal attention mechanism, and (2) A joint embedding-action-prediction architecture featuring temporal asymmetric predictor-observer branches. Integrating these designs based on JEPA, EgoAgent unifies these capabilities in a cohesive learning framework. Comprehensive evaluations of EgoAgent on representative tasks such as image classification, egocentric future state prediction, and 3D human motion prediction tasks demonstrate the superiority of our method. The code and trained model will be released for reproducibility.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.