• Researchers present a comprehensive empirical study of pre-trained visual representations (PVRs) for Embodied AI.
  • They find that scaling dataset size and diversity does not universally improve performance, but their largest model, VC-1, outperforms all prior PVRs on average.

Key terms:

  • Artificial Visual Cortex: A model or system that replicates the human brain's visual cortex for processing visual information.
  • Embodied AI: Artificial intelligence that interacts with the environment through a physical body or robotic system.
  • Pre-trained Visual Representations (PVRs): Visual foundation models used as a starting point for further training in specific tasks.
  • CortexBench: A curated benchmark suite consisting of 17 different tasks spanning locomotion, navigation, dexterous, and mobile manipulation.
  • VC-1: A large-scale model developed in the study that outperforms all prior PVRs on average and shows potential for task-specific adaptation.


Research Artificial Intelligence AI research ImageNet embodied AI CortexBench Visual Transformers Masked Auto-Encoding Egocentric Videos Pre-trained Visual Representations