Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning to Act from Actionless Videos through Dense Correspondences (2310.08576v1)

Published 12 Oct 2023 in cs.RO, cs.CV, cs.LG, and stat.ML

Abstract: In this work, we present an approach to construct a video-based robot policy capable of reliably executing diverse tasks across different robots and environments from few video demonstrations without using any action annotations. Our method leverages images as a task-agnostic representation, encoding both the state and action information, and text as a general representation for specifying robot goals. By synthesizing videos that ``hallucinate'' robot executing actions and in combination with dense correspondences between frames, our approach can infer the closed-formed action to execute to an environment without the need of any explicit action labels. This unique capability allows us to train the policy solely based on RGB videos and deploy learned policies to various robotic tasks. We demonstrate the efficacy of our approach in learning policies on table-top manipulation and navigation tasks. Additionally, we contribute an open-source framework for efficient video modeling, enabling the training of high-fidelity policy models with four GPUs within a single day.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Po-Chen Ko (2 papers)
  2. Jiayuan Mao (55 papers)
  3. Yilun Du (113 papers)
  4. Shao-Hua Sun (22 papers)
  5. Joshua B. Tenenbaum (257 papers)
Citations (46)