Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PlaTe: Visually-Grounded Planning with Transformers in Procedural Tasks (2109.04869v2)

Published 10 Sep 2021 in cs.RO

Abstract: In this work, we study the problem of how to leverage instructional videos to facilitate the understanding of human decision-making processes, focusing on training a model with the ability to plan a goal-directed procedure from real-world videos. Learning structured and plannable state and action spaces directly from unstructured videos is the key technical challenge of our task. There are two problems: first, the appearance gap between the training and validation datasets could be large for unstructured videos; second, these gaps lead to decision errors that compound over the steps. We address these limitations with Planning Transformer (PlaTe), which has the advantage of circumventing the compounding prediction errors that occur with single-step models during long model-based rollouts. Our method simultaneously learns the latent state and action information of assigned tasks and the representations of the decision-making process from human demonstrations. Experiments conducted on real-world instructional videos and an interactive environment show that our method can achieve a better performance in reaching the indicated goal than previous algorithms. We also validated the possibility of applying procedural tasks on a UR-5 platform. We make our code publicly available and support academic research purposes.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Jiankai Sun (53 papers)
  2. De-An Huang (45 papers)
  3. Bo Lu (79 papers)
  4. Yun-Hui Liu (61 papers)
  5. Bolei Zhou (134 papers)
  6. Animesh Garg (129 papers)
Citations (47)