Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Structured Object-Aware Physics Prediction for Video Modeling and Planning (1910.02425v2)

Published 6 Oct 2019 in cs.LG, cs.CV, and stat.ML

Abstract: When humans observe a physical system, they can easily locate objects, understand their interactions, and anticipate future behavior, even in settings with complicated and previously unseen interactions. For computers, however, learning such models from videos in an unsupervised fashion is an unsolved research problem. In this paper, we present STOVE, a novel state-space model for videos, which explicitly reasons about objects and their positions, velocities, and interactions. It is constructed by combining an image model and a dynamics model in compositional manner and improves on previous work by reusing the dynamics model for inference, accelerating and regularizing training. STOVE predicts videos with convincing physical behavior over hundreds of timesteps, outperforms previous unsupervised models, and even approaches the performance of supervised baselines. We further demonstrate the strength of our model as a simulator for sample efficient model-based control in a task with heavily interacting objects.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Jannik Kossen (14 papers)
  2. Karl Stelzner (8 papers)
  3. Marcel Hussing (12 papers)
  4. Claas Voelcker (8 papers)
  5. Kristian Kersting (205 papers)
Citations (68)

Summary

We haven't generated a summary for this paper yet.