Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 43 tok/s

Gemini 2.5 Pro 49 tok/s Pro

GPT-5 Medium 18 tok/s Pro

GPT-5 High 16 tok/s Pro

GPT-4o 95 tok/s Pro

Kimi K2 198 tok/s Pro

GPT OSS 120B 464 tok/s Pro

Claude Sonnet 4 37 tok/s Pro

2000 character limit reached

Towards Physically Interpretable World Models: Meaningful Weakly Supervised Representations for Visual Trajectory Prediction (2412.12870v3)

Published 17 Dec 2024 in cs.LG

Abstract: Deep learning models are increasingly employed for perception, prediction, and control in robotic systems. For for achieving realistic and consistent outputs, it is crucial to embed physical knowledge into their learned representations. However, doing so is difficult due to high-dimensional observation data, such as images, particularly under conditions of incomplete system knowledge and imprecise state sensing. To address this, we propose Physically Interpretable World Models, a novel architecture that aligns learned latent representations with real-world physical quantities. To this end, our architecture combines three key elements: (1) a vector-quantized image autoencoder, (2) a transformer-based physically interpretable autoencoder, and (3) a partially known dynamical model. The training incorporates weak interval-based supervision to eliminate the impractical reliance on ground-truth physical knowledge. Three case studies demonstrate that our approach achieves physical interpretability and accurate state predictions, thus advancing representation learning for robotics.