Towards Physically Interpretable World Models: Meaningful Weakly Supervised Representations for Visual Trajectory Prediction (2412.12870v3)
Abstract: Deep learning models are increasingly employed for perception, prediction, and control in robotic systems. For for achieving realistic and consistent outputs, it is crucial to embed physical knowledge into their learned representations. However, doing so is difficult due to high-dimensional observation data, such as images, particularly under conditions of incomplete system knowledge and imprecise state sensing. To address this, we propose Physically Interpretable World Models, a novel architecture that aligns learned latent representations with real-world physical quantities. To this end, our architecture combines three key elements: (1) a vector-quantized image autoencoder, (2) a transformer-based physically interpretable autoencoder, and (3) a partially known dynamical model. The training incorporates weak interval-based supervision to eliminate the impractical reliance on ground-truth physical knowledge. Three case studies demonstrate that our approach achieves physical interpretability and accurate state predictions, thus advancing representation learning for robotics.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.