Visual Physics: Discovering Physical Laws from Videos (1911.11893v1)

Published 27 Nov 2019 in cs.CV

Abstract: In this paper, we teach a machine to discover the laws of physics from video streams. We assume no prior knowledge of physics, beyond a temporal stream of bounding boxes. The problem is very difficult because a machine must learn not only a governing equation (e.g. projectile motion) but also the existence of governing parameters (e.g. velocities). We evaluate our ability to discover physical laws on videos of elementary physical phenomena, such as projectile motion or circular motion. These elementary tasks have textbook governing equations and enable ground truth verification of our approach.

Citations (8)

View on Semantic Scholar

Summary

The paper introduces a novel framework that autonomously discovers symbolic equations from visual data.
It employs a three-part methodology: object detection with Mask R-CNN, latent physics via a modified β-VAE, and equation discovery through genetic programming.
Results on synthetic and real datasets validate its robustness in recovering physics laws, even under significant noise.

Visual Physics: Discovering Physical Laws from Videos

The paper "Visual Physics: Discovering Physical Laws from Videos" (1911.11893) proposes a novel framework that aims to discover governing physical laws directly from video data. The method is built upon a combination of representation learning and genetic programming to autonomously identify both the symbolic form of equations and the contextual governing parameters from visual data inputs.

Introduction and Methodology

The primary goal of this work is to endow machines with the capability to infer physical laws as humans ostensibly do — through observation. This process involves two major tasks: identifying the mathematical form of physical laws and uncovering the related parameters, such as velocities, that are unknown at the outset. The framework is demonstrated on elementary physics tasks like projectile and circular motion, which are well understood and serve as a basis for evaluating such discovery.

The proposed Visual Physics framework is comprised of three main components:

Position Detection Module: Utilizes a pretrained Mask R-CNN to extract object positions from video frames. The precision of object localization is crucial as it directly affects subsequent parameter discovery.
Latent Physics Module: Implements a modified $\beta$ -VAE to uncover latent representations that correspond to physical parameters. This step is distinct as it requires no prior knowledge of the parameters such as velocity, interpreting them from the input data via a learned representation.
Equation Discovery Module: Employs genetic programming to derive closed-form symbolic expressions. The genetic approach reconciles the learned latent parameters with the observed positional data to discover equations that describe the physical phenomenon.
Figure 1: An overview of the proposed Visual Physics framework.

Synthetic and Real Data Evaluation

Synthetic Data

The framework was rigorously tested on synthetic datasets simulating various physics tasks.

Free Fall: Demonstrates the capability to distinguish between horizontal and vertical components of velocity. The discovered governing equation closely aligns with textbook kinematic equations.
Constant Acceleration Motion: Proves that even when the governing parameter is non-linear, the framework correctly identifies acceleration as the singular influencing factor.
Uniform Circular Motion: Successfully discovers the angular frequency, $\omega$ . The results affirm symbolic concurrence with sinusoidal motion laws.
Figure 2: Discovered physical equations from Visual Physics framework, on simulated videos.

Real Data

Performance tests on real-world data, such as basketball tosses, showed that the framework can extrapolate findings from synthetic training data to real unseen scenarios. The pipeline's robustness ensures that symbolic forms of equations and parameter mappings retain high fidelity.

Figure 3: Evaluating performance on real data with both real and synthetic training sets.

Robustness and Performance Analysis

The approach is robust to substantial noise within input data, maintaining interpretability and accuracy in discovered parameters as shown through various tests with different noise levels. However, extremely high noise levels eventually degrade performance, as would be expected with inputs that deviate far from signal properties.

Figure 4: Robustness to noise tested on synthetic trajectory data.

A trade-off between equation complexity and performance accuracy is addressed using Pareto-optimal selection, ensuring an optimal balance is achieved to prevent overfitting while maintaining interpretability.

Figure 5: Trade-off between equation complexity and accuracy.

Implications and Future Directions

This research underlines the potential for AI to autonomously delineate fundamental physics from observable phenomena, without predefined models. Practically, this method could extend to domains involving partial or unknown physical laws, such as high-energy astrophysics or complex biological systems.

Finally, open challenges include generalizing this discovery framework to multi-object dynamics and incorporating learned equations into additional computational tasks, such as enhanced predictive modelling.

Conclusion

The Visual Physics framework exemplifies a significant step toward automating the discovery of natural laws from raw observational data. Through methodical evaluation across synthetic and real datasets, it confirms the potential for machines to independently interpret and express the foundational equations governing physical phenomena from visual inputs.