Learning Physically Consistent Geometry at Scale

Establish learning-based techniques for estimating physically consistent 3D geometry at scale, ensuring reliable depth and camera pose predictions across large, complex scenes and long trajectories.

Background

The paper contrasts classical geometric pipelines (e.g., SfM/SLAM) with learning-based methods, noting that while learning approaches promise scalability and end-to-end optimization, maintaining physical consistency of geometry across large-scale environments remains difficult.

Transformer-based frameworks like VGGT offer global attention for long-range dependencies, but without strong supervision or structured losses, they tend to overfit local cues in self-supervised settings. This motivates identifying how to learn physically consistent geometry at scale.

References

Learning-based approaches aim to overcome these limitations by enabling end-to-end training and improved generalization; however, learning physically consistent geometry at scale remains a challenging open problem.