Weakly Supervised Learning of Rigid 3D Scene Flow (2102.08945v1)

Published 17 Feb 2021 in cs.CV, cs.AI, cs.LG, and cs.RO

Abstract: We propose a data-driven scene flow estimation algorithm exploiting the observation that many 3D scenes can be explained by a collection of agents moving as rigid bodies. At the core of our method lies a deep architecture able to reason at the \textbf{object-level} by considering 3D scene flow in conjunction with other 3D tasks. This object level abstraction, enables us to relax the requirement for dense scene flow supervision with simpler binary background segmentation mask and ego-motion annotations. Our mild supervision requirements make our method well suited for recently released massive data collections for autonomous driving, which do not contain dense scene flow annotations. As output, our model provides low-level cues like pointwise flow and higher-level cues such as holistic scene understanding at the level of rigid objects. We further propose a test-time optimization refining the predicted rigid scene flow. We showcase the effectiveness and generalization capacity of our method on four different autonomous driving datasets. We release our source code and pre-trained models under \url{github.com/zgojcic/Rigid3DSceneFlow}.

Citations (83)

View on Semantic Scholar

Summary

The paper proposes a weakly supervised framework that leverages binary segmentation masks and ego-motion for 3D scene flow estimation.
It employs a deep learning architecture with background ego-motion estimation and DBSCAN clustering to model rigid body dynamics.
Experiments on LiDAR datasets show competitive performance, lowering annotation costs and enhancing generalization in dynamic environments.

Overview of "Weakly Supervised Learning of Rigid 3D Scene Flow"

The paper "Weakly Supervised Learning of Rigid 3D Scene Flow" proposes a novel approach for estimating 3D scene flow using weak supervision. The authors have developed a method that operates under the assumption that dynamic 3D scenes can be understood as a collection of rigidly moving objects. Key to this approach is a deep learning architecture that allows for object-level reasoning, reducing the need for dense annotations typically required in scene flow estimation. Instead, the method leverages binary background segmentation masks and ego-motion information, which can be easily obtained from large-scale autonomous driving datasets.

Methodology

The authors propose a deep architecture that takes as input two successive point cloud frames and outputs a set of transformation parameters for each segmented rigid object. The model employs a scene abstraction methodology, using rigid body motions as its foundational elements. Specifically, the scene is decomposed into foreground, which consists of movable objects, and background, which includes static parts of the scene.

Background Segmentation: The method uses a binary segmentation loss to separate background from foreground, which facilitates a coarse segmentation of the scene into agents that behave as rigid bodies.
Ego-motion Estimation: For the background, which does not move independently of the sensor, the model calculates ego-motion using a differentiable algorithm based on the Kabsch algorithm and optimal transport theory.
Foreground Rigid Body Motion: The foreground motion is explained through clustering using DBSCAN, which aggregates points into rigidly moving entities. No instance segmentation labels are required, thus significantly reducing annotation costs.
Test-Time Optimization: This component refines the predicted scene flow by adjusting the transformations of background and foreground objects, leading to an improved alignment between the two frames.

Results and Performance

The proposed methodology achieves competitive performance on several benchmarks, significantly outperforming existing state-of-the-art methods on LiDAR-based datasets like lidarKITTI without the need for dense supervision. The method is particularly effective in settings where traditional scene flow methods fail to generalize due to domain gaps, as it can be directly trained on available large-scale autonomous driving datasets like semanticKITTI.

Implications and Future Directions

This paper contributes to the shift towards more pragmatic learning paradigms by effectively applying weak supervision to a traditionally heavily supervised problem. By reducing the reliance on dense, and often costly, data annotation, this approach aligns well with the needs of real-world applications in autonomous driving. It opens up avenues for applying similar weakly supervised strategies across other computer vision tasks, suggesting a broader implication that high-level scene abstraction might be universally beneficial in dynamic environments.

Looking forward, further developments could explore incorporating temporal consistency over multiple frames, potentially increasing the robustness and accuracy of inferred scene flows. Additionally, there is scope for optimization in scenarios with highly dense or cluttered environments, where the rigid body assumption might be more challenging to apply directly.

In conclusion, this research innovatively alleviates the annotation burden associated with 3D scene flow estimation by combining geometric reasoning with learning-based predictions, setting a precedent for weakly supervised approaches in dynamic 3D perception.

PDF Markdown

Related Papers

GitHub

GitHub - zgojcic/Rigid3DSceneFlow: [CVPR 2021, Oral] "Weakly Supervised Learning of Rigid 3D Scene Flow" (135 stars)

Tweets

https://twitter.com/quantombone/status/1362271249469239296

https://twitter.com/_akhaliq/status/1362264425932206080

YouTube

Show All Videos