FlowNet3D: Learning Scene Flow in 3D Point Clouds (1806.01411v3)

Published 4 Jun 2018 in cs.CV and cs.LG

Abstract: Many applications in robotics and human-computer interaction can benefit from understanding 3D motion of points in a dynamic environment, widely noted as scene flow. While most previous methods focus on stereo and RGB-D images as input, few try to estimate scene flow directly from point clouds. In this work, we propose a novel deep neural network named $FlowNet3D$ that learns scene flow from point clouds in an end-to-end fashion. Our network simultaneously learns deep hierarchical features of point clouds and flow embeddings that represent point motions, supported by two newly proposed learning layers for point sets. We evaluate the network on both challenging synthetic data from FlyingThings3D and real Lidar scans from KITTI. Trained on synthetic data only, our network successfully generalizes to real scans, outperforming various baselines and showing competitive results to the prior art. We also demonstrate two applications of our scene flow output (scan registration and motion segmentation) to show its potential wide use cases.

Authors (3)

Xingyu Liu (56 papers)
Charles R. Qi (31 papers)
Leonidas J. Guibas (75 papers)

Citations (451)

View on Semantic Scholar

Summary

The paper introduces an end-to-end neural network that learns 3D scene flow from point clouds, significantly advancing motion estimation.
Its novel flow embedding and set upconv layers capture spatial relations and refine predictions, achieving superior accuracy on datasets like KITTI.
Trained on synthetic FlyingThings3D data, FlowNet3D demonstrates robust generalization to real-world scenarios, supporting applications in autonomous driving and robotics.

Overview of FlowNet3D: Learning Scene Flow in 3D Point Clouds

Introduction

The task of estimating scene flow, defined as the 3D motion field of points in a scene, presents significant challenges and opportunities in robotics and human-computer interaction. Traditional approaches rely primarily on stereo and RGB-D images to estimate scene flow, failing to directly utilize 3D point clouds. FlowNet3D addresses this gap by introducing an end-to-end deep neural network designed for learning scene flow directly from point clouds, encompassing LiDAR data and other sources of 3D information.

Network Architecture

FlowNet3D is constructed around three main modules that drive its ability to learn from point clouds: hierarchical feature learning, point mixture through the flow embedding layer, and flow refinement using the set upconv layer. The novel components include:

Flow Embedding Layer: This layer is crucial for comprehending the geometric and spatial relations between points across two frames. It leverages learned features to deduce flows, a distinct advantage over traditional methods reliant on static feature matching.
Set Upconv Layer: By enabling the up-sampling of learned features in a contextual and trainable manner, this layer refines scene flow predictions beyond the abilities of rudimentary 3D interpolations.

Training and Evaluation

Trained mainly on synthetic data (FlyingThings3D), FlowNet3D showcases an impressive ability to generalize to real-world data, particularly on the challenging KITTI dataset. The results demonstrate a substantial improvement over baselines and prior methods, with marked superiority in metrics like end-point error and accuracy.

Results

The numerical analysis reveals distinct advantages:

FlowNet3D outperforms conventional methods such as ICP and image-based adaptations like FlowNet in terms of accuracy and efficiency.
Ablation studies underscore the efficacy of the flow embedding layer and the set upconv layer, validating their contribution to improved scene flow estimation.

Implications and Future Work

FlowNet3D’s success emphasizes the potential of end-to-end learning for complex 3D vision tasks. As scene flow estimation becomes more integral to applications such as autonomous driving and robotics, the techniques introduced by FlowNet3D can pave the way for innovations in real-time 3D data processing.

The potential applications of FlowNet3D are notable, including 3D scan registration and motion segmentation, which were demonstrated effectively in the paper. These applications highlight the versatility and practical relevance of scene flow predictions in operational contexts.

Future developments might involve enhancing the model's robustness across diverse environments, potentially incorporating multi-frame data to further improve prediction fidelity. Additionally, expanding upon the interaction between learned point features and higher-level semantic understanding could bridge gaps between low-level scene flow and high-level task comprehension.

Conclusion

FlowNet3D heralds a significant step forward in scene flow estimation from 3D point clouds, providing a robust framework underpinned by innovative deep learning methodologies. Its architectural advancements, combined with strong generalization capabilities, set the stage for further research and development in 3D vision, leveraging deep neural networks to address intricate spatial motion tasks.

PDF Markdown