DeepI2P: Image-to-Point Cloud Registration via Deep Classification (2104.03501v1)

Published 8 Apr 2021 in cs.CV and cs.AI

Abstract: This paper presents DeepI2P: a novel approach for cross-modality registration between an image and a point cloud. Given an image (e.g. from a rgb-camera) and a general point cloud (e.g. from a 3D Lidar scanner) captured at different locations in the same scene, our method estimates the relative rigid transformation between the coordinate frames of the camera and Lidar. Learning common feature descriptors to establish correspondences for the registration is inherently challenging due to the lack of appearance and geometric correlations across the two modalities. We circumvent the difficulty by converting the registration problem into a classification and inverse camera projection optimization problem. A classification neural network is designed to label whether the projection of each point in the point cloud is within or beyond the camera frustum. These labeled points are subsequently passed into a novel inverse camera projection solver to estimate the relative pose. Extensive experimental results on Oxford Robotcar and KITTI datasets demonstrate the feasibility of our approach. Our source code is available at https://github.com/lijx10/DeepI2P

PDF Abstract

DeepI2P: Image-to-Point Cloud Registration via Deep Classification

The paper presents an innovative approach for the challenging problem of cross-modality registration between images and point clouds, termed DeepI2P. This method is significant because it moves away from the traditional reliance on learned feature descriptors, a strategy that has been fraught with difficulties in these heterogeneous modalities. Instead, the authors transform the registration dilemma into a combination of deep classification and optimization, thereby venturing into a less explored territory of direct correspondence-free registration.

Problem Context and Methodology

Image-to-point cloud registration requires estimating the relative rigid transformation—comprising both rotation and translation—that aligns a 3D point cloud with a 2D image. Conventionally, establishing robust correspondences has been the main bottleneck due to the lack of direct geometric and appearance correlations across the two modalities. Existing methods, such as learning cross-modal descriptors or employing point features, encounter limitations in the form of high computational costs and dependence on specific environmental conditions.

The DeepI2P framework cleverly bypasses these constraints by utilizing deep classification. The approach is structured as a two-stage pipeline:

Classification: A neural network classifies each 3D point in the point cloud to determine whether its projection lies within or beyond the camera's frustum. The network is designed with dual branches and attention modules to enhance the fusion of image and point cloud data.
Inverse Camera Projection: Leveraging the labeled point classifications, a novel optimization problem is solved to derive the camera pose. This method strategically employs a Gauss-Newton algorithm for continuous unconstrained optimization, based on a refined cost function focused on the boundary distances of the projected points.

Experimental Evaluations

Extensive experiments conducted on the Oxford Robotcar and KITTI datasets demonstrate the practical applicability and efficacy of DeepI2P. The results illustrate competitive relative translation errors (RTE) and relative rotation errors (RRE), even in scenarios where large 3D rotations and occlusions are present. The method showcased a frustum classification accuracy of 98% and 94% on the respective datasets, highlighting the robustness of the classification network. Moreover, the observable performance superiority over baseline methods, particularly those dependent on monocular depth predictions, underscores the viability of a classification-optimization approach for cross-modality registration.

Implications and Future Directions

The implications of DeepI2P are manifold. Practically, this framework can be directly employed in fields like augmented and virtual reality, autonomous driving, and robotics, where camera and Lidar point cloud data integration is pivotal. This method also presents a potential avenue to deploy image information in lieu of expensive Lidar hardware in large-scale operations, significantly reducing the application's overall cost and complexity.

Theoretically, DeepI2P encourages reevaluation of how modality differences can be reconciled without resolving back to descriptor-based correspondence, presenting an alternative paradigm grounded in classification and optimization. This approach suggests future research into refining and expanding classification-driven registration frameworks, potentially employing more sophisticated attention mechanisms and neural architectures to address additional cross-sensor integration challenges.

In conclusion, DeepI2P makes a substantial contribution by redefining the approach to image-to-point cloud registration. As AI and sensor technologies evolve, expanding upon this classification and optimization methodology could address emerging challenges and scale to more diverse applications.

PDF Markdown Bookmark Chat (Pro)

Authors (2)

Jiaxin Li (57 papers)
Gim Hee Lee (135 papers)

Citations (73)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos