DVLO: Deep Visual-LiDAR Odometry with Local-to-Global Feature Fusion and Bi-Directional Structure Alignment (2403.18274v3)

Published 27 Mar 2024 in cs.CV

Abstract: Information inside visual and LiDAR data is well complementary derived from the fine-grained texture of images and massive geometric information in point clouds. However, it remains challenging to explore effective visual-LiDAR fusion, mainly due to the intrinsic data structure inconsistency between two modalities: Image pixels are regular and dense, but LiDAR points are unordered and sparse. To address the problem, we propose a local-to-global fusion network (DVLO) with bi-directional structure alignment. To obtain locally fused features, we project points onto the image plane as cluster centers and cluster image pixels around each center. Image pixels are pre-organized as pseudo points for image-to-point structure alignment. Then, we convert points to pseudo images by cylindrical projection (point-to-image structure alignment) and perform adaptive global feature fusion between point features and local fused features. Our method achieves state-of-the-art performance on KITTI odometry and FlyingThings3D scene flow datasets compared to both single-modal and multi-modal methods. Codes are released at https://github.com/IRMVLab/DVLO.

Authors (7)

Jiuming Liu (19 papers)
Dong Zhuo (1 paper)
Zhiheng Feng (4 papers)
Siting Zhu (8 papers)
Chensheng Peng (14 papers)
Zhe Liu (236 papers)
Hesheng Wang (87 papers)

Citations (7)

View on Semantic Scholar

Summary

The paper introduces a novel deep visual-LiDAR odometry method that integrates local-to-global feature fusion with bi-directional structure alignment.
It employs innovative clustering-based local fusion and adaptive global fusion to effectively combine dense image textures with sparse LiDAR geometries.
DVLO demonstrates superior pose accuracy and robust performance on KITTI and FlyingThings3D datasets, underscoring its potential in autonomous systems.

DVLO: A Novel Approach for Deep Visual-LiDAR Odometry through Local-to-Global Feature Fusion

Introduction to Visual-LiDAR Odometry Challenges

The integration of visual and LiDAR sensors in odometry tasks offers a compelling avenue for leveraging the complementary strengths of dense texture information from images and the substantial geometric data from point clouds. However, exploiting this synergy effectively poses a significant challenge, primarily due to the intrinsic structural inconsistencies between these two data modalities. Whereas images are characterized by their regular, dense structure, LiDAR point clouds are inherently unordered and sparse. This paper introduces DVLO, an innovative approach to address this challenge through a local-to-global fusion network complemented by a bi-directional structure alignment. This methodology not only bridges the gap between the two modalities but also sets a new benchmark in performance on the KITTI odometry and FlyingThings3D scene flow datasets.

Deep Dive into DVLO: Architecture and Components

Feature Extraction and Structure Alignment

The DVLO model begins with a structured feature extraction process for both modalities, converting them into formats conducive to fusion - pseudo points for images and pseudo images for point clouds. Utilizing cylindrical projection for point clouds aligns the data structures, thereby simplifying the fusion process. The architecture subsequently introduces two pivotal components: the Local Fuser and Global Fuser modules.

Local Fusion: At the heart of local fusion lies a novel clustering-based mechanism that projects LiDAR points onto the image plane, serving as dynamic cluster centers to aggregate image pixels (pseudo points). This process facilitates fine-grained local feature integration, ensuring detailed textural and geometric features are captured efficiently.
Global Fusion: Building on the locally fused features, the Global Fuser employs an adaptive fusion technique for these features with the pseudo image representations of LiDAR data. This step is crucial for incorporating broader contextual information and achieving a comprehensive feature amalgamation.

Iterative Pose Estimation

Building upon fused features, DVLO employs a hierarchical approach to pose estimation. Starting with coarse-level feature associations, the model utilizes an iterative refinement process to enhance pose accuracy progressively. This method demonstrates the importance of integrating multi-scale information for precise odometry.

Empirical Validation

DVLO's efficacy is showcased through extensive evaluations on the KITTI odometry dataset, where it consistently outperforms existing single-modal and multimodal methods across various sequences. Its superior performance is further highlighted in generalization to scene flow estimation tasks on the FlyingThings3D dataset. These results underscore DVLO's robustness and versatility in handling diverse multimodal fusion tasks.

Implications and Future Directions

The introduction of DVLO represents a significant step forward in visual-LiDAR odometry, offering a robust solution to the challenges posed by data modality inconsistencies. Its local-to-global fusion strategy, accentuated by bi-directional structure alignment, exemplifies a promising direction for future research in multimodal sensor fusion. Potential extensions of this work could explore the application of these principles to other domains, such as autonomous navigation in complex environments, where leveraging multimodal data is crucial for understanding and interacting with the surroundings comprehensively.

In summary, DVLO not only addresses current limitations in visual-LiDAR odometry but also opens avenues for future innovations in the integration of diverse sensory data for autonomous systems. The methodologies and insights presented in this work have the potential to significantly influence the development of more sophisticated and efficient multimodal fusion techniques in the field.

PDF Markdown

Related Papers

Tweets

https://twitter.com/zhenjun_zhao/status/1774334753908674952

https://twitter.com/CSVisionPapers/status/1773481444708348118