NeuralFusion: Online Depth Fusion in Latent Space (2011.14791v2)

Published 30 Nov 2020 in cs.CV

Abstract: We present a novel online depth map fusion approach that learns depth map aggregation in a latent feature space. While previous fusion methods use an explicit scene representation like signed distance functions (SDFs), we propose a learned feature representation for the fusion. The key idea is a separation between the scene representation used for the fusion and the output scene representation, via an additional translator network. Our neural network architecture consists of two main parts: a depth and feature fusion sub-network, which is followed by a translator sub-network to produce the final surface representation (e.g. TSDF) for visualization or other tasks. Our approach is an online process, handles high noise levels, and is particularly able to deal with gross outliers common for photometric stereo-based depth maps. Experiments on real and synthetic data demonstrate improved results compared to the state of the art, especially in challenging scenarios with large amounts of noise and outliers.

Authors (4)

Silvan Weder (8 papers)
Johannes L. Schönberger (15 papers)
Marc Pollefeys (230 papers)
Martin R. Oswald (69 papers)

Citations (56)

View on Semantic Scholar

Summary

Insightful Overview of NeuralFusion: Online Depth Fusion in Latent Space

NeuralFusion introduces an innovative approach for online depth map fusion, leveraging a learned latent feature space for depth map aggregation. This approach contrasts classical techniques that rely on explicit scene representations such as signed distance functions (SDF). The primary innovation lies in separating the fusion process from the output scene representation using a translator network. This design choice allows for cleaner and more accurate real-time surface reconstruction, effectively managing high noise levels and outliers, particularly those encountered in photometric stereo-based depth maps.

Methodology

The paper proposes a dual-module architecture:

Depth and Feature Fusion Network: It aggregates depth maps in a latent feature space, encoding complex scene information such as confidence levels and local geometries. This separation helps to tackle common issues like surface thickening and outlier blobs, enhancing performance in noisy and incomplete data conditions.
Translator Network: This network interprets the latent space into tangible representations such as Truncated Signed Distance Functions (TSDF) for visualization or application-specific tasks. The end-to-end trainable architecture facilitates efficient filtering of geometric outliers and simultaneously decodes the latent representation into a refined output.

Numerical Results and Claims

NeuralFusion demonstrates significant improvements over state-of-the-art methodologies in depth map fusion, specifically in scenarios with substantial noise and outlier prevalence. Quantitative metrics illustrate enhanced mesh accuracy and completeness in both synthetic and real-world data tests. This method allows for real-time processing speeds even with extensive feature dimensions demonstrating its practical applicability in dynamic environments.

Implications and Future Directions

Theoretical Implications: By learning feature aggregation in a latent space, NeuralFusion paves the way for more robust 3D reconstruction models for complex environments. This decoupling of scene representation and output visualization introduces potential for advancements in fine-grained depth map integration strategies adaptable to varied and challenging data conditions.

Practical Implications: This approach benefits robotic navigation, augmented reality applications, and multi-view stereo scenarios, where real-time processing and adaptability to noisy inputs are crucial. The robustness against outliers makes it a promising candidate for deployment in less controlled environments where traditional methods falter.

Speculation on Future Developments: NeuralFusion's architecture could inspire future AI advancements in depth map fusion by integrating higher resolution latent features or evolving the translator modules to allow easier integration with varied downstream tasks. Furthermore, extending this approach to other sensor data types could widen its applicability across different domains needing scene reconstruction, such as autonomous driving or drone navigation.

In summary, NeuralFusion presents a significant step forward in depth map fusion by addressing inherent limitations of traditional methods while elegantly balancing accuracy and computational demands. It opens pathways for future work aiming to integrate complex geometric interpretations within the efficient learning-based frameworks of AI-driven 3D scene reconstruction.

Related Papers

YouTube

Show All Videos