StreetSurf: Extending Multi-view Implicit Surface Reconstruction to Street Views (2306.04988v1)

Published 8 Jun 2023 in cs.CV and cs.GR

Abstract: We present a novel multi-view implicit surface reconstruction technique, termed StreetSurf, that is readily applicable to street view images in widely-used autonomous driving datasets, such as Waymo-perception sequences, without necessarily requiring LiDAR data. As neural rendering research expands rapidly, its integration into street views has started to draw interests. Existing approaches on street views either mainly focus on novel view synthesis with little exploration of the scene geometry, or rely heavily on dense LiDAR data when investigating reconstruction. Neither of them investigates multi-view implicit surface reconstruction, especially under settings without LiDAR data. Our method extends prior object-centric neural surface reconstruction techniques to address the unique challenges posed by the unbounded street views that are captured with non-object-centric, long and narrow camera trajectories. We delimit the unbounded space into three parts, close-range, distant-view and sky, with aligned cuboid boundaries, and adapt cuboid/hyper-cuboid hash-grids along with road-surface initialization scheme for finer and disentangled representation. To further address the geometric errors arising from textureless regions and insufficient viewing angles, we adopt geometric priors that are estimated using general purpose monocular models. Coupled with our implementation of efficient and fine-grained multi-stage ray marching strategy, we achieve state of the art reconstruction quality in both geometry and appearance within only one to two hours of training time with a single RTX3090 GPU for each street view sequence. Furthermore, we demonstrate that the reconstructed implicit surfaces have rich potential for various downstream tasks, including ray tracing and LiDAR simulation.

Citations (69)

View on Semantic Scholar

Summary

The paper introduces StreetSurf, which partitions unbounded street scenes into close-range, distant-view, and sky segments to achieve high-quality neural surface reconstruction.
It employs aligned cuboid boundaries, adapted hash-grids, and a multi-stage ray marching strategy to overcome challenges from textureless regions and limited viewing angles.
Experiments on autonomous driving datasets show that StreetSurf attains superior geometry and appearance fidelity with training times as low as one to two hours on a single RTX3090 GPU.

Extending Multi-view Implicit Surface Reconstruction to Street Views with StreetSurf

Introduction

The field of neural rendering has seen significant advancements through the introduction and development of Neural Radiance Fields (NeRF). This progress has catalyzed research efforts toward applying NeRF in diverse settings, including street views, a domain with untapped potential for applications in autonomous driving and city planning. StreetSurf addresses the challenges peculiar to street-view imagery, such as the absence of LiDAR data and the distinctive camera trajectories, by extending prior object-centric neural surface reconstruction methodologies. The novelty of StreetSurf lies in its ability to effectively delimit unbounded street scenes into comprehensible segments and utilize adapted data representations, achieving state-of-the-art results in both geometry and appearance reconstruction within limited training times.

Methodology

The core approach of StreetSurf is to partition the unbounded spaces of street views into three distinct sections: close-range, distant-view, and sky. This division is managed through the application of aligned cuboid boundaries and the adaptation of cuboid/hyper-cuboid hash-grids to ensure finer representation. Key to this approach is the inventive use of geometric priors for handling textureless regions and compensating for the insufficient viewing angles often encountered in unbounded street scenes. Additionally, an efficient and fine-grained multi-stage ray marching strategy is implemented, enhancing the model's ability to process complex scenes with detailed occlusions effectively.

Disentangled Representation and Initialisation Schemes

Delimiting the close-range and distant-view spaces poses an unsupervised and challenging problem due to the ambiguous nature of such a division. StreetSurf introduces a novel road-surface initialization scheme alongside an entropy regularization process to encourage distinct representations. This scheme is particularly advantageous in the context of unbounded street scenes, facilitating a clearer separation between the close-range and distant-view models.

Addressing Geometric Errors

A common challenge in the reconstruction of street-view images is the geometric inaccuracies stemming from textureless regions and insufficient coverage of viewing angles. To address this, StreetSurf leverages monocular estimations to guide the rendered normals and depths of the close-range scene, significantly improving the geometric fidelity of the reconstruction.

Experiments and Results

StreetSurf is rigorously evaluated on real-world autonomous driving datasets, such as the Waymo-Perception sequences. The experiments demonstrate the model's superiority in reconstructing both the geometry and appearance of street views without relying on dense LiDAR data. Notably, the system achieves remarkable training efficiency, requiring only one to two hours on a single RTX3090 GPU for each sequence, setting a new benchmark in the field.

Implications and Future Directions

The research presented in StreetSurf opens up promising avenues for the application of neural rendering technologies in urban environments. By achieving highly accurate reconstructions with minimal reliance on LiDAR data, the model paves the way for enhanced virtual simulations and planning in urban development and autonomous driving. Future developments could explore the integration of dynamic object reconstruction and the enhancement of the system's application in varied environmental conditions.

Conclusion

StreetSurf represents a significant step forward in the application of multi-view implicit surface reconstruction techniques to street views. The model's innovative approach to scene delimitation, coupled with its use of disentangled representations and geometric priors, enables highly accurate reconstructions of urban landscapes. The findings from this research not only underscore the potential of neural rendering in new domains but also highlight the path for future advancements in the field.

PDF Markdown