- The paper introduces StreetSurf, which partitions unbounded street scenes into close-range, distant-view, and sky segments to achieve high-quality neural surface reconstruction.
- It employs aligned cuboid boundaries, adapted hash-grids, and a multi-stage ray marching strategy to overcome challenges from textureless regions and limited viewing angles.
- Experiments on autonomous driving datasets show that StreetSurf attains superior geometry and appearance fidelity with training times as low as one to two hours on a single RTX3090 GPU.
Extending Multi-view Implicit Surface Reconstruction to Street Views with StreetSurf
Introduction
The field of neural rendering has seen significant advancements through the introduction and development of Neural Radiance Fields (NeRF). This progress has catalyzed research efforts toward applying NeRF in diverse settings, including street views, a domain with untapped potential for applications in autonomous driving and city planning. StreetSurf addresses the challenges peculiar to street-view imagery, such as the absence of LiDAR data and the distinctive camera trajectories, by extending prior object-centric neural surface reconstruction methodologies. The novelty of StreetSurf lies in its ability to effectively delimit unbounded street scenes into comprehensible segments and utilize adapted data representations, achieving state-of-the-art results in both geometry and appearance reconstruction within limited training times.
Methodology
The core approach of StreetSurf is to partition the unbounded spaces of street views into three distinct sections: close-range, distant-view, and sky. This division is managed through the application of aligned cuboid boundaries and the adaptation of cuboid/hyper-cuboid hash-grids to ensure finer representation. Key to this approach is the inventive use of geometric priors for handling textureless regions and compensating for the insufficient viewing angles often encountered in unbounded street scenes. Additionally, an efficient and fine-grained multi-stage ray marching strategy is implemented, enhancing the model's ability to process complex scenes with detailed occlusions effectively.
Disentangled Representation and Initialisation Schemes
Delimiting the close-range and distant-view spaces poses an unsupervised and challenging problem due to the ambiguous nature of such a division. StreetSurf introduces a novel road-surface initialization scheme alongside an entropy regularization process to encourage distinct representations. This scheme is particularly advantageous in the context of unbounded street scenes, facilitating a clearer separation between the close-range and distant-view models.
Addressing Geometric Errors
A common challenge in the reconstruction of street-view images is the geometric inaccuracies stemming from textureless regions and insufficient coverage of viewing angles. To address this, StreetSurf leverages monocular estimations to guide the rendered normals and depths of the close-range scene, significantly improving the geometric fidelity of the reconstruction.
Experiments and Results
StreetSurf is rigorously evaluated on real-world autonomous driving datasets, such as the Waymo-Perception sequences. The experiments demonstrate the model's superiority in reconstructing both the geometry and appearance of street views without relying on dense LiDAR data. Notably, the system achieves remarkable training efficiency, requiring only one to two hours on a single RTX3090 GPU for each sequence, setting a new benchmark in the field.
Implications and Future Directions
The research presented in StreetSurf opens up promising avenues for the application of neural rendering technologies in urban environments. By achieving highly accurate reconstructions with minimal reliance on LiDAR data, the model paves the way for enhanced virtual simulations and planning in urban development and autonomous driving. Future developments could explore the integration of dynamic object reconstruction and the enhancement of the system's application in varied environmental conditions.
Conclusion
StreetSurf represents a significant step forward in the application of multi-view implicit surface reconstruction techniques to street views. The model's innovative approach to scene delimitation, coupled with its use of disentangled representations and geometric priors, enables highly accurate reconstructions of urban landscapes. The findings from this research not only underscore the potential of neural rendering in new domains but also highlight the path for future advancements in the field.