Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LidaRF: Delving into Lidar for Neural Radiance Field on Street Scenes (2405.00900v2)

Published 1 May 2024 in cs.CV

Abstract: Photorealistic simulation plays a crucial role in applications such as autonomous driving, where advances in neural radiance fields (NeRFs) may allow better scalability through the automatic creation of digital 3D assets. However, reconstruction quality suffers on street scenes due to largely collinear camera motions and sparser samplings at higher speeds. On the other hand, the application often demands rendering from camera views that deviate from the inputs to accurately simulate behaviors like lane changes. In this paper, we propose several insights that allow a better utilization of Lidar data to improve NeRF quality on street scenes. First, our framework learns a geometric scene representation from Lidar, which is fused with the implicit grid-based representation for radiance decoding, thereby supplying stronger geometric information offered by explicit point cloud. Second, we put forth a robust occlusion-aware depth supervision scheme, which allows utilizing densified Lidar points by accumulation. Third, we generate augmented training views from Lidar points for further improvement. Our insights translate to largely improved novel view synthesis under real driving scenes.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. GitHub - lxxue/FRNN: Fixed Radius Nearest Neighbor Search on GPU — github.com. https://github.com/lxxue/FRNN. [Accessed 18-11-2023].
  2. Interstate Highway standards - Wikipedia — en.wikipedia.org. https://en.wikipedia.org/wiki/Interstate_Highway_standards. [Accessed 18-11-2023].
  3. Neural point-based graphics. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXII 16, pages 696–712. Springer, 2020.
  4. Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5855–5864, 2021.
  5. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
  6. nuscenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11621–11631, 2020.
  7. Cloner: Camera-lidar fusion for occupancy grid-aided neural representations. IEEE Robotics and Automation Letters, 2023.
  8. Pointersect: Neural rendering with cloud-ray intersection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8359–8369, 2023a.
  9. Neural radiance field with lidar maps. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 17914–17923, 2023b.
  10. 4d spatio-temporal convnets: Minkowski convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3075–3084, 2019.
  11. Kangle Deng et. al. Depth-supervised NeRF: Fewer views and faster training for free. In CVPR, 2022.
  12. Streetsurf: Extending multi-view implicit surface reconstruction to street views. arXiv preprint arXiv:2306.04988, 2023.
  13. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017.
  14. Rama C Hoetzlein. Fast fixed-radius nearest neighbors: interactive million-particle fluids. In GPU Technology Conference, page 2, 2014.
  15. Trivol: Point cloud rendering via triple volumes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20732–20741, 2023a.
  16. Point2pix: Photo-realistic point cloud rendering via neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8349–8358, 2023b.
  17. Ray tracing volume densities. ACM SIGGRAPH computer graphics, 1984.
  18. Direct visibility of point sets. In ACM SIGGRAPH 2007 papers, pages 24–es. 2007.
  19. Panoptic neural fields: A semantic object-aware neural scene representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12871–12881, 2022.
  20. Real-time neural rasterization for large scenes. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8416–8427, 2023a.
  21. On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265, 2019.
  22. Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation. In 2023 IEEE International Conference on Robotics and Automation (ICRA), 2023b.
  23. Nerf: Representing scenes as neural radiance fields for view synthesis. In European conference on computer vision, 2020.
  24. Thomas Müller. tiny-cuda-nn, 2021.
  25. Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph., 2022.
  26. Neural scene graphs for dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2856–2865, 2021.
  27. Neural point light fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18419–18429, 2022.
  28. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Advances in neural information processing systems, 30, 2017.
  29. Accelerating 3d deep learning with pytorch3d. arXiv preprint arXiv:2007.08501, 2020.
  30. Urban radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
  31. Block-nerf: Scalable large scene neural view synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8248–8258, 2022.
  32. Nerfstudio: A modular framework for neural radiance field development. In ACM SIGGRAPH 2023 Conference Proceedings, 2023.
  33. Torchsparse++: Efficient training and inference framework for sparse convolution on gpus. In IEEE/ACM International Symposium on Microarchitecture (MICRO), 2023.
  34. Moving forward in structure from motion. In 2007 IEEE Conference on Computer Vision and Pattern Recognition, pages 1–7. IEEE, 2007.
  35. Digging into depth priors for outdoor neural radiance fields. In ACMMM, 2023a.
  36. Planerf: Svd unsupervised 3d plane regularization for nerf large-scale scene reconstruction. arXiv preprint arXiv:2305.16914, 2023b.
  37. Sparsenerf: Distilling depth ranking for few-shot novel view synthesis. In ICCV, 2023c.
  38. Neural fields meet explicit geometric representations for inverse rendering of urban scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8370–8380, 2023d.
  39. Depth-guided optimization of neural radiance fields for indoor multi-view stereo. PAMI, 2023.
  40. Argoverse 2: Next generation datasets for self-driving perception and forecasting. arXiv preprint arXiv:2301.00493, 2023.
  41. Behind the scenes: Density fields for single view reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9076–9086, 2023.
  42. Pandaset: Advanced sensor suite dataset for autonomous driving. In 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), pages 3095–3101. IEEE, 2021.
  43. S-nerf: Neural radiance fields for street views. In The Eleventh International Conference on Learning Representations, 2022.
  44. Point-nerf: Point-based neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5438–5448, 2022.
  45. Nerfvs: Neural radiance fields for free view synthesis via geometry scaffolds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16549–16558, 2023a.
  46. Unisim: A neural closed-loop sensor simulator. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023b.
  47. Center-based 3d object detection and tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021.
  48. Monosdf: Exploring monocular geometric cues for neural implicit surface reconstruction. NeurIPS, 2022.
  49. Nerflets: Local radiance fields for efficient structure-aware 3d scene representation from 2d supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8274–8284, 2023a.
  50. Frequency-modulated point cloud rendering with easy editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 119–129, 2023b.
  51. Open3D: A modern library for 3D data processing. arXiv:1801.09847, 2018.
  52. Sampling: Scene-adaptive hierarchical multiplane images representation for novel view synthesis from a single image. In ICCV, 2023.

Summary

  • The paper demonstrates DiL-NeRF's integration of Lidar data with NeRF, significantly improving 3D scene reconstruction for dynamic street environments.
  • It introduces a robust method to densify sparse Lidar points and applies occlusion-aware depth supervision to enhance view extrapolation.
  • Quantitative evaluations show improved PSNR and SSIM metrics, underpinning its potential for safer, more reliable autonomous driving simulations.

Enhancing NeRF with Lidar for Realistic Street Scene Rendering

Introduction to NeRF and its Challenges in Street Scenes

Neural Radiance Fields (NeRF) have transformed the way we create photorealistic simulations by efficiently synthesizing novel views of complex scenes using a neural network. Despite its popularity in controlled settings, using NeRF to simulate dynamic street scenes, especially for applications like autonomous driving, presents unique challenges:

  • Collinear Camera Movements: Typically, street scene data is captured from vehicles moving primarily forward, resulting in collinear camera movement. This limits the available geometric information crucial for 3D reconstruction.
  • Sparse Sampling at Higher Speeds: At higher driving speeds, fewer images are captured per unit of distance, leading to sparser data and, consequently, lower reconstruction quality.
  • Demand for Off-trajectory Views: Simulating maneuvers like lane changes requires views that deviate significantly from the captured trajectories, demanding more effective view extrapolation capabilities from NeRF.

Addressing NeRF Challenges with Lidar in DiL-NeRF

The paper introduces DiL-NeRF, a framework that richly integrates Lidar data to address the limitations of applying NeRF in dynamic street scenes. Here's how DiL-NeRF tackles these challenges:

  1. Lidar-Enhanced Geometric Scene Representation:
    • DiL-NeRF utilizes a geometric representation learned from Lidar data. By combining this with a grid-based radiance field, the model gains a stronger geometric understanding from the explicit point cloud data.
  2. Robust Occlusion-aware Depth Supervision:
    • To combat the sparsity of Lidar, DiL-NeRF densifies Lidar points by aggregating data across frames, creating denser depth maps. A robust depth supervision mechanism filters out unreliable, occluded depth information throughout training.
  3. Augmented Training Views from Lidar Points:
    • DiL-NeRF synthesizes additional training views by projecting accumulated Lidar points into novel viewpoints. Although this introduces potential occlusions, the robust supervision technique is used to mitigate this issue.

Performance and Evaluation

The adaptation of Lidar data in DiL-NeRF facilitates significant improvements in rendering quality, particularly under challenging real-world conditions. In quantitative evaluations, DiL-NeRF demonstrates improved performance metrics such as PSNR and SSIM across various street scenes when compared to existing methods like UniSim.

Implications and Future Directions

  • Practical Implications: By enhancing NeRF's ability to utilize Lidar data effectively, DiL-NeRF paves the way for more accurate and reliable simulations in autonomous driving, where handling dynamic and complex street scenes is crucial for training and testing autonomous systems.
  • Theoretical Implications: This approach pushes the boundaries of integrating explicit geometric data (from Lidar) with implicit models (like NeRF), which could lead to further research in hybrid modeling techniques in computer vision and graphics.
  • Future Developments: Exploring the integration of dynamic object handling within the DiL-NeRF framework could further enhance its utility and applicability in real-world scenarios.

Conclusion

DiL-NeRF marks a significant step towards resolving the specific challenges of applying NeRF technology to dynamic street scenes, leveraging the depth and geometric precision of Lidar data to enhance the quality and reliability of photorealistic simulations. This advancement opens new possibilities for safely and efficiently training and testing autonomous vehicles in simulated environments that closely mimic the complexities of real-world driving.