MV-Map: Offboard HD-Map Generation with Multi-view Consistency (2305.08851v3)
Abstract: While bird's-eye-view (BEV) perception models can be useful for building high-definition maps (HD-Maps) with less human labor, their results are often unreliable and demonstrate noticeable inconsistencies in the predicted HD-Maps from different viewpoints. This is because BEV perception is typically set up in an 'onboard' manner, which restricts the computation and consequently prevents algorithms from reasoning multiple views simultaneously. This paper overcomes these limitations and advocates a more practical 'offboard' HD-Map generation setup that removes the computation constraints, based on the fact that HD-Maps are commonly reusable infrastructures built offline in data centers. To this end, we propose a novel offboard pipeline called MV-Map that capitalizes multi-view consistency and can handle an arbitrary number of frames with the key design of a 'region-centric' framework. In MV-Map, the target HD-Maps are created by aggregating all the frames of onboard predictions, weighted by the confidence scores assigned by an 'uncertainty network'. To further enhance multi-view consistency, we augment the uncertainty network with the global 3D structure optimized by a voxelized neural radiance field (Voxel-NeRF). Extensive experiments on nuScenes show that our MV-Map significantly improves the quality of HD-Maps, further highlighting the importance of offboard methods for HD-Map generation.
- Mohamed Aly. Real time detection of lane markers in urban streets. In IEEE Intelligent Vehicles Symposium, 2008.
- Real-time lane and obstacle detection on the GOLD system. In IEEE Intelligent Vehicles Symposium, 1996.
- nuScenes: A multimodal dataset for autonomous driving. In CVPR, 2020.
- Argoverse: 3D tracking and forecasting with rich maps. In CVPR, 2019.
- ImageNet: A large-scale hierarchical image database. In CVPR, 2009.
- Depth-supervised NeRF: Fewer views and faster training for free. In CVPR, 2022.
- Restricted deformable convolution-based road scene semantic segmentation using surround view cameras. In ITSC, 2020.
- SuperFusion: Multilevel LiDAR-camera fusion for long-range HD map generation and prediction. arXiv preprint arXiv:2211.15656, 2022.
- Large scale interactive motion forecasting for autonomous driving: The Waymo open motion dataset. In ICCV, 2021.
- Super sparse 3D object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
- Plenoxels: Radiance fields without neural networks. In CVPR, 2022.
- Simple-BEV: What really matters for multi-sensor BEV perception? In ICRA, 2023.
- Deep residual learning for image recognition. In CVPR, 2016.
- FIERY: Future instance prediction in Bird’s-Eye View from surround monocular cameras. In ICCV, 2021.
- PointPillars: Fast encoders for object detection from point clouds. In CVPR, 2019.
- HDMapNet: An online hd map construction and evaluation framework. In ICRA, 2022.
- BEVDepth: Acquisition of reliable depth for multi-view 3D object detection. In AAAI, 2023.
- BEVFormer: Learning Bird’s-Eye-View representation from multi-camera images via spatiotemporal transformers. In ECCV, 2022.
- MapTR: Structured modeling and learning for online vectorized HD map construction. In ICLR, 2023.
- Focal loss for dense object detection. In IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.
- Neural sparse voxel fields. In NeurIPS, 2020.
- VectorMapNet: End-to-end vectorized HD map learning. In ICML, 2023.
- BEVFusion: Multi-task multi-sensor fusion with unified Bird’s-Eye View representation. In ICRA, 2023.
- Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
- Decoupled weight decay regularization. In ICLR, 2019.
- NeRF in the wild: Neural radiance fields for unconstrained photo collections. In CVPR, 2021.
- TrackFormer: Multi-object tracking with transformers. In CVPR, 2022.
- NeRF: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
- Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph., 2022.
- Motion inspired unsupervised perception and prediction in autonomous driving. In ECCV, 2022.
- DONeRF: Towards real-time rendering of compact neural radiance fields using depth oracle networks. In Computer Graphics Forum, 2021.
- BEV-Seg: Bird’s Eye View semantic segmentation using geometry and semantic point cloud. arXiv preprint arXiv:2006.11436, 2020.
- Cross-view semantic segmentation for sensing surroundings. IEEE Robotics and Automation Letters, 2020.
- Model-free vehicle tracking and state estimation in point cloud sequences. In IROS, 2021.
- Lift, Splat, Shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3D. In ECCV, 2020.
- Offboard 3D object detection from point cloud sequences. In CVPR, 2021.
- A Sim2Real deep learning approach for the transformation of images from multiple vehicle-mounted cameras to a semantically segmented image in Bird’s Eye View. In ITSC, 2020.
- Urban radiance fields. In CVPR, 2022.
- U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2015.
- Structure-from-motion revisited. In CVPR, 2016.
- Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In CVPR, 2022.
- Improved direct voxel grid optimization for radiance fields reconstruction. arXiv preprint arXiv:2206.05085, 2022.
- Block-NeRF: Scalable large scene neural view synthesis. In CVPR, 2022.
- NerfingMVS: Guided optimization of neural radiance fields for indoor multi-view stereo. In ICCV, 2021.
- Argoverse 2: Next generation datasets for self-driving perception and forecasting. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (NeurIPS Datasets and Benchmarks 2021, 2023.
- S-NeRF: Neural radiance fields for street views. In ICLR, 2023.
- Neural map prior for autonomous driving. In CVPR, 2023.
- Point-NeRF: Point-based neural radiance fields. In CVPR, 2022.
- Auto4D: Learning to label 4D objects from sequential point clouds. arXiv preprint arXiv:2101.06586, 2021.
- MVSNet: Depth inference for unstructured multi-view stereo. In ECCV, 2018.
- NeWCRFs: Neural window fully-connected CRFs for monocular depth estimation. In CVPR, 2022.
- MOTR: End-to-end multiple-object tracking with transformer. In ECCV, 2022.
- Nemo: Neural map growing system for spatiotemporal fusion in bird’s-eye-view and bdd-map benchmark. arXiv preprint arXiv:2306.04540, 2023.