Stream Query Denoising for Vectorized HD Map Construction
Abstract: To enhance perception performance in complex and extensive scenarios within the realm of autonomous driving, there has been a noteworthy focus on temporal modeling, with a particular emphasis on streaming methods. The prevailing trend in streaming models involves the utilization of stream queries for the propagation of temporal information. Despite the prevalence of this approach, the direct application of the streaming paradigm to the construction of vectorized high-definition maps (HD-maps) fails to fully harness the inherent potential of temporal information. This paper introduces the Stream Query Denoising (SQD) strategy as a novel approach for temporal modeling in high-definition map (HD-map) construction. SQD is designed to facilitate the learning of temporal consistency among map elements within the streaming model. The methodology involves denoising the queries that have been perturbed by the addition of noise to the ground-truth information from the preceding frame. This denoising process aims to reconstruct the ground-truth information for the current frame, thereby simulating the prediction process inherent in stream queries. The SQD strategy can be applied to those streaming methods (e.g., StreamMapNet) to enhance the temporal modeling. The proposed SQD-MapNet is the StreamMapNet equipped with SQD. Extensive experiments on nuScenes and Argoverse2 show that our method is remarkably superior to other existing methods across all settings of close range and long range. The code will be available soon.
- Fast and robust detection of fallen people from a mobile robot. In 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 4159–4166. IEEE, 2017.
- Esmat Bekir. Introduction to modern navigation systems. World scientific, 2007.
- nuscenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11621–11631, 2020.
- End-to-end object detection with transformers. In European conference on computer vision, pages 213–229. Springer, 2020.
- Detrdistill: A universal knowledge distillation framework for detr-families. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6898–6908, 2023.
- Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014.
- Pivotnet: Vectorized pivot learning for end-to-end hd map construction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3672–3682, 2023.
- Denoising-mot: Towards multiple object tracking with severe occlusions. In Proceedings of the 31st ACM International Conference on Multimedia, pages 2734–2743, 2023.
- Exploring recurrent long-term temporal fusion for multi-view 3d perception. arXiv preprint arXiv:2303.05970, 2023.
- A review of navigation systems (integration and algorithms). Australian journal of basic and applied sciences, 3(2):943–959, 2009.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Bevdet4d: Exploit temporal cues in multi-camera 3d object detection. arXiv preprint arXiv:2203.17054, 2022.
- An energy and gpu-computation efficient backbone network for real-time object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 0–0, 2019.
- Dn-detr: Accelerate detr training by introducing query denoising. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13619–13627, 2022a.
- Mask dino: Towards a unified transformer-based framework for object detection and segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3041–3050, 2023.
- Hdmapnet: An online hd map construction and evaluation framework. In 2022 International Conference on Robotics and Automation (ICRA), pages 4628–4634. IEEE, 2022b.
- Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers. In European conference on computer vision, pages 1–18. Springer, 2022c.
- Maptr: Structured modeling and learning for online vectorized hd map construction. arXiv preprint arXiv:2208.14437, 2022.
- Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2117–2125, 2017.
- Sparse4d v2: Recurrent temporal fusion with sparse model. arXiv preprint arXiv:2305.14018, 2023.
- Dab-detr: Dynamic anchor boxes are better queries for detr. arXiv preprint arXiv:2201.12329, 2022a.
- Petr: Position embedding transformation for multi-view 3d object detection. In European Conference on Computer Vision, pages 531–548. Springer, 2022b.
- Petrv2: A unified framework for 3d perception from multi-camera images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3262–3272, 2023a.
- Vectormapnet: End-to-end vectorized hd map learning. In International Conference on Machine Learning, pages 22352–22369. PMLR, 2023b.
- Fixing weight decay regularization in adam. 2018.
- Is pseudo-lidar needed for monocular 3d object detection? In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3142–3152, 2021.
- Bevsegformer: Bird’s eye view semantic segmentation from arbitrary camera rigs. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 5935–5943, 2023.
- Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV 16, pages 194–210. Springer, 2020.
- End-to-end vectorized hd-map construction with piecewise bezier curve. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13218–13228, 2023.
- Unifusion: Unified multi-view fusion transformer for spatial-temporal representation in bird’s-eye-view. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8690–8699, 2023.
- Lego-loam: Lightweight and ground-optimized lidar odometry and mapping on variable terrain. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4758–4765. IEEE, 2018.
- Exploring object-centric temporal modeling for efficient multi-view 3d object detection. arXiv preprint arXiv:2303.11926, 2023a.
- Towards domain generalization for multi-view 3d object detection in bird-eye-view. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13333–13342, 2023b.
- Multi-modal 3d object detection in autonomous driving: a survey. International Journal of Computer Vision, pages 1–31, 2023c.
- Argoverse 2: Next generation datasets for self-driving perception and forecasting. arXiv preprint arXiv:2301.00493, 2023.
- The 1st-place solution for cvpr 2023 openlane topology in autonomous driving challenge. arXiv preprint arXiv:2306.09590, 2023.
- Bevformer v2: Adapting modern image backbones to bird’s-eye-view recognition via perspective supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17830–17839, 2023.
- Streammapnet: Streaming mapping network for vectorized online hd map construction. arXiv preprint arXiv:2308.12570, 2023.
- A survey of autonomous driving: Common practices and emerging technologies. IEEE access, 8:58443–58469, 2020.
- Dino: Detr with improved denoising anchor boxes for end-to-end object detection. arXiv preprint arXiv:2203.03605, 2022a.
- Loam: Lidar odometry and mapping in real-time. In Robotics: Science and systems, pages 1–9. Berkeley, CA, 2014.
- Beverse: Unified perception and prediction in birds-eye-view for vision-centric autonomous driving. arXiv preprint arXiv:2205.09743, 2022b.
- Deformable detr: Deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159, 2020.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.