Bootstrapping Autonomous Driving Radars with Self-Supervised Learning (2312.04519v3)
Abstract: The perception of autonomous vehicles using radars has attracted increased research interest due its ability to operate in fog and bad weather. However, training radar models is hindered by the cost and difficulty of annotating large-scale radar data. To overcome this bottleneck, we propose a self-supervised learning framework to leverage the large amount of unlabeled radar data to pre-train radar-only embeddings for self-driving perception tasks. The proposed method combines radar-to-radar and radar-to-vision contrastive losses to learn a general representation from unlabeled radar heatmaps paired with their corresponding camera images. When used for downstream object detection, we demonstrate that the proposed self-supervision framework can improve the accuracy of state-of-the-art supervised baselines by $5.8\%$ in mAP. Code is available at \url{https://github.com/yiduohao/Radical}.
- Crosspoint: Self-supervised cross-modal contrastive learning for 3d point cloud understanding. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9892–9902, Los Alamitos, CA, USA, 2022. IEEE Computer Society.
- Self-supervised learning of audio-visual objects from video. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVIII 16, pages 208–224. Springer, 2020.
- Look, radiate, and learn: Self-supervised localisation via radio-visual correspondence. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17430–17440, Los Alamitos, CA, USA, 2023. IEEE Computer Society.
- Self-supervised radio-visual representation learning for 6g sensing. In ICC 2022-IEEE International Conference on Communications, pages 1955–1961. IEEE, 2022.
- Self-supervised learning by cross-modal audio-video clustering. Advances in Neural Information Processing Systems, 33:9758–9770, 2020.
- Look, listen and learn. In Proceedings of the IEEE International Conference on Computer Vision, pages 609–617, 2017.
- Objects that sound. In Proceedings of the European conference on computer vision, pages 435–451, 2018.
- Self-labelling via simultaneous clustering and representation learning. In International Conference on Learning Representations (ICLR), 2020.
- Soundnet: Learning sound representations from unlabeled video. Advances in neural information processing systems, 29, 2016.
- A cookbook of self-supervised learning. arXiv preprint arXiv:2304.12210, 2023.
- Multimodal machine learning: A survey and taxonomy. IEEE transactions on pattern analysis and machine intelligence, 41(2):423–443, 2018.
- Pointillism: Accurate 3d bounding box estimation with multi-radars. In Proceedings of the 18th Conference on Embedded Networked Sensor Systems, page 340–353, New York, NY, USA, 2020. Association for Computing Machinery.
- Variance-invariance-covariance regularization for self-supervised learning. ICLR, Vicreg, 2022.
- The oxford radar robotcar dataset: A radar extension to the oxford robotcar dataset. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 6433–6438. IEEE, 2020.
- nuscenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11621–11631, 2020.
- Towards cross-environment human activity recognition based on radar without source data. IEEE Transactions on Vehicular Technology, 70(11):11843–11854, 2021.
- Unsupervised learning of visual features by contrasting cluster assignments. Advances in neural information processing systems, 33:9912–9924, 2020.
- A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020a.
- Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15750–15758, 2021.
- Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297, 2020b.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Hidden gems: 4d radar scene flow learning using cross-modal supervision. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9340–9349, 2023.
- Vision models are more robust and fair when pretrained on uncurated images without supervision. arXiv preprint arXiv:2202.08360, 2022.
- Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems, 33:21271–21284, 2020.
- Through fog high-resolution imaging using millimeter wave radar. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11464–11473, 2020.
- Exploiting virtual array diversity for accurate radar detection. In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5, 2023.
- Dimensionality reduction by learning an invariant mapping. In 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), pages 1735–1742. IEEE, 2006.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020.
- Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022.
- Spatio-temporal self-supervised representation learning for 3d point clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6535–6545, 2021.
- The fundamentals of millimeter wave sensors. Texas Instruments, pages 1–8, 2017.
- Multimodal contrastive learning for remote sensing tasks. arXiv preprint arXiv:2209.02329, 2022.
- Scaling up visual and vision-language representation learning with noisy text supervision. In International conference on machine learning, pages 4904–4916. PMLR, 2021.
- Rss-net: Weakly-supervised multi-class semantic segmentation with fmcw radar. In 2020 IEEE Intelligent Vehicles Symposium (IV), pages 431–436. IEEE, 2020.
- Radar occupancy prediction with lidar supervision while preserving long-range sensing and penetrating capabilities. IEEE Robotics and Automation Letters, 7(2):2637–2643, 2022.
- Unsupervised learning for human sensing using radio signals. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 3288–3297, 2022.
- Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014.
- Radatron: Accurate detection using multi-resolution cascaded mimo radar. In Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXIX, page 160–178, 2022.
- Learning representations from audio-visual spatial alignment. Advances in Neural Information Processing Systems, 33:4733–4744, 2020.
- High-resolution radar dataset for semi-supervised learning of dynamic objects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 100–101, 2020a.
- High resolution radar dataset for semi-supervised learning of dynamic objects. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 450–457, 2020b.
- Experimental study on low-thz automotive radar signal attenuation during snowfall. IET Radar, Sonar & Navigation, 13(9):1421–1427, 2019.
- Rain attenuation at millimeter wave and low-thz frequencies. IEEE Transactions on Antennas and Propagation, 68(1):421–431, 2020.
- Deep open space segmentation using automotive radar. In 2020 IEEE MTT-S International Conference on Microwaves for Intelligent Mobility (ICMIM), pages 1–4. IEEE, 2020.
- Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
- High-resolution radar road segmentation using weakly supervised learning. Nature Machine Intelligence, 3(3):239–246, 2021.
- Carrada dataset: camera and automotive radar with range-angle-doppler annotations. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 5068–5075. IEEE, 2021.
- Ambient sound provides supervision for visual learning. In European conference on computer vision, pages 801–816. Springer, 2016.
- Multi-modal multi-objective contrastive learning for sentinel-1/2 imagery. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2135–2143, 2023.
- Robust multimodal vehicle detection in foggy weather using complementary lidar and radar signals. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 444–453, 2021.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Raw high-definition radar for multi-task learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17021–17030, 2022.
- Radar-based perception for autonomous outdoor vehicles. Journal of Field Robotics, 28(6):894–913, 2011.
- Semantic segmentation on radar point clouds. In 2018 21st International Conference on Information Fusion (FUSION), pages 2179–2186. IEEE, 2018.
- Radiate: A radar dataset for automotive perception. arXiv preprint arXiv:2010.09076, 3(4):7, 2020.
- Road scene understanding by occupancy grid learning from sparse radar clusters using semantic segmentation. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pages 867–875, Los Alamitos, CA, USA, 2019. IEEE Computer Society.
- Rf-url: unsupervised representation learning for rf sensing. In Proceedings of the 28th Annual International Conference on Mobile Computing And Networking, pages 282–295, 2022.
- Rodnet: Radar object detection using cross-modal supervision. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 504–513, 2021a.
- Rethinking of radar’s role: A camera-radar dataset and systematic annotator via coordinate alignment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2815–2824, 2021b.
- Probably unknown: Deep inverse sensor modelling radar. In 2019 International Conference on Robotics and Automation (ICRA), pages 5446–5452, 2019.
- Self-supervised multi-modal alignment for whole body medical imaging. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 90–101. Springer, 2021.
- Detectron2. https://github.com/facebookresearch/detectron2, 2019.
- Unsupervised feature learning via non-parametric instance discrimination. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3733–3742, 2018.
- Mae-based self-supervised pretraining algorithm for heart rate estimation of radar signals. Sensors, 23(18), 2023.
- Pointcontrast: Unsupervised pre-training for 3d point cloud understanding. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16, pages 574–591. Springer, 2020.
- Unsupervised domain adaptation for disguised-gait-based person identification on micro-doppler signatures. IEEE Transactions on Circuits and Systems for Video Technology, 32(9):6448–6460, 2022.
- The impact of adverse weather conditions on autonomous vehicles: How rain, snow, fog, and hail affect the performance of a self-driving car. IEEE Vehicular Technology Magazine, 14(2):103–111, 2019.
- Barlow twins: Self-supervised learning via redundancy reduction. In International Conference on Machine Learning, pages 12310–12320. PMLR, 2021.
- Raddet: Range-azimuth-doppler based radar object detection for dynamic road users. arXiv preprint arXiv:2105.00363, 2021.