Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fusion is Not Enough: Single Modal Attacks on Fusion Models for 3D Object Detection (2304.14614v3)

Published 28 Apr 2023 in cs.CV and cs.CR

Abstract: Multi-sensor fusion (MSF) is widely used in autonomous vehicles (AVs) for perception, particularly for 3D object detection with camera and LiDAR sensors. The purpose of fusion is to capitalize on the advantages of each modality while minimizing its weaknesses. Advanced deep neural network (DNN)-based fusion techniques have demonstrated the exceptional and industry-leading performance. Due to the redundant information in multiple modalities, MSF is also recognized as a general defence strategy against adversarial attacks. In this paper, we attack fusion models from the camera modality that is considered to be of lesser importance in fusion but is more affordable for attackers. We argue that the weakest link of fusion models depends on their most vulnerable modality, and propose an attack framework that targets advanced camera-LiDAR fusion-based 3D object detection models through camera-only adversarial attacks. Our approach employs a two-stage optimization-based strategy that first thoroughly evaluates vulnerable image areas under adversarial attacks, and then applies dedicated attack strategies for different fusion models to generate deployable patches. The evaluations with six advanced camera-LiDAR fusion models and one camera-only model indicate that our attacks successfully compromise all of them. Our approach can either decrease the mean average precision (mAP) of detection performance from 0.824 to 0.353, or degrade the detection score of a target object from 0.728 to 0.156, demonstrating the efficacy of our proposed attack framework. Code is available.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (71)
  1. Adversarial attacks on camera-lidar models for 3d car detection. In IROS, 2021a.
  2. Towards universal physical attacks on cascaded camera-lidar 3d object detection models. In ICIP, 2021b.
  3. AIDay. Tesla Autopilot Uses Transformer, 2022. https://youtu.be/j0z4FweCy4M?t=3621.
  4. Rhett Allain. What Is the Angular Field of View for an iPhone 13?, 2022. https://rjallain.medium.com/what-is-the-angular-field-of-view-for-an-iphone-13-199969482531.
  5. Synthesizing robust adversarial examples. In ICML, 2018.
  6. Autoware. Autoware. https://www.autoware.org/.
  7. Transfusion: Robust lidar-camera fusion for 3d object detection with transformers. In CVPR, 2022.
  8. BaiduApollo. Baidu Apollo. https://apollo.auto/index.html.
  9. Attacking vision-based perception in end-to-end autonomous driving models. Journal of Systems Architecture, 2020.
  10. Adversarial patch. arXiv preprint arXiv:1712.09665, 2017.
  11. nuscenes: A multimodal dataset for autonomous driving. In CVPR, 2020.
  12. Adversarial sensor attack on lidar-based perception in autonomous driving. In CCS, 2019.
  13. Invisible for both camera and lidar: Security of multi-sensor fusion based perception in autonomous driving under physical-world attacks. In S&P, 2021.
  14. You can’t see me: Physical removal attacks on lidar-based autonomous vehicles driving frameworks. In USENIX Security, 2023.
  15. End-to-end object detection with transformers. In ECCV, 2020.
  16. Multi-view 3d object detection network for autonomous driving. In CVPR, 2017.
  17. Futr3d: A unified sensor fusion framework for 3d detection. arXiv preprint arXiv:2203.10642, 2022a.
  18. Autoalignv2: Deformable feature aggregation for dynamic multi-modal 3d object detection. arXiv preprint arXiv:2207.10316, 2022b.
  19. Physical attack on monocular depth estimation with optimal adversarial patches. In ECCV, 2022.
  20. Adversarial training of self-supervised monocular depth estimation against physical-world attacks. In ICLR, 2023.
  21. Tf-blender: Temporal feature blender for video object detection. In ICCV, 2021.
  22. CARLA: An open urban driving simulator. In Proceedings of the 1st Annual Conference on Robot Learning, 2017.
  23. Adversarial camouflage: Hiding physical-world attacks with natural styles. In CVPR, 2020.
  24. Are we ready for autonomous driving? the kitti vision benchmark suite. In CVPR, 2012.
  25. Jacob Gildenblat. EigenCAM for YOLO5, 2022. https://jacobgil.github.io/pytorch-gradcam-book/EigenCAM%20for%20YOLO5.html.
  26. Security analysis of {{\{{Camera-LiDAR}}\}} fusion against {{\{{Black-Box}}\}} attacks on autonomous vehicles. In USENIX Security, 2022.
  27. Object removal attacks on lidar-based 3d object detectors. arXiv preprint arXiv:2102.03722, 2021.
  28. Deep residual learning for image recognition. In CVPR, 2016.
  29. Universal physical camouflage attacks on object detectors. In CVPR, 2020.
  30. Msmdfusion: Fusing lidar and camera at multiple scales with multi-depth seeds for 3d object detection. arXiv preprint arXiv:2209.03102, 2022.
  31. Pla-lidar: Physical laser attacks against lidar-based 3d object detection in autonomous vehicle. In S&P, pp.  1822–1839. IEEE, 2023.
  32. On single source robustness in deep fusion models. NeurIPS, 32, 2019.
  33. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  34. Joint 3d proposal generation and object detection from view aggregation. In IROS, 2018.
  35. Fred Lambert. Hacker shows what Tesla Full Self-Driving’s vision depth perception neural net can see, 2021. https://electrek.co/2021/07/07/hacker-tesla-full-self-drivings-vision -depth-perception-neural-net-can-see/.
  36. Delving into the devils of bird’s-eye-view perception: A review, evaluation and recipe. arXiv preprint arXiv:2209.05324, 2022a.
  37. Pointaugment: an auto-augmentation framework for point cloud classification. In CVPR, 2020.
  38. Unifying voxel-based representation with transformer for 3d object detection. In NeurIPS, 2022b.
  39. Fully sparse fusion for 3d object detection. arXiv preprint arXiv:2304.12310, 2023.
  40. Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers. ECCV, 2022c.
  41. Clusterfomer: Clustering as a universal visual learner. In NeurIPS, 2023.
  42. BEVFusion: A Simple and Robust LiDAR-Camera Fusion Framework. In NeurIPS, 2022.
  43. Sg-net: Spatial granularity network for one-stage video instance segmentation. In CVPR, 2021a.
  44. Slowlidar: Increasing the latency of lidar-based detection using adversarial examples. In CVPR, 2023a.
  45. Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV, 2021b.
  46. Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation. In ICRA, 2023b.
  47. Momenta. Momenta Autodrive. https://www.momenta.cn/en/.
  48. Motional. Nuscenes Object Detection Leaderboard, 2023. https://www.nuscenes.org/object-detection?externalData=all&mapData=all&modalities=Any.
  49. Sensor adversarial traits: Analyzing robustness of 3d object detection sensor fusion models. In ICIP, 2021.
  50. Pointnet: Deep learning on point sets for 3d classification and segmentation. In CVPR, 2017.
  51. Frustum pointnets for 3d object detection from rgb-d data. In CVPR, 2018.
  52. Shared reality: detecting stealthy attacks against autonomous vehicles. In Proceedings of the 2th Workshop on CPS&IoT Security and Privacy, 2021.
  53. Dirty road can attack: Security of deep learning based automated lane centering under physical-world attack. In USENIX Security 21, 2021.
  54. Grad-cam: Visual explanations from deep networks via gradient-based localization. In ICCV, 2017.
  55. Towards robust lidar-based perception in autonomous driving: General black-box adversarial sensor attack and countermeasures. In USENIX Security, 2020.
  56. Physically realizable adversarial examples for lidar object detection. In CVPR, 2020.
  57. Exploring adversarial robustness of multi-sensor perception systems in self driving. arXiv preprint arXiv:2101.06784, 2021.
  58. Attention is all you need. NeurIPS, 2017.
  59. Pointpainting: Sequential fusion for 3d object detection. In CVPR, 2020.
  60. Pointaugmenting: Cross-modal augmentation for 3d object detection. In CVPR, 2021.
  61. Score-cam: Score-weighted visual explanations for convolutional neural networks. In CVPR Workshop, 2020.
  62. Waymo. Waymo Tech. https://waymo.com/tech/.
  63. Adversarial examples for semantic segmentation and object detection. In ICCV, 2017.
  64. Second: Sparsely embedded convolutional detection. Sensors, 2018.
  65. Deepinteraction: 3d object detection via modality interaction. In NeurIPS, 2022.
  66. Multimodal virtual point 3d detection. NeurIPS, 2021.
  67. Junko Yoshida. Teardown: Tesla’s Hardware Retrofits for Model 3, 2020. https://www.eetasia.com/teslas-hardware-retrofits-for-model-3/.
  68. Evaluating adversarial attacks on driving safety in vision-based autonomous vehicles. IEEE Internet of Things Journal, 2021.
  69. On adversarial robustness of trajectory prediction for autonomous vehicles. In CVPR, 2022.
  70. Voxelnet: End-to-end learning for point cloud based 3d object detection. In CVPR, 2018.
  71. Understanding the robustness of 3d object detection with bird’s-eye-view representations in autonomous driving. In CVPR, 2023.
Citations (7)

Summary

We haven't generated a summary for this paper yet.