Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the Adversarial Robustness of Camera-based 3D Object Detection (2301.10766v2)

Published 25 Jan 2023 in cs.CV and cs.AI

Abstract: In recent years, camera-based 3D object detection has gained widespread attention for its ability to achieve high performance with low computational cost. However, the robustness of these methods to adversarial attacks has not been thoroughly examined, especially when considering their deployment in safety-critical domains like autonomous driving. In this study, we conduct the first comprehensive investigation of the robustness of leading camera-based 3D object detection approaches under various adversarial conditions. We systematically analyze the resilience of these models under two attack settings: white-box and black-box; focusing on two primary objectives: classification and localization. Additionally, we delve into two types of adversarial attack techniques: pixel-based and patch-based. Our experiments yield four interesting findings: (a) bird's-eye-view-based representations exhibit stronger robustness against localization attacks; (b) depth-estimation-free approaches have the potential to show stronger robustness; (c) accurate depth estimation effectively improves robustness for depth-estimation-based methods; (d) incorporating multi-frame benign inputs can effectively mitigate adversarial attacks. We hope our findings can steer the development of future camera-based object detection models with enhanced adversarial robustness.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Are transformers more robust than cnns? Advances in Neural Information Processing Systems, 34:26831–26843, 2021.
  2. Adversarial patch. arXiv preprint arXiv:1712.09665, 2017.
  3. nuscenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  11621–11631, 2020.
  4. Invisible for both camera and lidar: Security of multi-sensor fusion based perception in autonomous driving under physical-world attacks. In 2021 IEEE Symposium on Security and Privacy (SP), pp. 176–194. IEEE, 2021.
  5. Towards evaluating the robustness of neural networks. In 2017 ieee symposium on security and privacy (sp), pp. 39–57. Ieee, 2017.
  6. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In International conference on machine learning, pp. 2206–2216. PMLR, 2020.
  7. Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE conference on computer vision and pattern recognition, pp.  3354–3361. IEEE, 2012.
  8. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
  9. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  770–778, 2016.
  10. Bevdet4d: Exploit temporal cues in multi-camera 3d object detection. arXiv preprint arXiv:2203.17054, 2022.
  11. Bevdet: High-performance multi-camera 3d object detection in bird-eye-view. arXiv preprint arXiv:2112.11790, 2021.
  12. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  13. Pointpillars: Fast encoders for object detection from point clouds. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  12697–12705, 2019.
  14. An energy and gpu-computation efficient backbone network for real-time object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp.  0–0, 2019.
  15. Bevdepth: Acquisition of reliable depth for multi-view 3d object detection. arXiv preprint arXiv:2206.10092, 2022a.
  16. Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers. arXiv preprint arXiv:2203.17270, 2022b.
  17. Dpatch: An adversarial patch attack on object detectors. arXiv preprint arXiv:1806.02299, 2018.
  18. Petr: Position embedding transformation for multi-view 3d object detection. arXiv preprint arXiv:2203.05625, 2022a.
  19. Petrv2: A unified framework for 3d perception from multi-camera images. arXiv preprint arXiv:2206.01256, 2022b.
  20. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  10012–10022, 2021.
  21. Vision-centric bev perception: A survey. arXiv preprint arXiv:2208.02797, 2022.
  22. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
  23. Universal adversarial perturbations. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  1765–1773, 2017.
  24. Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d. In European Conference on Computer Vision, pp.  194–210. Springer, 2020.
  25. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28, 2015.
  26. On the real-world adversarial robustness of real-time semantic segmentation models for autonomous driving. arXiv preprint arXiv:2201.01850, 2022.
  27. On the adversarial robustness of vision transformers. arXiv preprint arXiv:2103.15670, 2021.
  28. Scalability in perception for autonomous driving: Waymo open dataset. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  2446–2454, 2020.
  29. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
  30. Fcos: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF international conference on computer vision, pp.  9627–9636, 2019.
  31. Physically realizable adversarial examples for lidar object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  13716–13725, 2020.
  32. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  33. Pointpainting: Sequential fusion for 3d object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  4604–4612, 2020.
  34. Fcos3d: Fully convolutional one-stage monocular 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  913–922, 2021.
  35. Probabilistic and geometric depth: Detecting objects in perspective. In Conference on Robot Learning, pp.  1475–1485. PMLR, 2022a.
  36. Detr3d: 3d object detection from multi-view images via 3d-to-2d queries. In Conference on Robot Learning, pp.  180–191. PMLR, 2022b.
  37. Adversarial robustness under long-tailed distribution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  8659–8668, 2021.
  38. Adversarial examples for semantic segmentation and object detection. In Proceedings of the IEEE international conference on computer vision, pp.  1369–1378, 2017.
  39. Second: Sparsely embedded convolutional detection. Sensors, 18(10):3337, 2018.
  40. Voxelnet: End-to-end learning for point cloud based 3d object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  4490–4499, 2018.
  41. Class-balanced grouping and sampling for point cloud 3d object detection. arXiv preprint arXiv:1908.09492, 2019.
  42. Understanding the robustness of 3d object detection with bird’s-eye-view representations in autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  21600–21610, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Shaoyuan Xie (10 papers)
  2. Zichao Li (36 papers)
  3. Zeyu Wang (137 papers)
  4. Cihang Xie (91 papers)
Citations (14)