Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multimodal Attack Detection for Action Recognition Models (2404.10790v1)

Published 13 Apr 2024 in cs.CR, cs.AI, cs.CV, and cs.LG

Abstract: Adversarial machine learning attacks on video action recognition models is a growing research area and many effective attacks were introduced in recent years. These attacks show that action recognition models can be breached in many ways. Hence using these models in practice raises significant security concerns. However, there are very few works which focus on defending against or detecting attacks. In this work, we propose a novel universal detection method which is compatible with any action recognition model. In our extensive experiments, we show that our method consistently detects various attacks against different target models with high true positive rates while satisfying very low false positive rates. Tested against four state-of-the-art attacks targeting four action recognition models, the proposed detector achieves an average AUC of 0.911 over 16 test cases while the best performance achieved by the existing detectors is 0.645 average AUC. This 41.2% improvement is enabled by the robustness of the proposed detector to varying attack methods and target models. The lowest AUC achieved by our detector across the 16 test cases is 0.837 while the competing detector's performance drops as low as 0.211. We also show that the proposed detector is robust to varying attack strengths. In addition, we analyze our method's real-time performance with different hardware setups to demonstrate its potential as a practical defense mechanism.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Towards evaluating the robustness of neural networks. In 2017 ieee symposium on security and privacy (sp), pages 39–57. Ieee, 2017.
  2. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In Proceedings of the 10th ACM workshop on artificial intelligence and security, pages 15–26, 2017.
  3. A study of the effect of jpg compression on adversarial images. arXiv preprint arXiv:1608.00853, 2016.
  4. Multiscale vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6824–6835, 2021.
  5. Christoph Feichtenhofer. X3d: Expanding architectures for efficient video recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 203–213, 2020.
  6. Slowfast networks for video recognition. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6202–6211, 2019.
  7. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
  8. Just one moment: Structural vulnerability of deep action recognition against one frame attack. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7668–7676, 2021.
  9. Temporal shuffling for defending deep action recognition models against adversarial attacks. Neural Networks, 2023.
  10. Black-box adversarial attacks with limited queries and information. In International Conference on Machine Learning, pages 2137–2146. PMLR, 2018.
  11. Fooling detection alone is not enough: Adversarial attack against multiple object tracking. In International Conference on Learning Representations (ICLR’20), 2020.
  12. The kinetics human action video dataset. arXiv preprint arXiv:1705.06950, 2017.
  13. On physical adversarial patches for object detection. arXiv preprint arXiv:1906.11897, 2019.
  14. Adversarial attacks on black box video classifiers: Leveraging the power of geometric transformations. Advances in Neural Information Processing Systems, 34, 2021.
  15. Exploring target representations for masked autoencoders. arXiv preprint arXiv:2209.03917, 2022.
  16. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
  17. Sequential architecture-agnostic black-box attack design and analysis. Pattern Recognition, page 110066, 2023.
  18. Adversarial machine learning attacks against video anomaly detection systems. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 206–213, 2022.
  19. Expanding language-image pretrained models for general video recognition. In European Conference on Computer Vision, pages 1–18. Springer, 2022.
  20. Defense against adversarial attacks with efficient frequency-adaptive compression and reconstruction. Pattern Recognition, 138:109382, 2023.
  21. Imagenet-patch: A dataset for benchmarking machine learning robustness against adversarial patches. Pattern Recognition, 134:109064, 2023.
  22. Pixle: a fast and effective black-box attack based on rearranging pixels. In 2022 International Joint Conference on Neural Networks (IJCNN), pages 1–7. IEEE, 2022.
  23. Over-the-air adversarial flickering attacks against video recognition networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 515–524, 2021.
  24. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  25. Video classification with channel-separated convolutional networks. In Proceedings of the IEEE/CVF international conference on computer vision, pages 5552–5561, 2019.
  26. Adversarial risk and the dangers of evaluating against weak attacks. In International Conference on Machine Learning, pages 5025–5034. PMLR, 2018.
  27. Heuristic black-box adversarial attacks on video recognition models. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 12338–12345, 2020.
  28. Towards transferable adversarial attacks on vision transformers. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 2668–2676, 2022.
  29. Bidirectional cross-modal knowledge exploration for video recognition with pre-trained vision-language models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6620–6630, 2023.
  30. Advit: Adversarial frames identifier based on temporal consistency in videos. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3968–3977, 2019.
  31. Mitigating adversarial effects through randomization. arXiv preprint arXiv:1711.01991, 2017.
  32. Sparse black-box video attack with reinforcement learning. arXiv preprint arXiv:2001.03754, 2020.
  33. Advmask: A sparse adversarial attack-based data augmentation method for image classification. Pattern Recognition, page 109847, 2023.
  34. Improving adversarial robustness by learning shared information. Pattern Recognition, 134:109054, 2023.
  35. Delving into clip latent space for video anomaly recognition. arXiv preprint arXiv:2310.02835, 2023.
  36. Glipv2: Unifying localization and vision-language understanding. Advances in Neural Information Processing Systems, 35:36067–36080, 2022.
  37. Seeing isn’t believing: Towards more robust adversarial attack against real world object detectors. In Proceedings of the 2019 ACM SIGSAC conference on computer and communications security, pages 1989–2004, 2019.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com