Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Temporal Shuffling for Defending Deep Action Recognition Models against Adversarial Attacks (2112.07921v2)

Published 15 Dec 2021 in cs.CV

Abstract: Recently, video-based action recognition methods using convolutional neural networks (CNNs) achieve remarkable recognition performance. However, there is still lack of understanding about the generalization mechanism of action recognition models. In this paper, we suggest that action recognition models rely on the motion information less than expected, and thus they are robust to randomization of frame orders. Furthermore, we find that motion monotonicity remaining after randomization also contributes to such robustness. Based on this observation, we develop a novel defense method using temporal shuffling of input videos against adversarial attacks for action recognition models. Another observation enabling our defense method is that adversarial perturbations on videos are sensitive to temporal destruction. To the best of our knowledge, this is the first attempt to design a defense method without additional training for 3D CNN-based video action recognition models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. Adversarial patch defense for optical flow networks in video action recognition, in: Proceedings of the IEEE International Conference on Machine Learning and Application.
  2. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples, in: Proceedings of the International Conference on Machine Learning.
  3. Synthesizing robust adversarial examples, in: Proceedings of the International Conference on Machine Learning.
  4. Deep convolutional networks do not classify based on global object shape. PLoS Computational Biology 14, e1006613.
  5. Is space-time attention all you need for video understanding?, in: Proceedings of the International Conference on Machine Learning.
  6. Mitigating evasion attacks to deep neural networks via region-based classification, in: Proceedings of the 33rd Annual Computer Security Applications Conference.
  7. Quo vadis, action recognition? a new model and the kinetics dataset, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
  8. Certified adversarial robustness via randomized smoothing, in: Proceedings of the International Conference on Machine Learning.
  9. Openmmlab’s next generation video understanding toolbox and benchmark. https://github.com/open-mmlab/mmaction2.
  10. Long-term recurrent convolutional networks for visual recognition and description, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
  11. A study of the effect of JPG compression on adversarial images. arXiv:1608.00853 .
  12. X3D: Expanding architectures for efficient video recognition, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
  13. Slowfast networks for video recognition, in: Proceedings of the IEEE International Conference on Computer Vision.
  14. Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness, in: Proceedings of the International Conference on Learning Representations.
  15. Explaining and harnessing adversarial examples, in: Proceedings of the International Conference on Learning Representations.
  16. Just one moment: structural vulnerability of deep action recognition against one frame attack, in: Proceedings of the IEEE/CVF International Conference on Computer Vision.
  17. 3d convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 221–231.
  18. The kinetics human action video dataset. arXiv:1705.06950 .
  19. Adam: A method for stochastic optimization, in: Proceedings of the International Conference for Learning Representations.
  20. Adversarial examples in the physical world, in: Proceedings of the International Conference on Learning Representations Workshop.
  21. Stealthy adversarial perturbations against real-time video classification systems, in: Proceedings of the Network and Distributed System Security Symposium.
  22. Towards robust neural networks via random self-ensemble, in: Proceedings of the European Conference on Computer Vision.
  23. Defending against multiple and unforeseen adversarial videos. IEEE Transactions on Image Processing .
  24. Action transformer: A self-attention model for short-time pose-based human action recognition. Pattern Recognition 124, 108487.
  25. Over-the-air adversarial flickering attacks against video recognition networks, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
  26. Denoised smoothing: A provable defense for pretrained classifiers. Proceedings of the Advances in Neural Information Processing Systems .
  27. Not using the car to see the sidewalk–quantifying and controlling the effects of context in classification and segmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
  28. On adaptive attacks to adversarial example defenses, in: Advances in Neural Information Processing Systems.
  29. Learning spatiotemporal features with 3d convolutional networks, in: Proceedings of the IEEE International Conference on Computer Vision.
  30. Video classification with channel-separated convolutional networks, in: Proceedings of the IEEE International Conference on Computer Vision.
  31. High-frequency component helps explain the generalization of convolutional neural networks, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
  32. Non-local neural networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
  33. Sparse adversarial perturbations for videos, in: Proceedings of the AAAI Conference on Artificial Intelligence.
  34. Noise or signal: the role of image backgrounds in object recognition, in: Proceedings of the International Conference on Learning Representations.
  35. Mitigating adversarial effects through randomization, in: Proceedings of the International Conference on Learning Representations.
  36. Object recognition with and without objects, in: Proceedings of the International Joint Conference on Artificial Intelligence.
Citations (3)

Summary

We haven't generated a summary for this paper yet.