Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Hierarchical Invariance for Robust and Interpretable Vision Tasks at Larger Scales (2402.15430v2)

Published 23 Feb 2024 in cs.CV and cs.LG

Abstract: Developing robust and interpretable vision systems is a crucial step towards trustworthy artificial intelligence. In this regard, a promising paradigm considers embedding task-required invariant structures, e.g., geometric invariance, in the fundamental image representation. However, such invariant representations typically exhibit limited discriminability, limiting their applications in larger-scale trustworthy vision tasks. For this open problem, we conduct a systematic investigation of hierarchical invariance, exploring this topic from theoretical, practical, and application perspectives. At the theoretical level, we show how to construct over-complete invariants with a Convolutional Neural Networks (CNN)-like hierarchical architecture yet in a fully interpretable manner. The general blueprint, specific definitions, invariant properties, and numerical implementations are provided. At the practical level, we discuss how to customize this theoretical framework into a given task. With the over-completeness, discriminative features w.r.t. the task can be adaptively formed in a Neural Architecture Search (NAS)-like manner. We demonstrate the above arguments with accuracy, invariance, and efficiency results on texture, digit, and parasite classification experiments. Furthermore, at the application level, our representations are explored in real-world forensics tasks on adversarial perturbations and Artificial Intelligence Generated Content (AIGC). Such applications reveal that the proposed strategy not only realizes the theoretically promised invariance, but also exhibits competitive discriminability even in the era of deep learning. For robust and interpretable vision tasks at larger scales, hierarchical invariant representation can be considered as an effective alternative to traditional CNN and invariants.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (81)
  1. J. M. Wing, “Trustworthy AI,” Commun. ACM, vol. 64, no. 10, pp. 64–71, 2021.
  2. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015.
  3. D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton et al., “Mastering the game of go without human knowledge,” Nature, vol. 550, no. 7676, pp. 354–359, 2017.
  4. H. Wang, T. Fu, Y. Du, W. Gao, K. Huang, Z. Liu, P. Chandak, S. Liu, P. Van Katwyk, A. Deac et al., “Scientific discovery in the age of artificial intelligence,” Nature, vol. 620, no. 7972, pp. 47–60, 2023.
  5. K. Sundararajan and D. L. Woodard, “Deep learning for biometrics: A survey,” ACM Comput. Surv., vol. 51, no. 3, pp. 1–34, 2018.
  6. S. Warnat-Herresthal, H. Schultze, K. L. Shastry, S. Manamohan, S. Mukherjee, V. Garg, R. Sarveswara, K. Händler, P. Pickkers, N. A. Aziz et al., “Swarm learning for decentralized and confidential clinical machine learning,” Nature, vol. 594, no. 7862, pp. 265–270, 2021.
  7. S. Feng, H. Sun, X. Yan, H. Zhu, Z. Zou, S. Shen, and H. X. Liu, “Dense reinforcement learning for safety validation of autonomous vehicles,” Nature, vol. 615, no. 7953, pp. 620–627, 2023.
  8. F. Juefei-Xu, R. Wang, Y. Huang, Q. Guo, L. Ma, and Y. Liu, “Countering malicious deepfakes: Survey, battleground, and horizon,” Int. J. Comput. Vis., vol. 130, no. 7, pp. 1678–1734, 2022.
  9. H. Liu, M. Chaudhary, and H. Wang, “Towards trustworthy and aligned machine learning: A data-centric survey with causality perspectives,” arXiv preprint arXiv:2307.16851, 2023.
  10. M. M. Bronstein, J. Bruna, T. Cohen, and P. Veličković, “Geometric deep learning: Grids, groups, graphs, geodesics, and gauges,” arXiv preprint arXiv:2104.13478, 2021.
  11. Y. Bengio, A. Courville, and P. Vincent, “Representation learning: A review and new perspectives,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 8, pp. 1798–1828, 2013.
  12. D. Ulyanov, A. Vedaldi, and V. Lempitsky, “Deep image prior,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 9446–9454.
  13. F. Klein, “A comparative review of recent researches in geometry,” Bull. Am. Math. Soc., vol. 2, no. 10, pp. 215–249, 1893.
  14. K. Fukushima and S. Miyake, “Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in position,” Pattern Recognit., vol. 15, no. 6, pp. 455–469, 1982.
  15. V. Balntas, K. Lenc, A. Vedaldi, T. Tuytelaars, J. Matas, and K. Mikolajczyk, “H-Patches: A benchmark and evaluation of handcrafted and learned local descriptors.” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 11, pp. 2825–2841, 2019.
  16. S. Qi, Y. Zhang, C. Wang, J. Zhou, and X. Cao, “A survey of orthogonal moments for image representation: theory, implementation, and evaluation,” ACM Comput. Surv., vol. 55, no. 1, pp. 1–35, 2021.
  17. D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vis., vol. 60, pp. 91–110, 2004.
  18. E. Tola, V. Lepetit, and P. Fua, “Daisy: An efficient dense descriptor applied to wide-baseline stereo,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 5, pp. 815–830, 2009.
  19. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein et al., “ImageNet large scale visual recognition challenge,” Int. J. Comput. Vis., vol. 115, pp. 211–252, 2015.
  20. R. Zhang, “Making convolutional networks shift-invariant again,” in Proc. Int. Conf. Mach. Learn., 2019, pp. 7324–7334.
  21. C. Buckner, “Understanding adversarial examples requires a theory of artefacts for deep learning,” Nature Mach. Intell., vol. 2, no. 12, pp. 731–736, 2020.
  22. M. Taddeo, T. McCutcheon, and L. Floridi, “Trusting artificial intelligence in cybersecurity is a double-edged sword,” Nature Mach. Intell., vol. 1, no. 12, pp. 557–560, 2019.
  23. J. Bruna and S. Mallat, “Invariant scattering convolution networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 8, pp. 1872–1886, 2013.
  24. L. Sifre and S. Mallat, “Rotation, scaling and deformation invariant scattering for texture discrimination,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2013, pp. 1233–1240.
  25. T. Wiatowski and H. Bölcskei, “A mathematical theory of deep convolutional neural networks for feature extraction,” IEEE Trans. Inf. Theory, vol. 64, no. 3, pp. 1845–1866, 2017.
  26. E. Oyallon, S. Zagoruyko, G. Huang, N. Komodakis, S. Lacoste-Julien, M. Blaschko, and E. Belilovsky, “Scattering networks for hybrid representation learning,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 9, pp. 2208–2221, 2018.
  27. J. Andén and S. Mallat, “Deep scattering spectrum,” IEEE Trans. Signal Process., vol. 62, no. 16, pp. 4114–4128, 2014.
  28. X. Chen, X. Cheng, and S. Mallat, “Unsupervised deep Haar scattering on graphs,” Proc. Adv. Neural Inf. Process. Syst., vol. 27, 2014.
  29. S. Yu, “Evolving scattering networks for engineering disorder,” Nature Comput. Sci., vol. 3, no. 2, pp. 128–138, 2023.
  30. S. Cheng, Y.-S. Ting, B. Ménard, and J. Bruna, “A new approach to observational cosmology using the scattering transform,” Mon. Not. R. Astron. Soc., vol. 499, no. 4, pp. 5902–5914, 2020.
  31. T. Cohen and M. Welling, “Group equivariant convolutional networks,” in Proc. Int. Conf. Mach. Learn., 2016, pp. 2990–2999.
  32. T. S. Cohen and M. Welling, “Steerable CNNs,” in Proc. Int. Conf. Learn. Representations, 2016.
  33. D. E. Worrall, S. J. Garbin, D. Turmukhambetov, and G. J. Brostow, “Harmonic networks: Deep translation and rotation equivariance,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 5028–5037.
  34. M. Weiler, F. A. Hamprecht, and M. Storath, “Learning steerable filters for rotation equivariant CNNs,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 849–858.
  35. I. Sosnovik, M. Szmaja, and A. Smeulders, “Scale-equivariant steerable networks,” in Proc. Int. Conf. Learn. Representations, 2019.
  36. D. Worrall and M. Welling, “Deep scale-spaces: Equivariance over scale,” Proc. Adv. Neural Inf. Process. Syst., vol. 32, 2019.
  37. Z. Sun and T. Blu, “Empowering networks with scale and rotation equivariance using a similarity convolution,” in Proc. Int. Conf. Learn. Representations, 2022.
  38. M. Finzi, S. Stanton, P. Izmailov, and A. G. Wilson, “Generalizing convolutional neural networks for equivariance to Lie groups on arbitrary continuous data,” in Proc. Int. Conf. Mach. Learn., 2020, pp. 3165–3176.
  39. E. J. Bekkers, “B-spline CNNs on Lie groups,” in Proc. Int. Conf. Learn. Representations, 2019.
  40. Q. Xie, Q. Zhao, Z. Xu, and D. Meng, “Fourier series expansion based filter parametrization for equivariant convolutions,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 4, pp. 4537–4551, 2022.
  41. K. Atz, F. Grisoni, and G. Schneider, “Geometric deep learning on molecular representations,” Nature Mach. Intell., vol. 3, no. 12, pp. 1023–1032, 2021.
  42. R. J. Townshend, S. Eismann, A. M. Watkins, R. Rangan, M. Karelina, R. Das, and R. O. Dror, “Geometric deep learning of RNA structure,” Science, vol. 373, no. 6558, pp. 1047–1051, 2021.
  43. I. Goodfellow, P. McDaniel, and N. Papernot, “Making machine learning robust against adversarial inputs,” Commun. ACM, vol. 61, no. 7, pp. 56–66, 2018.
  44. F. Zhan, Y. Yu, R. Wu, J. Zhang, S. Lu, L. Liu, A. Kortylewski, C. Theobalt, and E. Xing, “Multimodal image synthesis and editing: The generative AI era,” IEEE Trans. Pattern Anal. Mach. Intell., 2023.
  45. S. Qi, Y. Zhang, C. Wang, J. Zhou, and X. Cao, “A principled design of image representation: Towards forensic tasks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 5, pp. 5337–5354, 2022.
  46. K. Lenc and A. Vedaldi, “Understanding image representations by measuring their equivariance and equivalence,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 991–999.
  47. D. Marcos, M. Volpi, N. Komodakis, and D. Tuia, “Rotation equivariant vector field networks,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 5048–5057.
  48. S. Qi, Y. Zhang, C. Wang, T. Xiang, X. Cao, and Y. Xiang, “Representing noisy image without denoising,” arXiv preprint arXiv:2301.07409, 2023.
  49. P.-T. Yap, X. Jiang, and A. C. Kot, “Two-dimensional polar harmonic transforms for invariant image representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 7, pp. 1259–1270, 2009.
  50. G. Bender, P.-J. Kindermans, B. Zoph, V. Vasudevan, and Q. Le, “Understanding and simplifying one-shot architecture search,” in Proc. Int. Conf. Mach. Learn., 2018, pp. 550–559.
  51. M. Guo, Y. Yang, R. Xu, Z. Liu, and D. Lin, “When NAS meets robustness: In search of robust architectures against adversarial attacks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 631–640.
  52. S. G. Mallat, “A theory for multiresolution signal decomposition: The wavelet representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 11, no. 7, pp. 674–693, 1989.
  53. P.-T. Yap, R. Paramesran, and S.-H. Ong, “Image analysis by Krawtchouk moments,” IEEE Trans. Image Process., vol. 12, no. 11, pp. 1367–1377, 2003.
  54. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Commun. ACM, vol. 60, no. 6, pp. 84–90, 2017.
  55. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
  56. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 1–9.
  57. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778.
  58. G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 4700–4708.
  59. C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 2818–2826.
  60. M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenet-v2: Inverted residuals and linear bottlenecks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 4510–4520.
  61. R. Feinman, R. R. Curtin, S. Shintre, and A. B. Gardner, “Detecting adversarial samples from artifacts,” arXiv preprint arXiv:1703.00410, 2017.
  62. S. Liang, Y. Li, and R. Srikant, “Enhancing the reliability of out-of-distribution image detection in neural networks,” in Proc. Int. Conf. Learn. Representations, 2018.
  63. B. Liang, H. Li, M. Su, X. Li, W. Shi, and X. Wang, “Detecting adversarial image examples in deep neural networks with adaptive noise reduction,” IEEE Trans. Dependable Secure Comput., vol. 18, no. 1, pp. 72–85, 2018.
  64. G. Goswami, A. Agarwal, N. Ratha, R. Singh, and M. Vatsa, “Detecting and mitigating adversarial perturbations for robust face recognition,” Int. J. Comput. Vis., vol. 127, pp. 719–742, 2019.
  65. J. Liu, W. Zhang, Y. Zhang, D. Hou, Y. Liu, H. Zha, and N. Yu, “Detection based defense against adversarial examples from the steganalysis point of view,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 4825–4834.
  66. A. Agarwal, R. Singh, M. Vatsa, and N. Ratha, “Image transformation-based defense against adversarial perturbation on deep learning models,” IEEE Trans. Dependable Secure Comput., vol. 18, no. 5, pp. 2106–2121, 2020.
  67. C. Wang, S. Qi, Z. Huang, Y. Zhang, R. Lan, and X. Cao, “Towards an accurate and secure detector against adversarial perturbations,” arXiv preprint arXiv:2305.10856, 2023.
  68. Y. Qian, G. Yin, L. Sheng, Z. Chen, and J. Shao, “Thinking in frequency: Face forgery detection by mining frequency-aware clues,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 86–103.
  69. S.-Y. Wang, O. Wang, R. Zhang, A. Owens, and A. A. Efros, “CNN-generated images are surprisingly easy to spot… for now,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 8695–8704.
  70. Z. Liu, X. Qi, and P. H. Torr, “Global texture enhancement for fake face detection in the wild,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 8060–8069.
  71. A. Kurakin, I. J. Goodfellow, and S. Bengio, “Adversarial examples in the physical world,” in Artificial Intelligence Safety and Security.   Chapman and Hall/CRC, 2018, pp. 99–112.
  72. N. Carlini and D. Wagner, “Towards evaluating the robustness of neural networks,” in Proc. IEEE Symp. Secur. Privacy, 2017, pp. 39–57.
  73. S. Chen, Z. He, C. Sun, J. Yang, and X. Huang, “Universal adversarial attack on attention and the resulting dataset DamageNet,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 4, pp. 2188–2197, 2020.
  74. I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” arXiv preprint arXiv:1412.6572, 2014.
  75. A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” in Proc. Int. Conf. Learn. Representations, 2018.
  76. S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard, “Universal adversarial perturbations,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 1765–1773.
  77. P. Dhariwal and A. Nichol, “Diffusion models beat GANs on image synthesis,” Proc. Adv. Neural Inf. Process. Syst., vol. 34, pp. 8780–8794, 2021.
  78. A. Brock, J. Donahue, and K. Simonyan, “Large scale GAN training for high fidelity natural image synthesis,” arXiv preprint arXiv:1809.11096, 2018.
  79. A. Nichol, P. Dhariwal, A. Ramesh, P. Shyam, P. Mishkin, B. McGrew, I. Sutskever, and M. Chen, “Glide: Towards photorealistic image generation and editing with text-guided diffusion models,” arXiv preprint arXiv:2112.10741, 2021.
  80. R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 10 684–10 695.
  81. S. Gu, D. Chen, J. Bao, F. Wen, B. Zhang, D. Chen, L. Yuan, and B. Guo, “Vector quantized diffusion model for text-to-image synthesis,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 10 696–10 706.

Summary

We haven't generated a summary for this paper yet.