Papers
Topics
Authors
Recent
Search
2000 character limit reached

Towards Explaining Hypercomplex Neural Networks

Published 26 Mar 2024 in cs.CV | (2403.17929v1)

Abstract: Hypercomplex neural networks are gaining increasing interest in the deep learning community. The attention directed towards hypercomplex models originates from several aspects, spanning from purely theoretical and mathematical characteristics to the practical advantage of lightweight models over conventional networks, and their unique properties to capture both global and local relations. In particular, a branch of these architectures, parameterized hypercomplex neural networks (PHNNs), has also gained popularity due to their versatility across a multitude of application domains. Nonetheless, only few attempts have been made to explain or interpret their intricacies. In this paper, we propose inherently interpretable PHNNs and quaternion-like networks, thus without the need for any post-hoc method. To achieve this, we define a type of cosine-similarity transform within the parameterized hypercomplex domain. This PHB-cos transform induces weight alignment with relevant input features and allows to reduce the model into a single linear transform, rendering it directly interpretable. In this work, we start to draw insights into how this unique branch of neural models operates. We observe that hypercomplex networks exhibit a tendency to concentrate on the shape around the main object of interest, in addition to the shape of the object itself. We provide a thorough analysis, studying single neurons of different layers and comparing them against how real-valued networks learn. The code of the paper is available at https://github.com/ispamm/HxAI.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. N. Matsui, T. Isokawa, H. Kusamichi, F. Peper, and H. Nishimura, “Quaternion neural network with geometrical operators,” Journal of Intell. and Fuzzy Syst., vol. 15, no. 3-4, pp. 149–164, 2004.
  2. D. P. Mandic, C. Jahanchahi, and C. C. Took, “A quaternion gradient operator and its applications,” IEEE Signal Process. Lett., vol. 18, no. 1, pp. 47–50, 2010.
  3. T. Parcollet, M. Morchid, and G. Linarès, “Deep quaternion neural networks for spoken language understanding,” in IEEE Autom. Speech Recognit. and Understanding Workshop (ASRU), 2017, pp. 504–511.
  4. Z. Zheng et al., “Quaternion-valued correlation learning for few-shot semantic segmentation,” IEEE Trans. on Circuits and Syst. for Video Technol., vol. 33, no. 5, pp. 2102–2115, 2023.
  5. M. E. Valle, W. L. Vital, and G. Vieira, “Universal approximation theorem for vector- and hypercomplex-valued neural networks,” ArXiv preprint: arXiv2401.02277, 2024.
  6. D. P. Mandic et al., “The HR-calculus: Enabling information processing with quaternion algebra,” ArXiv preprint: arXiv:2311.16771, 2023.
  7. T. Parcollet, M. Morchid, and G. Linarès, “A survey of quaternion neural networks,” Artif. Intell. Rev., 2019.
  8. M. E. Valle and R. A. Lobo, “Hypercomplex-valued recurrent correlation neural networks,” Neurocomputing, vol. 432, pp. 111–123, 2021.
  9. C. Brignone, G. Mancini, E. Grassucci, A. Uncini, and D. Comminiello, “Efficient sound event localization and detection in the quaternion domain,” IEEE Trans. on Circuits and Syst. II: Express Brief, vol. 69, no. 5, pp. 2453–2457, 2022.
  10. E. Grassucci, A. Zhang, and D. Comminiello, “PHNNs: Lightweight neural networks via parameterized hypercomplex convolutions,” IEEE Trans. on Neural Netw. and Learn. Syst., pp. 1–13, 2022.
  11. I. I. Panagos, G. Sfikas, and C. Nikou, “Compressing audio visual speech recognition models with parameterized hypercomplex layers,” in Hellenic Conf. on Artif. Intell. (SETN).   Association for Computing Machinery, 2022.
  12. E. Lopez, E. Grassucci, M. Valleriani, and D. Comminiello, “Multi-view hypercomplex learning for breast cancer screening,” ArXiv preprint arXiv:2204.05798, 2023.
  13. R. R. Selvaraju et al., “Grad-CAM: Visual explanations from deep networks via gradient-based localization,” in IEEE Int. Conf. on Comput. Vision (ICCV), 2017, pp. 618–626.
  14. J. Adebayo, J. Gilmer, M. Muelly, I. Goodfellow, M. Hardt, and B. Kim, “Sanity checks for saliency maps,” Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 31, 2018.
  15. M. Böhle, M. Fritz, and B. Schiele, “B-cos networks: Alignment is all we need for interpretability,” in IEEE/CVF Conf. on Comput. Vision and Pattern Recognit. (CVPR), 2022, pp. 10 329–10 338.
  16. J. Howard and S. Gugger, “Fastai: A layered API for deep learning,” Information, vol. 11, no. 2, p. 108, 2020.
  17. K. Pogorelov et al., “KVASIR: A multi-class image dataset for computer aided gastrointestinal disease detection,” in ACM Multimedia Syst. Conf. (MMSys), 2017, pp. 164–169.
  18. E. Grassucci, L. Sigillo, A. Uncini, and D. Comminiello, “GROUSE: A task and model agnostic wavelet-driven framework for medical imaging,” IEEE Signal Process. Letters, 2023.
  19. L. Sigillo, E. Grassucci, A. Uncini, and D. Comminiello, “Generalizing medical image representations via quaternion wavelet networks,” ArXiv preprint arXiv:2310.10224, 2023.
  20. S. Qin, X. Zhang, H. Xu, and Y. Xu, “Fast quaternion product units for learning disentangled representations in SO(3),” IEEE Trans. on Pattern Anal. and Mach. Intell., pp. 1–17, 2022.
  21. J. Navarro-Moreno, R. M. Fernández-Alcalá, J. D. Jiménez-López, and J. C. Ruiz-Molina, “Tessarine signal processing under the t-properness condition,” Journal of the Franklin Institute, vol. 357, no. 14, pp. 10 100–10 126, 2020.
  22. G. Vieira, E. Grassucci, M. E. Valle, and D. Comminiello, “Dual quaternion rotational and translational equivariance in 3D rigid motion modelling,” in IEEE Int. Workshop on Mach. Learn. for Signal Process. (MLSP), Rome, Italy, 2023.
  23. E. Grassucci, G. Mancini, C. Brignone, A. Uncini, and D. Comminiello, “Dual quaternion ambisonics array for six-degree-of-freedom acoustic representation,” Pattern Recognit. Letters, vol. 166, pp. 24–30, 2023.
  24. A. Parada-Mayorga, L. Butler, and A. Ribeiro, “Convolutional filters and neural networks with non commutative algebras,” IEEE Trans. on Signal Process., vol. PP, pp. 1–16, 01 2023.
  25. T. Nitta, M. Kobayashi, and D. P. Mandic, “Hypercomplex widely linear estimation through the lens of underpinning geometry,” IEEE Trans. on Signal Process., vol. 67, no. 15, pp. 3985–3994, 2019.
  26. A. Zhang et al., “Beyond fully-connected layers with quaternions: Parameterization of hypercomplex multiplications with 1/n1𝑛1/n1 / italic_n parameters,” Int. Conf. on Mach. Learn. (ICML), 2021.
  27. T. Le, M. Bertolini, F. Noé, and D.-A. Clevert, “Parameterized hypercomplex graph neural networks for graph classification,” in Artif. Neural Netw. and Mach. Learn. (ICANN), I. Farkaš, P. Masulli, S. Otte, and S. Wermter, Eds.   Cham: Springer International Publishing, 2021, pp. 204–216.
  28. E. Grassucci, L. Sigillo, A. Uncini, and D. Comminiello, “Hypercomplex image-to-image translation,” in Int. Joint Conf. on Neural Netw. (IJCNN), 2022, pp. 1–8.
  29. M. Mancanelli, E. Grassucci, A. Uncini, and D. Comminiello, “PHYDI: Initializing parameterized hypercomplex neural networks as identity functions,” in IEEE Int. Workshop on Mach. Learn. for Signal Process. (MLSP), Rome, Italy, Sep. 2023, pp. 1–6.
  30. E. Lopez, E. Chiarantano, E. Grassucci, and D. Comminiello, “Hypercomplex multimodal emotion recognition from eeg and peripheral physiological signals,” in IEEE Int. Conf. on Acoust., Speech, and Signal Process. Workshops (ICASSPW), 2023.
  31. E. Lopez, F. Betello, F. Carmignani, E. Grassucci, and D. Comminiello, “Attention-map augmentation for hypercomplex breast cancer classification,” ArXiv preprint arXiv:2310.07633, 2023.
  32. I. E. Nielsen, D. Dera, G. Rasool, R. P. Ramachandran, and N. C. Bouaynaya, “Robust explainability: A tutorial on gradient-based attribution methods for deep neural networks,” IEEE Signal Process. Mag., vol. 39, no. 4, pp. 73–84, 2022.
  33. G. Montavon, A. Binder, S. Lapuschkin, W. Samek, and K.-R. Müller, “Layer-wise relevance propagation: An overview,” Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, pp. 193–209, 2019.
  34. R. Achtibat et al., “From attribution maps to human-understandable explanations through concept relevance propagation,” Nature Mach. Intell., vol. 5, no. 9, pp. 1006–1019, 2023.
  35. M. T. Ribeiro, S. Singh, and C. Guestrin, “"Why should I trust you?": Explaining the predictions of any classifier,” in Int. Conf. Knowl. Discovery and Data Mining (SIGKDD), 2016, pp. 1135–1144.
  36. T. Laugel, M.-J. Lesot, C. Marsala, X. Renard, and M. Detyniecki, “Unjustified classification regions and counterfactual explanations in machine learning,” in Mach. Learn. and Knowl. Discovery in Databases: Eur. Conf. (ECML PKDD).   Springer, 2020, pp. 37–54.
  37. Q. Zhang, Y. N. Wu, and S.-C. Zhu, “Interpretable convolutional neural networks,” in IEEE/CVF Conf. on Comput. Vision and Pattern Recognit. (CVPR), 2018, pp. 8827–8836.
  38. C.-C. J. Kuo, M. Zhang, S. Li, J. Duan, and Y. Chen, “Interpretable convolutional neural networks via feedforward design,” Journal of Visual Commun. and Image Representation, vol. 60, pp. 346–359, 2019.
  39. C. Chen et al., “This looks like that: Deep learning for interpretable image recognition,” Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 32, 2019.
  40. M. Bohle, M. Fritz, and B. Schiele, “Convolutional dynamic alignment networks for interpretable classifications,” in IEEE/CVF Conf. on Comput. Vision and Pattern Recognit. (CVPR), 2021, pp. 10 029–10 038.
  41. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE/CVF Conf. on Comput. Vision and Pattern Recognit. (CVPR), 2016, pp. 770–778.
  42. G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in IEEE/CVF Conf. on Comput. Vision and Pattern Recognit. (CVPR), 2017, pp. 4700–4708.
  43. E. D. Cubuk, B. Zoph, J. Shlens, and Q. V. Le, “RandAugment: Practical automated data augmentation with a reduced search space,” in IEEE/CVF Conf. on Comput. Vision and Pattern Recognit. (CVPR), 2020, pp. 702–703.
  44. I. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, and Y. Bengio, “Maxout networks,” in Int. Conf. on Mach. Learn. (ICML).   PMLR, 2013, pp. 1319–1327.
Citations (2)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 16 likes about this paper.