OrthCaps: An Orthogonal CapsNet with Sparse Attention Routing and Pruning (2403.13351v1)
Abstract: Redundancy is a persistent challenge in Capsule Networks (CapsNet),leading to high computational costs and parameter counts. Although previous works have introduced pruning after the initial capsule layer, dynamic routing's fully connected nature and non-orthogonal weight matrices reintroduce redundancy in deeper layers. Besides, dynamic routing requires iterating to converge, further increasing computational demands. In this paper, we propose an Orthogonal Capsule Network (OrthCaps) to reduce redundancy, improve routing performance and decrease parameter counts. Firstly, an efficient pruned capsule layer is introduced to discard redundant capsules. Secondly, dynamic routing is replaced with orthogonal sparse attention routing, eliminating the need for iterations and fully connected structures. Lastly, weight matrices during routing are orthogonalized to sustain low capsule similarity, which is the first approach to introduce orthogonality into CapsNet as far as we know. Our experiments on baseline datasets affirm the efficiency and robustness of OrthCaps in classification tasks, in which ablation studies validate the criticality of each component. Remarkably, OrthCaps-Shallow outperforms other Capsule Network benchmarks on four datasets, utilizing only 110k parameters, which is a mere 1.25% of a standard Capsule Network's total. To the best of our knowledge, it achieves the smallest parameter count among existing Capsule Networks. Similarly, OrthCaps-Deep demonstrates competitive performance across four datasets, utilizing only 1.2% of the parameters required by its counterparts.
- Fast robust capsule network with dynamic pruning and multiscale mutual information maximization for compound-fault diagnosis. IEEE/ASME Transactions on Mechatronics, 2022.
- Attention routing between capsules. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2019.
- Introducing routing uncertainty in capsule networks. In Advances in Neural Information Processing Systems, pages 6490–6502, 2020.
- Simple black-box adversarial attacks. In Proceedings of the 36th International Conference on Machine Learning, pages 2484–2493. PMLR, 2019.
- Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016.
- Geoffrey Hinton. How to represent part-whole hierarchies in a neural network. Neural Computation, 35(3):413–452, 2023.
- Matrix capsules with EM routing. In International Conference on Learning Representations, 2018.
- Self-attention capsule networks for object classification. arXiv preprint arXiv:1904.12483, 2019.
- Orthogonal transformer: An efficient vision transformer backbone with token orthogonalization. Advances in Neural Information Processing Systems, 35:14596–14607, 2022.
- Orthogonal weight normalization: Solution to optimization over multiple dependent stiefel manifolds in deep neural networks. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1), 2018.
- Controllable orthogonalization in training DNNs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6429–6438, 2020.
- Da-capsnet: Dual attention mechanism capsule network. Scientific Reports, 10(1):11383, 2020.
- Ladder capsule network. In Proceedings of the 36th International Conference on Machine Learning, pages 3071–3079. PMLR, 2019.
- Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.
- Yann LeCun. The MNIST database of handwritten digits. http://yann. lecun. com/exdb/mnist/, 1998.
- Learning methods for generic object recognition with invariance to pose and lighting. Technical report, Courant Institute of Mathematical Sciences, New York University, 2004.
- Efficient riemannian optimization on the stiefel manifold via the cayley transform. arXiv preprint arXiv:2002.01113, 2020.
- What if neural networks had SVDs? In Advances in Neural Information Processing Systems, pages 18411–18420, 2020.
- Efficient-capsnet: Capsule network with self-attention routing. Scientific Reports, 11(1):14634, 2021.
- Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011, 2011.
- Bg-sac: Entity relationship classification model based on self-attention supported capsule networks. Applied Soft Computing, 91:106186, 2020.
- Sparse sequence-to-sequence models. arXiv preprint arXiv:1905.05702, 2019.
- Self-attention agreement among capsules. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, pages 272–280, 2021.
- Deep isometric learning for visual recognition. In Proceedings of the 37th International Conference on Machine Learning, pages 7824–7835. PMLR, 2020.
- Deepcaps: Going deeper with capsule networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10725–10733, 2019.
- Towards efficient capsule networks. In 2022 IEEE International Conference on Image Processing (ICIP), pages 2801–2805. IEEE, 2022.
- Rem: Routing entropy minimization for capsule networks. arXiv preprint arXiv:2204.01298, 2022.
- Dynamic routing between capsules. In Advances in Neural Information Processing Systems, page 3856–3866, 2017.
- Prunedcaps: A case for primary capsules discrimination. In 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA), pages 1437–1442. IEEE, 2021.
- Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
- Skew orthogonal convolutions. In Proceedings of the 38th International Conference on Machine Learning, pages 9756–9766. PMLR, 2021.
- Orthogonalizing convolutional layers with the cayley transform. arXiv preprint arXiv:2104.07167, 2021.
- Capsules with inverted dot-product attention routing. arXiv preprint arXiv:2002.04764, 2020.
- Frank Uhlig. Constructive ways for generating (generalized) real orthogonal matrices as products of (generalized) symmetries. Linear Algebra and its Applications, 332:459–467, 2001.
- Lipschitz regularity of deep neural networks: Analysis and efficient estimation. In Advances in Neural Information Processing Systems, pages 3835–3844, 2018.
- Orthogonal convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11505–11515, 2020.
- Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017.
- Rs-capsnet: An advanced capsule network. IEEE Access, 8:85007–85018, 2020.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.