Polynomial Width is Sufficient for Set Representation with High-dimensional Features (2307.04001v3)
Abstract: Set representation has become ubiquitous in deep learning for modeling the inductive bias of neural networks that are insensitive to the input order. DeepSets is the most widely used neural network architecture for set representation. It involves embedding each set element into a latent space with dimension $L$, followed by a sum pooling to obtain a whole-set embedding, and finally mapping the whole-set embedding to the output. In this work, we investigate the impact of the dimension $L$ on the expressive power of DeepSets. Previous analyses either oversimplified high-dimensional features to be one-dimensional features or were limited to analytic activations, thereby diverging from practical use or resulting in $L$ that grows exponentially with the set size $N$ and feature dimension $D$. To investigate the minimal value of $L$ that achieves sufficient expressive power, we present two set-element embedding layers: (a) linear + power activation (LP) and (b) linear + exponential activations (LE). We demonstrate that $L$ being poly$(N, D)$ is sufficient for set representation using both embedding layers. We also provide a lower bound of $L$ for the LP embedding layer. Furthermore, we extend our results to permutation-equivariant set functions and the complex field.
- Neural injective functions for multisets, measures and graphs via a finite witness theorem. Advances in Neural Information Processing Systems, 36, 2024.
- Expressive power of invariant and equivariant graph neural networks. In ICLR 2021-International Conference on Learning Representations, 2021.
- Andrew R Barron. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information theory, 39(3):930–945, 1993.
- Lorentz group equivariant neural network for particle physics. In International Conference on Machine Learning, pp. 992–1002. PMLR, 2020.
- Nicolas Bourbaki. Éléments d’histoire des mathématiques, volume 4. Springer Science & Business Media, 2007.
- Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine, 34(4):18–42, 2017.
- On the equivalence between graph isomorphism testing and function approximation with gnns. In Advances in Neural Information Processing Systems, pp. 15868–15876, 2019.
- Can graph neural networks count substructures? volume 33, 2020.
- Exact and efficient representation of totally anti-symmetric functions. arXiv preprint arXiv:2311.05064, 2023.
- Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289, 2015.
- On the expressive power of deep learning: A tensor analysis. In Conference on learning theory, pp. 698–728. PMLR, 2016.
- Group equivariant convolutional networks. In International conference on machine learning, pp. 2990–2999. PMLR, 2016.
- Principal neighbourhood aggregation for graph nets. Advances in Neural Information Processing Systems, 33:13260–13271, 2020.
- Roots and polynomials as homeomorphic spaces. Expositiones Mathematicae, 24(1):81–95, 2006.
- George Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems, 2(4):303–314, 1989.
- Low-dimensional invariant embeddings for universal geometric learning. Foundations of Computational Mathematics, pp. 1–41, 2024.
- Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural networks, 107:3–11, 2018.
- What functions can graph neural networks generate? arXiv preprint arXiv:2202.08833, 2022.
- Neural message passing for quantum chemistry. In International Conference on Machine Learning (ICML), 2017.
- Stochastic optimization of sorting networks via continuous relaxations. In International Conference on Learning Representations, 2020.
- Pine: Universal deep embedding for graph nodes via partial permutation invariant set functions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(2):770–782, 2021.
- Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems, 2017.
- Multilayer feedforward networks are universal approximators. Neural Networks, 2(5):359–366, 1989.
- Linear regression without correspondence. Advances in Neural Information Processing Systems, 30, 2017.
- Universal invariant and equivariant graph neural networks. In Advances in Neural Information Processing Systems, pp. 7090–7099, 2019.
- On the expressive power of deep polynomial neural networks. Advances in neural information processing systems, 32, 2019.
- Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR), 2017.
- On the generalization of equivariance and convolution in neural networks to the action of compact groups. In International Conference on Machine Learning, pp. 2747–2755, 2018.
- Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks, 3361(10):1995, 1995.
- Set transformer: A framework for attention-based permutation-invariant neural networks. In International conference on machine learning, pp. 3744–3753. PMLR, 2019.
- Shiyu Liang and R Srikant. Why deep neural networks for function approximation? In International Conference on Learning Representations, 2017.
- Andreas Loukas. What graph neural networks cannot learn: depth vs width. In International Conference on Learning Representations, 2020.
- Invariant and equivariant graph networks. In International Conference on Learning Representations (ICLR), 2018.
- On the universality of invariant networks. In International conference on machine learning, pp. 4363–4371. PMLR, 2019.
- Point cloud transformers applied to collider physics. Machine Learning: Science and Technology, 2(3):035027, 2021.
- Weisfeiler and leman go neural: Higher-order graph neural networks. In the AAAI Conference on Artificial Intelligence, volume 33, pp. 4602–4609, 2019.
- Janossy pooling: Learning deep permutation-invariant functions for variable-size inputs. In International Conference on Learning Representations (ICLR), 2018.
- Linear regression without correspondences via concave minimization. IEEE Signal Processing Letters, 27:1580–1584, 2020.
- Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 652–660, 2017.
- Jet tagging via particle clouds. Physical Review D, 101(5):056019, 2020.
- On the expressive power of deep neural networks. In international conference on machine learning, pp. 2847–2854. PMLR, 2017.
- David Rydh. A minimal set of generators for the ring of multisymmetric functions. In Annales de l’institut Fourier, volume 57, pp. 1741–1769, 2007.
- Universal approximations of permutation invariant/equivariant functions by deep neural networks. arXiv preprint arXiv:1903.01939, 2019.
- A simple neural network module for relational reasoning. Advances in neural information processing systems, 30, 2017.
- The graph neural network model. IEEE Transactions on Neural Networks, 20(1):61–80, 2008.
- On universal equivariant set networks. In International Conference on Learning Representations (ICLR), 2020.
- Universal representation of permutation-invariant functions on vectors and tensors. arXiv preprint arXiv:2310.13829, 2023.
- Homomorphic sensing. In International Conference on Machine Learning, pp. 6335–6344. PMLR, 2019.
- Manolis C Tsakiris. Low-rank matrix completion theory via plücker coordinates. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
- An algebraic-geometric approach for linear regression without correspondences. IEEE Transactions on Information Theory, 66(8):5130–5144, 2020.
- Unlabeled sensing with random linear measurements. IEEE Transactions on Information Theory, 64(5):3237–3253, 2018.
- On the limitations of representing functions on sets. In International Conference on Machine Learning, pp. 6487–6494. PMLR, 2019.
- Universal approximation of functions on sets. Journal of Machine Learning Research, 23(151):1–56, 2022.
- Equivariant hypergraph diffusion neural operators. In International Conference on Learning Representations (ICLR), 2023.
- How powerful are graph neural networks? In International Conference on Learning Representations, 2019.
- Unlabeled principal component analysis. Advances in Neural Information Processing Systems, 34:30452–30464, 2021.
- Dmitry Yarotsky. Error bounds for approximations with deep relu networks. Neural Networks, 94:103–114, 2017.
- Dmitry Yarotsky. Universal approximations of invariant maps by neural networks. Constructive Approximation, 55(1):407–474, 2022.
- Deep sets. In Advances in Neural Information Processing Systems (NeurIPS), 2017.
- Deep set prediction networks. Advances in Neural Information Processing Systems, 32, 2019.
- Fspool: Learning set representations with featurewise sort pooling. In International Conference on Learning Representations, 2020.
- Point transformer. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 16259–16268, 2021.
- Ding-Xuan Zhou. Universality of deep convolutional neural networks. Applied and computational harmonic analysis, 48(2):787–794, 2020.
- Exponential separations in symmetric neural networks. arXiv preprint arXiv:2206.01266, 2022.