Papers
Topics
Authors
Recent
Search
2000 character limit reached

Permutation-Invariant Set Functions

Updated 9 February 2026
  • Permutation-invariant set functions are mappings defined on sets where reordering inputs does not change the output, ensuring unbiased aggregation in deep learning.
  • They underpin architectures like Deep Sets, attention-based models, Janossy pooling, and dot-product decomposition, offering robustness and scalability for unordered data.
  • These functions enable efficient processing in various applications such as point cloud segmentation, sensor networks, and multi-agent systems by maintaining invariance to input order.

A permutation-invariant set function is a mapping defined on finite or countably infinite sets such that its output is unaffected by any reordering of its inputs. In contrast to functions defined on vectors or sequences, such functions are central to machine learning contexts where set structure, not sequence, is fundamental—examples include processing unordered point clouds, multi-instance learning, molecular graphs, and variable sensor configurations. The field encompasses mathematical characterization, neural architectures, optimization techniques, and applications in settings that demand strict or partial invariance to permutation.

1. Formal Definitions and Theoretical Foundations

A function f:XnYf: \mathcal{X}^n \to \mathcal{Y} is permutation-invariant if for every permutation π\pi of {1,,n}\{1,\ldots,n\}: f(xπ(1),,xπ(n))=f(x1,,xn)f(x_{\pi(1)}, \ldots, x_{\pi(n)}) = f(x_1, \ldots, x_n) Permutation equivariance is related: a function g:XnZng: \mathcal{X}^n \to \mathcal{Z}^n is permutation-equivariant if permuting the inputs permutes the outputs in the same way: g(xπ(1),,xπ(n))=(g(x))π(1),,(g(x))π(n)g(x_{\pi(1)}, \ldots, x_{\pi(n)}) = (g(x))_{\pi(1)}, \ldots, (g(x))_{\pi(n)} These definitions extend naturally to multisets and to variable-size input domains (Lee et al., 2018, Tabaghi et al., 2023).

The foundational universality result states that every continuous permutation-invariant function ff (on a compact domain) can be represented as: f({x1,...,xn})=ρ(i=1nϕ(xi))f(\{x_1, ..., x_n\}) = \rho\left(\sum_{i=1}^n \phi(x_i)\right) for suitable continuous ϕ\phi and ρ\rho (Lee et al., 2018, Kimura et al., 2024). This sum-decomposition principle underlies most modern set neural architectures. Importantly, for functions on sets of DD-dimensional vectors, universality holds with latent dimension scaling as O(ND)O(N^D), where NN bounds the set size. However, for identifiable multisets (those where an element-wise identifier can separate element identities), this can be reduced to $2DN$ (Tabaghi et al., 2023).

The expressiveness of max- or sum-decomposable set functions is limited by the latent dimension; stricter lower bounds show exact representation requires the latent size to scale with set cardinality (Kimura et al., 2024).

2. Architectural Frameworks for Permutation-Invariance

Deep Sets

The Deep Sets architecture (Kimura et al., 2024) embodies the sum-decomposition: f(X)=ρ(xXϕ(x))f(X) = \rho\left(\sum_{x \in X} \phi(x)\right) Here, ϕ\phi and ρ\rho are neural networks, with sum- or mean-pooling providing invariance. Theoretical results guarantee this form is universal for continuous set functions (Lee et al., 2018, Tabaghi et al., 2023, Kimura et al., 2024).

Variants generalize the aggregation by replacing summation with a quasi-arithmetic or Hölder mean for improved expressiveness: Mp(x1,,xn)=(1ni=1nxip)1/pM_p(x_1,\dots,x_n) = \left(\frac{1}{n}\sum_{i=1}^n x_i^p\right)^{1/p} Letting pp be learnable yields "Hölder's Power Deep Sets" (Kimura et al., 2024).

Attention-based Architectures

Attention mechanisms, as used in Set Transformer and related models, encode cross-element interactions through self-attention blocks that are permutation-equivariant (Lee et al., 2018). The architecture consists of:

  • Stacked self-attention layers (Set Attention Blocks, SAB or Induced Set Attention Blocks, ISAB) to model pairwise or higher-order interactions.
  • A permutation-invariant pooling (Pooling by Multihead Attention, PMA).

The induced attention scheme reduces computational complexity from O(n2)O(n^2) to O(nm)O(nm), with mm inducing points, enabling scalability to large sets.

Janossy Pooling

Janossy pooling expresses any permutation-invariant function as an explicit average over all n!n! input orderings: f(X)=1n!πSng(xπ(1),...,xπ(n))f(X) = \frac{1}{n!} \sum_{\pi \in S_n} g(x_{\pi(1)}, ..., x_{\pi(n)}) where gg is an arbitrary function (e.g., a sequential neural network). Computationally tractable variants include canonical orderings, kk-order interactions (subsetwise aggregation), and stochastic averages over random permutations (π\pi-SGD). Janossy pooling offers arbitrarily rich expressiveness at the cost of factorial or polynomial complexity, bridging Deep Sets and attention models (Murphy et al., 2018, Kimura et al., 2024).

Dot-Product Decomposition

The DuMLP-Pin architecture demonstrates that any permutation-invariant function f:RN×pRs×tf: \mathbb{R}^{N \times p} \to \mathbb{R}^{s \times t} can be realized as a dot-product of two permutation-equivariant functions: f(X)=[g(1)(X)]Tg(2)(X)f(X) = [g^{(1)}(X)]^T g^{(2)}(X) for Nmin{s,t}N \geq \min\{s,t\}, where g(1),g(2)g^{(1)}, g^{(2)} are row-wise MLPs (Fei et al., 2022). This yields a constrained Deep Sets form with superior parameter efficiency.

Partial Permutation Invariance

For structured input domains, such as heterogeneous graphs, it is often desirable that a function is invariant within certain partitions of the input but not globally. The PINE framework formalizes "partial permutation invariance": a function ff is invariant to the ordering within each group (e.g., neighbor type) but not to group order (Gui et al., 2019). The universal approximator then becomes: f(X1,,XK)=ϕ(n=1N1ρ1(x1,n),,n=1NKρK(xK,n))f(X_1,\ldots,X_K) = \phi\left(\sum_{n=1}^{N_1} \rho_1(x_{1,n}), \ldots, \sum_{n=1}^{N_K} \rho_K(x_{K,n})\right) where XkX_k is the set for block kk.

3. Polynomial and Sum-Decomposition Representations

The structure of symmetric (permutation-invariant) polynomial functions enables rigorous approximation error analysis, parameter counting, and complexity bounds. Polynomial approximation theorems establish that for ff symmetric in RN×d\mathbb{R}^{N \times d}, the "pooled basis" consisting of sums j=1Nϕv(xj)\sum_{j=1}^N \phi_v(x_j), where ϕv\phi_v ranges over monomials, supports uniform approximation rates nearly eliminating the curse of dimensionality (Bachmayr et al., 2021). Explicit parameter and error bounds are available:

  • The number of pooled features required can be nearly independent of NN for fixed degree.
  • Symmetry reduces both parameterization and evaluation cost.

This analysis justifies the mathematical backbone of Deep Sets and its descendants.

4. Permutation-Invariant Architectures in Practice

Operator Networks and Variable-Input Settings

Permutation-invariant set encodings enable operator networks (e.g., SetONet) to generalize DeepONet to variable input sampling, missing data, and irregular grids by processing the input as an unordered set of location–value pairs and aggregating through Deep Set blocks (Tretiakov et al., 7 May 2025). This design achieves:

  • Robustness to missing sensors.
  • Improved accuracy on strongly nonlinear PDE operators.
  • Elimination of the need for input interpolation during inference.

Autoencoders and Multi-Agent Systems

Set autoencoders (e.g., PISA) leverage sum-decomposition with latent "key–value" tricks to obtain O(1) fixed-size representation of a set, enabling perfect permutation invariance and scalability to high cardinality (Kortvelesy et al., 2023). PISA is demonstrated to achieve nearly lossless set reconstruction, efficient inference, and supports insertion/deletion in latent space, with applications to multi-agent communication in graph neural networks.

Permutation-Invariant Language Modeling

Set-LLM adapts LLMs to be permutation-invariant over set-valued segments of input (e.g., unordered multiple-choice options), using attention masks (SetMask) and set-specific positional encodings (SetPE). This delivers provable invariance with exact recovery under all answer orderings, eliminating the need for majority-voting across permutations at inference (Egressy et al., 21 May 2025).

Efficient and Scalable Global Aggregation

DuMLP-Pin achieves close-to-state-of-the-art classification and segmentation on point cloud and attribute datasets with up to 95% parameter reduction and linear inference complexity, demonstrating that dot-product decomposition is sufficiently expressive for demanding global aggregation tasks while retaining efficiency over local attention models (Fei et al., 2022).

5. Advanced Topics: Limitations, Expressiveness, and Open Problems

The universality of Deep Sets and similar architectures is conditioned on the latent dimension scaling at least linearly or polynomially with input set size and ambient dimension (Tabaghi et al., 2023, Kimura et al., 2024). Comprehensive polynomial representations can require exponential latent dimension unless one restricts to identifiable multisets or tolerates approximate representations (Bachmayr et al., 2021).

Current limitations include:

  • Latent bottleneck: For very large NN, model parameter and memory costs become prohibitive.
  • Weakness in high-order correlations: Simple sum-pooling cannot capture, e.g., median or majority-of-nn without very large latent features or explicit high-order pooling.
  • Computational cost of full Janossy pooling: Exact symmetrization is intractable beyond N10N \sim 10; kk-order or attention-based equivalence is the practical path.
  • Dataset scarcity: Large-scale, standardized set-structured datasets lag behind image or sequence domains.

Empirical and theoretical characterization of recurrent attention aggregators, optimal aggregation functions, and trade-offs between complexity, expressivity, and generalization capacity remain active areas of investigation (Kimura et al., 2024, Murphy et al., 2018).

6. Applications and Impact

Permutation-invariant set functions underpin a wide range of applications:

In computational physics, chemistry, and materials science, high-dimensional symmetric polynomial approximations are critical for cluster expansion and Jastrow–Slater wavefunction modeling (Bachmayr et al., 2021).

7. Comparative Summary of Key Architectures

Architecture Aggregation Interaction Modeling Complexity Universality
Deep Sets Sum, mean, quasi-arith. None (per-element only) O(n)O(n) Universal*
Set Transformer Attention, PMA Pairwise/high-order O(n2)O(n^2) or O(nm)O(nm) Universal
Janossy Pooling All permutations; kk-subset High-order (tunable kk) O(n!)/O(nk)O(n!)/O(n^k) Universal
DuMLP-Pin Dot-product equivariant MLPs Dot-product O(n)O(n) Universal for Nmin{s,t}N \geq \min\{s,t\}
PICASO Cascaded attention Dynamic, high-order O(Lknd)O(L k n d) Empirically robust
PISA Sum key–value None (decoder recovers by keys) O(n)O(n) Empirically lossless (high dzd_z)
Set-LLM Attention + set masking/PE Cross-set crosstalk disabled Linear in seq. len. Complete invariance for choices

*Universal for continuous functions with sufficient latent dimension.


Permutation-invariant set functions form the core mathematical and algorithmic motif in set-structured deep learning, yielding expressive, robust, and scalable models for unordered data domains. Their principled design, enabled by sum-/attention-based aggregation and deep theoretical analysis, continues to fuel progress in set reasoning, generative modeling, cooperative multi-agent systems, and beyond. Recent technical advances address both practical computational barriers and the theoretical frontiers of universal approximation and latent complexity (Lee et al., 2018, Murphy et al., 2018, Tabaghi et al., 2023, Kimura et al., 2024, Fei et al., 2022, Tretiakov et al., 7 May 2025, Egressy et al., 21 May 2025).

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Permutation-Invariant Set Functions.