Permutation-Invariant Set Functions

Updated 9 February 2026

Permutation-invariant set functions are mappings defined on sets where reordering inputs does not change the output, ensuring unbiased aggregation in deep learning.
They underpin architectures like Deep Sets, attention-based models, Janossy pooling, and dot-product decomposition, offering robustness and scalability for unordered data.
These functions enable efficient processing in various applications such as point cloud segmentation, sensor networks, and multi-agent systems by maintaining invariance to input order.

A permutation-invariant set function is a mapping defined on finite or countably infinite sets such that its output is unaffected by any reordering of its inputs. In contrast to functions defined on vectors or sequences, such functions are central to machine learning contexts where set structure, not sequence, is fundamental—examples include processing unordered point clouds, multi-instance learning, molecular graphs, and variable sensor configurations. The field encompasses mathematical characterization, neural architectures, optimization techniques, and applications in settings that demand strict or partial invariance to permutation.

1. Formal Definitions and Theoretical Foundations

A function $f: \mathcal{X}^n \to \mathcal{Y}$ is permutation-invariant if for every permutation $\pi$ of $\{1,\ldots,n\}$ : $f(x_{\pi(1)}, \ldots, x_{\pi(n)}) = f(x_1, \ldots, x_n)$ Permutation equivariance is related: a function $g: \mathcal{X}^n \to \mathcal{Z}^n$ is permutation-equivariant if permuting the inputs permutes the outputs in the same way: $g(x_{\pi(1)}, \ldots, x_{\pi(n)}) = (g(x))_{\pi(1)}, \ldots, (g(x))_{\pi(n)}$ These definitions extend naturally to multisets and to variable-size input domains (Lee et al., 2018, Tabaghi et al., 2023).

The foundational universality result states that every continuous permutation-invariant function $f$ (on a compact domain) can be represented as: $f(\{x_1, ..., x_n\}) = \rho\left(\sum_{i=1}^n \phi(x_i)\right)$ for suitable continuous $\phi$ and $\rho$ (Lee et al., 2018, Kimura et al., 2024). This sum-decomposition principle underlies most modern set neural architectures. Importantly, for functions on sets of $D$ -dimensional vectors, universality holds with latent dimension scaling as $O(N^D)$ , where $N$ bounds the set size. However, for identifiable multisets (those where an element-wise identifier can separate element identities), this can be reduced to $2DN$ (Tabaghi et al., 2023).

The expressiveness of max- or sum-decomposable set functions is limited by the latent dimension; stricter lower bounds show exact representation requires the latent size to scale with set cardinality (Kimura et al., 2024).

2. Architectural Frameworks for Permutation-Invariance

Deep Sets

The Deep Sets architecture (Kimura et al., 2024) embodies the sum-decomposition: $f(X) = \rho\left(\sum_{x \in X} \phi(x)\right)$ Here, $\phi$ and $\rho$ are neural networks, with sum- or mean-pooling providing invariance. Theoretical results guarantee this form is universal for continuous set functions (Lee et al., 2018, Tabaghi et al., 2023, Kimura et al., 2024).

Variants generalize the aggregation by replacing summation with a quasi-arithmetic or Hölder mean for improved expressiveness: $M_p(x_1,\dots,x_n) = \left(\frac{1}{n}\sum_{i=1}^n x_i^p\right)^{1/p}$ Letting $p$ be learnable yields "Hölder's Power Deep Sets" (Kimura et al., 2024).

Attention-based Architectures

Attention mechanisms, as used in Set Transformer and related models, encode cross-element interactions through self-attention blocks that are permutation-equivariant (Lee et al., 2018). The architecture consists of:

Stacked self-attention layers (Set Attention Blocks, SAB or Induced Set Attention Blocks, ISAB) to model pairwise or higher-order interactions.
A permutation-invariant pooling (Pooling by Multihead Attention, PMA).

The induced attention scheme reduces computational complexity from $O(n^2)$ to $O(nm)$ , with $m$ inducing points, enabling scalability to large sets.

Janossy Pooling

Janossy pooling expresses any permutation-invariant function as an explicit average over all $n!$ input orderings: $f(X) = \frac{1}{n!} \sum_{\pi \in S_n} g(x_{\pi(1)}, ..., x_{\pi(n)})$ where $g$ is an arbitrary function (e.g., a sequential neural network). Computationally tractable variants include canonical orderings, $k$ -order interactions (subsetwise aggregation), and stochastic averages over random permutations ( $\pi$ -SGD). Janossy pooling offers arbitrarily rich expressiveness at the cost of factorial or polynomial complexity, bridging Deep Sets and attention models (Murphy et al., 2018, Kimura et al., 2024).

Dot-Product Decomposition

The DuMLP-Pin architecture demonstrates that any permutation-invariant function $f: \mathbb{R}^{N \times p} \to \mathbb{R}^{s \times t}$ can be realized as a dot-product of two permutation-equivariant functions: $f(X) = [g^{(1)}(X)]^T g^{(2)}(X)$ for $N \geq \min\{s,t\}$ , where $g^{(1)}, g^{(2)}$ are row-wise MLPs (Fei et al., 2022). This yields a constrained Deep Sets form with superior parameter efficiency.

Partial Permutation Invariance

For structured input domains, such as heterogeneous graphs, it is often desirable that a function is invariant within certain partitions of the input but not globally. The PINE framework formalizes "partial permutation invariance": a function $f$ is invariant to the ordering within each group (e.g., neighbor type) but not to group order (Gui et al., 2019). The universal approximator then becomes: $f(X_1,\ldots,X_K) = \phi\left(\sum_{n=1}^{N_1} \rho_1(x_{1,n}), \ldots, \sum_{n=1}^{N_K} \rho_K(x_{K,n})\right)$ where $X_k$ is the set for block $k$ .

3. Polynomial and Sum-Decomposition Representations

The structure of symmetric (permutation-invariant) polynomial functions enables rigorous approximation error analysis, parameter counting, and complexity bounds. Polynomial approximation theorems establish that for $f$ symmetric in $\mathbb{R}^{N \times d}$ , the "pooled basis" consisting of sums $\sum_{j=1}^N \phi_v(x_j)$ , where $\phi_v$ ranges over monomials, supports uniform approximation rates nearly eliminating the curse of dimensionality (Bachmayr et al., 2021). Explicit parameter and error bounds are available:

The number of pooled features required can be nearly independent of $N$ for fixed degree.
Symmetry reduces both parameterization and evaluation cost.

This analysis justifies the mathematical backbone of Deep Sets and its descendants.

4. Permutation-Invariant Architectures in Practice

Operator Networks and Variable-Input Settings

Permutation-invariant set encodings enable operator networks (e.g., SetONet) to generalize DeepONet to variable input sampling, missing data, and irregular grids by processing the input as an unordered set of location–value pairs and aggregating through Deep Set blocks (Tretiakov et al., 7 May 2025). This design achieves:

Robustness to missing sensors.
Improved accuracy on strongly nonlinear PDE operators.
Elimination of the need for input interpolation during inference.

Autoencoders and Multi-Agent Systems

Set autoencoders (e.g., PISA) leverage sum-decomposition with latent "key–value" tricks to obtain O(1) fixed-size representation of a set, enabling perfect permutation invariance and scalability to high cardinality (Kortvelesy et al., 2023). PISA is demonstrated to achieve nearly lossless set reconstruction, efficient inference, and supports insertion/deletion in latent space, with applications to multi-agent communication in graph neural networks.

Permutation-Invariant Language Modeling

Set-LLM adapts LLMs to be permutation-invariant over set-valued segments of input (e.g., unordered multiple-choice options), using attention masks (SetMask) and set-specific positional encodings (SetPE). This delivers provable invariance with exact recovery under all answer orderings, eliminating the need for majority-voting across permutations at inference (Egressy et al., 21 May 2025).

Efficient and Scalable Global Aggregation

DuMLP-Pin achieves close-to-state-of-the-art classification and segmentation on point cloud and attribute datasets with up to 95% parameter reduction and linear inference complexity, demonstrating that dot-product decomposition is sufficiently expressive for demanding global aggregation tasks while retaining efficiency over local attention models (Fei et al., 2022).

5. Advanced Topics: Limitations, Expressiveness, and Open Problems

The universality of Deep Sets and similar architectures is conditioned on the latent dimension scaling at least linearly or polynomially with input set size and ambient dimension (Tabaghi et al., 2023, Kimura et al., 2024). Comprehensive polynomial representations can require exponential latent dimension unless one restricts to identifiable multisets or tolerates approximate representations (Bachmayr et al., 2021).

Current limitations include:

Latent bottleneck: For very large $N$ , model parameter and memory costs become prohibitive.
Weakness in high-order correlations: Simple sum-pooling cannot capture, e.g., median or majority-of- $n$ without very large latent features or explicit high-order pooling.
Computational cost of full Janossy pooling: Exact symmetrization is intractable beyond $N \sim 10$ ; $k$ -order or attention-based equivalence is the practical path.
Dataset scarcity: Large-scale, standardized set-structured datasets lag behind image or sequence domains.

Empirical and theoretical characterization of recurrent attention aggregators, optimal aggregation functions, and trade-offs between complexity, expressivity, and generalization capacity remain active areas of investigation (Kimura et al., 2024, Murphy et al., 2018).

6. Applications and Impact

Permutation-invariant set functions underpin a wide range of applications:

Point cloud classification/segmentation in vision (Lee et al., 2018, Fei et al., 2022).
Learning on variable-sized and incomplete sensor networks (Tretiakov et al., 7 May 2025).
Multi-agent observation and communication fusion (Kortvelesy et al., 2023).
Graph representation learning in homogeneous and heterogeneous networks (Gui et al., 2019).
Permutation-invariant evaluation and robustness in language modeling (Egressy et al., 21 May 2025).
Symmetric (exchangeable) normalizing flow models for generative modeling of non-i.i.d. set data (Rasul et al., 2019).
Communication complexity characterizations for distributed computation of set functions (Ghazi et al., 2015).

In computational physics, chemistry, and materials science, high-dimensional symmetric polynomial approximations are critical for cluster expansion and Jastrow–Slater wavefunction modeling (Bachmayr et al., 2021).

7. Comparative Summary of Key Architectures

Architecture	Aggregation	Interaction Modeling	Complexity	Universality
Deep Sets	Sum, mean, quasi-arith.	None (per-element only)	$O(n)$	Universal*
Set Transformer	Attention, PMA	Pairwise/high-order	$O(n^2)$ or $O(nm)$	Universal
Janossy Pooling	All permutations; $k$ -subset	High-order (tunable $k$ )	$O(n!)/O(n^k)$	Universal
DuMLP-Pin	Dot-product equivariant MLPs	Dot-product	$O(n)$	Universal for $N \geq \min\{s,t\}$
PICASO	Cascaded attention	Dynamic, high-order	$O(L k n d)$	Empirically robust
PISA	Sum key–value	None (decoder recovers by keys)	$O(n)$	Empirically lossless (high $d_z$ )
Set-LLM	Attention + set masking/PE	Cross-set crosstalk disabled	Linear in seq. len.	Complete invariance for choices

*Universal for continuous functions with sufficient latent dimension.

Permutation-invariant set functions form the core mathematical and algorithmic motif in set-structured deep learning, yielding expressive, robust, and scalable models for unordered data domains. Their principled design, enabled by sum-/attention-based aggregation and deep theoretical analysis, continues to fuel progress in set reasoning, generative modeling, cooperative multi-agent systems, and beyond. Recent technical advances address both practical computational barriers and the theoretical frontiers of universal approximation and latent complexity (Lee et al., 2018, Murphy et al., 2018, Tabaghi et al., 2023, Kimura et al., 2024, Fei et al., 2022, Tretiakov et al., 7 May 2025, Egressy et al., 21 May 2025).

Markdown Upgrade to Chat

References (12)

Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks (2018)

Universal Representation of Permutation-Invariant Functions on Vectors and Tensors (2023)

On permutation-invariant neural networks (2024)

Janossy Pooling: Learning Deep Permutation-Invariant Functions for Variable-Size Inputs (2018)

DuMLP-Pin: A Dual-MLP-dot-product Permutation-invariant Network for Set Feature Extraction (2022)

PINE: Universal Deep Embedding for Graph Nodes via Partial Permutation Invariant Set Functions (2019)

Polynomial Approximation of Symmetric Functions (2021)

SetONet: A Deep Set-based Operator Network for Solving PDEs with permutation invariant variable input sampling (2025)

Permutation-Invariant Set Autoencoders with Fixed-Size Embeddings for Multi-Agent Learning (2023)

10.

Set-LLM: A Permutation-Invariant LLM (2025)

11.

Set Flow: A Permutation Invariant Normalizing Flow (2019)

12.

Communication Complexity of Permutation-Invariant Functions (2015)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Permutation-Invariant Set Functions.

Permutation-Invariant Set Functions

1. Formal Definitions and Theoretical Foundations

2. Architectural Frameworks for Permutation-Invariance

Deep Sets

Attention-based Architectures

Janossy Pooling

Dot-Product Decomposition

Partial Permutation Invariance

3. Polynomial and Sum-Decomposition Representations

4. Permutation-Invariant Architectures in Practice

Operator Networks and Variable-Input Settings

Autoencoders and Multi-Agent Systems

Permutation-Invariant Language Modeling

Efficient and Scalable Global Aggregation

5. Advanced Topics: Limitations, Expressiveness, and Open Problems

6. Applications and Impact

7. Comparative Summary of Key Architectures

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Permutation-Invariant Set Functions

1. Formal Definitions and Theoretical Foundations

2. Architectural Frameworks for Permutation-Invariance

Deep Sets

Attention-based Architectures

Janossy Pooling

Dot-Product Decomposition

Partial Permutation Invariance

3. Polynomial and Sum-Decomposition Representations

4. Permutation-Invariant Architectures in Practice

Operator Networks and Variable-Input Settings

Autoencoders and Multi-Agent Systems

Permutation-Invariant Language Modeling

Efficient and Scalable Global Aggregation

5. Advanced Topics: Limitations, Expressiveness, and Open Problems

6. Applications and Impact

7. Comparative Summary of Key Architectures

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research