Deep Sets Theorem

Updated 17 January 2026

Deep Sets Theorem is a rigorous formulation defining permutation-invariant set functions via sum pooling and continuous mappings.
The theorem decomposes any invariant function into per-element embeddings aggregated by summation, ensuring uniqueness and universality.
It also establishes the structure of permutation-equivariant functions, motivating parameter-sharing schemes in modern deep learning architectures.

In contemporary machine learning, numerous tasks involve inputs that are most naturally represented as finite sets, rather than as ordered, fixed-length vectors. Such domains include point-cloud classification, estimation of population statistics, set-expansion, outlier detection, and related scientific and industrial applications. The defining requirement for any function applied to such sets is permutation invariance—its output must not depend on the order of elements within the set. The Deep Sets theorem, introduced by Zaheer et al. (Zaheer et al., 2017), offers a full characterization of permutation-invariant set functions and prescribes universal neural network architectures that exactly respect this symmetry. In parallel, the theorem delineates the structure of permutation-equivariant functions, which produce per-element outputs that transform compatibly under input re-ordering.

1. Permutation-Invariant Set Functions

Let $\mathfrak{X}$ denote the ground set or universe of possible elements (e.g., $\mathfrak{X} = \mathbb{R}^d$ ), and let $X = \{x_1, \ldots, x_m\} \subset \mathfrak{X}$ be a finite input set. Define $2^\mathfrak{X}$ as the collection of all finite subsets of $\mathfrak{X}$ . A function $f : 2^\mathfrak{X} \to \mathcal{Y}$ is called permutation invariant if for every finite set $X$ and every permutation $\pi$ of $\{1,\ldots,m\}$ , it holds that

$f(\{x_1, \ldots, x_m\}) = f(\{x_{\pi(1)}, \ldots, x_{\pi(m)}\}).$

The Deep Sets theorem states that any continuous permutation-invariant function $f : [0,1]^M \to \mathbb{R}$ can be decomposed as

$f(\{x_1, \ldots, x_M\}) = \rho\left( \sum_{m=1}^M \phi(x_m) \right),$

where $\phi : \mathbb{R} \to \mathbb{R}^{M+1}$ and $\rho : \mathbb{R}^{M+1} \to \mathbb{R}$ are continuous mappings. In the more general countable-universe case, this representation is exact for all set functions $f : 2^\mathfrak{X} \to \mathbb{R}$ (without requiring continuity), with suitable choices of $\phi$ and $\rho$ .

2. Proof Structure and Key Ingredients

The proof comprises two critical components:

A. Uniqueness of 'sum-of-embeddings' signature:

— For countable universes, one can construct $\phi(x)$ as, e.g., $4^{-c(x)}$ for a fixed enumeration $c : \mathfrak{X} \to \mathbb{N}$ . The sum $\sum_{x \in X} \phi(x)$ uniquely encodes the set $X$ , allowing $\rho$ to reconstruct $f(X)$ .

— For fixed-size sets over continuous domains, the embedding is $\phi(x) = [1, x, x^2, \ldots, x^M] \in \mathbb{R}^{M+1}$ . The pooled vector $E(X)$ ,

$E(X) := \sum_{m=1}^M \phi(x_m) = [M, \sum x_m, \sum x_m^2, \ldots, \sum x_m^M],$

uniquely determines the multiset $\{x_1,\ldots,x_m\}$ up to permutation by Newton–Girard identities and is a continuous bijection when the set is sorted, with continuous inverse.

B. Construction of $\rho$ :

With a continuous bijection $E : \mathrm{SortedSets} \rightarrow \mathbb{R}^{M+1}$ , define $\rho(z) := f(E^{-1}(z))$ , which is automatically continuous, yielding $f(X) = \rho(\sum \phi(x))$ for any set $X$ .

3. Permutation-Equivariant Functions

For mappings $f : \mathfrak{X}^M \rightarrow \mathcal{Y}^M$ that commute with input permutations, i.e.,

$f([x_{\pi(1)}, ..., x_{\pi(M)}]) = [f(x)_{{\pi(1)}}, ..., f(x)_{{\pi(M)}}],$

the necessary and sufficient condition (for a single layer of the form $f_\Theta(x) = \sigma(\Theta x)$ , $\sigma$ activation, $x \in \mathbb{R}^M$ ) is for $\Theta \in \mathbb{R}^{M \times M}$ to commute with all permutation matrices. By group theory, this restricts $\Theta$ to the form

$\Theta = \lambda I + \gamma 11^T,$

where $\lambda, \gamma \in \mathbb{R}$ , $I$ is the identity, and $11^T$ is the all-ones matrix. This yields

$f(x)_i = \sigma(\lambda x_i + \gamma \sum_{j=1}^M x_j).$

For general $x \in \mathbb{R}^{M \times D}$ and output in $\mathbb{R}^{M \times D'}$ , the layer generalizes to

$f(X)_m = \sigma\left( x_m \Lambda + (\sum_{j=1}^M x_j) \Gamma \right),$

where $\Lambda, \Gamma \in \mathbb{R}^{D \times D'}$ .

4. Neural Architectures for Sets

Invariant models:

— Apply a shared feedforward (φ-)network, $\phi : \mathfrak{X} \rightarrow \mathbb{R}^k$ , to each set element. — Aggregate via sum-pooling, $s = \sum_{m=1}^M \phi(x_m)$ . — A second (ρ-)network, $\rho : \mathbb{R}^k \rightarrow \mathcal{Y}$ , processes the pooled vector.

Given sufficient expressivity of $\phi$ and $\rho$ (e.g., multilayer MLPs), the architecture can approximate any continuous invariant function.

Equivariant models:

— Use layers of the form

$h_m^{(\ell+1)} = \sigma(\lambda^{(\ell)} h_m^{(\ell)} + \gamma^{(\ell)} \sum_j h_j^{(\ell)} + b^{(\ell)} ),$

where $\lambda, \gamma, b$ are scalars or small matrices; equivariance is preserved across stacked layers. While mean and max pooling are also commutative and used in practice, only sum pooling guarantees universality per the main theorem.

5. Universality and Architectural Necessity

The theorem uniquely identifies sum-pooling over elementwise embeddings as the only mechanism—modulo mild technicalities—for universal function approximation subject to permutation symmetry. Approaches such as RNN processing, arbitrary post-pooling with non-shared weights, or applying fully connected layers before pooling either violate permutation invariance or fail to be universal. The parameter-sharing structure $\lambda I + \gamma 11^T$ for equivariant outputs underpins not only Deep Sets architectures but has also motivated subsequent frameworks such as graph neural networks (which aggregate via neighbor summations) and point-cloud networks (e.g., PointNet).

In summary, the Deep Sets theorem establishes that permutation symmetry is a necessary and sufficient condition for universal set processing, with immediate consequences for deep network design in set-based learning tasks (Zaheer et al., 2017).

Markdown Upgrade to Chat

References (1)

Deep Sets (2017)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Deep Sets Theorem.

Deep Sets Theorem

1. Permutation-Invariant Set Functions

2. Proof Structure and Key Ingredients

3. Permutation-Equivariant Functions

4. Neural Architectures for Sets

5. Universality and Architectural Necessity

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Deep Sets Theorem

1. Permutation-Invariant Set Functions

2. Proof Structure and Key Ingredients

3. Permutation-Equivariant Functions

4. Neural Architectures for Sets

5. Universality and Architectural Necessity

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research