Symmetric Deep Neural Networks

Updated 23 November 2025

Symmetric deep neural networks are architectures that enforce permutation invariance, enabling effective high-dimensional function approximation.
They utilize symmetric Korobov spaces and squared-ReLU subnets to achieve dimension-free approximation rates and mitigate the curse of dimensionality.
The design improves computational efficiency and generalization in fields like physics and finance by integrating sparse grid symmetrization and Vandermonde-inverse aggregation.

Symmetric deep neural networks are architectures designed to exploit permutation symmetry inherent in function classes encountered in scientific and mathematical modeling, particularly for high-dimensional tasks. These models enforce invariance under permutations of input coordinates, leading to substantial computational advantages and rigorous improvements in both approximation and generalization for functions possessing such symmetry. The paradigm offers dimension-free rates, avoiding the curse of dimensionality previously endemic to neural approximations of symmetric functions, as established by the dimension-free approximation and learning guarantees for symmetric Korobov spaces (Lu et al., 16 Nov 2025).

1. Symmetric Korobov Spaces and Function Classes

Symmetric Korobov spaces are a central construct for analyzing permutation-symmetric functions in multiple dimensions. Let $r \geq 1$ and $d \geq 1$ ; the periodic Korobov space $H^r_{\mathrm{Kor}}(d)$ is defined on $[0,1]^d$ as the set of periodic functions $f$ admittting a Fourier expansion $f(x) = \sum_{k \in \mathbb{Z}^d} \hat{f}_k e^{2\pi i k \cdot x}$ , equipped with norm

$\|f\|_{H^r_{\mathrm{Kor}}}^2 = \sum_{k \in \mathbb{Z}^d} |\hat{f}_k|^2 \prod_{j=1}^d (1 + |k_j|^2)^r.$

Equivalently, the zero-boundary "hat-basis" formulation establishes

$X^{2,2}(\Omega) := \{ f \in L^2(\Omega) : f|_{\partial\Omega} = 0, D^\alpha f \in L^2(\Omega) \;\forall\, |\alpha|_\infty \leq 2 \},$

with semi-norm $|f|_{2,2} := \bigl\| \partial_{x_1}^2 \cdots \partial_{x_d}^2 f \bigr\|_{L^2(\Omega)}$ . Functions $f$ are called symmetric if $f(x_{\sigma(1)}, ..., x_{\sigma(d)}) = f(x_1, ..., x_d)$ for all $\sigma \in S_d$ ; accordingly, the symmetric subspace $X^{2,2}_{\mathrm{sym}}(\Omega)$ is defined by restriction to symmetric functions, and in Fourier coordinates requires $\hat{f}_k = \hat{f}_{k_\sigma}$ for all coordinate permutations.

2. Dimension-Free Approximation by Deep Symmetric Networks

The main theorem establishes that any $f \in X^{2,2}_{\mathrm{sym}}(\Omega)$ can be approximated, for integer $n \geq 1$ , by a symmetric squared-ReLU network $\varphi_{\mathrm{ReLU}_2}$ of the form

$\varphi_{\mathrm{ReLU}_2}(x) = \sum_{i=1}^m s_i(x),$

where each $s_i$ is itself a squared-ReLU network of width $\leq 3d^2$ and depth $L_0 = \lfloor\log_2 d\rfloor + 2$ . The total number $m$ of summands satisfies

$m = O\left( d\,2^{n+d-1} e^{\pi\sqrt{5d/3}} \right).$

The energy-norm error satisfies

$\|f - \varphi_{\mathrm{ReLU}_2}\|_E = \left( \int_\Omega \sum_{j=1}^d |\partial_{x_j}(f - \varphi)(x)|^2 dx \right)^{1/2} \leq C |f|_{2,2} m^{-1},$

where $C$ depends polynomially on $d$ but not exponentially. The approximation rate $O(m^{-1})$ is thus dimension-free; to drive the energy-norm below $\varepsilon$ requires $m = O(\varepsilon^{-1})$ , achievable with network depth $O(\log d)$ , width $O(d^3 2^d e^{\pi\sqrt{5d/3}})$ , and weights bounded by $O(e^{\pi\sqrt{5d/3}})$ .

3. Permutation-Invariant Network Architecture

Symmetry is imposed by grouping tensor-product sparse grid basis functions $\varphi_{\ell,i}(x)$ into symmetrized blocks

$\psi_{\ell,i}(x) = \sum_{\sigma \in S_d} \varphi_{\ell,i}(x_{\sigma(1)}, ..., x_{\sigma(d)}).$

While direct summation over $d!$ permutations is intractable, Lemma 4.1 represents $\psi_{\ell,i}$ as a linear combination of only $O(d\,2^{d-1})$ exponentials of inner-product features $G_\xi(x) = \prod_{s=1}^d \xi^{2^{s-1} \varphi_{\ell_s, i_s}(x_s)}$ , $\xi = 1, ..., D$ , with $D \leq d\,2^{d-1}$ . Recovery of the symmetrized output is performed through a Vandermonde-inverse linear layer.

Each $G_\xi$ is approximated by feeding the scalar hat function $\varphi_{\ell,j}(\cdot)$ into a product-of-exponentials, using shallow squared-ReLU subnets and an $O(\log d)$ -deep binary-tree of ReLU-based bilinear blocks for the $d$ -fold product, requiring $O(d)$ neurons. A final linear layer of width $D$ combines these channels, with global weight-sharing across combinatorial block types to ensure permutation invariance.

4. Mathematical Framework Underpinning Dimension-Free Rates

Key ingredients for dimension-free results include:

The energy-based sparse grid index set $X_n$ , which replaces total-degree sets to reduce the dominant $(\log m)^{d-1}$ term in the error estimate. This produces cardinality $O(2^n e^d)$ .
Exploiting permutation symmetry by aligning with ordered multi-indices ( $\ell_1 \leq ... \leq \ell_d$ ) and symmetrizing bases, resulting in $C_s 2^n \exp(\pi\sqrt{5d/3})$ distinct symmetric blocks (exponential in $\sqrt{d}$ only).
Realizing each symmetrized block $\psi_{\ell,i}$ via a squared-ReLU subnet of width $3d^3(2^{d-1}-1)$ , depth $\lfloor \log_2 d\rfloor + 2$ , and $O(d^3 2^{d-1} \log d)$ parameters, attaining $H^1$ -accuracy $\delta$ .
By truncating to $m \sim C_s 2^n e^{\pi\sqrt{5d/3}}$ blocks and approximating each to error $\delta \approx m^{-2}$ , total $H^1$ error is $O(m^{-1})$ , yielding an algebraic, truly dimension-free rate.

The relevance lies in reducing the exponential cost normally expected in $d$ for generic approximators, making the approach scalable to high-dimensional symmetric problems.

5. Sample Complexity and Generalization Guarantees

For supervised learning of symmetric Korobov functions, let the target $f_\varrho \in X^{2,2}_{\mathrm{sym}}(\Omega)$ satisfy $\|\nabla f_\varrho\|_{L^\infty} \leq L$ . Observed i.i.d. samples $S = \{ (x_j, y_j) \}_{j=1}^M$ are distributed so that $\mathbb{E}[y|x] = \nabla f_\varrho(x)$ and $\|y\| \leq L$ almost surely, and the hypothesis class $F_{m, L}$ is the set of symmetric networks described above.

The empirical risk minimizer

$\hat{f} = \arg\min_{f \in F_{m, L}} \frac{1}{M} \sum_{j = 1}^M \| \nabla f(x_j) - y_j \|^2$

admits the bound

$\mathbb{E}_S \| \hat{f} - f_\varrho \|_E^2 \leq C \left( \frac{(\log M)^2}{M} \right)^{2/3} + 2 \inf_{f \in F_{m,L}} \| f - f_\varrho \|_E^2,$

where $C$ is polynomial in $d$ , $L$ , $|f_\varrho|_{2,2}$ . By choosing $m \sim \varepsilon^{-1}$ and $M \sim m^3 (\log m)^2 \sim \varepsilon^{-3} (\log 1/\varepsilon)^2$ , one achieves $\mathbb{E}\| \hat{f} - f_\varrho \|_E^2 = O(\varepsilon^2)$ . High-probability bounds are also established: for any $\delta > 0$ , with probability at least $1 - \delta$ ,

$\| \hat{f} - f_\varrho \|_E = O\left( (\log(1/\delta)/M)^{1/3} \right) = O(\varepsilon).$

A plausible implication is that learning symmetric function classes with deep networks can achieve sample and approximation efficiency competitive with classical statistical rates, with dimension-independent leading factors.

6. Implications and Significance for High-Dimensional Learning

The dimension-free results obtained for symmetric deep neural networks represent a substantial advance over previous approximation and generalization bounds, as both the convergence rates and constant prefactors scale at most polynomially with ambient dimension, as opposed to classical exponential dependencies. This suggests a scalable pathway for approximating physically or mathematically symmetric models, such as those in computational physics, finance, and chemistry.

The architectural insights—enforcing permutation invariance via sparse grid symmetrization and Vandermonde-based aggregation—may generalize to other domains requiring strict invariance under variable permutation, such as set-based models or particle-interaction networks. Broadly, the approach expands the class of feasible problems for neural approximation and learning in high-dimensional symmetric settings, and demonstrates that by carefully matching neural architecture to underlying function symmetry, one can eliminate a principal bottleneck traditionally faced by generic deep learning models (Lu et al., 16 Nov 2025).

PDF Markdown Chat (Pro)

References (1)

On the Dimension-Free Approximation of Deep Neural Networks for Symmetric Korobov Functions (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Symmetric Deep Neural Networks.