Papers
Topics
Authors
Recent
2000 character limit reached

Symmetric Deep Neural Networks

Updated 23 November 2025
  • Symmetric deep neural networks are architectures that enforce permutation invariance, enabling effective high-dimensional function approximation.
  • They utilize symmetric Korobov spaces and squared-ReLU subnets to achieve dimension-free approximation rates and mitigate the curse of dimensionality.
  • The design improves computational efficiency and generalization in fields like physics and finance by integrating sparse grid symmetrization and Vandermonde-inverse aggregation.

Symmetric deep neural networks are architectures designed to exploit permutation symmetry inherent in function classes encountered in scientific and mathematical modeling, particularly for high-dimensional tasks. These models enforce invariance under permutations of input coordinates, leading to substantial computational advantages and rigorous improvements in both approximation and generalization for functions possessing such symmetry. The paradigm offers dimension-free rates, avoiding the curse of dimensionality previously endemic to neural approximations of symmetric functions, as established by the dimension-free approximation and learning guarantees for symmetric Korobov spaces (Lu et al., 16 Nov 2025).

1. Symmetric Korobov Spaces and Function Classes

Symmetric Korobov spaces are a central construct for analyzing permutation-symmetric functions in multiple dimensions. Let r1r \geq 1 and d1d \geq 1; the periodic Korobov space HKorr(d)H^r_{\mathrm{Kor}}(d) is defined on [0,1]d[0,1]^d as the set of periodic functions ff admittting a Fourier expansion f(x)=kZdf^ke2πikxf(x) = \sum_{k \in \mathbb{Z}^d} \hat{f}_k e^{2\pi i k \cdot x}, equipped with norm

fHKorr2=kZdf^k2j=1d(1+kj2)r.\|f\|_{H^r_{\mathrm{Kor}}}^2 = \sum_{k \in \mathbb{Z}^d} |\hat{f}_k|^2 \prod_{j=1}^d (1 + |k_j|^2)^r.

Equivalently, the zero-boundary "hat-basis" formulation establishes

X2,2(Ω):={fL2(Ω):fΩ=0,DαfL2(Ω)  α2},X^{2,2}(\Omega) := \{ f \in L^2(\Omega) : f|_{\partial\Omega} = 0, D^\alpha f \in L^2(\Omega) \;\forall\, |\alpha|_\infty \leq 2 \},

with semi-norm f2,2:=x12xd2fL2(Ω)|f|_{2,2} := \bigl\| \partial_{x_1}^2 \cdots \partial_{x_d}^2 f \bigr\|_{L^2(\Omega)}. Functions ff are called symmetric if f(xσ(1),...,xσ(d))=f(x1,...,xd)f(x_{\sigma(1)}, ..., x_{\sigma(d)}) = f(x_1, ..., x_d) for all σSd\sigma \in S_d; accordingly, the symmetric subspace Xsym2,2(Ω)X^{2,2}_{\mathrm{sym}}(\Omega) is defined by restriction to symmetric functions, and in Fourier coordinates requires f^k=f^kσ\hat{f}_k = \hat{f}_{k_\sigma} for all coordinate permutations.

2. Dimension-Free Approximation by Deep Symmetric Networks

The main theorem establishes that any fXsym2,2(Ω)f \in X^{2,2}_{\mathrm{sym}}(\Omega) can be approximated, for integer n1n \geq 1, by a symmetric squared-ReLU network φReLU2\varphi_{\mathrm{ReLU}_2} of the form

φReLU2(x)=i=1msi(x),\varphi_{\mathrm{ReLU}_2}(x) = \sum_{i=1}^m s_i(x),

where each sis_i is itself a squared-ReLU network of width 3d2\leq 3d^2 and depth L0=log2d+2L_0 = \lfloor\log_2 d\rfloor + 2. The total number mm of summands satisfies

m=O(d2n+d1eπ5d/3).m = O\left( d\,2^{n+d-1} e^{\pi\sqrt{5d/3}} \right).

The energy-norm error satisfies

fφReLU2E=(Ωj=1dxj(fφ)(x)2dx)1/2Cf2,2m1,\|f - \varphi_{\mathrm{ReLU}_2}\|_E = \left( \int_\Omega \sum_{j=1}^d |\partial_{x_j}(f - \varphi)(x)|^2 dx \right)^{1/2} \leq C |f|_{2,2} m^{-1},

where CC depends polynomially on dd but not exponentially. The approximation rate O(m1)O(m^{-1}) is thus dimension-free; to drive the energy-norm below ε\varepsilon requires m=O(ε1)m = O(\varepsilon^{-1}), achievable with network depth O(logd)O(\log d), width O(d32deπ5d/3)O(d^3 2^d e^{\pi\sqrt{5d/3}}), and weights bounded by O(eπ5d/3)O(e^{\pi\sqrt{5d/3}}).

3. Permutation-Invariant Network Architecture

Symmetry is imposed by grouping tensor-product sparse grid basis functions φ,i(x)\varphi_{\ell,i}(x) into symmetrized blocks

ψ,i(x)=σSdφ,i(xσ(1),...,xσ(d)).\psi_{\ell,i}(x) = \sum_{\sigma \in S_d} \varphi_{\ell,i}(x_{\sigma(1)}, ..., x_{\sigma(d)}).

While direct summation over d!d! permutations is intractable, Lemma 4.1 represents ψ,i\psi_{\ell,i} as a linear combination of only O(d2d1)O(d\,2^{d-1}) exponentials of inner-product features Gξ(x)=s=1dξ2s1φs,is(xs)G_\xi(x) = \prod_{s=1}^d \xi^{2^{s-1} \varphi_{\ell_s, i_s}(x_s)}, ξ=1,...,D\xi = 1, ..., D, with Dd2d1D \leq d\,2^{d-1}. Recovery of the symmetrized output is performed through a Vandermonde-inverse linear layer.

Each GξG_\xi is approximated by feeding the scalar hat function φ,j()\varphi_{\ell,j}(\cdot) into a product-of-exponentials, using shallow squared-ReLU subnets and an O(logd)O(\log d)-deep binary-tree of ReLU-based bilinear blocks for the dd-fold product, requiring O(d)O(d) neurons. A final linear layer of width DD combines these channels, with global weight-sharing across combinatorial block types to ensure permutation invariance.

4. Mathematical Framework Underpinning Dimension-Free Rates

Key ingredients for dimension-free results include:

  • The energy-based sparse grid index set XnX_n, which replaces total-degree sets to reduce the dominant (logm)d1(\log m)^{d-1} term in the error estimate. This produces cardinality O(2ned)O(2^n e^d).
  • Exploiting permutation symmetry by aligning with ordered multi-indices (1...d\ell_1 \leq ... \leq \ell_d) and symmetrizing bases, resulting in Cs2nexp(π5d/3)C_s 2^n \exp(\pi\sqrt{5d/3}) distinct symmetric blocks (exponential in d\sqrt{d} only).
  • Realizing each symmetrized block ψ,i\psi_{\ell,i} via a squared-ReLU subnet of width 3d3(2d11)3d^3(2^{d-1}-1), depth log2d+2\lfloor \log_2 d\rfloor + 2, and O(d32d1logd)O(d^3 2^{d-1} \log d) parameters, attaining H1H^1-accuracy δ\delta.
  • By truncating to mCs2neπ5d/3m \sim C_s 2^n e^{\pi\sqrt{5d/3}} blocks and approximating each to error δm2\delta \approx m^{-2}, total H1H^1 error is O(m1)O(m^{-1}), yielding an algebraic, truly dimension-free rate.

The relevance lies in reducing the exponential cost normally expected in dd for generic approximators, making the approach scalable to high-dimensional symmetric problems.

5. Sample Complexity and Generalization Guarantees

For supervised learning of symmetric Korobov functions, let the target fϱXsym2,2(Ω)f_\varrho \in X^{2,2}_{\mathrm{sym}}(\Omega) satisfy fϱLL\|\nabla f_\varrho\|_{L^\infty} \leq L. Observed i.i.d. samples S={(xj,yj)}j=1MS = \{ (x_j, y_j) \}_{j=1}^M are distributed so that E[yx]=fϱ(x)\mathbb{E}[y|x] = \nabla f_\varrho(x) and yL\|y\| \leq L almost surely, and the hypothesis class Fm,LF_{m, L} is the set of symmetric networks described above.

The empirical risk minimizer

f^=argminfFm,L1Mj=1Mf(xj)yj2\hat{f} = \arg\min_{f \in F_{m, L}} \frac{1}{M} \sum_{j = 1}^M \| \nabla f(x_j) - y_j \|^2

admits the bound

ESf^fϱE2C((logM)2M)2/3+2inffFm,LffϱE2,\mathbb{E}_S \| \hat{f} - f_\varrho \|_E^2 \leq C \left( \frac{(\log M)^2}{M} \right)^{2/3} + 2 \inf_{f \in F_{m,L}} \| f - f_\varrho \|_E^2,

where CC is polynomial in dd, LL, fϱ2,2|f_\varrho|_{2,2}. By choosing mε1m \sim \varepsilon^{-1} and Mm3(logm)2ε3(log1/ε)2M \sim m^3 (\log m)^2 \sim \varepsilon^{-3} (\log 1/\varepsilon)^2, one achieves Ef^fϱE2=O(ε2)\mathbb{E}\| \hat{f} - f_\varrho \|_E^2 = O(\varepsilon^2). High-probability bounds are also established: for any δ>0\delta > 0, with probability at least 1δ1 - \delta,

f^fϱE=O((log(1/δ)/M)1/3)=O(ε).\| \hat{f} - f_\varrho \|_E = O\left( (\log(1/\delta)/M)^{1/3} \right) = O(\varepsilon).

A plausible implication is that learning symmetric function classes with deep networks can achieve sample and approximation efficiency competitive with classical statistical rates, with dimension-independent leading factors.

6. Implications and Significance for High-Dimensional Learning

The dimension-free results obtained for symmetric deep neural networks represent a substantial advance over previous approximation and generalization bounds, as both the convergence rates and constant prefactors scale at most polynomially with ambient dimension, as opposed to classical exponential dependencies. This suggests a scalable pathway for approximating physically or mathematically symmetric models, such as those in computational physics, finance, and chemistry.

The architectural insights—enforcing permutation invariance via sparse grid symmetrization and Vandermonde-based aggregation—may generalize to other domains requiring strict invariance under variable permutation, such as set-based models or particle-interaction networks. Broadly, the approach expands the class of feasible problems for neural approximation and learning in high-dimensional symmetric settings, and demonstrates that by carefully matching neural architecture to underlying function symmetry, one can eliminate a principal bottleneck traditionally faced by generic deep learning models (Lu et al., 16 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Symmetric Deep Neural Networks.