Papers
Topics
Authors
Recent
2000 character limit reached

Clebsch–Gordan Transformer

Updated 29 November 2025
  • Clebsch–Gordan Transformer is a neural network architecture that integrates global attention with exact SO(3) equivariance and permutation symmetry for geometric deep learning.
  • It employs Clebsch–Gordan decomposition in both nonlinearity and attention mechanisms to achieve subquadratic computational complexity and support high-order rotation group representations.
  • It demonstrates state-of-the-art performance on tasks such as N-body simulation and molecular property prediction while offering notable improvements in speed and memory efficiency.

The Clebsch–Gordan Transformer (CGT) is a neural network architecture that unifies global attention, exact $\SO(3)$ equivariance, and permutation symmetry for geometric deep learning. By employing the Clebsch–Gordan decomposition in both its nonlinearity and attention mechanism, the CGT achieves subquadratic computational complexity in the number of tokens and supports high-order representation of rotation groups, outperforming previous equivariant transformers in diverse 3D learning tasks (Howell et al., 28 Sep 2025, Kondor et al., 2018).

1. Mathematical Foundation: Clebsch–Gordan Decomposition and $\SO(3)$ Equivariance

The core mathematical operation underlying the CGT is the Clebsch–Gordan (CG) decomposition. For irreducible representations (irreps) V1V^{\ell_1} and V2V^{\ell_2} of $\SO(3)$, the tensor (Kronecker) product decomposes as:

V1V2=121+2V,V^{\ell_1} \otimes V^{\ell_2} \cong \bigoplus_{\ell=|\ell_1-\ell_2|}^{\ell_1+\ell_2} V^\ell,

implemented by sparse CG matrices C12C^{\ell}_{\ell_1 \ell_2} whose entries vanish unless the angular momentum selection rule m=m1+m2m = m_1 + m_2 and parity constraints are satisfied. This structure enables construction of layers that are strictly equivariant under 3D rotations: each feature transforms according to a specific VV^\ell, and all interactions among features are mediated by the algebra of irreps and the CG coefficients (Howell et al., 28 Sep 2025, Kondor et al., 2018).

2. Clebsch–Gordan Attention: Global $\SO(3)$-Equivariant Correlation

CGT replaces standard dot-product attention with "Clebsch–Gordan Convolution," a global equivariant correlation structured on spherical harmonic tensor fields. For NN tokens, each carrying features fiR(2+1)×mf_i^\ell\in\mathbb{R}^{(2\ell+1)\times m_\ell} for =0,,L\ell=0,\dots,L, query, key, and value projections yield:

qi, ki, vi=WQ(fi), WK(fi), WV(fi)q_i^\ell,~k_i^\ell,~v_i^\ell = W_Q^\ell(f_i),~W_K^\ell(f_i),~W_V^\ell(f_i)

The key global correlation is computed as:

(qk)iJ=CJj=1Nqjkij(q^\ell \star k^{\ell'})_i^J = C^J_{\ell\,\ell'} \sum_{j=1}^N q_j^\ell \otimes k_{i-j}^{\ell'}

and the total output at each order JJ is:

u^iJ==0L=0L(qk)iJ\hat u_i^J = \sum_{\ell=0}^{L} \sum_{\ell'=0}^{L} (q^\ell\star k^{\ell'})^J_i

To avoid O(N2)\mathcal{O}(N^2) cost, the token dimension is handled with a Fast Fourier Transform (FFT):

q^(ω)=i=1Ne2πiωi/Nqi\hat q^\ell(\omega) = \sum_{i=1}^N e^{-2\pi i \omega i/N} q_i^\ell

with the batched CG-tensor operation and an inverse FFT restoring the spatial domain, achieving O(NlogN)\mathcal{O}(N \log N) scaling (Howell et al., 28 Sep 2025).

3. Nonlinearity: Clebsch–Gordan Product as Equivariant Activation

In contrast to pointwise nonlinearities that break Fourier domain structure, Clebsch–Gordan-based architectures—including CGT and Clebsch–Gordan Nets—employ the quadratic, equivariant tensor product followed by projection via CG coefficients. For fragments f1sC21+1f^{s}_{\ell_1} \in \mathbb{C}^{2\ell_1+1}, f2sC22+1f^{s}_{\ell_2} \in \mathbb{C}^{2\ell_2+1}, the coupled output is

[g1,2Ls]M=m1=11m2=221m1;2m2LMf1,m1sf2,m2s[g^{s}_{\ell_1,\ell_2 \to L}]_M = \sum_{m_1 = -\ell_1}^{\ell_1} \sum_{m_2 = -\ell_2}^{\ell_2} \langle \ell_1 m_1; \ell_2 m_2 | L M \rangle f^{s}_{\ell_1, m_1} f^{s}_{\ell_2, m_2}

ensuring each output channel transforms as irreducible LL and the mapping is exactly $\SO(3)$-equivariant (Kondor et al., 2018).

4. Efficiency: Exploiting Sparsity and Computational Complexity

A naïve all-pairs tensor product would result in O(L6)\mathcal{O}(L^6) scaling for the harmonic degree, but the selection rules and sparsity of the CG matrices reduce this to O(L3)\mathcal{O}(L^3). Each CJC^J_{\ell\ell'} contains only (2+1)(2+1)(2\ell+1)(2\ell'+1) nonzero elements. Combined with the FFT on the token axis, the complete attention block achieves:

  • O(NlogN)\mathcal{O}(N\log N) in input size NN
  • O(L3)\mathcal{O}(L^3) in maximum harmonic order LL
  • Linear memory O(N+L2)\mathcal{O}(N+L^2) per layer

This efficiency enables practical deployment with high-order irreps (L6L\geq 6) on modern GPU hardware (Howell et al., 28 Sep 2025).

5. Permutation Equivariance and Set Structure

FFT-based global attention inherently breaks the permutation symmetry required for processing sets. CGT addresses this with two strategies:

  • Filter Weight-Tying Across Frequencies: Ensures learned filters commute with arbitrary permutations.
  • Data Augmentation and Regularization: Incorporates random token permutations and permutation-equivariant losses (e.g., via Deep Sets or spectral graph attention formulations).

Alternatively, replacing the token-axis FFT with a Graph Fourier Transform on the Laplacian yields strict permutation equivariance, as Laplacian eigenvectors transform naturally under permutations (Howell et al., 28 Sep 2025).

6. Empirical Performance and Applications

CGT demonstrates state-of-the-art or superior results in domains requiring both $\SO(3)$-equivariance and global context:

Task Metric CGT Result Baseline Comparison
N-body Simulation MSEx_x, MSEv_v 0.0041/0.0065 SE(3)-Transformer: 0.0076/0.075
QM9 Molecular Properties MAEμ_\mu 0.21 D SE(3)-Transformer: 0.51 D
ModelNet40 Classification Accuracy 89.3% SEGNN: 90.5%; SE(3)-Transformer: 88.1%
Robotic Grasping (2048 pts) Rot. Error 0.025 rad DGCNN: 0.031

CGT further achieves significant memory and speed improvements versus local or low-order equivariant attention methods; for example, at N=20N=20 tokens, 8 GB GPU memory versus 12 GB for SE(3)-Transformer, and 1.8×1.8\times throughput at N=40N=40. CGT also scales to 4096-point clouds, where SE(3)-Transformer is out-of-memory (Howell et al., 28 Sep 2025).

7. Limitations and Future Research Directions

The principal constraints of Clebsch–Gordan Transformers include:

  • The cubic O(L3)\mathcal{O}(L^3) scaling in harmonic order, which can be limiting if L10L \gg 10 is required.
  • Small numerical errors due to FFT-based attention may arise in very long token sequences.

Proposed research extensions involve:

  • Multi-scale schemes (e.g., fast multipole or Barnes–Hut style) to approach linear O(N)\mathcal{O}(N) scaling.
  • Learned or sparse/low-rank CG convolutions to further reduce harmonic cost towards O(L2logL)\mathcal{O}(L^2 \log L).
  • Integrating continuous-depth or state-space layers for seamless modeling of long-range interactions (Howell et al., 28 Sep 2025).

8. Relation to Clebsch–Gordan Nets and Generalization

Clebsch–Gordan Nets extend the same tensor-product nonlinearity to fully Fourier-space, $\SO(3)$-equivariant spherical CNNs, using precomputed CG matrices for transformation between irreps and omitting forward/inverse Fourier transforms after initialization. This methodology generalizes to any compact group whose irreducible representations and CG coefficients are known, supporting construction of G-equivariant neural architectures with precise symmetry guarantees (Kondor et al., 2018).

In summary, the Clebsch–Gordan Transformer establishes a scalable, expressive, and symmetry-capable foundation for deep learning on geometric and physical data, demonstrating efficient attention and nonlinearity mechanisms applicable to a wide range of scientific and computational domains (Howell et al., 28 Sep 2025, Kondor et al., 2018).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Clebsch-Gordan Transformer.