Papers
Topics
Authors
Recent
2000 character limit reached

Tensor Field Networks (TFN) Overview

Updated 20 December 2025
  • Tensor Field Networks are neural architectures that achieve equivariance to 3D rotations, translations, and point permutations, enabling effective learning on arbitrarily oriented point clouds.
  • They construct equivariant filters using spherical harmonics and Clebsch–Gordan coefficients, ensuring that each layer processes features (scalars, vectors, tensors) in a geometrically consistent manner.
  • TFNs demonstrate robust performance in 3D shape classification, Newtonian physics tasks, and molecular chemistry predictions, often outperforming non-equivariant methods without rotational data augmentation.

Tensor Field Networks (TFNs) are neural network architectures designed to be locally equivariant to 3D rotations, translations, and point permutations at every layer. This geometric equivariance enables TFNs to identify features in arbitrary orientations without explicit data augmentation, providing a principled framework for learning on 3D point clouds. TFNs construct equivariant filters using spherical harmonics and guarantee that each layer processes and outputs features as scalars, vectors, or higher-order tensors in the geometric sense. These properties are utilized in diverse tasks spanning geometry, physics, and chemistry (Thomas et al., 2018).

1. Mathematical Foundations

A Tensor Field Network is fundamentally structured to respect the group of 3D Euclidean isometries, G=E(3)G=E(3), which includes rotations (SO(3)SO(3)), translations, and point permutations. A linear map D(g):VVD(g): V \to V is a representation of GG if D(g1)D(g2)=D(g1g2)D(g_1)D(g_2) = D(g_1g_2) for all g1,g2Gg_1, g_2 \in G. TFN layers L:XYL: \mathcal{X} \rightarrow \mathcal{Y} are equivariant if for all gGg\in G,

LDX(g)=DY(g)LL \circ D^{\mathcal{X}}(g) = D^{\mathcal{Y}}(g) \circ L

Translation equivariance is achieved by designing layers such that L(x+t)=L(x)+tL(x+t) = L(x)+t, meaning translations act trivially on features not tied to coordinates. Rotation equivariance involves transforming the inputs xx by RSO(3)R\in SO(3) and features by corresponding representation matrices DX(R)D^{\mathcal{X}}(R), causing outputs to rotate by DY(R)D^{\mathcal{Y}}(R).

SO(3) possesses one irreducible representation (irrep) for each integer 0\ell \geq 0, of dimension 2+12\ell+1. Here, \ell is termed the “rotation order”: =0\ell=0 (scalars), =1\ell=1 (3D vectors), =2\ell=2 (rank-2 symmetric traceless tensors), etc. Features fm()f^{(\ell)}_m for m=,,+m=-\ell,\dots,+\ell transform under rotation RR according to the real Wigner D-matrix D()(R)D^{(\ell)}(R):

fm()m=Dmm()(R)fm()f^{(\ell)}_m \to \sum_{m'=-\ell}^{\ell} D^{(\ell)}_{mm'}(R) f^{(\ell)}_{m'}

Spherical harmonics Ym(θ,ϕ)Y^\ell_m(\theta,\phi) provide an orthonormal basis for L2(S2)L^2(S^2). They satisfy:

Ym(Rr^)=m=Dmm()(R)Ym(r^),Y^\ell_m(R\hat{r}) = \sum_{m'=-\ell}^{\ell} D^{(\ell)}_{mm'}(R) Y^\ell_{m'}(\hat{r}),

where r^\hat{r} is the unit vector corresponding to angles θ,ϕ\theta, \phi.

2. Filter Construction and Equivariant Convolutions

TFNs implement rotation-equivariant continuous convolutions on point clouds by constraining each filter of rotation order f\ell_f to the form:

Fcm(f,i)(r,θ,ϕ)=Rc(f,i)(r)Ymf(θ,ϕ)F^{(\ell_f, \ell_i)}_{c m}(r, \theta, \phi) = R^{(\ell_f, \ell_i)}_c(r) Y^{\ell_f}_m(\theta, \phi)

Here, i\ell_i is the input feature's rotation order, cc indexes the learnable “channels,” and m=f,...,+fm=-\ell_f,...,+\ell_f specifies components in the filter's irrep. The scalar radial function Rc(f,i)(r)R^{(\ell_f,\ell_i)}_c(r) is shared across mm, where r=xjxir=\|x_j - x_i\|, with (θ,ϕ)(\theta, \phi) denoting relative spherical coordinates.

For each layer LL, features fj,c,m(i)f^{(\ell_i)}_{j,c,m} at point xjx_j transform in the i\ell_i irrep. The output of order o\ell_o at point ii is:

(Lf)i,co,mo(o)=i,fci,micf,mfC(f,mf),(i,mi)(o,mo)jN(i)Fcf,mf(f,i)(xjxi)fj,ci,mi(i)(L f)^{(\ell_o)}_{i,c_o,m_o} = \sum_{\ell_i, \ell_f} \sum_{c_i,m_i} \sum_{c_f,m_f} C^{(\ell_o,m_o)}_{(\ell_f,m_f),(\ell_i,m_i)} \sum_{j \in \mathcal{N}(i)} F^{(\ell_f,\ell_i)}_{c_f,m_f}(x_j - x_i) f^{(\ell_i)}_{j, c_i, m_i}

N(i)\mathcal{N}(i) specifies neighbors of ii (within a cutoff). Clebsch-Gordan coefficients C(o,mo)C^{(\ell_o, m_o)} project the tensor product of f\ell_f and i\ell_i irreps onto the irreducible o\ell_o component, enforcing correct rotational transformation of outputs.

The construction ensures that under a global rotation, the output features transform according to D(o)(R)D^{(\ell_o)}(R). The equivariance proof relies on the transformation properties of filters, input features, and Clebsch-Gordan projectors.

3. Feature Representations and Tensor Products

TFN features at each layer are organized as lists or dictionaries indexed by \ell, yielding arrays of shape [#points,#channels,2+1][\#\text{points}, \#\text{channels}_\ell, 2\ell+1]. Under rotation, feature vectors fc,m=()f^{(\ell)}_{c, m=-\ell\ldots\ell} for each \ell and channel cc transform by D()(R)D^{(\ell)}(R).

Convolutions of i\ell_i-order features with f\ell_f-order filters perform tensor products, which decompose according to o=fi,...,f+i\ell_o = |\ell_f - \ell_i|, ..., \ell_f+\ell_i. Clebsch-Gordan coefficients enforce this decomposition, ensuring outputs consistently transform as irreducible representations dictated by SO(3) structure.

4. Network Architecture and Nonlinearities

TFN architectures stack multiple equivariant blocks, each comprising:

  • Equivariant point convolutions, combining io\ell_i \to \ell_o paths.
  • Self-interaction layers (“1×11 \times 1 convolution”), where channels within each \ell are mixed via learnable weights WW_\ell (no dependence on mm).
  • Equivariant nonlinearities.

For =0\ell=0 (scalars), ordinary nonlinearities η(+b)\eta(\cdot + b) are used. For >0\ell > 0 (vectors or higher tensors), TFNs employ norm-based nonlinearities. For each feature channel fc,m()f^{(\ell)}_{c,m}, compute the norm

fc()=m=fc,m()2\lVert f^{(\ell)}_{c} \rVert = \sqrt{\sum_{m=-\ell}^{\ell} |f^{(\ell)}_{c,m}|^2}

Apply a scalar nonlinearity η\eta to the norm: η(f+bc)\eta(\lVert f \rVert + b_c), and scale the original feature:

fc,m()η(f+bc)fc,m()ff^{(\ell)}_{c,m} \leftarrow \eta(\lVert f \rVert + b_c) \cdot \frac{f^{(\ell)}_{c,m}}{\lVert f \rVert}

Since η\eta acts only on the rotation-invariant norm and the bias is a scalar, SO(3)-equivariance is preserved.

Translation equivariance is achieved by exclusively operating on relative vectors rij=xjxir_{ij} = x_j-x_i, never absolute coordinates. Self-interactions and nonlinearities are pointwise and commute with translations, ensuring the entire network maintains translation equivariance.

5. Implementation Details

TFN equivariant convolution layers operate via the following steps for each point ii:

  • Identify neighbor list N(i)\mathcal{N}(i) within a cutoff.
  • For each neighbor jN(i)j\in\mathcal{N}(i):
    • Compute r=xjxir = x_j - x_i, its magnitude r=rr=|r|, and unit direction r^\hat{r}.
    • Evaluate radial MLP basis Rc(f,i)(r)R^{(\ell_f, \ell_i)}_c(r) for each needed f,i,c\ell_f,\ell_i,c.
    • Evaluate spherical harmonics Ymf(r^)Y^{\ell_f}_m(\hat{r}) up to the maximal f\ell_f.
    • Multiply to obtain Fc,m(f,i)(r)=R(r)Ym(r^)F^{(\ell_f, \ell_i)}_{c,m}(r) = R(r) Y_m(\hat{r}).
  • For each input order i\ell_i, filter order f\ell_f, and output order o\ell_o:

    • Contract fj,ci,mi(i)f^{(\ell_i)}_{j, c_i, m_i} with Fcf,mf(f,i)F^{(\ell_f, \ell_i)}_{c_f, m_f} and CGC:

    outi,co,mo(o)+=mi,mfC(f,mf),(i,mi)(o,mo)Fcf,mf(f,i)fj,ci,mi(i)\text{out}^{(\ell_o)}_{i, c_o, m_o} += \sum_{m_i, m_f} C^{(\ell_o, m_o)}_{(\ell_f, m_f), (\ell_i, m_i)} F^{(\ell_f, \ell_i)}_{c_f, m_f} \cdot f^{(\ell_i)}_{j, c_i, m_i}

This triple sum is batched and parallelized on GPU. Precomputing Clebsch-Gordan coefficient tables and spherical harmonics accelerates computations.

Radial functions Rc(f,i)(r)R^{(\ell_f, \ell_i)}_c(r) are parameterized by small MLPs applied to a fixed basis ϕk(r)\phi_k(r), such as Gaussian functions. Each radial function shares the same basis but has independent MLP weights.

6. Experimental Results

TFNs have been demonstrated across several domains:

  • Geometry (3D shape classification): On toy “3D Tetris” blocks, TFN achieves 100% accuracy without rotational augmentation. On ModelNet40, TFN achieves \sim80–85% accuracy without rotational data augmentation. Standard non-equivariant networks degrade to random performance under arbitrary rotations.
  • Physics (Newtonian acceleration and moment of inertia): For Newtonian acceleration, inputting random point masses {xj,mj}\{x_j, m_j\}, TFN predicts the vector acceleration ai=jimj(xixj)/xixj3a_i = -\sum_{j\ne i} m_j (x_i - x_j)/\|x_i - x_j\|^3, learning the 1/r21/r^2 law precisely. For moment of inertia, TFN predicts symmetric tensor I=jmj[(xj2)I3xjxjT]I = \sum_j m_j [(\|x_j\|^2)I_3 - x_j x_j^T]; learned channels R(0)(r)=2r2/3R^{(0)}(r) = 2r^2/3 and R(2)(r)=r2R^{(2)}(r) = -r^2 match analytical forms.
  • Chemistry (missing atom generation on QM9): For molecules (up to 29 atoms), the task is to infer a missing atom's type and position. Each atom emits a scalar probability pap_a and vector δa\delta_a; the prediction is apa(xa+δa)\sum_a p_a (x_a + \delta_a). After training on 1,000 molecules, TFN generalizes to larger cases with \sim95–97% correct atom-type+position (≤0.5 Å) and <<0.2 Å MAE in position.

TFNs outperform or match non-equivariant baselines across all experiments, with no rotational data augmentation and often fewer parameters (Thomas et al., 2018).


Domain Task TFN Performance
Geometry 3D Tetris, ModelNet40 classification 100% (toy), ~80–85% (ModelNet40)
Physics Newtonian acceleration, moment of inertia Recovers analytic kernels
Chemistry QM9 missing atom prediction 95–97% correct, <0.2 Å MAE
Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Tensor Field Networks (TFN).