Tensor Field Networks (TFN) Overview

Updated 20 December 2025

Tensor Field Networks are neural architectures that achieve equivariance to 3D rotations, translations, and point permutations, enabling effective learning on arbitrarily oriented point clouds.
They construct equivariant filters using spherical harmonics and Clebsch–Gordan coefficients, ensuring that each layer processes features (scalars, vectors, tensors) in a geometrically consistent manner.
TFNs demonstrate robust performance in 3D shape classification, Newtonian physics tasks, and molecular chemistry predictions, often outperforming non-equivariant methods without rotational data augmentation.

Tensor Field Networks (TFNs) are neural network architectures designed to be locally equivariant to 3D rotations, translations, and point permutations at every layer. This geometric equivariance enables TFNs to identify features in arbitrary orientations without explicit data augmentation, providing a principled framework for learning on 3D point clouds. TFNs construct equivariant filters using spherical harmonics and guarantee that each layer processes and outputs features as scalars, vectors, or higher-order tensors in the geometric sense. These properties are utilized in diverse tasks spanning geometry, physics, and chemistry (Thomas et al., 2018).

1. Mathematical Foundations

A Tensor Field Network is fundamentally structured to respect the group of 3D Euclidean isometries, $G=E(3)$ , which includes rotations ( $SO(3)$ ), translations, and point permutations. A linear map $D(g): V \to V$ is a representation of $G$ if $D(g_1)D(g_2) = D(g_1g_2)$ for all $g_1, g_2 \in G$ . TFN layers $L: \mathcal{X} \rightarrow \mathcal{Y}$ are equivariant if for all $g\in G$ ,

$L \circ D^{\mathcal{X}}(g) = D^{\mathcal{Y}}(g) \circ L$

Translation equivariance is achieved by designing layers such that $L(x+t) = L(x)+t$ , meaning translations act trivially on features not tied to coordinates. Rotation equivariance involves transforming the inputs $x$ by $R\in SO(3)$ and features by corresponding representation matrices $D^{\mathcal{X}}(R)$ , causing outputs to rotate by $D^{\mathcal{Y}}(R)$ .

SO(3) possesses one irreducible representation (irrep) for each integer $\ell \geq 0$ , of dimension $2\ell+1$ . Here, $\ell$ is termed the “rotation order”: $\ell=0$ (scalars), $\ell=1$ (3D vectors), $\ell=2$ (rank-2 symmetric traceless tensors), etc. Features $f^{(\ell)}_m$ for $m=-\ell,\dots,+\ell$ transform under rotation $R$ according to the real Wigner D-matrix $D^{(\ell)}(R)$ :

$f^{(\ell)}_m \to \sum_{m'=-\ell}^{\ell} D^{(\ell)}_{mm'}(R) f^{(\ell)}_{m'}$

Spherical harmonics $Y^\ell_m(\theta,\phi)$ provide an orthonormal basis for $L^2(S^2)$ . They satisfy:

$Y^\ell_m(R\hat{r}) = \sum_{m'=-\ell}^{\ell} D^{(\ell)}_{mm'}(R) Y^\ell_{m'}(\hat{r}),$

where $\hat{r}$ is the unit vector corresponding to angles $\theta, \phi$ .

2. Filter Construction and Equivariant Convolutions

TFNs implement rotation-equivariant continuous convolutions on point clouds by constraining each filter of rotation order $\ell_f$ to the form:

$F^{(\ell_f, \ell_i)}_{c m}(r, \theta, \phi) = R^{(\ell_f, \ell_i)}_c(r) Y^{\ell_f}_m(\theta, \phi)$

Here, $\ell_i$ is the input feature's rotation order, $c$ indexes the learnable “channels,” and $m=-\ell_f,...,+\ell_f$ specifies components in the filter's irrep. The scalar radial function $R^{(\ell_f,\ell_i)}_c(r)$ is shared across $m$ , where $r=\|x_j - x_i\|$ , with $(\theta, \phi)$ denoting relative spherical coordinates.

For each layer $L$ , features $f^{(\ell_i)}_{j,c,m}$ at point $x_j$ transform in the $\ell_i$ irrep. The output of order $\ell_o$ at point $i$ is:

$(L f)^{(\ell_o)}_{i,c_o,m_o} = \sum_{\ell_i, \ell_f} \sum_{c_i,m_i} \sum_{c_f,m_f} C^{(\ell_o,m_o)}_{(\ell_f,m_f),(\ell_i,m_i)} \sum_{j \in \mathcal{N}(i)} F^{(\ell_f,\ell_i)}_{c_f,m_f}(x_j - x_i) f^{(\ell_i)}_{j, c_i, m_i}$

$\mathcal{N}(i)$ specifies neighbors of $i$ (within a cutoff). Clebsch-Gordan coefficients $C^{(\ell_o, m_o)}$ project the tensor product of $\ell_f$ and $\ell_i$ irreps onto the irreducible $\ell_o$ component, enforcing correct rotational transformation of outputs.

The construction ensures that under a global rotation, the output features transform according to $D^{(\ell_o)}(R)$ . The equivariance proof relies on the transformation properties of filters, input features, and Clebsch-Gordan projectors.

3. Feature Representations and Tensor Products

TFN features at each layer are organized as lists or dictionaries indexed by $\ell$ , yielding arrays of shape $[\#\text{points}, \#\text{channels}_\ell, 2\ell+1]$ . Under rotation, feature vectors $f^{(\ell)}_{c, m=-\ell\ldots\ell}$ for each $\ell$ and channel $c$ transform by $D^{(\ell)}(R)$ .

Convolutions of $\ell_i$ -order features with $\ell_f$ -order filters perform tensor products, which decompose according to $\ell_o = |\ell_f - \ell_i|, ..., \ell_f+\ell_i$ . Clebsch-Gordan coefficients enforce this decomposition, ensuring outputs consistently transform as irreducible representations dictated by SO(3) structure.

4. Network Architecture and Nonlinearities

TFN architectures stack multiple equivariant blocks, each comprising:

Equivariant point convolutions, combining $\ell_i \to \ell_o$ paths.
Self-interaction layers (“ $1 \times 1$ convolution”), where channels within each $\ell$ are mixed via learnable weights $W_\ell$ (no dependence on $m$ ).
Equivariant nonlinearities.

For $\ell=0$ (scalars), ordinary nonlinearities $\eta(\cdot + b)$ are used. For $\ell > 0$ (vectors or higher tensors), TFNs employ norm-based nonlinearities. For each feature channel $f^{(\ell)}_{c,m}$ , compute the norm

$\lVert f^{(\ell)}_{c} \rVert = \sqrt{\sum_{m=-\ell}^{\ell} |f^{(\ell)}_{c,m}|^2}$

Apply a scalar nonlinearity $\eta$ to the norm: $\eta(\lVert f \rVert + b_c)$ , and scale the original feature:

$f^{(\ell)}_{c,m} \leftarrow \eta(\lVert f \rVert + b_c) \cdot \frac{f^{(\ell)}_{c,m}}{\lVert f \rVert}$

Since $\eta$ acts only on the rotation-invariant norm and the bias is a scalar, SO(3)-equivariance is preserved.

Translation equivariance is achieved by exclusively operating on relative vectors $r_{ij} = x_j-x_i$ , never absolute coordinates. Self-interactions and nonlinearities are pointwise and commute with translations, ensuring the entire network maintains translation equivariance.

5. Implementation Details

TFN equivariant convolution layers operate via the following steps for each point $i$ :

Identify neighbor list $\mathcal{N}(i)$ within a cutoff.
For each neighbor $j\in\mathcal{N}(i)$ $j \in N (i)$ :
- Compute $r = x_j - x_i$ , its magnitude $r=|r|$ , and unit direction $\hat{r}$ .
- Evaluate radial MLP basis $R^{(\ell_f, \ell_i)}_c(r)$ for each needed $\ell_f,\ell_i,c$ .
- Evaluate spherical harmonics $Y^{\ell_f}_m(\hat{r})$ up to the maximal $\ell_f$ .
- Multiply to obtain $F^{(\ell_f, \ell_i)}_{c,m}(r) = R(r) Y_m(\hat{r})$ .
For each input order $\ell_i$ $ℓ_{i}$ , filter order $\ell_f$ $ℓ_{f}$ , and output order $\ell_o$ $ℓ_{o}$ :
- Contract $f^{(\ell_i)}_{j, c_i, m_i}$ with $F^{(\ell_f, \ell_i)}_{c_f, m_f}$ and CGC:
$\text{out}^{(\ell_o)}_{i, c_o, m_o} += \sum_{m_i, m_f} C^{(\ell_o, m_o)}_{(\ell_f, m_f), (\ell_i, m_i)} F^{(\ell_f, \ell_i)}_{c_f, m_f} \cdot f^{(\ell_i)}_{j, c_i, m_i}$

This triple sum is batched and parallelized on GPU. Precomputing Clebsch-Gordan coefficient tables and spherical harmonics accelerates computations.

Radial functions $R^{(\ell_f, \ell_i)}_c(r)$ are parameterized by small MLPs applied to a fixed basis $\phi_k(r)$ , such as Gaussian functions. Each radial function shares the same basis but has independent MLP weights.

6. Experimental Results

TFNs have been demonstrated across several domains:

Geometry (3D shape classification): On toy “3D Tetris” blocks, TFN achieves 100% accuracy without rotational augmentation. On ModelNet40, TFN achieves $\sim$ 80–85% accuracy without rotational data augmentation. Standard non-equivariant networks degrade to random performance under arbitrary rotations.
Physics (Newtonian acceleration and moment of inertia): For Newtonian acceleration, inputting random point masses $\{x_j, m_j\}$ , TFN predicts the vector acceleration $a_i = -\sum_{j\ne i} m_j (x_i - x_j)/\|x_i - x_j\|^3$ , learning the $1/r^2$ law precisely. For moment of inertia, TFN predicts symmetric tensor $I = \sum_j m_j [(\|x_j\|^2)I_3 - x_j x_j^T]$ ; learned channels $R^{(0)}(r) = 2r^2/3$ and $R^{(2)}(r) = -r^2$ match analytical forms.
Chemistry (missing atom generation on QM9): For molecules (up to 29 atoms), the task is to infer a missing atom's type and position. Each atom emits a scalar probability $p_a$ and vector $\delta_a$ ; the prediction is $\sum_a p_a (x_a + \delta_a)$ . After training on 1,000 molecules, TFN generalizes to larger cases with $\sim$ 95–97% correct atom-type+position (≤0.5 Å) and $<$ 0.2 Å MAE in position.

TFNs outperform or match non-equivariant baselines across all experiments, with no rotational data augmentation and often fewer parameters (Thomas et al., 2018).

Domain	Task	TFN Performance
Geometry	3D Tetris, ModelNet40 classification	100% (toy), ~80–85% (ModelNet40)
Physics	Newtonian acceleration, moment of inertia	Recovers analytic kernels
Chemistry	QM9 missing atom prediction	95–97% correct, <0.2 Å MAE

PDF Markdown Chat (Pro)

References (1)

Tensor field networks: Rotation- and translation-equivariant neural networks for 3D point clouds (2018)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Tensor Field Networks (TFN).