Tensor Field Networks (TFN) Overview
- Tensor Field Networks are neural architectures that achieve equivariance to 3D rotations, translations, and point permutations, enabling effective learning on arbitrarily oriented point clouds.
- They construct equivariant filters using spherical harmonics and Clebsch–Gordan coefficients, ensuring that each layer processes features (scalars, vectors, tensors) in a geometrically consistent manner.
- TFNs demonstrate robust performance in 3D shape classification, Newtonian physics tasks, and molecular chemistry predictions, often outperforming non-equivariant methods without rotational data augmentation.
Tensor Field Networks (TFNs) are neural network architectures designed to be locally equivariant to 3D rotations, translations, and point permutations at every layer. This geometric equivariance enables TFNs to identify features in arbitrary orientations without explicit data augmentation, providing a principled framework for learning on 3D point clouds. TFNs construct equivariant filters using spherical harmonics and guarantee that each layer processes and outputs features as scalars, vectors, or higher-order tensors in the geometric sense. These properties are utilized in diverse tasks spanning geometry, physics, and chemistry (Thomas et al., 2018).
1. Mathematical Foundations
A Tensor Field Network is fundamentally structured to respect the group of 3D Euclidean isometries, , which includes rotations (), translations, and point permutations. A linear map is a representation of if for all . TFN layers are equivariant if for all ,
Translation equivariance is achieved by designing layers such that , meaning translations act trivially on features not tied to coordinates. Rotation equivariance involves transforming the inputs by and features by corresponding representation matrices , causing outputs to rotate by .
SO(3) possesses one irreducible representation (irrep) for each integer , of dimension . Here, is termed the “rotation order”: (scalars), (3D vectors), (rank-2 symmetric traceless tensors), etc. Features for transform under rotation according to the real Wigner D-matrix :
Spherical harmonics provide an orthonormal basis for . They satisfy:
where is the unit vector corresponding to angles .
2. Filter Construction and Equivariant Convolutions
TFNs implement rotation-equivariant continuous convolutions on point clouds by constraining each filter of rotation order to the form:
Here, is the input feature's rotation order, indexes the learnable “channels,” and specifies components in the filter's irrep. The scalar radial function is shared across , where , with denoting relative spherical coordinates.
For each layer , features at point transform in the irrep. The output of order at point is:
specifies neighbors of (within a cutoff). Clebsch-Gordan coefficients project the tensor product of and irreps onto the irreducible component, enforcing correct rotational transformation of outputs.
The construction ensures that under a global rotation, the output features transform according to . The equivariance proof relies on the transformation properties of filters, input features, and Clebsch-Gordan projectors.
3. Feature Representations and Tensor Products
TFN features at each layer are organized as lists or dictionaries indexed by , yielding arrays of shape . Under rotation, feature vectors for each and channel transform by .
Convolutions of -order features with -order filters perform tensor products, which decompose according to . Clebsch-Gordan coefficients enforce this decomposition, ensuring outputs consistently transform as irreducible representations dictated by SO(3) structure.
4. Network Architecture and Nonlinearities
TFN architectures stack multiple equivariant blocks, each comprising:
- Equivariant point convolutions, combining paths.
- Self-interaction layers (“ convolution”), where channels within each are mixed via learnable weights (no dependence on ).
- Equivariant nonlinearities.
For (scalars), ordinary nonlinearities are used. For (vectors or higher tensors), TFNs employ norm-based nonlinearities. For each feature channel , compute the norm
Apply a scalar nonlinearity to the norm: , and scale the original feature:
Since acts only on the rotation-invariant norm and the bias is a scalar, SO(3)-equivariance is preserved.
Translation equivariance is achieved by exclusively operating on relative vectors , never absolute coordinates. Self-interactions and nonlinearities are pointwise and commute with translations, ensuring the entire network maintains translation equivariance.
5. Implementation Details
TFN equivariant convolution layers operate via the following steps for each point :
- Identify neighbor list within a cutoff.
- For each neighbor :
- Compute , its magnitude , and unit direction .
- Evaluate radial MLP basis for each needed .
- Evaluate spherical harmonics up to the maximal .
- Multiply to obtain .
- For each input order , filter order , and output order :
- Contract with and CGC:
This triple sum is batched and parallelized on GPU. Precomputing Clebsch-Gordan coefficient tables and spherical harmonics accelerates computations.
Radial functions are parameterized by small MLPs applied to a fixed basis , such as Gaussian functions. Each radial function shares the same basis but has independent MLP weights.
6. Experimental Results
TFNs have been demonstrated across several domains:
- Geometry (3D shape classification): On toy “3D Tetris” blocks, TFN achieves 100% accuracy without rotational augmentation. On ModelNet40, TFN achieves 80–85% accuracy without rotational data augmentation. Standard non-equivariant networks degrade to random performance under arbitrary rotations.
- Physics (Newtonian acceleration and moment of inertia): For Newtonian acceleration, inputting random point masses , TFN predicts the vector acceleration , learning the law precisely. For moment of inertia, TFN predicts symmetric tensor ; learned channels and match analytical forms.
- Chemistry (missing atom generation on QM9): For molecules (up to 29 atoms), the task is to infer a missing atom's type and position. Each atom emits a scalar probability and vector ; the prediction is . After training on 1,000 molecules, TFN generalizes to larger cases with 95–97% correct atom-type+position (≤0.5 Å) and 0.2 Å MAE in position.
TFNs outperform or match non-equivariant baselines across all experiments, with no rotational data augmentation and often fewer parameters (Thomas et al., 2018).
| Domain | Task | TFN Performance |
|---|---|---|
| Geometry | 3D Tetris, ModelNet40 classification | 100% (toy), ~80–85% (ModelNet40) |
| Physics | Newtonian acceleration, moment of inertia | Recovers analytic kernels |
| Chemistry | QM9 missing atom prediction | 95–97% correct, <0.2 Å MAE |