Geometric Algebra Transformer (GATr)

Updated 31 October 2025

The paper demonstrates that GATr incorporates projective geometric algebra into Transformers to natively encode 3D primitives and maintain Euclidean equivariance.
It employs equivariant linear layers and multivector attention to ensure symmetry-preserving interactions and robust geometric reasoning.
LaB-GATr extends this framework with geometric tokenization and learned interpolation, enabling scalable processing of high-fidelity biomedical meshes.

A Geometric Algebra Transformer (GATr) is a neural network architecture that integrates projective geometric algebra into the Transformer paradigm, enabling efficient, symmetry-equivariant modeling of geometric data. Its design allows native encoding and manipulation of 3D primitives such as points, planes, and transformations, and is structured to maintain equivariance under the full Euclidean group, including rotations, translations, and reflections. Recent extensions—including LaB-GATr—address scalability for extremely high-fidelity meshes in biomedical applications, combining geometric tokenization and equivariant interpolation without alignment pre-processing (Suk et al., 12 Mar 2024).

1. Mathematical Foundations and Geometric Algebra Representation

GATr operates on the geometric algebra $G(3,0,1)$ , a 16-dimensional algebra with a 4D basis $\{e_0, e_1, e_2, e_3\}$ , supporting projective geometry for points, lines, planes, and geometric transformations. In this model:

Multivectors encode geometric primitives:
- Points $\rho \in \mathbb{R}^3$ are embedded in trivector components.
- Planes, directions, translations, and rotations have distinct grade mappings.
Projective coordinate ( $e_0$ ): Enables linear representation of translations via projective embeddings.
Operators and group actions: Euclidean transformations are represented as versors (products of vectors), acting on geometric data via the sandwich product:

$\rho_u(x) = \begin{cases} u x u^{-1}, & \text{if $u$ is even} \ u \hat{x} u^{-1}, & \text{if $u$ is odd} \end{cases}$

where $\hat{x}$ is the grade involution.

This algebraic scheme facilitates the encoding of both objects and their transformations within a single vector space, crucial for efficient and physically faithful geometric reasoning in learning tasks.

2. Transformer Architecture in $G(3,0,1)$

GATr adapts the standard Transformer framework to operate in multivector space. Its layers and operations are rigorously designed to preserve Euclidean symmetry:

Equivariant Linear Layers:

$\phi(x) = \sum_{k=0}^{4} w_k \langle x \rangle_k + \sum_{k=0}^{3} v_k e_0 \langle x \rangle_k$

Grade projections $\langle x \rangle_k$ ensure algebraic type preservation.

Multivector Attention:

$\text{Attention}(Q, K, V)_{i'c'} = \sum_i \text{Softmax}_i \left( \frac{\sum_c \langle Q_{i'c'}, K_{ic'} \rangle}{\sqrt{8 n_c}} \right) V_{ic'}$

Only geometric-algebra-invariant inner products on non- $e_0$ components are used.

Geometric Bilinear and Join Product Layers:

Allows computation of geometric entities’ interactions, such as intersections and distances, through concatenation of products and dual operations.

Equivariant LayerNorm and Nonlinearities:
- LayerNorm: Normalizes each multivector using the $G(3,0,1)$ inner product.
- Scalar-gated GELU: Nonlinearity controlled by scalar component.

This strict algebraic discipline guarantees that all architectures preserve equivariance, avoiding leakage or destruction of geometric symmetries during learning.

3. Scalability: Geometric Tokenization and LaB-GATr

Direct application of GATr becomes computationally impractical for meshes with $n$ on the order of $10^5$ vertices ( $O(n^2)$ attention). LaB-GATr extends GATr to large-scale biomedical meshes through:

Geometric Tokenization:
- Farthest point sampling subsamples a set of coarse tokens $P_\text{coarse}$ from the full vertex set $P_\text{fine}$ , creating clusters.
- Each vertex assigned to closest coarse center:
$C(p) = \{ v \in P_\text{fine} : p = \arg\min_{q \in P_\text{coarse}} \|v-q\|_2 \}$
Clusterwise Feature Pooling:
- Per cluster, aggregation:
$m_{v \to p} = \text{MLP}(X^{(0)}|_v, p-v), \quad X^{(1)}|_p = \frac{1}{|C(p)|} \sum_{v \in C(p)} m_{v \to p}$ - Features and relative positions encoded as translation multivectors.
Learned Barycentric Interpolation: Upsampling by weighted aggregation:

$X^{(l+1)}|_v = \frac{\sum_p \lambda_{p,v} X^{(l)}|_p}{\sum_p \lambda_{p,v}},\ \lambda_{p,v} = \frac{1}{\|p-v\|_2^2}$

Weights $\lambda_{p,v}$ determined for nearest pooled centers/tokens; subsequent MLP for final per-vertex features.
- End-to-end pipeline:

$[\text{Embedding}] \rightarrow [\text{Tokenization}] \rightarrow [\text{GATr}] \rightarrow [\text{Interpolation}] \rightarrow [\text{Output}]$

This architecture achieves compression of the input sequences by up to two orders of magnitude with negligible loss, enabling tractable training and inference without mesh alignment or spherical resampling.

4. Equivariance and Generalization

All stages—embedding, tokenization, transformer, and interpolation—operate in $G(3,0,1)$ , ensuring $E(3)$ -equivariance:

Equivariance definition:

$f(\rho_u(x)) = \rho_u(f(x)),\ \forall u \in E(3)$

Importance for biomedical meshes:
- Anatomical surfaces/volumes are not canonically aligned; orientations are arbitrary.
- Equivariant models generalize across patients/subjects without registration, essential for predicting physical or physiological properties from native-space meshes.
- All mathematical constructions (including barycentric interpolation) are proven to preserve Euclidean invariance.

5. Empirical Validation and Applications

LaB-GATr demonstrates state-of-the-art performance in high-resolution biomedical mesh tasks:

Task	Baseline	Metric	LaB-GATr	SOTA Prior
Coronary wall-shear stress (surface, ~7000 vertices)	GATr	$\varepsilon$ (%)	5.5	5.5
Velocity field estimation (volume, ~175k vertices)	SEGNN	$\varepsilon$ (%)	3.5	7.4
Neurodevelopmental age prediction (cortical surface, ~82k v)	MS-SiT, SiT	MAE (wks)	0.54	0.59, 0.68, 0.54

LaB-GATr matches or exceeds previous SOTA with up to $10\times$ – $100\times$ compression.
Tractable training on meshes up to 200,000 vertices—previous GATr infeasible.
Generalizes in native surface/volume space; no topological resampling required.
Models applications include vessel wall stress estimation, blood flow modelling, and phenotype prediction.

6. Mathematical Formulation Summary

Concept	Formula	Role
Multivector encoding	$x = (x_s, x_0, x_1, x_2, x_3, x_{01},..., x_{0123}) \in G(3,0,1)$	Uniform geometric representation
Attention	$\mathrm{Softmax}(q_h(X)k_h(X)^\top/\sqrt{d})v_h(X)$	Interaction of geometric tokens
Cluster pooling	$X^{(1)}\|_p = \frac{1}{\|C(p)\|}\sum_{v\in C(p)} m_{v\to p}$	Sequence compression by geometric relation
Interpolation	$X^{(l+1)}\|_v = [\sum_p \lambda_{p,v} X^{(l)}\|_p]/[\sum_p \lambda_{p,v}]$	Learned upsampling

7. Impact and Prospects

The GATr architecture—and its scalable extension, LaB-GATr—provides a principled, symmetry-respecting, and tractable approach for learning on complex geometric domains. Its use of projective geometric algebra guarantees full Euclidean equivariance and supports both global attention and mesh manipulation at scale. By removing the necessity for canonical alignment and supporting direct mesh-space inference, GATr is well-suited for next-generation biomedical modeling, physics, and engineering tasks involving 3D geometric data, with demonstrated empirical gains in accuracy and efficiency (Suk et al., 12 Mar 2024). Further applications may extend to mesh segmentation, anatomical disease localization, and any domain where geometric symmetry and scalability are critical.

PDF Markdown Chat (Pro)

References (1)

LaB-GATr: geometric algebra transformers for large biomedical surface and volume meshes (2024)

Follow Topic

Get notified by email when new papers are published related to Geometric Algebra Transformer (GATr).