REVNET: Rotation-Equivariant Anchor Transformer
- The paper presents REVNET’s main contribution: a framework ensuring strict SO(3)-equivariance for 3D point cloud completion through anchor-based geometric representations and VN neural primitives.
- REVNET leverages deterministic farthest-point sampling and Vector Neuron-based modules to encode local geometry and maintain rotation-equivariant feature transformations.
- Experimental results show REVNET outperforms prior methods on datasets like MVP and KITTI, achieving state-of-the-art metrics with stable, fine-detail reconstructions.
The Rotation-Equivariant Anchor Transformer (REVNET) is a framework for robust 3D point cloud completion under arbitrary rotations, achieving strict SO(3)-equivariance through a synergistic integration of anchor-based geometric representations, Vector Neuron (VN) neural primitives, and anchor-transformer architectures. REVNET addresses the limitations of canonical pose dependence in previous approaches and provides stable, fine-detail 3D shape reconstructions without requiring explicit pose normalization or augmentation (Ni et al., 13 Jan 2026, Bekci et al., 2024).
1. Core Principles of SO(3)-Equivariance
The principal objective of REVNET is to ensure that the mapping from partial to completed 3D point clouds commutes with arbitrary rotations. For a rotation , REVNET satisfies
where is an input point cloud or feature. All feature representations and intermediate outputs are constrained to transform covariantly under this group action.
REVNET operationalizes equivariance at multiple levels:
- Feature representation: Features are encoded as lists of channels of 3D vectors in , manipulated by layers that commute with .
- Network operations: Key primitives (VN-Linear, VN-ReLU, VN-EdgeConv, and invariant projections) are specifically designed so that each operation maintains or produces the correct transformation behavior.
- Anchor point selection and manipulation: Both input sampling and generated coordinates are constrained to preserve equivariance through deterministic, geometry-based processes.
This architectural discipline allows direct deployment in settings where the input pose is unknown, unavailable, or variable, eliminating the need for pre-alignment or exhaustive augmentation (Ni et al., 13 Jan 2026).
2. Anchor-Based Encoding of 3D Geometry
REVNET selects a set of representative anchor points from a partial input point cloud using deterministic farthest-point sampling (FPS), optionally refined to focus on key geometric landmarks (e.g., high curvature regions). Since FPS depends solely on pairwise Euclidean distances, it is itself rotation-equivariant (Bekci et al., 2024, Ni et al., 13 Jan 2026).
The local geometry around each anchor is encoded as a vector neuron feature by a VN-based backbone comprising EdgeConv, set-abstraction, and relative position encoding layers. The result is a set that forms a rotation-equivariant, locally informative descriptor of the partial shape.
Alternatively, all points can be represented by a distance matrix , where . The invariance property makes itself rotation-invariant—essential for transformer-based approaches that aggregate information in the space of distances (Bekci et al., 2024).
3. Rotation-Equivariant Anchor Transformer Architecture
3.1 VN Missing Anchor Transformer (VN-MAT) Module
The VN-MAT is a core architectural component that predicts the location and features of missing anchors via rotation-equivariant channel-wise attention:
- Observed anchor features serve as keys and values for VN-MAT.
- For each missing anchor, its coordinates are embedded by VN-EdgeConv over its nearest observed anchors.
- Attention weights are computed by taking the channelwise vector difference, projecting to rotation-invariant space (VN-Inv), applying an MLP and softmax, and using them to aggregate information.
- Decoder cross-attends from missing anchor queries to observed anchor keys/values, refining features for each missing anchor.
3.2 Prediction Heads and Coordinate Decoding
The final dense point cloud is generated by first predicting missing anchor positions in an invariant frame, then transforming these predictions back to the input orientation using the inverse of the learned transformation . Local offset decoders further populate points around each anchor, using the same equivariant-to-invariant projection strategy.
3.3 Distance-Based Transformer Variant
An alternative realization, as in ESCAPE, encodes all points via their anchor distances and processes the distance matrix through a stack of transformer encoder-decoder layers, with DGCNN extraction and attention applied in distance space. Decoded outputs are mapped back to coordinates by solving a nonlinear least-squares system anchored to the equivariant anchor locations (Bekci et al., 2024).
4. Equivariant Bias and ZCA-Based Normalization
Standard bias terms and naïve normalization strategies generally break equivariance in networks handling vector features. REVNET introduces:
- Rotation-Equivariant Bias: The bias term is normalized and linearly combined with the existing feature by
maintaining equivariance due to the trace-cyclic property under .
- VN-ZCA Layer Normalization: Channelwise whitening is achieved by ZCA, where each vector channel is normalized as
with derived from the batch covariance. Under rotation, and transform appropriately so that equivariance holds across all batch and feature statistics (Ni et al., 13 Jan 2026).
5. Equivariant–Invariant Feature Conversions and Coordinate Generation
VN-Inv enables conversion between rotation-equivariant (frame-dependent) and rotation-invariant (frame-free) features via a learned transformation . All coordinate predictions (anchor positions, local offsets) are produced in the invariant frame with an MLP, then mapped back to the global orientation with . This provides consistent and stable outputs, mitigating the instability that arises when directly regressing coordinates in a naive vector neuron space.
This mechanism ensures that outputs remain equivariant, regardless of the input orientation. For anchor distance-transformer variants, coordinate recovery is performed by solving for the location minimizing for the predicted distances and anchor positions, using the Levenberg–Marquardt algorithm (Bekci et al., 2024).
6. Experimental Results and Comparative Analysis
REVNET establishes new state-of-the-art results for rotation-equivariant point cloud completion. On MVP (None/SO(3)), REVNET achieves average CD- and F@1%, outperforming EquivPCN (0.98, 67.5%) and AdaPoinTr (0.94, 68.0%). Consistency under input rotations () is approximately due to deterministic FPS (Ni et al., 13 Jan 2026).
On KITTI (None/None), REVNET achieves Fidelity Distance = and MMD = —competitive with non-equivariant methods requiring input alignment. Qualitative examination shows preservation of fine detail under arbitrary input rotations and plausible shape completion on real-world LiDAR without any pose normalization (Ni et al., 13 Jan 2026).
Ablation studies confirm that error bounds are robust with respect to Gaussian noise ( up to 0.004) and point removal (up to 50%), with ESCAPE's distance-based variant achieving the best results on rotated-input PCN and OmniObject3D datasets, outperforming all alternative baselines under arbitrary SO(3) transformations (Bekci et al., 2024).
7. Extensions, Limitations, and Future Directions
REVNET’s design unifies anchor-based encoding, rotation-equivariant attention, and stable invariant-invariant feature manipulations, generalizing seamlessly to other 3D tasks involving rigid transformations. The anchor encoding is readily adapted to tasks such as point cloud classification, registration, segmentation, and tracking by selecting task-appropriate anchor points and targets.
Despite its advantages, REVNET is currently limited to rotation-equivariance and does not handle translations or the full SE(3) group. Under sparse input regimes, it may struggle to infer global structure compared to non-equivariant methods with strong canonical priors. Delicate or long-thin structures (e.g., cable-like objects) also remain challenging.
Future developments aim to extend to SE(3)-equivariance, incorporate symmetry and part-based priors for complex geometries, scale to full scene completion, and explore unsupervised or weakly supervised settings with equivariance guarantees (Ni et al., 13 Jan 2026). By integrating steerable or vector-valued features atop anchor-based representations, REVNET may further expand to vector- and orientation-sensitive tasks (e.g., normals, flow estimation) (Bekci et al., 2024).