Papers
Topics
Authors
Recent
2000 character limit reached

ComRoPE: Scalable Rotary Position Encoding

Updated 9 November 2025
  • ComRoPE is a framework for trainable positional encoding in Transformers that replaces fixed rotations with learnable, commuting skew-symmetric matrices.
  • It preserves relative offset invariance by enforcing the commutativity of rotation generators, ensuring robust performance across sequential and multidimensional data.
  • Empirical results demonstrate that variants like ComRoPE-LD outperform traditional methods, achieving higher accuracy and enhanced robustness in diverse applications.

ComRoPE (Commuting Rotary Position Embedding) is a framework for positional encoding in Transformers that generalizes Rotary Positional Encoding (RoPE) by replacing fixed, hand-designed rotations with trainable, higher-dimensional rotations represented by commuting skew-symmetric matrices. This approach creates a scalable, robust, and theoretically principled method for embedding positional information in models handling sequential, spatial, or general multidimensional data. ComRoPE preserves the crucial “relative offset” property underpinning RoPE’s robustness, while enabling greater expressiveness and improved empirical performance in high-dimensional contexts.

1. Motivation and Limitations of Prior Methods

Absolute Positional Encoding (APE), such as sinusoidal encoding, is fixed after initialization and cannot be adapted during training. APE's fixed spectrum prevents generalization to longer or shifted input sequences and does not support learning of positional frequency content. Standard RoPE, as introduced in RoFormer, encodes positional information by applying a 2D rotation R(θ)R(\theta) to each (typically 2-dimensional) slice of the query/key. Here, θ\theta is a deterministic function of the position index. The corresponding rotation matrix is

R(θ)=(cosθsinθ sinθcosθ)R(\theta) = \begin{pmatrix} \cos\theta & -\sin\theta \ \sin\theta & \cos\theta \end{pmatrix}

This method is efficient and robust to absolute position offsets, but it is fundamentally limited by:

  • The use of 2D rotations (low expressivity in higher dimensions),
  • Manually fixed, non-trainable angles,
  • Restricted ability to extend to general rotation groups without losing offset-robustness.

The motivation for ComRoPE is to devise a parameterization of RoPE that is i) fully trainable, ii) robust to input offsets (shift-invariance), and iii) scalable to higher-dimensional embeddings.

2. Formalization: The RoPE Equation

ComRoPE is grounded in a formal definition of rotary positional encoding. Let f:Rd×RNRdf: \mathbb{R}^d \times \mathbb{R}^N \to \mathbb{R}^d insert position xRN\mathbf{x} \in \mathbb{R}^N into query vector qRd\mathbf{q} \in \mathbb{R}^d, and ρ(q,k)=qk\rho(\mathbf{q},\mathbf{k}) = \mathbf{q}^\top \mathbf{k} be the standard dot-product similarity. The model requires a matrix-valued function Rf:RNRd×d\mathbf{R}_f : \mathbb{R}^N \to \mathbb{R}^{d\times d} such that:

  1. f(q,x)=Rf(x)qf(\mathbf{q},\mathbf{x}) = \mathbf{R}_f(\mathbf{x})\mathbf{q},
  2. ρ(q,k)=qk\rho(\mathbf{q},\mathbf{k}) = \mathbf{q}^\top\mathbf{k},
  3. g(q,k,xy)=qRf(yx)kg(\mathbf{q},\mathbf{k},\mathbf{x}-\mathbf{y}) = \mathbf{q}^\top\mathbf{R}_f(\mathbf{y}-\mathbf{x})\mathbf{k} captures the relative-positional similarity.

For RoPE to provide offset-invariant attention, the following “RoPE Equation” must hold (Proposition 2.1): Rf(x)Rf(y)=Rf(yx)x,y\boxed{ \mathbf{R}_f(\mathbf{x})^\top \mathbf{R}_f(\mathbf{y}) = \mathbf{R}_f(\mathbf{y}-\mathbf{x}) \quad \forall \mathbf{x},\mathbf{y} } This guarantees that relative attention depends only on positional differences, not on absolute positions.

3. The Commutativity Constraint

ComRoPE parameterizes Rf(x)\mathbf{R}_f(\mathbf{x}) using NN skew-symmetric matrices A={A1,,AN}\mathcal{A} = \{A_1, \dots, A_N\}: R(x;A)=exp(i=1NxiAi)\mathbf{R}(\mathbf{x}; \mathcal{A}) = \exp\left(\sum_{i=1}^N x_i A_i\right) The central result (Theorem 3.1) establishes that the RoPE Equation holds for all x,y\mathbf{x},\mathbf{y} if and only if all AiA_i pairwise commute: exp(ixiAi)exp(iyiAi)=exp(i(yixi)Ai)    AiAj=AjAi,i,j\exp{\left(-\sum_i x_iA_i\right)}\,\exp{\left(\sum_i y_iA_i\right)} = \exp\left(\sum_i (y_i-x_i) A_i\right) \iff A_i A_j = A_j A_i,\,\,\forall i,j This requirement is both necessary and sufficient. Commuting generators ensure exact offset-robustness because the matrix exponential exactly factorizes without higher-order cross-terms.

4. Parameterizations: Trainable Commuting Angle Matrices

Two distinct parameterizations are proposed to enforce the commutativity of skew-symmetric matrices:

4.1 Axial-Partition (ComRoPE-AP)

  • The embedding dimension dd is divided into mm blocks of size bb (d=mbd = mb).
  • Each block jj with j{1,,m}j\in\{1,\dots,m\} is associated with a trainable PjRb×bP_j \in \mathbb{R}^{b\times b}.
  • For each axis i{1,,N}i\in\{1,\dots,N\}, the AiA_i are block-diagonal: Bij={PjPj,if ji(modN) 0,otherwiseB_{ij} = \begin{cases} P_j - P_j^\top, &\text{if } j \equiv i \pmod N \ 0, & \text{otherwise} \end{cases} Each AiA_i has at most one nonzero skew-symmetric block per position. All AiA_i are block-diagonal and commute.

4.2 Linearly-Dependent (ComRoPE-LD)

  • Learn a single base skew-symmetric S=PPRb×bS = P - P^\top \in \mathbb{R}^{b\times b} and, for each axis ii, a scalar θi\theta_i.
  • Set Ai=θiSA_i = \theta_i S. Since all AiA_i are scalar multiples of SS, they trivially commute.

Both constructions solve the RoPE Equation and guarantee offset-robustness.

5. Theoretical Foundations

Supporting lemmas demonstrate that for A,BA,B,

eAxeBy=eAx+By x,y    AB=BAe^{A x} e^{B y} = e^{A x + B y}\ \forall x, y \iff AB = BA

This generalizes to any set {Ai}\{A_i\}. Therefore, any collection of pairwise commuting skew-symmetric matrices produces a position-dependent transformation R(x)\mathbf{R}(\mathbf{x}) that meets the RoPE Equation. The standard RoPE is a special case where all AiA_i are 2×22\times2 blocks with hand-designed θ\theta.

The theoretical framework further justifies that if Ai=0A_i=0 for all ii, attention reduces to unrotated, standard dot-product attention, and if b=2b=2 and PjP_j is fixed to the standard rotation generator, vanilla RoPE is recovered.

6. Empirical Performance

ComRoPE was evaluated on various benchmarks:

Method 2D Class. @224 (ImageNet-1K, ViT-B/16) 2D Class. @512 MS COCO Object Detection (ViT-S) 3D Classification (UCF-101)
APE ∼58.8% N/A AP = 44.0 Improved robustness
Vanilla RoPE ∼63.1% N/A N/A Improved robustness
LieRE 64.4% 61.2% AP = 44.5 Improved robustness
ComRoPE-AP 65.3% N/A N/A Improved robustness
ComRoPE-LD 65.5% 62.6% AP = 44.7 (+0.2) Improved robustness
  • ComRoPE-LD surpasses LieRE by 1.6% at training resolution and by 2.9% at higher resolution.
  • For object detection (MS COCO), ComRoPE-LD yields +0.2 AP over LieRE.
  • For 3D classification (UCF-101), ComRoPE variants maintain improved robustness under varying resolution.

These results establish that ComRoPE’s learnable, commuting-rotation approach produces consistent accuracy gains and stabilization as input resolution increases.

7. Generalization, Practical Recommendations, and Resources

ComRoPE unifies multiple positional encoding schemes:

  • If all Ai=0A_i=0, recovers standard dot-product attention.
  • Setting block size to 2 and PjP_j as the canonical generator recovers original RoPE.
  • Allows richer, learnable feature rotations in higher dimensions, which are optimized via backpropagation.

Practical implementation considerations include:

  • For images, positional coordinates are best represented in relative, normalized scale.
  • Centering patch coordinates and introducing synthetic perturbations at training further enhance robustness.
  • Block size bb balances rotation expressiveness and computational cost; empirical evidence suggests b=8b=8 is effective.
  • An open-source reference implementation is available at https://github.com/Longin-Yu/ComRoPE.

ComRoPE offers a scalable, flexible, and rigorously justified method for positional encoding in Transformers, particularly beneficial for contexts requiring high-dimensional, trainable, and offset-robust representations.

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to ComRoPE.