Papers
Topics
Authors
Recent
2000 character limit reached

2D Rotary Positional Encodings (AS2DRoPE)

Updated 28 December 2025
  • The paper introduces AS2DRoPE as an extension of 1D RoPE to efficiently encode 2D spatial relationships using block-diagonal rotations.
  • It employs parameter-efficient rotation matrices that preserve relative positional invariance, periodicity, and memory efficiency in Transformers.
  • Empirical benchmarks demonstrate that AS2DRoPE achieves lower FLOPs and state-of-the-art accuracy in trajectory prediction and vision tasks compared to traditional RPE methods.

Two-dimensional rotary positional encodings (AS2DRoPE) are a class of positional encoding strategies for Transformers that generalize the one-dimensional Rotary Position Embedding (RoPE) technique to structured, two-dimensional spatial domains. Originally motivated by the need for efficient agent interaction modeling in trajectory prediction, AS2DRoPE and its variants are mathematically grounded in the representation theory of the special orthogonal group and its Lie algebra, and are designed to preserve critical properties such as relative positional invariance, periodicity, and memory efficiency. Unlike explicit relative position embeddings, which incur quadratic memory overhead, these encodings apply parameter-efficient rotations to queries and keys, enabling space and time efficiency at scale while maintaining sensitivity to 2D geometric relationships.

1. Mathematical Formulation and Foundations

AS2DRoPE extends RoPE's formulation from sequences to 2D spatial settings, crucially allowing dot-product attention to depend only on relative spatial or angular displacement. Starting from one-dimensional RoPE, in which channel pairs are rotated by frequency-modulated phases (fixed or learnable), the key observation is that such rotations can be viewed as elements of SO(2)\mathrm{SO}(2) applied across embedding channel pairs. For scalar position pp, the 2×22\times 2 rotation block is

R(pθ)=[cos(pθ)sin(pθ) sin(pθ)cos(pθ)].R(p\theta_\ell) = \begin{bmatrix} \cos(p\theta_\ell) & -\sin(p\theta_\ell) \ \sin(p\theta_\ell) & \cos(p\theta_\ell) \end{bmatrix}.

This extends to tokens in R2\mathbb{R}^2. If pi=(xi,yi)p_i=(x_i,y_i) and pj=(xj,yj)p_j=(x_j,y_j) are 2D coordinates, define the relative offset Δp=pjpi=(Δx,Δy)\Delta p = p_j - p_i = (\Delta x,\,\Delta y) with angle θij=atan2(Δy,Δx)\theta_{ij} = \mathrm{atan2}(\Delta y, \Delta x). The core transformation in AS2DRoPE introduces a uniform identity scalar ss (typically 1) within each 2×22\times 2 rotation: R(θ,s)=s[cosθsinθ sinθ  cosθ].R(\theta,s) = s \begin{bmatrix} \cos\theta & -\sin\theta \ \sin\theta & \;\cos\theta \end{bmatrix}. By tiling identical rotation blocks across all dkd_k embedding channel pairs, the entire feature is rotated in block-diagonal fashion. For angular encoding, this rotation is parameterized by agent headings ϕi\phi_i: Qˉi=f(Qi,ϕi),Kˉj=f(Kj,ϕj),\bar Q_i = f^\angle(Q_i, \phi_i), \qquad \bar K_j = f^\angle(K_j, \phi_j), yielding the dot product

Qˉi,Kˉj=QiBlockDiag(R(ϕjϕimod2π,s))Kj,\langle \bar Q_i, \bar K_j\rangle = Q_i^\top\,\mathrm{BlockDiag}(R(\phi_j - \phi_i \bmod 2\pi, s))\,K_j,

ensuring that only the periodic relative angle ϕjϕi\phi_j - \phi_i influences the interaction. For general 2D RoPE, spatial distances are encoded by composing independent 1D RoPEs or, under a Lie-algebraic (maximal toral subalgebra) construction, by exponentiating independent rotations on orthogonal planes for xx and yy coordinates. Extensions via orthogonal basis changes QQ allow for coupled rotations and richer geometric structure (Zhao et al., 19 Mar 2025, Liu et al., 7 Apr 2025).

2. Implementation and Computational Properties

The AS2DRoPE mechanism is implemented as follows. For NN entities with queries QRN×2dkQ \in \mathbb{R}^{N \times 2d_k}, keys KRN×2dkK \in \mathbb{R}^{N \times 2d_k}, and headings ϕRN\phi \in \mathbb{R}^N:

1
2
3
4
5
6
7
8
for i in 1..N:
    barQ[i] = BlockDiag(R(φ[i], s), ..., R(φ[i], s)) @ Q[i]
    barK[i] = BlockDiag(R(φ[i], s), ..., R(φ[i], s)) @ K[i]
for i in 1..N:
    for j in 1..N:
        score[i,j] = (barQ[i] @ barK[j]) / sqrt(d_k)
    α[i] = softmax(score[i, 1..N])
    O[i] = sum_j α[i,j] * V[j]
For multi-head attention, heads can specialize (e.g., angle-encoding versus position-encoding) or split their subspaces accordingly. The operation is O(HN2dk)O(HN^2 d_k) in time and O(HN(2dk+dv))O(HN(2d_k+d_v)) in space (where HH is number of heads, dvd_v the value dimension), matching vanilla Transformers and vastly outperforming explicit RPE in memory, which scales as O(N2)O(N^2) (Zhao et al., 19 Mar 2025).

3. Theoretical Properties and Generalization

AS2DRoPE is constructed to guarantee two critical properties:

  • Relativity: The attention between positions x1,x2x_1, x_2 depends only on x2x1x_2 - x_1 via the construction Rx2Rx1=Rx2x1R_{x_2}^\top R_{x_1} = R_{x_2 - x_1}, ensuring relative positional encoding under translation and rotation (Liu et al., 7 Apr 2025).
  • Reversibility: The mapping from spatial position to rotation is (locally) injective. For xRxx \mapsto R_x, injectivity modulo periodicity preserves information content.
  • Periodicity: By using uniform scalar rotations, the encoding naturally handles 2π2\pi angular wraparound, preserving the periodicity needed for orientation in physical domains.

These properties together guarantee extrapolation to unseen positions (arbitrary (x,y)(x, y) in R2\mathbb{R}^2) and shift-invariance of attention matrices.

Lie-algebraic formulations clarify that, for standard 2D RoPE, the rotations generated by (xB1+yB2)(xB_1 + yB_2), with [B1,B2]=0[B_1, B_2]=0, span an abelian subalgebra, and optional orthogonal basis mixing enables richer, symmetric coupling of xx and yy dimensions (Liu et al., 7 Apr 2025).

4. Comparison to Other 2D Rotary Encoding Schemes

Method Rotational Structure Learnable Coupling Parameter Count Growth Core Limitation
AS2DRoPE Block-diagonal, fixed angle No O(1)O(1) Encodes only 2D angle, not distance
LieRE Blockwise, learnable linear Yes O(d)O(d) Requires trig per-token (Ostmeier et al., 14 Jun 2024)
GeoPE Quaternion-based, 3D coupled Implied (quaternion axis) O(1)O(1) Non-commutativity in 3D; extra overhead

AS2DRoPE closely parallels the standard 2D RoPE approach, but with a focus on angular relations (critical for agent interactions). LieRE generalizes to arbitrary learnable projections from R2\mathbb{R}^2 into so(d)\mathfrak{so}(d) for each embedding block, allowing for higher expressive capacity, demonstrated by empirical improvements in accuracy and sample/data efficiency (Ostmeier et al., 14 Jun 2024). GeoPE introduces rotational coupling of axes via quaternion interpolation and mean in the so(3)so(3) Lie algebra, enhancing preservation of 2D geometric structure, particularly for vision tasks (Yao et al., 4 Dec 2025).

5. Empirical Results and Benchmark Performance

In agent-centric, query-centric trajectory prediction for autonomous driving, AS2DRoPE achieves the leading minimum average displacement error (minADE) and competitive realism scores on the Waymo SimAgent leaderboard:

  • Trajectory Accuracy: DRoPE-Traj (AS2DRoPE) minADE = 1.2626 (best), UniMM = 1.2947.
  • Realism Score: DRoPE-Traj = 0.7625 (close to best 0.7702).
  • Memory/FLOPs: Unlike RPE, which increases memory/FLOPs by O(N2)O(N^2) with dd, AS2DRoPE matches scene-centric models in scaling, with 4–6×\times lower FLOPs compared to RPE at large dimensions (Zhao et al., 19 Mar 2025).

In image classification and object detection benchmarks (ImageNet-1K, COCO), more expressive 2D rotary schemes (LieRE, GeoPE) further improve accuracy, data efficiency, and shape bias, demonstrating that 2D rotary position encodings are beneficial across a range of high-dimensional structured domains (Yao et al., 4 Dec 2025, Ostmeier et al., 14 Jun 2024).

6. Limitations, Extensions, and Future Directions

While AS2DRoPE is highly parameter- and compute-efficient, its key limitation is that it encodes only angular (directional) differences and not the full 2D displacement vector. To recover general 2D structure, it is standard to combine angle encodings (e.g., agent headings) with separate spatial RoPE or explicit distance encodings. Potential enhancements include:

  • Learning block-dependent scalars ss_\ell instead of a fixed uniform ss.
  • Joint 2D polar encoding: representing (Δp,θ)(\|\Delta p\|, \theta) jointly in a complex-valued embedding.
  • Extension to 3D agent orientation by block-diagonal SO(3)\mathrm{SO}(3) representations, as in generalizations to higher dimensions discussed in (Liu et al., 7 Apr 2025).

Schemes such as LieRE and GeoPE introduce learnable or geometrically coupled bases, offering increased expressiveness to capture complex interactions among spatial dimensions (Ostmeier et al., 14 Jun 2024, Yao et al., 4 Dec 2025).

7. Significance and Broader Context

AS2DRoPE fundamentally enables parameter- and memory-efficient Transformer architectures to robustly model relative spatial relationships, particularly angular differences, in settings where spatial invariance and locality are paramount (e.g., autonomous driving, multi-agent systems, images). Building on a Lie-theoretic foundation guarantees mathematical tractability and extrapolation capacity, while empirical benchmarks demonstrate state-of-the-art results in both agent-centric and vision tasks. The continued development of richer, higher-dimensional rotary encodings and their integration with learnable geometric structures highlight the ongoing convergence between geometric deep learning and positional encoding research (Zhao et al., 19 Mar 2025, Liu et al., 7 Apr 2025, Yao et al., 4 Dec 2025, Ostmeier et al., 14 Jun 2024).

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to 2D Rotary Positional Encodings (AS2DRoPE).