2D Rotary Positional Encodings (AS2DRoPE)

Updated 28 December 2025

The paper introduces AS2DRoPE as an extension of 1D RoPE to efficiently encode 2D spatial relationships using block-diagonal rotations.
It employs parameter-efficient rotation matrices that preserve relative positional invariance, periodicity, and memory efficiency in Transformers.
Empirical benchmarks demonstrate that AS2DRoPE achieves lower FLOPs and state-of-the-art accuracy in trajectory prediction and vision tasks compared to traditional RPE methods.

Two-dimensional rotary positional encodings (AS2DRoPE) are a class of positional encoding strategies for Transformers that generalize the one-dimensional Rotary Position Embedding (RoPE) technique to structured, two-dimensional spatial domains. Originally motivated by the need for efficient agent interaction modeling in trajectory prediction, AS2DRoPE and its variants are mathematically grounded in the representation theory of the special orthogonal group and its Lie algebra, and are designed to preserve critical properties such as relative positional invariance, periodicity, and memory efficiency. Unlike explicit relative position embeddings, which incur quadratic memory overhead, these encodings apply parameter-efficient rotations to queries and keys, enabling space and time efficiency at scale while maintaining sensitivity to 2D geometric relationships.

1. Mathematical Formulation and Foundations

AS2DRoPE extends RoPE's formulation from sequences to 2D spatial settings, crucially allowing dot-product attention to depend only on relative spatial or angular displacement. Starting from one-dimensional RoPE, in which channel pairs are rotated by frequency-modulated phases (fixed or learnable), the key observation is that such rotations can be viewed as elements of $\mathrm{SO}(2)$ applied across embedding channel pairs. For scalar position $p$ , the $2\times 2$ rotation block is

$R(p\theta_\ell) = \begin{bmatrix} \cos(p\theta_\ell) & -\sin(p\theta_\ell) \ \sin(p\theta_\ell) & \cos(p\theta_\ell) \end{bmatrix}.$

This extends to tokens in $\mathbb{R}^2$ . If $p_i=(x_i,y_i)$ and $p_j=(x_j,y_j)$ are 2D coordinates, define the relative offset $\Delta p = p_j - p_i = (\Delta x,\,\Delta y)$ with angle $\theta_{ij} = \mathrm{atan2}(\Delta y, \Delta x)$ . The core transformation in AS2DRoPE introduces a uniform identity scalar $s$ (typically 1) within each $2\times 2$ rotation: $R(\theta,s) = s \begin{bmatrix} \cos\theta & -\sin\theta \ \sin\theta & \;\cos\theta \end{bmatrix}.$ By tiling identical rotation blocks across all $d_k$ embedding channel pairs, the entire feature is rotated in block-diagonal fashion. For angular encoding, this rotation is parameterized by agent headings $\phi_i$ : $\bar Q_i = f^\angle(Q_i, \phi_i), \qquad \bar K_j = f^\angle(K_j, \phi_j),$ yielding the dot product

$\langle \bar Q_i, \bar K_j\rangle = Q_i^\top\,\mathrm{BlockDiag}(R(\phi_j - \phi_i \bmod 2\pi, s))\,K_j,$

ensuring that only the periodic relative angle $\phi_j - \phi_i$ influences the interaction. For general 2D RoPE, spatial distances are encoded by composing independent 1D RoPEs or, under a Lie-algebraic (maximal toral subalgebra) construction, by exponentiating independent rotations on orthogonal planes for $x$ and $y$ coordinates. Extensions via orthogonal basis changes $Q$ allow for coupled rotations and richer geometric structure (Zhao et al., 19 Mar 2025, Liu et al., 7 Apr 2025).

2. Implementation and Computational Properties

The AS2DRoPE mechanism is implemented as follows. For $N$ entities with queries $Q \in \mathbb{R}^{N \times 2d_k}$ , keys $K \in \mathbb{R}^{N \times 2d_k}$ , and headings $\phi \in \mathbb{R}^N$ :

for i in 1..N:
    barQ[i] = BlockDiag(R(φ[i], s), ..., R(φ[i], s)) @ Q[i]
    barK[i] = BlockDiag(R(φ[i], s), ..., R(φ[i], s)) @ K[i]
for i in 1..N:
    for j in 1..N:
        score[i,j] = (barQ[i] @ barK[j]) / sqrt(d_k)
    α[i] = softmax(score[i, 1..N])
    O[i] = sum_j α[i,j] * V[j]

For multi-head attention, heads can specialize (e.g., angle-encoding versus position-encoding) or split their subspaces accordingly. The operation is

O(HN^2 d_k)

in time and

O(HN(2d_k+d_v))

in space (where

H

is number of heads,

d_v

the value dimension), matching vanilla Transformers and vastly outperforming explicit RPE in memory, which scales as

O(N^2)

(Zhao et al., 19 Mar 2025).

3. Theoretical Properties and Generalization

AS2DRoPE is constructed to guarantee two critical properties:

Relativity: The attention between positions $x_1, x_2$ depends only on $x_2 - x_1$ via the construction $R_{x_2}^\top R_{x_1} = R_{x_2 - x_1}$ , ensuring relative positional encoding under translation and rotation (Liu et al., 7 Apr 2025).
Reversibility: The mapping from spatial position to rotation is (locally) injective. For $x \mapsto R_x$ , injectivity modulo periodicity preserves information content.
Periodicity: By using uniform scalar rotations, the encoding naturally handles $2\pi$ angular wraparound, preserving the periodicity needed for orientation in physical domains.

These properties together guarantee extrapolation to unseen positions (arbitrary $(x, y)$ in $\mathbb{R}^2$ ) and shift-invariance of attention matrices.

Lie-algebraic formulations clarify that, for standard 2D RoPE, the rotations generated by $(xB_1 + yB_2)$ , with $[B_1, B_2]=0$ , span an abelian subalgebra, and optional orthogonal basis mixing enables richer, symmetric coupling of $x$ and $y$ dimensions (Liu et al., 7 Apr 2025).

4. Comparison to Other 2D Rotary Encoding Schemes

Method	Rotational Structure	Learnable Coupling	Parameter Count Growth	Core Limitation
AS2DRoPE	Block-diagonal, fixed angle	No	$O(1)$	Encodes only 2D angle, not distance
LieRE	Blockwise, learnable linear	Yes	$O(d)$	Requires trig per-token (Ostmeier et al., 14 Jun 2024)
GeoPE	Quaternion-based, 3D coupled	Implied (quaternion axis)	$O(1)$	Non-commutativity in 3D; extra overhead

AS2DRoPE closely parallels the standard 2D RoPE approach, but with a focus on angular relations (critical for agent interactions). LieRE generalizes to arbitrary learnable projections from $\mathbb{R}^2$ into $\mathfrak{so}(d)$ for each embedding block, allowing for higher expressive capacity, demonstrated by empirical improvements in accuracy and sample/data efficiency (Ostmeier et al., 14 Jun 2024). GeoPE introduces rotational coupling of axes via quaternion interpolation and mean in the $so(3)$ Lie algebra, enhancing preservation of 2D geometric structure, particularly for vision tasks (Yao et al., 4 Dec 2025).

5. Empirical Results and Benchmark Performance

In agent-centric, query-centric trajectory prediction for autonomous driving, AS2DRoPE achieves the leading minimum average displacement error (minADE) and competitive realism scores on the Waymo SimAgent leaderboard:

Trajectory Accuracy: DRoPE-Traj (AS2DRoPE) minADE = 1.2626 (best), UniMM = 1.2947.
Realism Score: DRoPE-Traj = 0.7625 (close to best 0.7702).
Memory/FLOPs: Unlike RPE, which increases memory/FLOPs by $O(N^2)$ with $d$ , AS2DRoPE matches scene-centric models in scaling, with 4–6 $\times$ lower FLOPs compared to RPE at large dimensions (Zhao et al., 19 Mar 2025).

In image classification and object detection benchmarks (ImageNet-1K, COCO), more expressive 2D rotary schemes (LieRE, GeoPE) further improve accuracy, data efficiency, and shape bias, demonstrating that 2D rotary position encodings are beneficial across a range of high-dimensional structured domains (Yao et al., 4 Dec 2025, Ostmeier et al., 14 Jun 2024).

6. Limitations, Extensions, and Future Directions

While AS2DRoPE is highly parameter- and compute-efficient, its key limitation is that it encodes only angular (directional) differences and not the full 2D displacement vector. To recover general 2D structure, it is standard to combine angle encodings (e.g., agent headings) with separate spatial RoPE or explicit distance encodings. Potential enhancements include:

Learning block-dependent scalars $s_\ell$ instead of a fixed uniform $s$ .
Joint 2D polar encoding: representing $(\|\Delta p\|, \theta)$ jointly in a complex-valued embedding.
Extension to 3D agent orientation by block-diagonal $\mathrm{SO}(3)$ representations, as in generalizations to higher dimensions discussed in (Liu et al., 7 Apr 2025).

Schemes such as LieRE and GeoPE introduce learnable or geometrically coupled bases, offering increased expressiveness to capture complex interactions among spatial dimensions (Ostmeier et al., 14 Jun 2024, Yao et al., 4 Dec 2025).

7. Significance and Broader Context

AS2DRoPE fundamentally enables parameter- and memory-efficient Transformer architectures to robustly model relative spatial relationships, particularly angular differences, in settings where spatial invariance and locality are paramount (e.g., autonomous driving, multi-agent systems, images). Building on a Lie-theoretic foundation guarantees mathematical tractability and extrapolation capacity, while empirical benchmarks demonstrate state-of-the-art results in both agent-centric and vision tasks. The continued development of richer, higher-dimensional rotary encodings and their integration with learnable geometric structures highlight the ongoing convergence between geometric deep learning and positional encoding research (Zhao et al., 19 Mar 2025, Liu et al., 7 Apr 2025, Yao et al., 4 Dec 2025, Ostmeier et al., 14 Jun 2024).