Cylindrical Rotary Position Embedding (CyRoPE)

Updated 3 January 2026

CyRoPE is a position encoding method designed for cylindrical or annular data, separating linear (temporal) and circular (angular) components.
It applies multiplicative block-diagonal rotations to embed both absolute and relative positions, enhancing transformer self-attention mechanisms.
Empirical results in sEMG decoding show that CyRoPE improves performance metrics, confirming its effectiveness in handling spatial-temporal data.

Cylindrical Rotary Position Embedding (CyRoPE) generalizes rotary position encoding mechanisms for transformer architectures to domains in which the underlying data exhibits cylindrical or annular topology. CyRoPE encodes both linear and angular positional components via multiplicative block-diagonal rotations in the embedding space, enabling self-attention layers to natively incorporate relative and absolute position cues for data distributed over a 2D cylindrical manifold. The approach inherits key mathematical and algorithmic properties from Rotary Position Embedding (RoPE) and leverages domain-specific geometric insights for applications including surface EMG sensor arrays and spatial-temporal geotoken modeling.

1. Theoretical Motivation and Geometric Domain

CyRoPE is motivated by limitations observed when applying conventional 1D position encoding (such as absolute sinusoid or vanilla RoPE) to observations sampled on cylindrical domains. Typical instances include surface electromyography (sEMG) arrays, which collect multi-channel time series data from sensors arranged annularly around a forearm. Treating each channel as a flat sequence disregards the wrap-around adjacency of channel indices (e.g., $c=0$ and $c=C-1$ are physically adjacent), impairing transformer cross-channel interaction and missing muscle synergy signatures dependent on angular proximity (Weng et al., 27 Dec 2025). Cylindrical geometry naturally separates position into linear (e.g., time or height, $z$ ) and circular (azimuthal, $\theta$ ) components, mandating explicit encoding of both for effective attention.

2. Mathematical Formulation of CyRoPE

The formalism for CyRoPE extends block-diagonal rotary encodings to two orthogonal dimensions:

Let $z_{c,t}\in\mathbb{R}^d$ be the token embedding for channel $c$ at temporal patch index $t$ . The vector is partitioned into two halves:

$z_{c,t} = \left[z_t \;\|\; z_c\right],\quad z_t,z_c\in \mathbb{R}^{d/2}$

Each half is further split into $d/4$ consecutive 2-dimensional "complex" coordinates, enabling planar rotary operations as in RoFormer (Su et al., 2021).

Temporal (Linear) Rotary Encoding:

Base $\beta_t=10^4$
Frequency for block $i$ :

$\theta_t^{(i)} = \frac{1}{\beta_t^{2i/(d/2)}}$

Apply block-diagonal rotation:

$R_{t} = \mathrm{diag}\left[R(t\theta_t^{(1)}), ..., R(t\theta_t^{(d/4)})\right]$

where each $R(\alpha)=\left[\begin{smallmatrix}\cos\alpha & -\sin\alpha\ \sin\alpha & \cos\alpha\end{smallmatrix}\right]$ .

Spatial (Annular) Rotary Encoding:

Channels $C$ spaced around $2\pi$ , fundamental angular separation $\omega_0=2\pi/C$
Set largest frequency scale to $\omega_0$ ; thus, $\beta_c=C/(2\pi)$
Frequency for block $i$ :

$\theta_c^{(i)} = \left(\frac{2\pi}{C}\right)^{2i/(d/2)}$

Apply rotation:

$R_c = \mathrm{diag}\left[R(c\theta_c^{(1)}), ..., R(c\theta_c^{(d/4)})\right]$

Concatenate:

$\mathrm{CyRoPE}(c,t) = [\,R_tz_t\,\|\; R_cz_c] \in \mathbb{R}^d$

Both query and key vectors in attention undergo this rotation, so the resulting dot-product depends only on the relative time $(t_1-t_2)$ and relative angular offset $(c_1-c_2)$ (Weng et al., 27 Dec 2025).

3. Integration with Transformer Architectures

CyRoPE encodings are realized as in-place rotations within the query and key projections during multi-head self-attention, paralleling RoPE implementations in RoFormer (Su et al., 2021). For each token, the $d$ -dim embedding is factorized into time and channel halves, each rotated independently. This yields attention scores that depend exclusively on relative positions—both linear and angular—and yields distance decay properties characteristic of RoPE frameworks. The resulting positional bias integrates seamlessly enabling relative-awareness across both temporal and spatial dimensions.

Pseudocode (as quoted in (Weng et al., 27 Dec 2025)):

for c in 0...C–1:
    for t in 0...L/P–1:
        p_ct = X[c, tP:(t+1)P]
        z_ct = f_cnn(p_ct)           # Embed patch
        z_t, z_c = split(z_ct)       # d/2 each
        z_t_rot = rotate(z_t, t, β_t)
        z_c_rot = rotate(z_c, c, β_c)
        z_ct = concat(z_t_rot, z_c_rot)
        # pass z_ct to Transformer layers

This approach preserves efficient

O(Nd)

rotation cost, matching that of standard RoPE but yielding cross-dimension positional expressivity.

4. Extension to Cylindrical/2D Manifolds: Connections and Generalizations

The CyRoPE mechanism operationalizes the principle that each coordinate axis can be encoded via independent planar rotation, a method extendable to higher-dimensional manifolds. In the context of cylindrical coordinates $(r,\theta,z)$ , rotary encoding applies planar rotation by $\theta$ (azimuth), scaling in the radial direction $r$ , and translation or scaling along the axial $z$ (Unlu, 2023). A block-diagonal construction in $\mathbb{R}^d$ , with $d/3$ blocks for triplets $(u,v,w)$ , allows the implementation of:

$R_{\text{cyl}}^{(j)}(r,\theta,z)=\begin{pmatrix} r\cos\theta & -r\sin\theta & 0 \ r\sin\theta & r\cos\theta & 0 \ 0 & 0 & z \end{pmatrix}$

so the embedding after rotation matches the structure of Euclidean distance in cylindrical coordinates:

$D^2 = r_1^2 + r_2^2 - 2r_1r_2\cos(\theta_1-\theta_2) + (z_1-z_2)^2$

Relative-position attention thus reflects both angular and axial displacements (Unlu, 2023). Frequency scaling per block can be added for multi-scale sensitivity, paralleling standard RoPE frequency bands.

5. Group-Theoretical Foundations and Generalization

The GRAPE framework (Zhang et al., 8 Dec 2025) formalizes multiplicative rotary embeddings as one-parameter subgroups in $\mathrm{SO}(d)$ , generalizing position encoding via group actions. RoPE arises as the canonical commuting instance, with $d/2$ planes rotated independently at log-uniform frequencies:

$G(n) = \exp(nL_{\mathrm{RoPE}}) = \prod_{i=1}^{d/2} \exp(n\theta_iL_i)$

where each $L_i$ is a rank-2 skew generator on a coordinate plane.

Cylindrical position encoding extends this principle by pairing planar rotational generators with unipotent (additive) translation generators, realizing rigid motions on a cylinder via block-upper-triangular group actions:

$\hat{G}(m) = \begin{pmatrix}\exp(mL) & m\omega u \ 0 & 1\end{pmatrix}$

such structures encode both angular displacement and axial translation in attention, enabling exact relative composition and streaming cacheability (Zhang et al., 8 Dec 2025).

6. Empirical Performance and Ablation Insights

In the context of sEMG movement decoding, CyRoPE was evaluated within the SPECTRE self-supervised learning framework (Weng et al., 27 Dec 2025). Empirical ablation reveals substantial performance improvement over absolute positional encoding, both with and without pre-training (Table 7):

PE Type	Pre-train Target	Pre-trained?	MSE ↓	MAE ↓	R² ↑
Absolute PE	STFT Clusters	No	0.0219±0.0140	0.0835±0.0314	0.7091±0.1985
Absolute PE	STFT Clusters	Yes	0.0206±0.0123	0.0812±0.0293	0.7252±0.1804
CyRoPE	Raw Clusters	Yes	0.0189±0.0117	0.0777±0.0297	0.7469±0.1746
CyRoPE	STFT Clusters	No	0.0196±0.0110	0.0794±0.0276	0.7380±0.1636
CyRoPE	STFT Clusters	Yes	0.0184±0.0114	0.0770±0.0287	0.7547±0.1657

Replacing absolute 1D PE by CyRoPE alone (no pre-train) raises $R^2$ from $0.7091$ to $0.7380$ (+4.0%), and CyRoPE plus spectral (STFT) clustering during pre-training yields $R^2=0.7547$ versus $0.7252$ for absolute PE (+3.0%). These improvements were robust across subjects.

7. Practical Implementation Details and Limitations

CyRoPE encoding can be instantiated within transformers using standard programming frameworks (e.g., PyTorch, TensorFlow, JAX) by extending existing RoPE code to factorize embeddings along time and space or azimuthal and axial coordinates. The rotation cost remains $O(Nd)$ per layer, minimal compared to $O(N^2d)$ dense attention operations. For sEMG, typical configurations involve $C=12$ channels, $P=100$ sample patch lengths, a CNN embedder to $d=256$ , and transformer stacks of $18$ layers (Weng et al., 27 Dec 2025).

No formal $p$ -values were reported for empirical gains, but observed improvement exceeded the $\pm0.16$ subject standard deviation in $R^2$ , indicating robust field-wide benefits.

This suggests CyRoPE's geometric inductive bias can be foundational for further architectures handling data on cylindrical or periodic manifolds. Future extensions may incorporate learned frequency scaling, non-commuting group actions, or helical extensions as outlined under GRAPE’s general group representational lens (Zhang et al., 8 Dec 2025).