Papers
Topics
Authors
Recent
Search
2000 character limit reached

Cylindrical Rotary Position Embedding (CyRoPE)

Updated 3 January 2026
  • CyRoPE is a position encoding method designed for cylindrical or annular data, separating linear (temporal) and circular (angular) components.
  • It applies multiplicative block-diagonal rotations to embed both absolute and relative positions, enhancing transformer self-attention mechanisms.
  • Empirical results in sEMG decoding show that CyRoPE improves performance metrics, confirming its effectiveness in handling spatial-temporal data.

Cylindrical Rotary Position Embedding (CyRoPE) generalizes rotary position encoding mechanisms for transformer architectures to domains in which the underlying data exhibits cylindrical or annular topology. CyRoPE encodes both linear and angular positional components via multiplicative block-diagonal rotations in the embedding space, enabling self-attention layers to natively incorporate relative and absolute position cues for data distributed over a 2D cylindrical manifold. The approach inherits key mathematical and algorithmic properties from Rotary Position Embedding (RoPE) and leverages domain-specific geometric insights for applications including surface EMG sensor arrays and spatial-temporal geotoken modeling.

1. Theoretical Motivation and Geometric Domain

CyRoPE is motivated by limitations observed when applying conventional 1D position encoding (such as absolute sinusoid or vanilla RoPE) to observations sampled on cylindrical domains. Typical instances include surface electromyography (sEMG) arrays, which collect multi-channel time series data from sensors arranged annularly around a forearm. Treating each channel as a flat sequence disregards the wrap-around adjacency of channel indices (e.g., c=0c=0 and c=C1c=C-1 are physically adjacent), impairing transformer cross-channel interaction and missing muscle synergy signatures dependent on angular proximity (Weng et al., 27 Dec 2025). Cylindrical geometry naturally separates position into linear (e.g., time or height, zz) and circular (azimuthal, θ\theta) components, mandating explicit encoding of both for effective attention.

2. Mathematical Formulation of CyRoPE

The formalism for CyRoPE extends block-diagonal rotary encodings to two orthogonal dimensions:

Let zc,tRdz_{c,t}\in\mathbb{R}^d be the token embedding for channel cc at temporal patch index tt. The vector is partitioned into two halves:

zc,t=[zt    zc],zt,zcRd/2z_{c,t} = \left[z_t \;\|\; z_c\right],\quad z_t,z_c\in \mathbb{R}^{d/2}

Each half is further split into d/4d/4 consecutive 2-dimensional "complex" coordinates, enabling planar rotary operations as in RoFormer (Su et al., 2021).

Temporal (Linear) Rotary Encoding:

  • Base βt=104\beta_t=10^4
  • Frequency for block ii:

θt(i)=1βt2i/(d/2)\theta_t^{(i)} = \frac{1}{\beta_t^{2i/(d/2)}}

  • Apply block-diagonal rotation:

Rt=diag[R(tθt(1)),...,R(tθt(d/4))]R_{t} = \mathrm{diag}\left[R(t\theta_t^{(1)}), ..., R(t\theta_t^{(d/4)})\right]

where each R(α)=[cosαsinα sinαcosα]R(\alpha)=\left[\begin{smallmatrix}\cos\alpha & -\sin\alpha\ \sin\alpha & \cos\alpha\end{smallmatrix}\right].

Spatial (Annular) Rotary Encoding:

  • Channels CC spaced around 2π2\pi, fundamental angular separation ω0=2π/C\omega_0=2\pi/C
  • Set largest frequency scale to ω0\omega_0; thus, βc=C/(2π)\beta_c=C/(2\pi)
  • Frequency for block ii:

θc(i)=(2πC)2i/(d/2)\theta_c^{(i)} = \left(\frac{2\pi}{C}\right)^{2i/(d/2)}

  • Apply rotation:

Rc=diag[R(cθc(1)),...,R(cθc(d/4))]R_c = \mathrm{diag}\left[R(c\theta_c^{(1)}), ..., R(c\theta_c^{(d/4)})\right]

Concatenate:

CyRoPE(c,t)=[Rtzt  Rczc]Rd\mathrm{CyRoPE}(c,t) = [\,R_tz_t\,\|\; R_cz_c] \in \mathbb{R}^d

Both query and key vectors in attention undergo this rotation, so the resulting dot-product depends only on the relative time (t1t2)(t_1-t_2) and relative angular offset (c1c2)(c_1-c_2) (Weng et al., 27 Dec 2025).

3. Integration with Transformer Architectures

CyRoPE encodings are realized as in-place rotations within the query and key projections during multi-head self-attention, paralleling RoPE implementations in RoFormer (Su et al., 2021). For each token, the dd-dim embedding is factorized into time and channel halves, each rotated independently. This yields attention scores that depend exclusively on relative positions—both linear and angular—and yields distance decay properties characteristic of RoPE frameworks. The resulting positional bias integrates seamlessly enabling relative-awareness across both temporal and spatial dimensions.

Pseudocode (as quoted in (Weng et al., 27 Dec 2025)):

1
2
3
4
5
6
7
8
9
for c in 0...C1:
    for t in 0...L/P1:
        p_ct = X[c, tP:(t+1)P]
        z_ct = f_cnn(p_ct)           # Embed patch
        z_t, z_c = split(z_ct)       # d/2 each
        z_t_rot = rotate(z_t, t, β_t)
        z_c_rot = rotate(z_c, c, β_c)
        z_ct = concat(z_t_rot, z_c_rot)
        # pass z_ct to Transformer layers
This approach preserves efficient O(Nd)O(Nd) rotation cost, matching that of standard RoPE but yielding cross-dimension positional expressivity.

4. Extension to Cylindrical/2D Manifolds: Connections and Generalizations

The CyRoPE mechanism operationalizes the principle that each coordinate axis can be encoded via independent planar rotation, a method extendable to higher-dimensional manifolds. In the context of cylindrical coordinates (r,θ,z)(r,\theta,z), rotary encoding applies planar rotation by θ\theta (azimuth), scaling in the radial direction rr, and translation or scaling along the axial zz (Unlu, 2023). A block-diagonal construction in Rd\mathbb{R}^d, with d/3d/3 blocks for triplets (u,v,w)(u,v,w), allows the implementation of:

Rcyl(j)(r,θ,z)=(rcosθrsinθ0 rsinθrcosθ0 00z)R_{\text{cyl}}^{(j)}(r,\theta,z)=\begin{pmatrix} r\cos\theta & -r\sin\theta & 0 \ r\sin\theta & r\cos\theta & 0 \ 0 & 0 & z \end{pmatrix}

so the embedding after rotation matches the structure of Euclidean distance in cylindrical coordinates:

D2=r12+r222r1r2cos(θ1θ2)+(z1z2)2D^2 = r_1^2 + r_2^2 - 2r_1r_2\cos(\theta_1-\theta_2) + (z_1-z_2)^2

Relative-position attention thus reflects both angular and axial displacements (Unlu, 2023). Frequency scaling per block can be added for multi-scale sensitivity, paralleling standard RoPE frequency bands.

5. Group-Theoretical Foundations and Generalization

The GRAPE framework (Zhang et al., 8 Dec 2025) formalizes multiplicative rotary embeddings as one-parameter subgroups in SO(d)\mathrm{SO}(d), generalizing position encoding via group actions. RoPE arises as the canonical commuting instance, with d/2d/2 planes rotated independently at log-uniform frequencies:

G(n)=exp(nLRoPE)=i=1d/2exp(nθiLi)G(n) = \exp(nL_{\mathrm{RoPE}}) = \prod_{i=1}^{d/2} \exp(n\theta_iL_i)

where each LiL_i is a rank-2 skew generator on a coordinate plane.

Cylindrical position encoding extends this principle by pairing planar rotational generators with unipotent (additive) translation generators, realizing rigid motions on a cylinder via block-upper-triangular group actions:

G^(m)=(exp(mL)mωu 01)\hat{G}(m) = \begin{pmatrix}\exp(mL) & m\omega u \ 0 & 1\end{pmatrix}

such structures encode both angular displacement and axial translation in attention, enabling exact relative composition and streaming cacheability (Zhang et al., 8 Dec 2025).

6. Empirical Performance and Ablation Insights

In the context of sEMG movement decoding, CyRoPE was evaluated within the SPECTRE self-supervised learning framework (Weng et al., 27 Dec 2025). Empirical ablation reveals substantial performance improvement over absolute positional encoding, both with and without pre-training (Table 7):

PE Type Pre-train Target Pre-trained? MSE MAE R² ↑
Absolute PE STFT Clusters No 0.0219±0.0140 0.0835±0.0314 0.7091±0.1985
Absolute PE STFT Clusters Yes 0.0206±0.0123 0.0812±0.0293 0.7252±0.1804
CyRoPE Raw Clusters Yes 0.0189±0.0117 0.0777±0.0297 0.7469±0.1746
CyRoPE STFT Clusters No 0.0196±0.0110 0.0794±0.0276 0.7380±0.1636
CyRoPE STFT Clusters Yes 0.0184±0.0114 0.0770±0.0287 0.7547±0.1657

Replacing absolute 1D PE by CyRoPE alone (no pre-train) raises R2R^2 from $0.7091$ to $0.7380$ (+4.0%), and CyRoPE plus spectral (STFT) clustering during pre-training yields R2=0.7547R^2=0.7547 versus $0.7252$ for absolute PE (+3.0%). These improvements were robust across subjects.

7. Practical Implementation Details and Limitations

CyRoPE encoding can be instantiated within transformers using standard programming frameworks (e.g., PyTorch, TensorFlow, JAX) by extending existing RoPE code to factorize embeddings along time and space or azimuthal and axial coordinates. The rotation cost remains O(Nd)O(Nd) per layer, minimal compared to O(N2d)O(N^2d) dense attention operations. For sEMG, typical configurations involve C=12C=12 channels, P=100P=100 sample patch lengths, a CNN embedder to d=256d=256, and transformer stacks of $18$ layers (Weng et al., 27 Dec 2025).

No formal pp-values were reported for empirical gains, but observed improvement exceeded the ±0.16\pm0.16 subject standard deviation in R2R^2, indicating robust field-wide benefits.

This suggests CyRoPE's geometric inductive bias can be foundational for further architectures handling data on cylindrical or periodic manifolds. Future extensions may incorporate learned frequency scaling, non-commuting group actions, or helical extensions as outlined under GRAPE’s general group representational lens (Zhang et al., 8 Dec 2025).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Cylindrical Rotary Position Embedding (CyRoPE).