Cylindrical Rotary Position Embedding (CyRoPE)
- CyRoPE is a position encoding method designed for cylindrical or annular data, separating linear (temporal) and circular (angular) components.
- It applies multiplicative block-diagonal rotations to embed both absolute and relative positions, enhancing transformer self-attention mechanisms.
- Empirical results in sEMG decoding show that CyRoPE improves performance metrics, confirming its effectiveness in handling spatial-temporal data.
Cylindrical Rotary Position Embedding (CyRoPE) generalizes rotary position encoding mechanisms for transformer architectures to domains in which the underlying data exhibits cylindrical or annular topology. CyRoPE encodes both linear and angular positional components via multiplicative block-diagonal rotations in the embedding space, enabling self-attention layers to natively incorporate relative and absolute position cues for data distributed over a 2D cylindrical manifold. The approach inherits key mathematical and algorithmic properties from Rotary Position Embedding (RoPE) and leverages domain-specific geometric insights for applications including surface EMG sensor arrays and spatial-temporal geotoken modeling.
1. Theoretical Motivation and Geometric Domain
CyRoPE is motivated by limitations observed when applying conventional 1D position encoding (such as absolute sinusoid or vanilla RoPE) to observations sampled on cylindrical domains. Typical instances include surface electromyography (sEMG) arrays, which collect multi-channel time series data from sensors arranged annularly around a forearm. Treating each channel as a flat sequence disregards the wrap-around adjacency of channel indices (e.g., and are physically adjacent), impairing transformer cross-channel interaction and missing muscle synergy signatures dependent on angular proximity (Weng et al., 27 Dec 2025). Cylindrical geometry naturally separates position into linear (e.g., time or height, ) and circular (azimuthal, ) components, mandating explicit encoding of both for effective attention.
2. Mathematical Formulation of CyRoPE
The formalism for CyRoPE extends block-diagonal rotary encodings to two orthogonal dimensions:
Let be the token embedding for channel at temporal patch index . The vector is partitioned into two halves:
Each half is further split into consecutive 2-dimensional "complex" coordinates, enabling planar rotary operations as in RoFormer (Su et al., 2021).
Temporal (Linear) Rotary Encoding:
- Base
- Frequency for block :
- Apply block-diagonal rotation:
where each .
Spatial (Annular) Rotary Encoding:
- Channels spaced around , fundamental angular separation
- Set largest frequency scale to ; thus,
- Frequency for block :
- Apply rotation:
Concatenate:
Both query and key vectors in attention undergo this rotation, so the resulting dot-product depends only on the relative time and relative angular offset (Weng et al., 27 Dec 2025).
3. Integration with Transformer Architectures
CyRoPE encodings are realized as in-place rotations within the query and key projections during multi-head self-attention, paralleling RoPE implementations in RoFormer (Su et al., 2021). For each token, the -dim embedding is factorized into time and channel halves, each rotated independently. This yields attention scores that depend exclusively on relative positions—both linear and angular—and yields distance decay properties characteristic of RoPE frameworks. The resulting positional bias integrates seamlessly enabling relative-awareness across both temporal and spatial dimensions.
Pseudocode (as quoted in (Weng et al., 27 Dec 2025)):
1 2 3 4 5 6 7 8 9 |
for c in 0...C–1: for t in 0...L/P–1: p_ct = X[c, tP:(t+1)P] z_ct = f_cnn(p_ct) # Embed patch z_t, z_c = split(z_ct) # d/2 each z_t_rot = rotate(z_t, t, β_t) z_c_rot = rotate(z_c, c, β_c) z_ct = concat(z_t_rot, z_c_rot) # pass z_ct to Transformer layers |
4. Extension to Cylindrical/2D Manifolds: Connections and Generalizations
The CyRoPE mechanism operationalizes the principle that each coordinate axis can be encoded via independent planar rotation, a method extendable to higher-dimensional manifolds. In the context of cylindrical coordinates , rotary encoding applies planar rotation by (azimuth), scaling in the radial direction , and translation or scaling along the axial (Unlu, 2023). A block-diagonal construction in , with blocks for triplets , allows the implementation of:
so the embedding after rotation matches the structure of Euclidean distance in cylindrical coordinates:
Relative-position attention thus reflects both angular and axial displacements (Unlu, 2023). Frequency scaling per block can be added for multi-scale sensitivity, paralleling standard RoPE frequency bands.
5. Group-Theoretical Foundations and Generalization
The GRAPE framework (Zhang et al., 8 Dec 2025) formalizes multiplicative rotary embeddings as one-parameter subgroups in , generalizing position encoding via group actions. RoPE arises as the canonical commuting instance, with planes rotated independently at log-uniform frequencies:
where each is a rank-2 skew generator on a coordinate plane.
Cylindrical position encoding extends this principle by pairing planar rotational generators with unipotent (additive) translation generators, realizing rigid motions on a cylinder via block-upper-triangular group actions:
such structures encode both angular displacement and axial translation in attention, enabling exact relative composition and streaming cacheability (Zhang et al., 8 Dec 2025).
6. Empirical Performance and Ablation Insights
In the context of sEMG movement decoding, CyRoPE was evaluated within the SPECTRE self-supervised learning framework (Weng et al., 27 Dec 2025). Empirical ablation reveals substantial performance improvement over absolute positional encoding, both with and without pre-training (Table 7):
| PE Type | Pre-train Target | Pre-trained? | MSE ↓ | MAE ↓ | R² ↑ |
|---|---|---|---|---|---|
| Absolute PE | STFT Clusters | No | 0.0219±0.0140 | 0.0835±0.0314 | 0.7091±0.1985 |
| Absolute PE | STFT Clusters | Yes | 0.0206±0.0123 | 0.0812±0.0293 | 0.7252±0.1804 |
| CyRoPE | Raw Clusters | Yes | 0.0189±0.0117 | 0.0777±0.0297 | 0.7469±0.1746 |
| CyRoPE | STFT Clusters | No | 0.0196±0.0110 | 0.0794±0.0276 | 0.7380±0.1636 |
| CyRoPE | STFT Clusters | Yes | 0.0184±0.0114 | 0.0770±0.0287 | 0.7547±0.1657 |
Replacing absolute 1D PE by CyRoPE alone (no pre-train) raises from $0.7091$ to $0.7380$ (+4.0%), and CyRoPE plus spectral (STFT) clustering during pre-training yields versus $0.7252$ for absolute PE (+3.0%). These improvements were robust across subjects.
7. Practical Implementation Details and Limitations
CyRoPE encoding can be instantiated within transformers using standard programming frameworks (e.g., PyTorch, TensorFlow, JAX) by extending existing RoPE code to factorize embeddings along time and space or azimuthal and axial coordinates. The rotation cost remains per layer, minimal compared to dense attention operations. For sEMG, typical configurations involve channels, sample patch lengths, a CNN embedder to , and transformer stacks of $18$ layers (Weng et al., 27 Dec 2025).
No formal -values were reported for empirical gains, but observed improvement exceeded the subject standard deviation in , indicating robust field-wide benefits.
This suggests CyRoPE's geometric inductive bias can be foundational for further architectures handling data on cylindrical or periodic manifolds. Future extensions may incorporate learned frequency scaling, non-commuting group actions, or helical extensions as outlined under GRAPE’s general group representational lens (Zhang et al., 8 Dec 2025).