- The paper introduces RoTE, which decomposes timestamps into year, month, and day components and incorporates them using a rotary embedding approach for enhanced temporal modeling.
- The method achieves significant performance gains, such as a +20.11% improvement in NDCG@5 and +17.51% in Recall@5, while adding minimal computational overhead.
- RoTE's plug-and-play design integrates directly with Transformer-based models, offering robust improvements in capturing long- and short-term user preferences.
Coarse-to-Fine Multi-Level Rotary Time Embedding for Sequential Recommendation
Introduction
The paper "RoTE: Coarse-to-Fine Multi-Level Rotary Time Embedding for Sequential Recommendation" (2604.13389) addresses fundamental limitations in temporal modeling within sequential recommender system frameworks, particularly those based on Transformer architectures. Standard sequential recommendation approaches rely heavily on order-preserving positional embeddings, which neglect the non-uniform and heterogeneous temporal intervals between user interactions. This reduces temporal dynamics to a coarse, order-only signal and inhibits the model's capacity to capture nuanced user preference evolution across multi-scale temporal gaps.
The RoTE module introduces a hierarchical approach to timestamp encoding, decomposing interaction time into year, month, and day components, and leveraging these granularities in a rotary embedding framework. This method enables attention mechanisms to perceive and exploit temporal distances and multi-level temporal patterns without requiring architectural changes to backbone models.
Methodology
Problem Definition
Sequential recommendation is formalized as p(iLuโโโฃSuโ), where Suโ denotes a userโs chronologically ordered interaction history with items and corresponding timestamps. Accurate modeling requires capturing dependencies and evolution within such sequences, taking into account irregular inter-event intervals.
Temporal Feature Construction
RoTE decomposes Unix timestamps for each interaction into three ordinal calendar components: years (ykโ), months (mkโ), and days (dkโ) since the epoch. This structured triplet preserves chronological order, aligns more closely with natural human temporal reasoning, and creates a foundation for multi-scale temporal modeling.
Rotary Time Embedding (RoTE)
RoTE injects temporal signals into the query and key vectors of the Transformerโs multi-head self-attention mechanism by applying rotary transformations at each temporal granularity. Specifically:
- For each temporal level lโ{y,m,d}, rotation angles ฮธk(l)โ are computed using a fixed inverse frequency spectrum controlled by level-specific base scalars.
- Query and key vectors undergo 2D rotational transformations over each even-odd dimension pair, encoding time-awareness directly in the angular relationships used for computing attention scores.
- Three temporal representations per interaction are obtained (year, month, day), and combined via a weighted fusion: ฮฑyโ, ฮฑmโ, ฮฑdโ emphasize long-term preferences and short-term dynamics respectively.
RoTE does not modify the value representations, the core attention formulation, or training objectives; it is entirely plug-and-play with negligible computational overhead.
Experimental Evaluation
Datasets and Baselines
Experiments are conducted on three Amazon Reviews datasets (Sports and Outdoors, Beauty, Toys and Games, using 5-core splits and fixed sequence length preprocessing), spanning both traditional (GRU4Rec, Caser, SASRec, BERT4Rec) and generative (VQRec, TIGER, HSTU, RPG) sequential recommendation paradigms. RoTE modules are integrated into SASRec (traditional) and RPG (generative) as representative backbones.
Numerical Results
RoTE yields consistent enhancements across all metrics and models. Notably:
- On Toys and Games, RoTE improves RPG's NDCG@5 by +20.11%, and Recall@5 by +17.51%, demonstrating strong performance advantages.
- Statistically significant improvements are observed via paired Suโ0-tests (Suโ1).
Ablation studies show incremental gains when introducing structured calendar components (year, month, day) compared to pure timestamp rotary encoding or positional-only baselines. The finest granularity (Y+M+D) delivers maximal benefit.
Efficiency Analysis
RoTE introduces only marginal increases in FLOPs (e.g., SASRec: +110K) and inference latency (SASRec: +0.7ms; RPG: +1.9ms), with a slight reduction in parameter count due to the elimination of the positional embedding table. These results validate the practical applicability of RoTE for real-time recommender scenarios.
Implications and Future Directions
Theoretical implications of RoTE suggest that multi-level temporal encoding strengthens the inductive bias within the attention mechanism for temporal reasoning, allowing for more accurate modeling of preference drift and multi-scale dynamics. Practically, RoTEโs plug-and-play design offers immediate integration potential for a wide range of sequential models, facilitating performance upgrades with minimal system cost.
Future research may investigate:
- Expanding the granularity spectrum (e.g., week, hour, minute) and dynamically learning optimal temporal fusion weights.
- Extending RoTE to session-based, group, or multi-modal recommendation contexts.
- Incorporating more advanced temporal regularization strategies or higher-order temporal interactions within rotary embeddings.
Conclusion
RoTE presents a practical, principled solution to limitations in temporal modeling for sequential recommendation. By decomposing timestamps into hierarchical components and injecting them via multi-level rotary embeddings in Transformers, RoTE enables superior temporal sensitivity and achieves robust performance gains across diverse baselines and datasets. Its lightweight and generic design positions it as a compelling temporal modeling module for future sequential recommendation research and production systems.