SynerGen: Unified Generative Recommender
- The paper introduces a unified decoder-only Transformer that performs both personalized search and recommendation by fusing semantic, collaborative, and temporal signals.
- It employs a joint optimization framework with InfoNCE loss for retrieval and a hybrid ranking loss for click prediction, reducing traditional pipeline overhead.
- A novel time-aware rotary positional embedding models continuous temporal dynamics, enhancing user personalization and scalability in industrial deployments.
SynerGen is a contextualized generative recommender system whose primary contribution is to unify personalized search and recommendation under a single decoder-only Transformer backbone, obviating the traditional retrieve-then-rank split and leveraging joint optimization over behavioral sequences for both retrieval and ranking (Gao et al., 26 Sep 2025). It advances recommender system architecture by fusing semantic, collaborative, and temporal signals, enabling simultaneous, high-performing search and recommendation in large-scale domains.
1. Unified Generative Architecture
SynerGen’s foundational design is a decoder-only Transformer applied to behavioral event sequences, with each sequence comprising items, actions, timestamps, and queries (if available). This architecture yields a unified treatment of search and recommendation, allowing both to be performed by the same model and with shared optimization objectives. The system utilizes four token types:
- Context token: for query-free recommendation (feed).
- Retrieval token: for query-aware search, using a masked item embedding to enforce prediction based on intent rather than identity.
- Ranking token: for explicit candidate evaluation and click probability estimation, using item identity, query, and action.
- Auxiliary task tokens: for internal signaling (modulating whether the model operates in context modeling, relevance estimation, or click prediction mode).
All behavioral tokens are represented by concatenated embeddings of:
- Semantic signals, obtained by freezing a pretrained LLM (applied to item metadata and queries).
- Collaborative signals, consisting of randomly initialized embeddings for item IDs and actions.
- Temporal and other contextual features.
Embedding fusion is performed through a dedicated MLP layer, and input sequences are processed left-to-right with autoregressive masking.
2. Joint Retrieval and Ranking Optimization
SynerGen jointly trains for retrieval and ranking within a single optimization framework:
- Retrieval is cast as a generative sequence modeling problem using InfoNCE contrastive loss, where the model predicts the next relevant token from masked item inputs. Let be the retrieval representation, the positive item embedding, and a negative sample. The loss is
where negatives are sampled both in-batch and as “impressed-but-not-clicked” hard negatives.
- Ranking employs a hybrid loss: pointwise binary cross-entropy for click probability estimation and pairwise margin ranking for preference ordering:
where is the observed click label and the model score. Both losses are combined as .
The unified optimization allows semantic signals acquired during search to directly enhance recommendation quality and vice versa, reducing calibration mismatch and architectural overhead.
3. Time-Aware Rotary Positional Embedding
Temporal context, critical for user modeling, is encoded via a novel time-aware rotary positional embedding (RoPE). Instead of static or bucketed positional encoding, SynerGen applies block-diagonal rotary transformations parameterized directly by Unix timestamps. If an input embedding is observed at time , its query and key are projected as , . For cross-token attention, the scoring function is: This mechanism models continuous and irregular time gaps with shift-invariance and extrapolation capability, directly encoding time-based relevance patterns in transformer attention scores.
4. Evaluation and Metrics
Empirical validation demonstrates SynerGen’s superiority on standard benchmarks:
| Task | SynerGen Improvement | Strongest Baselines |
|---|---|---|
| Book Review | Higher Recall@K; competitive NDCG | SASRec, Bert4Rec, HLLM-1B |
| eBook Search | Highest Recall@1, improved NDCG@10 | UnifiedSSR, UniSAR, TEM, CoPPS |
Additional ablation studies confirm that collaborative embeddings, InfoNCE-driven retrieval optimization, targeted ranking head, and time-aware RoPE all contribute to both early precision and overall ranking quality, without trading off between search and recommendation accuracy.
5. Practical Implications and Industrial Deployment
By combining search and recommendation in a single backbone and loss, SynerGen reduces misalignment and engineering overhead historically associated with separate retrieve-then-rank pipelines. The model’s architecture is latency-efficient and suitable for industrial-scale deployment where user personalization must respond to varied and dynamic interaction modalities (e.g., real-time commerce, content feeds, or query-driven catalog browsing). The system’s ability to incorporate semantic, collaborative, and temporal cues in a single pass notably enhances consistent user experience and personalization diversity.
6. Conceptual Significance and Future Directions
SynerGen validates the theoretical promise of generator-based sequence models as unifiers for retrieval and ranking, demonstrating that a shared optimization objective and backbone can excel at both tasks simultaneously, with performance gains traceable to architectural and loss design choices rather than increased parameter volume. The approach opens opportunities for further unified recommendation architectures, integrating cross-modal behavioral signals and bounding engineering complexity for extensible information access.
7. Summary Table of Key Innovations
| Feature | Description | Impact |
|---|---|---|
| Decoder-only Transformer | Single backbone for search & recommendation | Reduces architectural overhead; improves calibration |
| InfoNCE + Hybrid Ranking | Unified joint training for retrieval and ranking | Bridges semantic and behavioral signals |
| Time-aware RoPE | Precise encoding of irregular time gaps in attention scores | Shift-invariance; temporal extrapolation |
| Collaborative/semantic fusion | Combined frozen LLM, item, action, and context embeddings | Higher recall/precision; robust to cold start |
| Empirical improvement | Outperforms strong baselines on feed and query tasks | Demonstrates scalability and generality |
In conclusion, SynerGen is a contextualized generative recommender model that bridges the architectural and optimization gap between search and recommendation systems, achieving state-of-the-art performance through unified modeling and novel temporal context encoding (Gao et al., 26 Sep 2025).