Dual-Flow Orthogonal Semantic IDs (DOS)
- Dual-Flow Orthogonal Semantic IDs (DOS) is a novel framework that resolves codebook misalignment and semantic loss in generative recommendation systems.
- Its dual-flow architecture processes user and item data in parallel using transformer encoders and orthogonal residual quantization to align semantic representations.
- Empirical evaluations show DOS outperforms baselines with significant gains in offline CTR prediction and online gross merchandise value in large-scale deployments.
Dual-Flow Orthogonal Semantic IDs (DOS) is a semantic coding and quantization framework for generative recommendation (GR) systems designed to address key limitations of prior Semantic ID (SID) solutions. DOS couples a dual-flow user–item architecture for collaborative alignment with an orthogonal residual quantization (ORQ) mechanism that maximizes semantic preservation of LLM embeddings. DOS has demonstrated superior offline and online performance and is in operational use at massive scale within Meituan’s recommendation infrastructure (Yin et al., 4 Feb 2026).
1. Motivation and Problem Formulation
Generative Recommendation (GR) systems use compact Semantic IDs (SIDs) to encode item information, leveraging LLMs both to inject open-world semantics and to compress output spaces for tractable sequence generation. Despite their utility, existing SID designs are hindered by two systemic problems: (1) a codebook-generation gap due to codebooks learned outside the context of the downstream generation task, and (2) semantic degradation induced by naive discretization schemes such as k-means or vanilla VQ-VAE that introduce substantial information loss during quantization. These drawbacks result in diminished recommendation performance and poor downstream LLM token comprehension.
DOS is explicitly constructed to resolve both issues. Its dual objectives are: (a) align the semantic codebook with the collaborative user–item signal of the GR task, and (b) preserve LLM semantic fidelity through orthogonally optimized, residual quantization.
2. Dual-Flow Architecture
The DOS architecture processes user sequences and candidate items symmetrically via a parallel two-flow structure that interfaces at a shared, multi-layer codebook.
- Embedding Pool Construction: All items are embedded by a Qwen3-0.6B LLM, yielding .
- User Flow: A user’s click history encodes behavioral signal.
- Item Flow: The candidate’s embedding is sequence-length replicated for symmetry.
- Transformer Encoders: Both flows apply a single-layer Transformer:
- ORQ Quantization: Both and are quantized via the same, multi-layer ORQ module:
with shared codebook entries per layer.
- Semantic Alignment: Codebook-sharing forces contextual user and item representations into an aligned latent space, closing the codebook-generation gap.
- Aggregation and Prediction: The layerwise quantized outputs are aggregated and concatenated:
Cross-entropy loss supervises recommendation objectives.
3. Orthogonal Residual Quantization (ORQ) Mechanism
The ORQ module addresses quantization distortion by multi-stage, orthogonally-regularized dimensional selection and residual propagation:
- Orthogonal Rotation: A trainable orthogonal matrix (constrained by ) rotates input activations to orientations optimal for quantization:
- Saliency-Based Split:
- Dimension salience is scored by .
- Top- salient dimensions form a mask , splitting into (primary) and (secondary) dimensions.
- Primary channels capture the most task-relevant semantic components; overlap between and is structurally zero.
- Task-Relevance Regularization: Maximizing mutual information between and labels via
ensures codebook entries are semantically discriminative.
- Residual Quantization and Multi-Layer Stacking: Each primary block is quantized to the nearest codebook vector; the residual and secondary dimensions are aggregated and passed recursively to the next quantizer layer.
- VQ-VAE Loss per Layer: Standard VQ-VAE loss with stop-gradient, :
4. Training Objective and Optimization Strategies
DOS is trained end-to-end, combining recommendation, orthogonality, mutual information, and reconstruction objectives:
where penalizes deviation from the original LLM embeddings and . Hyperparameter settings include: , codebook layers with entries each, , batch size $1024$, Adam optimizer, and early stopping with patience $5$. Training and evaluation employ CTR-AUC, F1, and next-token Hit@10 metrics. Baselines used for comparison are RQ-KMeans, RQ-VAE, DAS, and HSTU.
5. Empirical Evaluation and Comparative Results
Offline Metrics
DOS demonstrates improved performance relative to several strong baseline systems. On downstream CTR prediction:
- RQ-KMeans: AUC 0.8363 / F1 0.7641
- RQ-VAE: AUC 0.8526 / F1 0.7739
- DAS: AUC 0.8539 / F1 0.7869
- DOS: AUC 0.8763 / F1 0.8057 (approximately +2.2% AUC and +1.9% F1 over DAS)
For next-token sequence generation (Hit@10, all business types):
- HSTU-RQ-KMeans: 0.0410
- HSTU-DAS: 0.0511
- HSTU-DOS: 0.0676 (+32% relative improvement)
Online A/B Test
A weeklong 30% traffic A/B test on Meituan’s production system yielded a +1.15% increase in gross merchandise value (GMV) and related CTR metric, attributable solely to the replacement of the prior SID module with DOS.
6. System Deployment and Scalability
Meituan integrates DOS into large-scale recommender infrastructure as follows:
- Semantic embeddings are precomputed with a Qwen3-0.6B LLM and stored in a key-value store.
- At inference, two shallow Transformer encoders and three lightweight ORQ layers (with ) process inputs on GPU-enabled servers.
- Fused kernel implementations for codebook lookup and MLP inference ensure overall latency remains under 20 ms.
- The DOS SID module is deployed in Meituan’s Android/iOS mobile applications, serving hundreds of millions of daily active users.
7. Significance and Implications
The DOS framework provides an explicit resolution to the task-agnostic codebook misalignment and the semantic loss from conventional quantization. By enforcing a shared, collaboratively supervised codebook and an orthogonally regularized, salience-preserving quantization pipeline, DOS sets a new standard for generative recommendation SID representations. The observed production online lift and robust offline generalization reflect the method’s architectural and algorithmic advancements in both semantic retention and efficiency (Yin et al., 4 Feb 2026). A plausible implication is the extensibility of dual-flow and ORQ principles to other LLM-driven discrete representation learning tasks beyond the scope of recommendation.