Dual-Flow Orthogonal Semantic IDs (DOS)

Updated 5 February 2026

Dual-Flow Orthogonal Semantic IDs (DOS) is a novel framework that resolves codebook misalignment and semantic loss in generative recommendation systems.
Its dual-flow architecture processes user and item data in parallel using transformer encoders and orthogonal residual quantization to align semantic representations.
Empirical evaluations show DOS outperforms baselines with significant gains in offline CTR prediction and online gross merchandise value in large-scale deployments.

Dual-Flow Orthogonal Semantic IDs (DOS) is a semantic coding and quantization framework for generative recommendation (GR) systems designed to address key limitations of prior Semantic ID (SID) solutions. DOS couples a dual-flow user–item architecture for collaborative alignment with an orthogonal residual quantization (ORQ) mechanism that maximizes semantic preservation of LLM embeddings. DOS has demonstrated superior offline and online performance and is in operational use at massive scale within Meituan’s recommendation infrastructure (Yin et al., 4 Feb 2026).

1. Motivation and Problem Formulation

Generative Recommendation (GR) systems use compact Semantic IDs (SIDs) to encode item information, leveraging LLMs both to inject open-world semantics and to compress output spaces for tractable sequence generation. Despite their utility, existing SID designs are hindered by two systemic problems: (1) a codebook-generation gap due to codebooks learned outside the context of the downstream generation task, and (2) semantic degradation induced by naive discretization schemes such as k-means or vanilla VQ-VAE that introduce substantial information loss during quantization. These drawbacks result in diminished recommendation performance and poor downstream LLM token comprehension.

DOS is explicitly constructed to resolve both issues. Its dual objectives are: (a) align the semantic codebook with the collaborative user–item signal of the GR task, and (b) preserve LLM semantic fidelity through orthogonally optimized, residual quantization.

2. Dual-Flow Architecture

The DOS architecture processes user sequences and candidate items symmetrically via a parallel two-flow structure that interfaces at a shared, multi-layer codebook.

Embedding Pool Construction: All items are embedded by a Qwen3-0.6B LLM, yielding $E_{semantic} = \{e_1, ..., e_N\} \in \mathbb{R}^{N \times d}$ .
User Flow: A user’s click history $X_{seq} = [e_m, e_n, ..., e_p] \in \mathbb{R}^{S \times d}$ encodes behavioral signal.
Item Flow: The candidate’s embedding $X_{target} = [e_t, ..., e_t] \in \mathbb{R}^{S \times d}$ is sequence-length replicated for symmetry.
Transformer Encoders: Both flows apply a single-layer Transformer:

$X_{user} = Encoder_{user}(X_{seq}), \quad X_{item} = Encoder_{item}(X_{target})$

ORQ Quantization: Both $X_{user}$ and $X_{item}$ are quantized via the same, multi-layer ORQ module:

$C_{user} = ORQ_{user}(X_{user}), \quad C_{item} = ORQ_{item}(X_{item})$

with shared codebook $C = \{C^{(1)}, ..., C^{(L)}\},\, K$ entries per layer.

Semantic Alignment: Codebook-sharing forces contextual user and item representations into an aligned latent space, closing the codebook-generation gap.
Aggregation and Prediction: The layerwise quantized outputs are aggregated and concatenated:

$\hat{Y} = MLP_{pred}(Aggre_{user}(C_{user}) \oplus Aggre_{item}(C_{item}))$

Cross-entropy loss supervises recommendation objectives.

3. Orthogonal Residual Quantization (ORQ) Mechanism

The ORQ module addresses quantization distortion by multi-stage, orthogonally-regularized dimensional selection and residual propagation:

Orthogonal Rotation: A trainable orthogonal matrix $W_{orth} \in \mathbb{R}^{d \times d}$ (constrained by $X_{seq} = [e_m, e_n, ..., e_p] \in \mathbb{R}^{S \times d}$ 0) rotates input activations to orientations optimal for quantization:

$X_{seq} = [e_m, e_n, ..., e_p] \in \mathbb{R}^{S \times d}$ 1

Saliency-Based Split:
- Dimension salience is scored by $X_{seq} = [e_m, e_n, ..., e_p] \in \mathbb{R}^{S \times d}$ 2.
- Top- $X_{seq} = [e_m, e_n, ..., e_p] \in \mathbb{R}^{S \times d}$ 3 salient dimensions form a mask $X_{seq} = [e_m, e_n, ..., e_p] \in \mathbb{R}^{S \times d}$ 4, splitting $X_{seq} = [e_m, e_n, ..., e_p] \in \mathbb{R}^{S \times d}$ 5 into $X_{seq} = [e_m, e_n, ..., e_p] \in \mathbb{R}^{S \times d}$ 6 (primary) and $X_{seq} = [e_m, e_n, ..., e_p] \in \mathbb{R}^{S \times d}$ 7 (secondary) dimensions.
- Primary channels capture the most task-relevant semantic components; overlap between $X_{seq} = [e_m, e_n, ..., e_p] \in \mathbb{R}^{S \times d}$ 8 and $X_{seq} = [e_m, e_n, ..., e_p] \in \mathbb{R}^{S \times d}$ 9 is structurally zero.
Task-Relevance Regularization: Maximizing mutual information between $X_{target} = [e_t, ..., e_t] \in \mathbb{R}^{S \times d}$ 0 and labels $X_{target} = [e_t, ..., e_t] \in \mathbb{R}^{S \times d}$ 1 via

$X_{target} = [e_t, ..., e_t] \in \mathbb{R}^{S \times d}$ 2

ensures codebook entries are semantically discriminative.

Residual Quantization and Multi-Layer Stacking: Each primary block is quantized to the nearest codebook vector; the residual and secondary dimensions are aggregated and passed recursively to the next quantizer layer.
VQ-VAE Loss per Layer: Standard VQ-VAE loss with stop-gradient, $X_{target} = [e_t, ..., e_t] \in \mathbb{R}^{S \times d}$ 3:

$X_{target} = [e_t, ..., e_t] \in \mathbb{R}^{S \times d}$ 4

4. Training Objective and Optimization Strategies

DOS is trained end-to-end, combining recommendation, orthogonality, mutual information, and reconstruction objectives:

$X_{target} = [e_t, ..., e_t] \in \mathbb{R}^{S \times d}$ 5

where $X_{target} = [e_t, ..., e_t] \in \mathbb{R}^{S \times d}$ 6 penalizes deviation from the original LLM embeddings and $X_{target} = [e_t, ..., e_t] \in \mathbb{R}^{S \times d}$ 7. Hyperparameter settings include: $X_{target} = [e_t, ..., e_t] \in \mathbb{R}^{S \times d}$ 8, $X_{target} = [e_t, ..., e_t] \in \mathbb{R}^{S \times d}$ 9 codebook layers with $X_{user} = Encoder_{user}(X_{seq}), \quad X_{item} = Encoder_{item}(X_{target})$ 0 entries each, $X_{user} = Encoder_{user}(X_{seq}), \quad X_{item} = Encoder_{item}(X_{target})$ 1, batch size $X_{user} = Encoder_{user}(X_{seq}), \quad X_{item} = Encoder_{item}(X_{target})$ 2, Adam optimizer, and early stopping with patience $X_{user} = Encoder_{user}(X_{seq}), \quad X_{item} = Encoder_{item}(X_{target})$ 3. Training and evaluation employ CTR-AUC, F1, and next-token Hit@10 metrics. Baselines used for comparison are RQ-KMeans, RQ-VAE, DAS, and HSTU.

5. Empirical Evaluation and Comparative Results

Offline Metrics

DOS demonstrates improved performance relative to several strong baseline systems. On downstream CTR prediction:

RQ-KMeans: AUC 0.8363 / F1 0.7641
RQ-VAE: AUC 0.8526 / F1 0.7739
DAS: AUC 0.8539 / F1 0.7869
DOS: AUC 0.8763 / F1 0.8057 (approximately +2.2% AUC and +1.9% F1 over DAS)

For next-token sequence generation (Hit@10, all business types):

HSTU-RQ-KMeans: 0.0410
HSTU-DAS: 0.0511
HSTU-DOS: 0.0676 (+32% relative improvement)

Online A/B Test

A weeklong 30% traffic A/B test on Meituan’s production system yielded a +1.15% increase in gross merchandise value (GMV) and related CTR metric, attributable solely to the replacement of the prior SID module with DOS.

6. System Deployment and Scalability

Meituan integrates DOS into large-scale recommender infrastructure as follows:

Semantic embeddings $X_{user} = Encoder_{user}(X_{seq}), \quad X_{item} = Encoder_{item}(X_{target})$ 4 are precomputed with a Qwen3-0.6B LLM and stored in a key-value store.
At inference, two shallow Transformer encoders and three lightweight ORQ layers (with $X_{user} = Encoder_{user}(X_{seq}), \quad X_{item} = Encoder_{item}(X_{target})$ 5) process inputs on GPU-enabled servers.
Fused kernel implementations for codebook lookup and MLP inference ensure overall latency remains under 20 ms.
The DOS SID module is deployed in Meituan’s Android/iOS mobile applications, serving hundreds of millions of daily active users.

7. Significance and Implications

The DOS framework provides an explicit resolution to the task-agnostic codebook misalignment and the semantic loss from conventional quantization. By enforcing a shared, collaboratively supervised codebook and an orthogonally regularized, salience-preserving quantization pipeline, DOS sets a new standard for generative recommendation SID representations. The observed production online lift and robust offline generalization reflect the method’s architectural and algorithmic advancements in both semantic retention and efficiency (Yin et al., 4 Feb 2026). A plausible implication is the extensibility of dual-flow and ORQ principles to other LLM-driven discrete representation learning tasks beyond the scope of recommendation.

Markdown Report Issue Upgrade to Chat

References (1)

DOS: Dual-Flow Orthogonal Semantic IDs for Recommendation in Meituan (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Dual-Flow Orthogonal Semantic IDs (DOS).