ENCODE: Efficient Two-Stage CTR Modeling
- The paper introduces ENCODE, a two-stage framework that leverages full-sequence clustering and metric learning to extract relevant user interests for accurate CTR prediction.
- It combines an offline extraction stage with metric learning-based dimensionality reduction and KMeans clustering to form multi-interest representations from long user behavior sequences.
- The online inference stage uses a unified attention mechanism to rapidly generate target-aligned interest vectors, significantly reducing computational latency.
EfficieNt Clustering based twO-stage interest moDEling (ENCODE) is a two-stage framework designed to efficiently model long-term user behavior sequences for click-through rate (CTR) prediction. ENCODE addresses the primary challenges in long-term sequence modeling: maximizing the utilization of the entire user history (R1) and extracting interests highly relevant to the current target item (R2). The pipeline is composed of an offline extraction stage, which discovers multi-faceted user interests via clustering and metric learning-based dimensionality reduction, followed by an online inference stage that rapidly computes target-aligned interest representations using a unified attention-based relevance metric throughout the process.
1. Theoretical Foundations and Motivation
ENCODE is motivated by limitations in prior models that either sample only part of the sequence (resulting in information loss) or employ target-attention over the full sequence (yielding high accuracy but prohibitive inference cost for online serving). Existing retrieval-based methods often break alignment between the extraction of interests and the subsequent relevance calculation with target items, negatively affecting predictive performance (Zhou et al., 19 Aug 2025).
ENCODE’s two guiding requirements are:
- R1: Leverage the entire behavior sequence so that no information is discarded.
- R2: Ensure high relevance between the extracted interests and the current target item by using a consistent relevance metric across both stages.
The method is designed to break the trade-off between full-information modeling and online serving efficiency that constrains previous systems.
2. Offline Extraction: Metric Learning and Clustering
In the offline phase, ENCODE operates on a user behavior sequence of substantial length (often hundreds to thousands of events). User interests are operationalized as sub-interests, assumed to reside in clusters of similar behaviors. The full sequence is encoded with embeddings .
Metric Learning-Based Dimensionality Reduction
Given the high-dimensional nature of typical behavior embeddings, ENCODE reduces clustering overhead by learning a projection:
with sampled from , and [Equation (A)]. A metric learning approach optimizes to preserve the relative pairwise distances between the projected embeddings, maintaining the semantic structure required for clustering.
For each behavior, positive/negative samples are selected by the distance relationships in the original space, and the following triplet loss is minimized:
with a dynamic margin
where is a cosine or euclidean distance; are positive/negative projected samples [Equations (B), (C)].
Clustering for Multi-Interest Extraction
The reduced representations are clustered using a standard algorithm such as KMeans to produce clusters. For cluster with center and member indices , the cluster’s multi-interest representation is computed by a weighted aggregation:
[Equation (D)]
The similarity function is:
[Equation (E)]
This preserves high-order information and ensures the interest representation is target-aware, rather than a simple cluster centroid.
3. Online Inference: Attention-Based Interest Matching
During online serving, the previously extracted multi-interest set and a target item are available for rapid matching.
Target-Aware Attention Mechanism
ENCODE computes the final user interest vector as a weighted sum over the multi-interests, with weights derived from the same relevance metric used in the offline stage:
[Equation (F)]
This architecture guarantees that interests discovered offline align with the relevance computation in real-time prediction, directly satisfying requirement R2.
4. Computational Complexity and Efficiency
The method is crafted for maximal scalability:
- Offline stage:
- Dimensionality reduction:
- Clustering: ( = iterations)
- Interest extraction:
- Online stage:
- For candidate items, attention computation is
By reducing the long sequence into just attentively-aggregated interest vectors (typical ), ENCODE achieves inference latency dramatically lower than full-sequence attention models, with computation now linear in not .
5. Empirical Performance and Comparative Analysis
ENCODE was benchmarked against state-of-the-art methods (including SIM (Chang et al., 2023), SDIM, and others) on both industrial-scale datasets (hundreds of millions of records) and large public datasets (Amazon Books; MovieLens 32M).
- Metrics: CTR AUC, Group AUC (GAUC), online inference latency
- Findings:
- ENCODE matches or slightly trails the “upper bound” established by models with full target-attention over all historical behaviors (e.g., DIN-L), but at a fraction of the computational cost.
- On all datasets tested, ENCODE outperforms retrieval/sampling methods in CTR AUC and GAUC, attributed to its full-sequence utilization and consistent relevance metric alignment.
- Inference latency is significantly reduced relative to full attention models.
This demonstrates that ENCODE breaks the prevailing performance-efficiency trade-off by attaining near-optimal CTR prediction with production-grade latency.
6. Significance and Relation to Prior Art
ENCODE distinguishes itself from prior retrieval-based multi-interest frameworks (e.g., SIM Hard, SIM Soft, SDIM (Chang et al., 2023)) by deploying a unified relevance metric at both extraction and matching stages, ensuring that the interests surfaced offline are directly compatible with target-specific attention during inference.
Other contemporary approaches such as TWIN (Chang et al., 2023) and RimiRec (Pei et al., 2 Feb 2024) share elements of multi-interest extraction or two-stage architectures, but ENCODE’s metric learning plus clustering strategy achieves both maximal information retention and computational tractability. Unlike holistic interest compression methods (e.g., CHIME (Bai et al., 9 Apr 2025)), ENCODE clusters and then attentively aggregates, maintaining interpretability and online efficiency.
7. Practical Considerations and Future Directions
ENCODE is designed for real-world recommender systems requiring both high-fidelity interest modeling and fast response times. The clustering process can adapt to behavioral growth and shifts by periodically retraining on fresh data. Metric learning-based reduction can be further extended with advanced contrastive or self-supervised objectives.
This suggests ENCODE may serve as a blueprint for future hybrid models combining offline multi-interest extraction via clustering and efficient online target-attention fusion, applicable to domains beyond CTR, such as search ranking, personalized advertising, or content recommendation at scale.