Discrete Semantic Tokenization for Deep CTR Prediction (2403.08206v2)
Abstract: Incorporating item content information into click-through rate (CTR) prediction models remains a challenge, especially with the time and space constraints of industrial scenarios. The content-encoding paradigm, which integrates user and item encoders directly into CTR models, prioritizes space over time. In contrast, the embedding-based paradigm transforms item and user semantics into latent embeddings, subsequently caching them to optimize processing time at the expense of space. In this paper, we introduce a new semantic-token paradigm and propose a discrete semantic tokenization approach, namely UIST, for user and item representation. UIST facilitates swift training and inference while maintaining a conservative memory footprint. Specifically, UIST quantizes dense embedding vectors into discrete tokens with shorter lengths and employs a hierarchical mixture inference module to weigh the contribution of each user--item token pair. Our experimental results on news recommendation showcase the effectiveness and efficiency (about 200-fold space compression) of UIST for CTR prediction.
- Neural machine translation by jointly learning to align and translate. arXiv (2014).
- Behavior sequence transformer for e-commerce recommendation in alibaba. In 1st DLP4Rec. 1–4.
- Tom Fawcett. 2006. An introduction to ROC analysis. PRL (2006).
- DeepFM: a factorization-machine based neural network for CTR prediction. arXiv (2017).
- Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. TOIS 20, 4 (2002).
- Language Models As Semantic Indexers. arXiv (2023).
- Diederik P Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. ICLR (2015).
- Boosting Deep CTR Prediction with a Plug-and-Play Pre-trainer for News Recommendation. In COLING. International Committee on Computational Linguistics.
- Only Encode Once: Making Content-based News Recommender Greener.
- FinalMLP: An Enhanced Two-Stream MLP Model for CTR Prediction. In AAAI.
- Recommender Systems with Generative Retrieval. arXiv (2023).
- Attention is all you need. NIPS 30 (2017).
- Deep & Cross Network for Ad Click Predictions. In ADKDD (Halifax, NS, Canada) (ADKDD’17). ACM, New York, NY, USA, Article 12, 7 pages.
- Neural News Recommendation with Multi-head Self-attention. In EMNLP-IJCNLP. 6389–6394.
- UserBERT: Pre-training User Model with Contrastive Self-supervision. In SIGIR. 2087–2092.
- NewsBERT: Distilling pre-trained language model for intelligent news application. arXiv (2021).
- Mind: A large-scale dataset for news recommendation. In ACL. 3597–3606.
- Where to go next for recommender systems? id-vs. modality-based recommender models revisited. arXiv (2023).
- Soundstream: An end-to-end neural audio codec. TASLP 30 (2021).
- Deep interest network for click-through rate prediction. In SIGKDD. 1059–1068.
- Open Benchmarking for Click-Through Rate Prediction. In CIKM. 2759–2769.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.