Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Better Generalization with Semantic IDs: A Case Study in Ranking for Recommendations (2306.08121v2)

Published 13 Jun 2023 in cs.IR and cs.LG

Abstract: Randomly-hashed item ids are used ubiquitously in recommendation models. However, the learned representations from random hashing prevents generalization across similar items, causing problems of learning unseen and long-tail items, especially when item corpus is large, power-law distributed, and evolving dynamically. In this paper, we propose using content-derived features as a replacement for random ids. We show that simply replacing ID features with content-based embeddings can cause a drop in quality due to reduced memorization capability. To strike a good balance of memorization and generalization, we propose to use Semantic IDs -- a compact discrete item representation learned from frozen content embeddings using RQ-VAE that captures the hierarchy of concepts in items -- as a replacement for random item ids. Similar to content embeddings, the compactness of Semantic IDs poses a problem of easy adaption in recommendation models. We propose novel methods for adapting Semantic IDs in industry-scale ranking models, through hashing sub-pieces of of the Semantic-ID sequences. In particular, we find that the SentencePiece model that is commonly used in LLM tokenization outperforms manually crafted pieces such as N-grams. To the end, we evaluate our approaches in a real-world ranking model for YouTube recommendations. Our experiments demonstrate that Semantic IDs can replace the direct use of video IDs by improving the generalization ability on new and long-tail item slices without sacrificing overall model quality.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems, pages 7–10, 2016.
  2. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM conference on recommender systems, pages 191–198, 2016.
  3. Jukebox: A generative model for music, 2020.
  4. How to learn item representation for cold-start multimedia recommendation? In Proceedings of the 28th ACM International Conference on Multimedia, pages 3469–3477, 2020.
  5. Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12873–12883, 2021.
  6. C. A. Gomez-Uribe and N. Hunt. The netflix recommender system: Algorithms, business value, and innovation. ACM Transactions on Management Information Systems (TMIS), 6(4):1–19, 2015.
  7. Learning vector-quantized item representation for transferable sequential recommenders. arXiv preprint arXiv:2210.12316, 2022.
  8. Product quantization for nearest neighbor search. IEEE transactions on pattern analysis and machine intelligence, 33(1):117–128, 2010.
  9. Learning multi-granular quantized embeddings for large-vocab categorical features in recommender systems. In Companion Proceedings of the Web Conference 2020, pages 562–566, 2020.
  10. Learning to embed categorical features without embedding tables for recommendation. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pages 840–850, 2021.
  11. A music recommendation system with a dynamic k-means clustering algorithm. In Sixth international conference on machine learning and applications (ICMLA 2007), pages 399–403. IEEE, 2007.
  12. Matrix factorization techniques for recommender systems. Computer, 42(8):30–37, 2009.
  13. Autoregressive image generation using residual quantization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11523–11532, 2022.
  14. Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems, 26, 2013.
  15. Pinnerformer: Sequence modeling for user representation at pinterest. arXiv preprint arXiv:2205.04507, 2022.
  16. Recommender systems with generative retrieval. arXiv preprint arXiv:2305.05065, 2023.
  17. Methods and metrics for cold-start recommendations. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 253–260, 2002.
  18. Adaptive feature sampling for recommendation with missing content feature values. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pages 1451–1460, 2019.
  19. Neural discrete representation learning. Advances in neural information processing systems, 30, 2017.
  20. Dropoutnet: Addressing cold start in recommender systems. Advances in neural information processing systems, 30, 2017a.
  21. Content-based neighbor models for cold start in recommender systems. In Proceedings of the Recommender Systems Challenge 2017, pages 1–6. 2017b.
  22. Feature hashing for large scale multitask learning. In Proceedings of the 26th annual international conference on machine learning, pages 1113–1120, 2009.
  23. Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, jul 2018. doi: 10.1145/3219819.3219890. URL https://doi.org/10.1145%2F3219819.3219890.
  24. Vector-quantized image modeling with improved vqgan. arXiv preprint arXiv:2110.04627, 2021.
  25. Scaling autoregressive models for content-rich text-to-image generation. arXiv preprint arXiv:2206.10789, 2022.
  26. Soundstream: An end-to-end neural audio codec. CoRR, abs/2107.03312, 2021. URL https://arxiv.org/abs/2107.03312.
  27. Model size reduction using frequency based double hashing for recommender systems. In Proceedings of the 14th ACM Conference on Recommender Systems, pages 521–526, 2020.
  28. Recommending what video to watch next: a multitask ranking system. In Proceedings of the 13th ACM Conference on Recommender Systems, pages 43–51, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (12)
  1. Anima Singh (4 papers)
  2. Trung Vu (12 papers)
  3. Raghunandan Keshavan (2 papers)
  4. Nikhil Mehta (34 papers)
  5. Xinyang Yi (24 papers)
  6. Lichan Hong (35 papers)
  7. Lukasz Heldt (8 papers)
  8. Li Wei (53 papers)
  9. Maheswaran Sathiamoorthy (14 papers)
  10. Yilin Zheng (7 papers)
  11. Devansh Tandon (1 paper)
  12. Ed H. Chi (74 papers)
Citations (12)
X Twitter Logo Streamline Icon: https://streamlinehq.com