Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Improving Out-of-Vocabulary Handling in Recommendation Systems (2403.18280v1)

Published 27 Mar 2024 in cs.IR

Abstract: Recommendation systems (RS) are an increasingly relevant area for both academic and industry researchers, given their widespread impact on the daily online experiences of billions of users. One common issue in real RS is the cold-start problem, where users and items may not contain enough information to produce high-quality recommendations. This work focuses on a complementary problem: recommending new users and items unseen (out-of-vocabulary, or OOV) at training time. This setting is known as the inductive setting and is especially problematic for factorization-based models, which rely on encoding only those users/items seen at training time with fixed parameter vectors. Many existing solutions applied in practice are often naive, such as assigning OOV users/items to random buckets. In this work, we tackle this problem and propose approaches that better leverage available user/item features to improve OOV handling at the embedding table level. We discuss general-purpose plug-and-play approaches that are easily applicable to most RS models and improve inductive performance without negatively impacting transductive model performance. We extensively evaluate 9 OOV embedding methods on 5 models across 4 datasets (spanning different domains). One of these datasets is a proprietary production dataset from a prominent RS employed by a large social platform serving hundreds of millions of daily active users. In our experiments, we find that several proposed methods that exploit feature similarity using LSH consistently outperform alternatives on most model-dataset combinations, with the best method showing a mean improvement of 3.74% over the industry standard baseline in inductive performance. We release our code and hope our work helps practitioners make more informed decisions when handling OOV for their RS and further inspires academic research into improving OOV support in RS.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (66)
  1. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https://www.tensorflow.org/ Software available from tensorflow.org.
  2. Jean-Philippe Aumasson and Daniel J Bernstein. 2012. SipHash: a fast short-input PRF. In International Conference on Cryptology in India. Springer, 489–508.
  3. Enriching word vectors with subword information. Transactions of the association for computational linguistics 5 (2017), 135–146.
  4. LightGCL: Simple Yet Effective Graph Contrastive Learning for Recommendation. arXiv preprint arXiv:2302.08191 (2023).
  5. Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems. 7–10.
  6. Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the twentieth annual symposium on Computational geometry. 253–262.
  7. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. https://doi.org/10.48550/ARXIV.1810.04805
  8. Combining similarity and sentiment in opinion mining for product recommendation. Journal of Intelligent Information Systems 46, 2 (2016), 285–312.
  9. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
  10. Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning. PMLR, 1126–1135.
  11. Learning to collide: Recommendation system model compression with learned hash functions. arXiv preprint arXiv:2203.15837 (2022).
  12. Similarity search in high dimensions via hashing. In Vldb, Vol. 99. 518–529.
  13. Accelerating large-scale inference with anisotropic vector quantization. In International Conference on Machine Learning. PMLR, PMLR, Cambridge, MA, USA, 3887–3896.
  14. F Maxwell Harper and Joseph A Konstan. 2015. The movielens datasets: History and context. Acm transactions on interactive intelligent systems (tiis) 5, 4 (2015), 1–19.
  15. Neural collaborative filtering. In Proceedings of the 26th international conference on world wide web. 173–182.
  16. Tuneup: A training strategy for improving generalization of graph neural networks. arXiv preprint arXiv:2210.14843 (2022).
  17. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management. 2333–2338.
  18. A critical study on data leakage in recommender system offline evaluation. ACM Transactions on Information Systems 41, 3 (2023), 1–27.
  19. Billion-scale similarity search with gpus. IEEE Transactions on Big Data 7, 3 (2019), 535–547.
  20. Learning to embed categorical features without embedding tables for recommendation. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 840–850.
  21. Yehuda Koren. 2008. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. 426–434.
  22. Matrix factorization techniques for recommender systems. Computer 42, 8 (2009), 30–37.
  23. Addressing cold-start problem in recommendation systems. In Proceedings of the 2nd International Conference on Ubiquitous Information Management and Communication (Suwon, Korea) (ICUIMC ’08). Association for Computing Machinery, New York, NY, USA, 208–211. https://doi.org/10.1145/1352793.1352837
  24. Daniel Lee and H Sebastian Seung. 2000. Algorithms for non-negative matrix factorization. Advances in neural information processing systems 13 (2000).
  25. xdeepfm: Combining explicit and implicit feature interactions for recommender systems. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 1754–1763.
  26. Facing the cold start problem in recommender systems. Expert systems with applications 41, 4 (2014), 2065–2073.
  27. Amazon. com recommendations: Item-to-item collaborative filtering. IEEE Internet computing 7, 1 (2003), 76–80.
  28. Monolith: real time recommendation system with collisionless embedding table. arXiv preprint arXiv:2209.07663 (2022).
  29. Deep learning models for representing out-of-vocabulary words. In Brazilian Conference on Intelligent Systems. Springer, 418–434.
  30. Multi-level out-of-vocabulary words handling approach. Knowledge-Based Systems 251 (2022), 108911.
  31. Between words and characters: a brief history of open-vocabulary modeling and tokenization in nlp. arXiv preprint arXiv:2112.10508 (2021).
  32. John Moody. 1988. Fast learning in multi-resolution hierarchies. Advances in neural information processing systems 1 (1988).
  33. Rasmus Pagh and Flemming Friche Rodler. 2004. Cuckoo hashing. Journal of Algorithms 51, 2 (2004), 122–144.
  34. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019), 1–12.
  35. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.
  36. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532–1543.
  37. Deep contextualized word representations. In Proceedings of NAACL-HLT. 2227–2237.
  38. BPR: Bayesian personalized ranking from implicit feedback. arXiv preprint arXiv:1205.2618 (2012).
  39. Dropedge: Towards deep graph convolutional networks on node classification. arXiv preprint arXiv:1907.10903 (2019).
  40. Graph neural networks for friend ranking in large-scale social platforms. In Proceedings of the Web Conference 2021. 2535–2546.
  41. Markus Schedl. 2016. The lfm-1b dataset for music retrieval and recommendation. In Proceedings of the 2016 ACM on international conference on multimedia retrieval. 103–110.
  42. Situating Recommender Systems in Practice: Towards Inductive Learning and Incremental Updates. arXiv preprint arXiv:2211.06365 (2022).
  43. Shalin Shah. 2023. A Survey of Latent Factor Models for Recommender Systems and Personalization. Authorea Preprints (2023).
  44. Embedding Based Retrieval in Friend Recommendation. (2023).
  45. Aixin Sun. 2023a. On Challenges of Evaluating Recommender Systems in an Offline Setting. In Proceedings of the 17th ACM Conference on Recommender Systems. 1284–1285.
  46. Aixin Sun. 2023b. Take a Fresh Look at Recommender Systems from an Evaluation Standpoint. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2629–2638.
  47. Learning to hash with graph neural networks for recommender systems. In Proceedings of The Web Conference 2020. 1988–1998.
  48. Large-Scale Representation Learning on Graphs via Bootstrapping. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, Virtual, 1–18. https://openreview.net/forum?id=0UXT6PpRpW
  49. A meta-learning perspective on cold-start recommendations for items. Advances in neural information processing systems 30 (2017).
  50. Dropoutnet: Addressing cold start in recommender systems. Advances in neural information processing systems 30 (2017).
  51. Towards representation alignment and uniformity in collaborative filtering. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1816–1825.
  52. Preference-Adaptive Meta-Learning for Cold-Start Recommendation.. In IJCAI. 1607–1614.
  53. Deep & cross network for ad click predictions. In Proceedings of the ADKDD’17. 1–7.
  54. Dcn v2: Improved deep & cross network and practical lessons for web-scale learning to rank systems. In Proceedings of the web conference 2021. 1785–1797.
  55. Tongzhou Wang and Phillip Isola. 2020. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In International Conference on Machine Learning. PMLR, 9929–9939.
  56. Feature hashing for large scale multitask learning. In Proceedings of the 26th annual international conference on machine learning. 1113–1120.
  57. Graph-based Alignment and Uniformity for Recommendation. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. 4395–4399.
  58. XSimGCL: Towards extremely simple graph contrastive learning for recommendation. IEEE Transactions on Knowledge and Data Engineering (2023).
  59. Model size reduction using frequency based double hashing for recommender systems. In Proceedings of the 14th ACM Conference on Recommender Systems. 521–526.
  60. Discrete collaborative filtering. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 325–334.
  61. An efficient recommender system using locality sensitive hashing. (2018).
  62. Learning from counterfactual links for link prediction. In International Conference on Machine Learning. PMLR, PMLR, Cambridge, MA, 26911–26926.
  63. RecBole 2.0: towards a more up-to-date recommendation library. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 4722–4726.
  64. Recbole: Towards a unified, comprehensive and efficient framework for recommendation algorithms. In proceedings of the 30th acm international conference on information & knowledge management. 4653–4664.
  65. Cold brew: Distilling graph node representations with incomplete or missing neighborhoods. arXiv preprint arXiv:2111.04840 (2021).
  66. Deep Graph Contrastive Representation Learning. CoRR abs/2006.04131 (2020), 1–17. arXiv:2006.04131 https://arxiv.org/abs/2006.04131

Summary

We haven't generated a summary for this paper yet.