Papers
Topics
Authors
Recent
Search
2000 character limit reached

Retrieval and Distill: A Temporal Data Shift-Free Paradigm for Online Recommendation System

Published 24 Apr 2024 in cs.IR and cs.AI | (2404.15678v4)

Abstract: Current recommendation systems are significantly affected by a serious issue of temporal data shift, which is the inconsistency between the distribution of historical data and that of online data. Most existing models focus on utilizing updated data, overlooking the transferable, temporal data shift-free information that can be learned from shifting data. We propose the Temporal Invariance of Association theorem, which suggests that given a fixed search space, the relationship between the data and the data in the search space keeps invariant over time. Leveraging this principle, we designed a retrieval-based recommendation system framework that can train a data shift-free relevance network using shifting data, significantly enhancing the predictive performance of the original model in the recommendation system. However, retrieval-based recommendation models face substantial inference time costs when deployed online. To address this, we further designed a distill framework that can distill information from the relevance network into a parameterized module using shifting data. The distilled model can be deployed online alongside the original model, with only a minimal increase in inference time. Extensive experiments on multiple real datasets demonstrate that our framework significantly improves the performance of the original model by utilizing shifting data.

Authors (4)
Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. Revisiting Label Smoothing and Knowledge Distillation Compatibility: What was Missing?. In International Conference on Machine Learning. PMLR, 2890–2916.
  2. Wide & deep learning for recommender systems. In 1st DLRS workshop. 7–10.
  3. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM conference on recommender systems. 191–198.
  4. Revisiting deep learning models for tabular data. Advances in Neural Information Processing Systems 34 (2021), 18932–18943.
  5. Knowledge distillation: A survey. International Journal of Computer Vision 129 (2021), 1789–1819.
  6. Deep multifaceted transformers for multi-objective ranking in large-scale e-commerce recommender systems. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2493–2500.
  7. Deepfm: a factorization-machine based neural network for ctr prediction. IJCAI (2017).
  8. Lightgcn: Simplifying and powering graph convolution network for recommendation. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval. 639–648.
  9. Neural collaborative filtering. In WWW. 173–182.
  10. Session-based recommendations with recurrent neural networks. arXiv preprint arXiv:1511.06939 (2015).
  11. Parallel recurrent neural network architectures for feature-rich session-based recommendations. In Proceedings of the 10th ACM conference on recommender systems. 241–248.
  12. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).
  13. Field-aware factorization machines for CTR prediction. In Proceedings of the 10th ACM conference on recommender systems. 43–50.
  14. Topology distillation for recommender system. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 829–839.
  15. Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recommendation. In 2018 IEEE international conference on data mining (ICDM). IEEE, 197–206.
  16. An Incremental Learning framework for Large-scale CTR Prediction. In Proceedings of the 16th ACM Conference on Recommender Systems. 490–493.
  17. Fi-gnn: Modeling feature interactions via graph neural networks for ctr prediction. In CIKM.
  18. AutoFIS: Automatic Feature Interaction Selection in Factorization Models for Click-Through Rate Prediction. KDD (2020).
  19. A convolutional click prediction model. In Proceedings of the 24th ACM international on conference on information and knowledge management. 1743–1746.
  20. Deep learning recommendation model for personalization and recommendation systems. arXiv preprint arXiv:1906.00091 (2019).
  21. Learning an adaptive meta model-generator for incrementally updating recommender systems. In Proceedings of the 15th ACM Conference on Recommender Systems. 411–421.
  22. Mary Phuong and Christoph Lampert. 2019. Towards understanding knowledge distillation. In International conference on machine learning. PMLR, 5142–5151.
  23. Practice on long sequential user behavior modeling for click-through rate prediction. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2671–2679.
  24. Search-based User Interest Modeling with Lifelong Sequential Behavior Data for Click-Through Rate Prediction. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.
  25. Retrieval & Interaction Machine for Tabular Data Prediction. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 1379–1389.
  26. User Behavior Retrieval for Click-Through Rate Prediction. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval.
  27. Product-based neural networks for user response prediction over multi-field categorical data. TOIS 37, 1 (2018), 1–35.
  28. Lifelong Sequential Modeling with Personalized Memorization for User Response Prediction. SIGIR.
  29. Steffen Rendle. 2010. Factorization machines. In 2010 IEEE International conference on data mining. IEEE, 995–1000.
  30. Okapi at TREC-3. Nist Special Publication Sp 109 (1995), 109.
  31. Deep crossing: Web-scale modeling without manually crafted combinatorial features. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 255–262.
  32. BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. In Proceedings of the 28th ACM international conference on information and knowledge management. 1441–1450.
  33. Understanding and improving knowledge distillation. arXiv preprint arXiv:2002.03532 (2020).
  34. Jiaxi Tang and Ke Wang. 2018. Personalized top-n sequential recommendation via convolutional sequence embedding. In Proceedings of the eleventh ACM international conference on web search and data mining. 565–573.
  35. Attention is all you need. Advances in neural information processing systems 30 (2017).
  36. Deep & cross network for ad click predictions. In ADKDD. 1–7.
  37. Dcn v2: Improved deep & cross network and practical lessons for web-scale learning to rank systems. In Proceedings of the web conference 2021. 1785–1797.
  38. Kgat: Knowledge graph attention network for recommendation. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 950–958.
  39. Neural graph collaborative filtering. In Proceedings of the 42nd international ACM SIGIR conference on Research and development in Information Retrieval. 165–174.
  40. Self-supervised graph learning for recommendation. In Proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval. 726–735.
  41. Cross-task knowledge distillation in multi-task recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 4318–4326.
  42. Nearest neighbor knowledge distillation for neural machine translation. arXiv preprint arXiv:2205.00479 (2022).
  43. Future gradient descent for adapting the temporal shifting data distribution in online recommendation systems. In Uncertainty in Artificial Intelligence. PMLR, 2256–2266.
  44. A simple convolutional generative network for next item recommendation. In Proceedings of the twelfth ACM international conference on web search and data mining. 582–590.
  45. Retrieval-Enhanced Machine Learning. arXiv preprint arXiv:2205.01230 (2022).
  46. Decoupled Non-Parametric Knowledge Distillation for end-to-End Speech Translation. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1–5.
  47. Dense Representation Learning and Retrieval for Tabular Data Prediction. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 3559–3569.
  48. Deep interest evolution network for click-through rate prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 5941–5948.
  49. Deep interest network for click-through rate prediction. In KDD.
  50. ReLoop2: Building Self-Adaptive Recommendation Models via Responsive Error Compensation Loop. arXiv preprint arXiv:2306.08808 (2023).
  51. Ensembled CTR prediction via knowledge distillation. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2941–2958.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.