Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Mamba4Rec: Towards Efficient Sequential Recommendation with Selective State Space Models (2403.03900v2)

Published 6 Mar 2024 in cs.IR

Abstract: Sequential recommendation aims to estimate the dynamic user preferences and sequential dependencies among historical user behaviors. Although Transformer-based models have proven to be effective for sequential recommendation, they suffer from the inference inefficiency problem stemming from the quadratic computational complexity of attention operators, especially for long behavior sequences. Inspired by the recent success of state space models (SSMs), we propose Mamba4Rec, which is the first work to explore the potential of selective SSMs for efficient sequential recommendation. Built upon the basic Mamba block which is a selective SSM with an efficient hardware-aware parallel algorithm, we design a series of sequential modeling techniques to further promote model performance while maintaining inference efficiency. Through experiments on public datasets, we demonstrate how Mamba4Rec effectively tackles the effectiveness-efficiency dilemma, outperforming both RNN- and attention-based baselines in terms of both effectiveness and efficiency. The code is available at https://github.com/chengkai-liu/Mamba4Rec.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. Layer normalization. arXiv preprint arXiv:1607.06450 (2016).
  2. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural networks 107 (2018), 3–11.
  3. Albert Gu and Tri Dao. 2023. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752 (2023).
  4. Hippo: Recurrent memory with optimal polynomial projections. Advances in neural information processing systems 33 (2020), 1474–1487.
  5. Efficiently modeling long sequences with structured state spaces. arXiv preprint arXiv:2111.00396 (2021).
  6. Combining recurrent, convolutional, and continuous-time models with linear state space layers. Advances in neural information processing systems 34 (2021), 572–585.
  7. F Maxwell Harper and Joseph A Konstan. 2015. The movielens datasets: History and context. Acm transactions on interactive intelligent systems (tiis) 5, 4 (2015), 1–19.
  8. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
  9. Dan Hendrycks and Kevin Gimpel. 2016. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016).
  10. Session-based recommendations with recurrent neural networks. arXiv preprint arXiv:1511.06939 (2015).
  11. Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recommendation. In 2018 IEEE international conference on data mining (ICDM). IEEE, 197–206.
  12. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
  13. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences 114, 13 (2017), 3521–3526.
  14. Neural attentive session-based recommendation. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. 1419–1428.
  15. MAP: A Model-agnostic Pretraining Framework for Click-through Rate Prediction. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1384–1395.
  16. Recommender system application developments: a survey. Decision support systems 74 (2015), 12–32.
  17. Mega: moving average equipped gated attention. arXiv preprint arXiv:2209.10655 (2022).
  18. Image-based recommendations on styles and substitutes. In Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval. 43–52.
  19. Monika Schak and Alexander Gepperth. 2019. A study on catastrophic forgetting in deep LSTM networks. In Artificial Neural Networks and Machine Learning–ICANN 2019: Deep Learning: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, Part II 28. Springer, 714–728.
  20. Simplified state space layers for sequence modeling. arXiv preprint arXiv:2208.04933 (2022).
  21. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15, 1 (2014), 1929–1958.
  22. BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. In Proceedings of the 28th ACM international conference on information and knowledge management. 1441–1450.
  23. Jiaxi Tang and Ke Wang. 2018. Personalized top-n sequential recommendation via convolutional sequence embedding. In Proceedings of the eleventh ACM international conference on web search and data mining. 565–573.
  24. Attention is all you need. Advances in neural information processing systems 30 (2017).
  25. A survey on session-based recommender systems. ACM Computing Surveys (CSUR) 54, 7 (2021), 1–38.
  26. Multi-behavior sequential transformer recommender. In Proceedings of the 45th international ACM SIGIR conference on research and development in information retrieval. 1642–1652.
  27. Recbole: Towards a unified, comprehensive and efficient framework for recommendation algorithms. In proceedings of the 30th acm international conference on information & knowledge management. 4653–4664.
Citations (7)

Summary

We haven't generated a summary for this paper yet.