Papers
Topics
Authors
Recent
2000 character limit reached

RL-MPCA: A Reinforcement Learning Based Multi-Phase Computation Allocation Approach for Recommender Systems (2401.01369v1)

Published 27 Dec 2023 in cs.IR, cs.AI, and cs.LG

Abstract: Recommender systems aim to recommend the most suitable items to users from a large number of candidates. Their computation cost grows as the number of user requests and the complexity of services (or models) increases. Under the limitation of computation resources (CRs), how to make a trade-off between computation cost and business revenue becomes an essential question. The existing studies focus on dynamically allocating CRs in queue truncation scenarios (i.e., allocating the size of candidates), and formulate the CR allocation problem as an optimization problem with constraints. Some of them focus on single-phase CR allocation, and others focus on multi-phase CR allocation but introduce some assumptions about queue truncation scenarios. However, these assumptions do not hold in other scenarios, such as retrieval channel selection and prediction model selection. Moreover, existing studies ignore the state transition process of requests between different phases, limiting the effectiveness of their approaches. This paper proposes a Reinforcement Learning (RL) based Multi-Phase Computation Allocation approach (RL-MPCA), which aims to maximize the total business revenue under the limitation of CRs. RL-MPCA formulates the CR allocation problem as a Weakly Coupled MDP problem and solves it with an RL-based approach. Specifically, RL-MPCA designs a novel deep Q-network to adapt to various CR allocation scenarios, and calibrates the Q-value by introducing multiple adaptive Lagrange multipliers (adaptive-$\lambda$) to avoid violating the global CR constraints. Finally, experiments on the offline simulation environment and online real-world recommender system validate the effectiveness of our approach.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Constrained policy optimization. In International conference on machine learning. PMLR, 22–31.
  2. Daniel Adelman and Adam J Mersereau. 2008. Relaxations of weakly coupled stochastic dynamic programs. Operations Research 56, 3 (2008), 712–727.
  3. An optimistic perspective on offline reinforcement learning. In International Conference on Machine Learning. PMLR, 104–114.
  4. Eitan Altman. 1999. Constrained Markov decision processes. Routledge.
  5. PID control system analysis, design, and technology. IEEE transactions on control systems technology 13, 4 (2005), 559–576.
  6. James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. Journal of machine learning research 13, 2 (2012).
  7. Craig Boutilier and Tyler Lu. 2016. Budget allocation using weakly coupled, constrained Markov decision processes. (2016).
  8. Apache flink: Stream and batch processing in a single engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 36, 4 (2015).
  9. BCRLSP: An Offline Reinforcement Learning Framework for Sequential Targeted Promotion. arXiv preprint arXiv:2207.07790 (2022).
  10. A primal-dual approach to constrained markov decision processes. arXiv preprint arXiv:2101.10895 (2021).
  11. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM conference on recommender systems. 191–198.
  12. Unified conversational recommendation policy learning via graph-based reinforcement learning. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1431–1441.
  13. Training with quantization noise for extreme model compression. arXiv preprint arXiv:2004.07320 (2020).
  14. Deep session interest network for click-through rate prediction. arXiv preprint arXiv:1905.06482 (2019).
  15. D4rl: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219 (2020).
  16. Benchmarking batch deep reinforcement learning algorithms. arXiv preprint arXiv:1910.01708 (2019).
  17. Off-policy deep reinforcement learning without exploration. In International conference on machine learning. PMLR, 2052–2062.
  18. DRCGR: Deep reinforcement learning framework incorporating CNN and GAN-based for interactive recommendation. In 2019 IEEE International Conference on Data Mining (ICDM). IEEE, 1048–1053.
  19. A Unified Solution to Constrained Bidding in Online Display Advertising. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2993–3001.
  20. Recommendation systems for e-commerce systems an overview. In Journal of Physics: Conference Series, Vol. 1897. IOP Publishing, 012024.
  21. DCAF: A Dynamic computation resource allocation Framework for Online Serving System. arXiv preprint arXiv:2006.09684 (2020).
  22. Conservative q-learning for offline reinforcement learning. Advances in Neural Information Processing Systems 33 (2020), 1179–1191.
  23. Cross dqn: Cross deep q network for ads allocation in feed. In Proceedings of the ACM Web Conference 2022. 401–409.
  24. Cascade ranking for operational e-commerce search. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1557–1565.
  25. IPO: Interior-point policy optimization under constraints. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 4940–4947.
  26. Solving very large weakly coupled Markov decision processes. In AAAI/IAAI. 165–172.
  27. Human-level control through deep reinforcement learning. nature 518, 7540 (2015), 529–533.
  28. Model compression via distillation and quantization. arXiv preprint arXiv:1802.05668 (2018).
  29. Reuven Y Rubinstein and Dirk P Kroese. 2004. The cross-entropy method: a unified approach to combinatorial optimization, Monte-Carlo simulation, and machine learning. Vol. 133. Springer.
  30. Optimized Cost per Mille in Feeds Advertising. In Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems. 1359–1367.
  31. Reward constrained policy optimization. arXiv preprint arXiv:1805.11074 (2018).
  32. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence, Vol. 30.
  33. Dueling network architectures for deep reinforcement learning. In International conference on machine learning. PMLR, 1995–2003.
  34. Natural evolution strategies. The Journal of Machine Learning Research 15, 1 (2014), 949–980.
  35. Budget constrained bidding by model-free reinforcement learning in display advertising. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 1443–1451.
  36. Hierarchical reinforcement learning for integrated recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 4521–4528.
  37. MoTiAC: Multi-objective actor-critics for real-time bidding. arXiv preprint arXiv:2002.07408 (2020).
  38. Computation Resource Allocation Solution in Recommender Systems. arXiv preprint arXiv:2103.02259 (2021).
  39. Combo: Conservative offline model-based policy optimization. Advances in neural information processing systems 34 (2021), 28954–28967.
  40. BCORLE (λ𝜆\lambdaitalic_λ): An Offline Reinforcement Learning and Evaluation Framework for Coupons Allocation in E-commerce Market. Advances in Neural Information Processing Systems 34 (2021), 20410–20422.
  41. Dear: Deep reinforcement learning for online advertising impression in recommender systems. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 750–758.
  42. Jointly learning to recommend and advertise. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 3319–3327.
  43. Deep interest network for click-through rate prediction. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 1059–1068.
  44. Interactive recommender system via knowledge graph-enhanced reinforcement learning. In Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval. 179–188.
Citations (6)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.