Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MA4DIV: Multi-Agent Reinforcement Learning for Search Result Diversification (2403.17421v2)

Published 26 Mar 2024 in cs.IR and cs.AI

Abstract: The objective of search result diversification (SRD) is to ensure that selected documents cover as many different subtopics as possible. Existing methods primarily utilize a paradigm of "greedy selection", i.e., selecting one document with the highest diversity score at a time. These approaches tend to be inefficient and are easily trapped in a suboptimal state. In addition, some other methods aim to approximately optimize the diversity metric, such as $\alpha$-NDCG, but the results still remain suboptimal. To address these challenges, we introduce Multi-Agent reinforcement learning (MARL) for search result DIVersity, which called MA4DIV. In this approach, each document is an agent and the search result diversification is modeled as a cooperative task among multiple agents. This approach allows for directly optimizing the diversity metrics, such as $\alpha$-NDCG, while achieving high training efficiency. We conducted preliminary experiments on public TREC datasets to demonstrate the effectiveness and potential of MA4DIV. Considering the limited number of queries in public TREC datasets, we construct a large-scale dataset from industry sources and show that MA4DIV achieves substantial improvements in both effectiveness and efficiency than existing baselines on a industrial scale dataset.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Diversifying search results. In Proceedings of the second ACM international conference on web search and data mining. 5–14.
  2. Jaime Carbonell and Jade Goldstein. 1998. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. 335–336.
  3. Ben Carterette. 2011. An analysis of NP-completeness in novelty and diversity ranking. Information Retrieval 14 (2011), 89–106.
  4. Intent-based diversification of web search results: metrics and algorithms. Information Retrieval 14 (2011), 572–592.
  5. Ptde: Personalized training with distillated execution for multi-agent reinforcement learning. arXiv preprint arXiv:2210.08872 (2022).
  6. Commander-Soldiers Reinforcement Learning for Cooperative Multi-Agent Systems. In 2022 International Joint Conference on Neural Networks (IJCNN). IEEE, 1–7.
  7. Novelty and diversity in information retrieval evaluation. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. 659–666.
  8. Rémi Coulom. 2006. Efficient selectivity and backup operators in Monte-Carlo tree search. In International conference on computers and games. Springer, 72–83.
  9. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
  10. From greedy selection to exploratory decision-making: Diverse ranking with policy-value networks. In The 41st international ACM SIGIR conference on research & development in information retrieval. 125–134.
  11. Sreenivas Gollapudi and Aneesh Sharma. 2009. An axiomatic approach for result diversification. In Proceedings of the 18th international conference on World wide web. 381–390.
  12. Shengbo Guo and Scott Sanner. 2010. Probabilistic latent maximal marginal relevance. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. 833–834.
  13. Combining implicit and explicit topic representations for result diversification. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval. 851–860.
  14. Rethinking the implementation tricks and monotonicity constraint in cooperative multi-agent reinforcement learning. arXiv e-prints (2021), arXiv–2102.
  15. Search result diversification based on hierarchical intents. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. 63–72.
  16. Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In International conference on machine learning. PMLR, 1188–1196.
  17. Enhancing diversity, coverage and balance for summarization through structure learning. In Proceedings of the 18th international conference on World wide web. 71–80.
  18. Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in neural information processing systems 30 (2017).
  19. Lilyana Mihalkova and Raymond Mooney. 2009. Learning to disambiguate search queries from short sessions. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 111–127.
  20. Frans A Oliehoek and Christopher Amato. 2016. A concise introduction to decentralized POMDPs. Springer.
  21. Setrank: Learning a permutation-invariant ranking model for information retrieval. In Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval. 499–508.
  22. Martin L Puterman. 2014. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons.
  23. GDESA: Greedy Diversity Encoder with Self-attention for Search Results Diversification. ACM Transactions on Information Systems 41, 2 (2023), 1–36.
  24. Learning diverse rankings with multi-armed bandits. In Proceedings of the 25th international conference on Machine learning. 784–791.
  25. Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In International conference on machine learning. PMLR, 4295–4304.
  26. Exploiting query reformulations for web search result diversification. In Proceedings of the 19th international conference on World wide web. 881–890.
  27. Ashudeep Singh and Thorsten Joachims. 2019. Policy learning for fairness in ranking. Advances in neural information processing systems 32 (2019).
  28. Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296 (2017).
  29. Attention is all you need. Advances in neural information processing systems 30 (2017).
  30. Qplex: Duplex dueling multi-agent q-learning. arXiv preprint arXiv:2008.01062 (2020).
  31. Rode: Learning roles to decompose multi-agent tasks. arXiv preprint arXiv:2010.01523 (2020).
  32. Reinforcement learning to rank with Markov decision process. In Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval. 945–948.
  33. Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning 8 (1992), 229–256.
  34. Learning maximal marginal relevance model via directly optimizing diversity evaluation measures. In Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval. 113–122.
  35. Modeling document novelty with neural tensor network for search result diversification. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 395–404.
  36. Adapting Markov decision process for search result diversification. In Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval. 535–544.
  37. Reinforcement learning to rank with pairwise policy gradient. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 509–518.
  38. Directly optimize diversity evaluation measures: A new approach to search result diversification. ACM Transactions on Intelligent Systems and Technology (TIST) 8, 3 (2017), 1–26.
  39. Diversification-aware learning to rank using distributed representation. In Proceedings of the Web Conference 2021. 127–136.
  40. RLPer: A reinforcement learning model for personalized search. In Proceedings of The Web Conference 2020. 2298–2308.
  41. The surprising effectiveness of ppo in cooperative multi-agent games. Advances in Neural Information Processing Systems 35 (2022), 24611–24624.
  42. Hai-Tao Yu. 2022. Optimize What You Evaluate With: Search Result Diversification Based on Metric Optimization. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 10399–10407.
  43. Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In Acm sigir forum, Vol. 49. ACM New York, NY, USA, 2–9.
  44. Efficient cooperation strategy generation in multi-agent video games via hypergraph neural network. arXiv preprint arXiv:2203.03265 (2022).
  45. Inducing stackelberg equilibrium through spatio-temporal sequential decision-making in multi-agent reinforcement learning. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence. 353–361.
  46. Stackelberg decision transformer for asynchronous action coordination in multi-agent systems. arXiv preprint arXiv:2305.07856 (2023).
  47. Jianghong Zhou and Eugene Agichtein. 2020. Rlirank: Learning to rank with reinforcement learning for dynamic search. In Proceedings of The Web Conference 2020. 2842–2848.
  48. Learning for search result diversification. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. 293–302.
  49. MarlRank: Multi-agent reinforced learning to rank. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2073–2076.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com