Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Agent4Ranking: Semantic Robust Ranking via Personalized Query Rewriting Using Multi-agent LLM (2312.15450v1)

Published 24 Dec 2023 in cs.IR

Abstract: Search engines are crucial as they provide an efficient and easy way to access vast amounts of information on the internet for diverse information needs. User queries, even with a specific need, can differ significantly. Prior research has explored the resilience of ranking models against typical query variations like paraphrasing, misspellings, and order changes. Yet, these works overlook how diverse demographics uniquely formulate identical queries. For instance, older individuals tend to construct queries more naturally and in varied order compared to other groups. This demographic diversity necessitates enhancing the adaptability of ranking models to diverse query formulations. To this end, in this paper, we propose a framework that integrates a novel rewriting pipeline that rewrites queries from various demographic perspectives and a novel framework to enhance ranking robustness. To be specific, we use Chain of Thought (CoT) technology to utilize LLMs as agents to emulate various demographic profiles, then use them for efficient query rewriting, and we innovate a robust Multi-gate Mixture of Experts (MMoE) architecture coupled with a hybrid loss function, collectively strengthening the ranking models' robustness. Our extensive experimentation on both public and industrial datasets assesses the efficacy of our query rewriting approach and the enhanced accuracy and robustness of the ranking model. The findings highlight the sophistication and effectiveness of our proposed model.

An Analytical Overview of "Agent4Ranking: Semantic Robust Ranking via Personalized Query Rewriting Using Multi-agent LLM"

This paper addresses a pressing concern in information retrieval systems — the need to enhance the robustness and effectiveness of search engine ranking models in handling semantically similar queries that are reformulated by users of different demographic backgrounds. The authors present a novel framework, Agent4Ranking, which incorporates a query rewriting pipeline leveraging LLMs and a robust model architecture integrated with a hybrid loss function.

Core Components and Methodology

The paper introduces a two-stage methodology. The initial stage involves employing LLMs to simulate various demographic profiles for query rewriting. This aims to tackle the query diversity problem that arises when different demographic groups express identical information needs differently. Leveraging Chain of Thought (CoT) technology, LLMs operate as agents to rewrite original queries into multiple variants. Rigorous query evaluation is performed to correct potential semantic deviations — a common challenge associated with LLMs hallucinating irrelevant outputs.

In the subsequent stage, the researchers develop a robust Multi-gate Mixture of Experts (MMoE) architecture. This is complemented by a hybrid loss function incorporating Jensen-Shannon divergence to ensure consistent ranking outcomes across the spectrum of rewritten queries. The MMoE setup dynamically identifies and capitalizes on shared semantic content inherent in the varied queries produced by the LLM agents, thus enhancing the ranking model's resilience.

Experimental Evaluation

Extensive experimental validation was conducted using both public and industrial datasets. The results demonstrate the efficacy of the proposed LLM-driven rewriting approach and the robustness enhancements achieved by the novel ranking model framework. Quantitative assessments reveal that incorporating demographic-aware query rewriting and MMoE architecture yields consistent ranking performance across semantically similar input variations.

Implications and Future Directions

The integration of diverse demographic perspectives via LLMs, as demonstrated in Agent4Ranking, highlights significant implications for personalized information retrieval. This approach opens avenues for employing LLMs in multi-agent settings to tackle domain-specific challenges in search engines and beyond. The robust MMoE architecture, coupled with targeted loss functions, contributes to a broader understanding of how robustness in ranking models can be systematically achieved and measured.

Future research could focus on refining the LLM agent frameworks to further minimize hallucination effects and improve prompt engineering strategies. Additionally, the application of such robust ranking systems can be explored in other domains, such as recommender systems or conversational agents, where the same semantic variations present similar robustness challenges.

Overall, the proposed research enriches the domain of information retrieval by integrating cutting-edge language generation techniques with robust modeling strategies, steering towards robust and adaptable search systems capable of handling the nuanced diversity of user inputs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. A. Ntoulas, J. Cho, and C. Olston, “What’s new on the web? the evolution of the web from a search engine perspective,” in Proceedings of the 13th international conference on World Wide Web, 2004, pp. 1–12.
  2. G. Salton, A. Wong, and C.-S. Yang, “A vector space model for automatic indexing,” Communications of the ACM, vol. 18, no. 11, pp. 613–620, 1975.
  3. S. E. Robertson and K. S. Jones, “Relevance weighting of search terms,” Journal of the American Society for Information science, vol. 27, no. 3, pp. 129–146, 1976.
  4. Z. Dai and J. Callan, “Deeper text understanding for ir with contextual neural language modeling,” in Proceedings of the 42nd international ACM SIGIR conference on research and development in information retrieval, 2019, pp. 985–988.
  5. Z. Dai, C. Xiong, J. Callan, and Z. Liu, “Convolutional neural networks for soft-matching n-grams in ad-hoc search,” in Proceedings of the eleventh ACM international conference on web search and data mining, 2018, pp. 126–134.
  6. J. Guo, Y. Fan, Q. Ai, and W. B. Croft, “A deep relevance matching model for ad-hoc retrieval,” in Proceedings of the 25th ACM international on conference on information and knowledge management, 2016, pp. 55–64.
  7. J.-C. Gu, T. Li, Q. Liu, Z.-H. Ling, Z. Su, S. Wei, and X. Zhu, “Speaker-aware bert for multi-turn response selection in retrieval-based chatbots,” in Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 2020, pp. 2041–2044.
  8. X. Ma, J. Guo, R. Zhang, Y. Fan, Y. Li, and X. Cheng, “B-prop: bootstrapped pre-training with representative words prediction for ad-hoc retrieval,” in Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021, pp. 1513–1522.
  9. M. Dash and H. Liu, “Consistency-based search in feature selection,” Artificial intelligence, vol. 151, no. 1-2, pp. 155–176, 2003.
  10. G. Penha, A. Câmara, and C. Hauff, “Evaluating the robustness of retrieval pipelines with query variation generators,” in European conference on information retrieval.   Springer, 2022, pp. 397–412.
  11. D. Campos, C. Zhai, and A. Magnani, “Noise-robust dense retrieval via contrastive alignment post training,” arXiv e-prints, pp. arXiv–2304, 2023.
  12. S. Zhuang and G. Zuccon, “Characterbert and self-teaching for improving the robustness of dense retrievers on queries with typos,” in Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2022, pp. 1444–1454.
  13. CNNIC, “Statistical report on internet development in china,” https://www.cnnic.net.cn/n4/2022/0401/c88-1125.html.
  14. P. Jia, Y. Liu, X. Zhao, X. Li, C. Hao, S. Wang, and D. Yin, “Mill: Mutual verification with large language models for zero-shot query expansion,” arXiv preprint arXiv:2310.19056, 2023.
  15. J. Hao, Y. Liu, X. Fan, S. Gupta, S. Soltan, R. Chada, P. Natarajan, C. Guo, and G. Tür, “Cgf: Constrained generation framework for query rewriting in conversational ai,” in Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track, 2022, pp. 475–483.
  16. L. Gao, X. Ma, J. Lin, and J. Callan, “Precise zero-shot dense retrieval without relevance labels,” arXiv preprint arXiv:2212.10496, 2022.
  17. L. Wang, N. Yang, and F. Wei, “Query2doc: Query expansion with large language models,” arXiv preprint arXiv:2303.07678, 2023.
  18. T. Shen, G. Long, X. Geng, C. Tao, T. Zhou, and D. Jiang, “Large language models are strong zero-shot retriever,” arXiv preprint arXiv:2304.14233, 2023.
  19. M. Alaofi, L. Gallagher, M. Sanderson, F. Scholer, and P. Thomas, “Can generative llms create query variants for test collections? an exploratory study,” in Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023, pp. 1869–1873.
  20. R. Jagerman, H. Zhuang, Z. Qin, X. Wang, and M. Bendersky, “Query expansion by prompting large language models,” arXiv preprint arXiv:2305.03653, 2023.
  21. W. Yu, D. Iter, S. Wang, Y. Xu, M. Ju, S. Sanyal, C. Zhu, M. Zeng, and M. Jiang, “Generate rather than retrieve: Large language models are strong context generators,” arXiv preprint arXiv:2209.10063, 2022.
  22. G. Izacard and E. Grave, “Leveraging passage retrieval with generative models for open domain question answering,” arXiv preprint arXiv:2007.01282, 2020.
  23. X. Ma, Y. Gong, P. He, H. Zhao, and N. Duan, “Query rewriting for retrieval-augmented large language models,” arXiv preprint arXiv:2305.14283, 2023.
  24. A. Anand, A. Anand, V. Setty et al., “Query understanding in the age of large language models,” arXiv preprint arXiv:2306.16004, 2023.
  25. N. Reimers and I. Gurevych, “Sentence-bert: Sentence embeddings using siamese bert-networks,” arXiv preprint arXiv:1908.10084, 2019.
  26. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
  27. Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, “Roberta: A robustly optimized bert pretraining approach,” arXiv preprint arXiv:1907.11692, 2019.
  28. Y. Sun, S. Wang, Y. Li, S. Feng, H. Tian, H. Wu, and H. Wang, “Ernie 2.0: A continual pre-training framework for language understanding,” in Proceedings of the AAAI conference on artificial intelligence, vol. 34, no. 05, 2020, pp. 8968–8975.
  29. J. Ma, Z. Zhao, X. Yi, J. Chen, L. Hong, and E. H. Chi, “Modeling task relationships in multi-task learning with multi-gate mixture-of-experts,” in Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, 2018, pp. 1930–1939.
  30. N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Gesmundo, M. Attariyan, and S. Gelly, “Parameter-efficient transfer learning for nlp,” in International Conference on Machine Learning.   PMLR, 2019, pp. 2790–2799.
  31. R. Wang, D. Tang, N. Duan, Z. Wei, X. Huang, G. Cao, D. Jiang, M. Zhou et al., “K-adapter: Infusing knowledge into pre-trained models with adapters,” arXiv preprint arXiv:2002.01808, 2020.
  32. X. Li, F. Yan, X. Zhao, Y. Wang, B. Chen, H. Guo, and R. Tang, “Hamur: Hyper adapter for multi-domain recommendation,” in Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, 2023, pp. 1268–1277.
  33. P. Tasawong, W. Ponwitayarat, P. Limkonchotiwat, C. Udomcharoenchaikit, E. Chuangsuwanich, and S. Nutanong, “Typo-robust representation learning for dense retrieval,” arXiv preprint arXiv:2306.10348, 2023.
  34. S. Robertson, H. Zaragoza et al., “The probabilistic relevance framework: Bm25 and beyond,” Foundations and Trends® in Information Retrieval, vol. 3, no. 4, pp. 333–389, 2009.
  35. L. Xiong, C. Xiong, Y. Li, K.-F. Tang, J. Liu, P. Bennett, J. Ahmed, and A. Overwijk, “Approximate nearest neighbor negative contrastive learning for dense text retrieval,” arXiv preprint arXiv:2007.00808, 2020.
  36. C. Wu, R. Zhang, J. Guo, Y. Fan, and X. Cheng, “Are neural ranking models robust?” ACM Transactions on Information Systems, vol. 41, no. 2, pp. 1–36, 2022.
  37. N. Thakur, N. Reimers, A. Rücklé, A. Srivastava, and I. Gurevych, “Beir: A heterogenous benchmark for zero-shot evaluation of information retrieval models,” arXiv preprint arXiv:2104.08663, 2021.
  38. J. Feng, C. Tao, X. Geng, T. Shen, C. Xu, G. Long, D. Zhao, and D. Jiang, “Knowledge refinement via interaction between search engines and large language models,” arXiv preprint arXiv:2305.07402, 2023.
  39. I. Mackie, S. Chatterjee, and J. Dalton, “Generative and pseudo-relevant feedback for sparse, dense and learned sparse retrieval,” arXiv preprint arXiv:2305.07477, 2023.
  40. K. Srinivasan, K. Raman, A. Samanta, L. Liao, L. Bertelli, and M. Bendersky, “Quill: Query intent with large language models using retrieval augmentation and multi-stage distillation,” arXiv preprint arXiv:2210.15718, 2022.
  41. S. Lupart and S. Clinchant, “A study on fgsm adversarial training for neural retrieval,” in European Conference on Information Retrieval.   Springer, 2023, pp. 484–492.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Xiaopeng Li (166 papers)
  2. Lixin Su (15 papers)
  3. Pengyue Jia (22 papers)
  4. Xiangyu Zhao (192 papers)
  5. Suqi Cheng (17 papers)
  6. Junfeng Wang (175 papers)
  7. Dawei Yin (165 papers)
Citations (10)
X Twitter Logo Streamline Icon: https://streamlinehq.com