Multi-Conditional Ranking with Large Language Models (2404.00211v3)
Abstract: Utilizing LLMs to rank a set of items has become a common approach in recommendation and retrieval systems. Typically, these systems focus on ordering a substantial number of documents in a monotonic order based on a given query. However, real-world scenarios often present a different challenge: ranking a comparatively smaller set of items, but according to a variety of diverse and occasionally conflicting conditions. In this paper, we define and explore the task of multi-conditional ranking by introducing MCRank, a benchmark tailored for assessing multi-conditional ranking across various item types and conditions. Our analysis of LLMs using MCRank indicates a significant decrease in performance as the number and complexity of items and conditions grow. To overcome this limitation, we propose a novel decomposed reasoning method, consisting of EXtracting and Sorting the conditions, and then Iteratively Ranking the items (EXSIR). Our extensive experiments show that this decomposed reasoning method enhances LLMs' performance significantly, achieving up to a 14.4% improvement over existing LLMs. We also provide a detailed analysis of LLMs performance across various condition categories, and examine the effectiveness of decomposition step. Furthermore, we compare our method with existing approaches such as Chain-of-Thought and existing ranking models, demonstrating the superiority of our approach and complexity of MCR task. We released our dataset and code.
- Self-rag: Learning to retrieve, generate, and critique through self-reflection. arXiv preprint arXiv:2310.11511, 2023.
- Cross-age reference coding for age-invariant face recognition and retrieval. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VI 13, pp. 768–783. Springer, 2014.
- Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
- Trec complex answer retrieval overview. In TREC, 2017.
- Drop: A reading comprehension benchmark requiring discrete reasoning over paragraphs. arXiv preprint arXiv:1903.00161, 2019.
- Successive prompting for decomposing complex questions. arXiv preprint arXiv:2212.04092, 2022.
- T-rex: A large scale alignment of natural language with knowledge base triples. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), 2018.
- Large language model based multi-agents: A survey of progress and challenges. arXiv preprint arXiv:2402.01680, 2024.
- Large language models are zero-shot rankers for recommender systems. In European Conference on Information Retrieval, pp. 364–381. Springer, 2024.
- Mistral 7b. arXiv preprint arXiv:2310.06825, 2023.
- Colbert: Efficient and effective passage search via contextualized late interaction over bert. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, pp. 39–48, 2020.
- Chatgpt: Jack of all trades, master of none. Information Fusion, 99:101861, 2023.
- Can language models understand physical concepts? arXiv preprint arXiv:2305.14057, 2023.
- Ms marco: A human-generated machine reading comprehension dataset. 2016.
- Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp. 188–197, 2019.
- Multi-stage document ranking with bert. arXiv preprint arXiv:1910.14424, 2019.
- OpenAI. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
- The art of socratic questioning: Recursive thinking with large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp. 4177–4199, 2023.
- Large language models are effective text rankers with pairwise ranking prompting. arXiv preprint arXiv:2306.17563, 2023.
- Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250, 2016.
- Are emergent abilities of large language models a mirage? Advances in Neural Information Processing Systems, 36, 2024.
- Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:2206.04615, 2022.
- Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:24824–24837, 2022.
- Large language models are better reasoners with self-verification. arXiv preprint arXiv:2212.09561, 2022.
- A survey on large language models for recommendation. arXiv preprint arXiv:2305.19860, 2023.
- The rise and potential of large language model based agents: A survey. arXiv preprint arXiv:2309.07864, 2023.
- Mave: A product dataset for multi-source attribute value extraction. In Proceedings of the fifteenth ACM international conference on web search and data mining, pp. 1256–1265, 2022.
- Tree of thoughts: Deliberate problem solving with large language models. Advances in Neural Information Processing Systems, 36, 2024.
- Evaluating recommender systems: survey and framework. ACM computing surveys, 55(8):1–38, 2022.
- Large language models for information retrieval: A survey. arXiv preprint arXiv:2308.07107, 2023.
- Open-source large language models are strong zero-shot query likelihood models for document ranking. arXiv preprint arXiv:2310.13243, 2023a.
- A setwise approach for effective and highly efficient zero-shot ranking with large language models. arXiv preprint arXiv:2310.09497, 2023b.