APEER: Automatic Prompt Engineering Enhances Large Language Model Reranking (2406.14449v1)

Published 20 Jun 2024 in cs.AI

Abstract: LLMs have significantly enhanced Information Retrieval (IR) across various modules, such as reranking. Despite impressive performance, current zero-shot relevance ranking with LLMs heavily relies on human prompt engineering. Existing automatic prompt engineering algorithms primarily focus on LLMing and classification tasks, leaving the domain of IR, particularly reranking, underexplored. Directly applying current prompt engineering algorithms to relevance ranking is challenging due to the integration of query and long passage pairs in the input, where the ranking complexity surpasses classification tasks. To reduce human effort and unlock the potential of prompt optimization in reranking, we introduce a novel automatic prompt engineering algorithm named APEER. APEER iteratively generates refined prompts through feedback and preference optimization. Extensive experiments with four LLMs and ten datasets demonstrate the substantial performance improvement of APEER over existing state-of-the-art (SoTA) manual prompts. Furthermore, we find that the prompts generated by APEER exhibit better transferability across diverse tasks and LLMs. Code is available at https://github.com/jincan333/APEER.

PDF HTML Abstract

APEER: Automatic Prompt Engineering Enhances LLM Reranking

The paper "APEER: Automatic Prompt Engineering Enhances LLM Reranking" presents a sophisticated approach to improving prompt engineering for LLMs in the context of information retrieval (IR). The focus is on enhancing the reranking module, which plays a crucial role in determining the relevance of information to user queries.

The authors identify a significant gap in the current landscape where zero-shot LLM relevance ranking is overly dependent on manual prompt engineering. This manual process often requires extensive human expertise and time, yet lacks objective guidelines. To address this, they introduce APEER, an automatic prompt engineering algorithm designed to iteratively refine prompts through feedback and preference optimization, thereby reducing the need for human intervention.

APEER utilizes an innovative training process that involves two key optimization steps: feedback optimization and preference optimization. Feedback optimization allows LLMs to generate refined prompts by iteratively reviewing performance-based feedback. Preference optimization, on the other hand, leverages positive and negative prompt demonstrations to fine-tune prompts further, ensuring that they are aligned with desired outcomes. This two-tiered optimization process seeks to unlock the potential of LLMs in reranking tasks by optimizing prompts beyond manual capabilities.

To demonstrate the efficacy of APEER, the authors conducted extensive experiments using four distinct LLM architectures: GPT3.5-Turbo-0301, GPT4-0613, LLaMA3-70B, and Qwen2-72B, across ten datasets. The results showed that APEER consistently improved performance over the state-of-the-art manual prompts. Specifically, in standard benchmarks like TREC-DL19 and TREC-DL20, APEER yielded significant gains in nDCG metrics, demonstrating its ability to enhance search relevance via improved prompt quality.

One of the key insights from the research is the enhanced transferability of APEER-generated prompts across various tasks and LLMs. The findings suggest that prompts optimized via APEER not only perform well in in-domain scenarios but also adapt effectively to out-of-domain tasks and across different model architectures. This cross-utilization underscores the robustness and practical utility of APEER-generated prompts, providing a flexible and scalable solution for varied IR tasks.

The implications of this research are substantial both theoretically and practically. Theoretically, it advances our understanding of automatic prompt engineering and its capacity to enhance LLM-based reranking. Practically, it promises significant efficiency improvements in IR systems, potentially reducing the reliance on manual prompt crafting and enabling more dynamic adaptability of LLMs to evolving data landscapes.

Looking ahead, the development of APEER opens several avenues for future research. These include further exploring the integration of other advanced retrieval systems with APEER, refining the feedback and preference optimization processes, and extending the method to other domains beyond traditional IR settings. The advancement in LLM automatic prompt engineering presented by APEER thus represents a meaningful stride towards more intelligent and autonomous language processing systems.

PDF Markdown Bookmark Chat (Pro)

Authors (10)

Can Jin (9 papers)
Hongwu Peng (27 papers)
Shiyu Zhao (55 papers)
Zhenting Wang (41 papers)
Wujiang Xu (19 papers)
Ligong Han (39 papers)
Jiahui Zhao (20 papers)
Kai Zhong (21 papers)
Sanguthevar Rajasekaran (21 papers)
Dimitris N. Metaxas (84 papers)

Citations (28)

View on Semantic Scholar

Related Papers

Find Related Papers

Tweets

https://twitter.com/_reachsumit/status/1804038774005039561

https://twitter.com/arxivsanitybot/status/1804698973942559175