Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 87 tok/s
Gemini 2.5 Pro 51 tok/s Pro
GPT-5 Medium 16 tok/s
GPT-5 High 18 tok/s Pro
GPT-4o 104 tok/s
GPT OSS 120B 459 tok/s Pro
Kimi K2 216 tok/s Pro
2000 character limit reached

ProRefine: Inference-time Prompt Refinement with Textual Feedback (2506.05305v1)

Published 5 Jun 2025 in cs.CL, cs.AI, and cs.LG

Abstract: Agentic workflows, where multiple AI agents collaborate to accomplish complex tasks like reasoning or planning, are becoming increasingly prevalent. However, these workflows often suffer from error propagation and sub-optimal performance, largely due to poorly designed prompts that fail to effectively guide individual agents. This is a critical problem because it limits the reliability and scalability of these powerful systems. We introduce ProRefine, an innovative inference-time prompt optimization method that leverages textual feedback from LLMs to address this challenge. ProRefine dynamically refines prompts for multi-step reasoning tasks without additional training or ground truth labels. Evaluated on five benchmark mathematical reasoning datasets, ProRefine significantly surpasses zero-shot Chain-of-Thought baselines by 3 to 37 percentage points. This approach not only boosts accuracy but also allows smaller models to match the performance of larger ones, highlighting its potential for efficient and scalable AI deployment, and democratizing access to high-performing AI.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper introduces ProRefine, an inference-time prompt optimization method that iteratively refines prompts using LLM textual feedback.
  • Experimental results show performance improvements of 3 to 37 percentage points across five benchmarks, enabling smaller models to rival larger ones.
  • The approach enhances AI reasoning and efficiency by leveraging a tri-partite LLM framework, suggesting promising directions for feedback-driven adaptation.

ProRefine: Inference-time Prompt Refinement with Textual Feedback

The advancement of agentic workflows in AI systems has brought attention to the challenges inherent in multi-agent collaborations, particularly those involving tasks such as reasoning and planning. These workflows often struggle with error propagation and inefficiencies due to suboptimal prompting. The introduction of ProRefine addresses this critical issue by proposing an inference-time prompt optimization technique leveraging LLMs for textual feedback.

Key Concepts and Methodology

ProRefine operates at inference-time to refine prompts, significantly enhancing performance without additional training data or ground truth labels. The methodology leverages three distinct roles within the LLM framework: LLMtaskLLM_{task}, responsible for the task execution; LLMfeedbackLLM_{feedback}, which critiques the LLM outputs; and LLMoptimizerLLM_{optimizer}, which uses feedback to modify the prompt iteratively. This tri-partite system enhances the reasoning capabilities of LLMs, showing improvements on tasks traditionally challenging for zero-shot approaches.

The process involves initializing a prompt, generating output limited by a set number of tokens, receiving textual critique from a feedback model, and incrementally refining the prompt through a dedicated optimizer. This adaptive workflow demonstrates the ability to match and occasionally surpass larger model performances with smaller models, optimizing computational resources.

Experimental Results and Analysis

ProRefine was tested across five benchmark datasets: object counting, word sorting, GSM8K, SVAMP, and AQUARAT. The results indicated substantial performance improvements ranging from 3 to 37 percentage points over the zero-shot Chain-of-Thought (CoT) baselines. The method notably excelled in object counting and GSM8K, showcasing the strength of iterative refinement in reasoning tasks. Furthermore, the innovative approach allowed smaller models, typically less capable, to rival the performance of their larger counterparts, demonstrating potential for democratizing access to high-performance AI models.

The results also highlighted the importance of verifier accuracy at inference-time, with further experiments using an optimal verifier yielding the best performance in most cases. The upper bound performance achieved with such an optimal verifier showcases the necessity of precise feedback mechanisms within the agentic framework.

Implications and Future Directions

Practically, ProRefine offers a promising approach to inferring optimized prompts and enhancing AI systems' efficiency, particularly in resource-constrained environments. The theoretical implications involve a shift towards more granular, nuanced learning and adaptation processes within AI modeling, potentially enriching LLM's interpretability through feedback transparency.

Future research could explore further optimization of feedback granularity, real-time adaptation, and integration of smaller models as feedback agents. Additionally, refining the implementation of verifiers tailored for specific tasks could enhance performance reliability and transparency.

Conclusion

ProRefine's approach to inference-time prompt refinement introduces a novel adaptive mechanism to enhance multi-step reasoning in AI systems. By leveraging the dynamic feedback of LLMs, it fosters both theoretical and practical advancements in AI deployment, offering a valuable alternative to data-intensive fine-tuning and contributing to scalable, efficient AI solutions. Its success in bridging the performance gap between model sizes suggests significant potential for future applications and developments in AI systems.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.