Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
92 tokens/sec
Gemini 2.5 Pro Premium
46 tokens/sec
GPT-5 Medium
19 tokens/sec
GPT-5 High Premium
32 tokens/sec
GPT-4o
87 tokens/sec
DeepSeek R1 via Azure Premium
98 tokens/sec
GPT OSS 120B via Groq Premium
435 tokens/sec
Kimi K2 via Groq Premium
207 tokens/sec
2000 character limit reached

PMPO: Probabilistic Metric Prompt Optimization for Small and Large Language Models (2505.16307v1)

Published 22 May 2025 in cs.CL, cs.AI, and cs.LG

Abstract: Prompt optimization offers a practical and broadly applicable alternative to fine-tuning for improving LLM performance. However, existing methods often rely on costly output generation, self-critiquing abilities, or human-annotated preferences, which limit their scalability, especially for smaller or non-instruction-tuned models. We introduce PMPO (Probabilistic Metric Prompt Optimization), a unified framework that refines prompts using token-level cross-entropy loss as a direct, lightweight evaluation signal. PMPO identifies low-quality prompt segments by masking and measuring their impact on loss, then rewrites and selects improved variants by minimizing loss over positive and negative examples. Unlike prior methods, it requires no output sampling or human evaluation during optimization, relying only on forward passes and log-likelihoods. PMPO supports both supervised and preference-based tasks through a closely aligned loss-based evaluation strategy. Experiments show that PMPO consistently outperforms prior methods across model sizes and tasks: it achieves the highest average accuracy on BBH, performs strongly on GSM8K and AQUA-RAT, and improves AlpacaEval 2.0 win rates by over 19 points. These results highlight PMPO's effectiveness, efficiency, and broad applicability.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.