Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LongDPO: Unlock Better Long-form Generation Abilities for LLMs via Critique-augmented Stepwise Information (2502.02095v2)

Published 4 Feb 2025 in cs.CL

Abstract: Long-form generation is crucial for academic writing papers and repo-level code generation. Despite this, current models, including GPT-4o, still exhibit unsatisfactory performance. Existing methods that utilize preference learning with outcome supervision often fail to provide detailed feedback for extended contexts. This shortcoming can lead to content that does not fully satisfy query requirements, resulting in issues like length deviations, and diminished quality. In this paper, we propose enhancing long-form generation by incorporating process supervision. We employ Monte Carlo Tree Search to gather stepwise preference pairs, utilizing a global memory pool to maintain consistency. To address the issue of suboptimal candidate selection, we integrate external critiques to refine and improve the quality of the preference pairs. Finally, we apply step-level DPO using the collected stepwise preference pairs. Experimental results show that our method improves length and quality on long-form generation benchmarks, with almost lossless performance on general benchmarks across various model backbones.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Bowen Ping (5 papers)
  2. Jiali Zeng (24 papers)
  3. Fandong Meng (174 papers)
  4. Shuo Wang (382 papers)
  5. Jie Zhou (687 papers)
  6. Shanghang Zhang (173 papers)

Summary

We haven't generated a summary for this paper yet.