Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 82 tok/s

Gemini 2.5 Pro 45 tok/s Pro

GPT-5 Medium 25 tok/s Pro

GPT-5 High 36 tok/s Pro

GPT-4o 110 tok/s Pro

Kimi K2 207 tok/s Pro

GPT OSS 120B 469 tok/s Pro

Claude Sonnet 4.5 36 tok/s Pro

2000 character limit reached

Suri: Multi-constraint Instruction Following for Long-form Text Generation (2406.19371v2)

Published 27 Jun 2024 in cs.CL

Abstract: Existing research on instruction following largely focuses on tasks with simple instructions and short responses. In this work, we explore multi-constraint instruction following for generating long-form text. We create Suri, a dataset with 20K human-written long-form texts paired with LLM-generated backtranslated instructions that contain multiple complex constraints. Because of prohibitive challenges associated with collecting human preference judgments on long-form texts, preference-tuning algorithms such as DPO are infeasible in our setting; thus, we propose Instructional ORPO (I-ORPO), an alignment method based on the ORPO algorithm. Instead of receiving negative feedback from dispreferred responses, I-ORPO obtains negative feedback from synthetically corrupted instructions generated by an LLM. Using Suri, we perform supervised and I-ORPO fine-tuning on Mistral-7b-Instruct-v0.2. The resulting models, Suri-SFT and Suri-I-ORPO, generate significantly longer texts (~5K tokens) than base models without significant quality deterioration. Our human evaluation shows that while both SFT and I-ORPO models satisfy most constraints, Suri-I-ORPO generations are generally preferred for their coherent and informative incorporation of the constraints. We release our code at https://github.com/chtmp223/suri.

Citations (8)

View on Semantic Scholar

Summary

The paper presents the Suri dataset and I-ORPO method to enhance LLMs' ability to follow complex, multi-constraint instructions in long-form text generation.
It leverages backtranslation and syntactically corrupted instructions to fine-tune models, resulting in texts averaging over 5,000 tokens with improved coherence.
Human evaluations and ranking experiments demonstrate that Suri-I-ORPO outperforms baseline models by at least 10% in distinguishing correct from corrupted instructions.

Suri: Multi-constraint Instruction Following for Long-form Text Generation

The paper "Suri: Multi-constraint Instruction Following for Long-form Text Generation" presents an in-depth paper on enhancing the instruction-following capabilities of LLMs in the context of generating long-form text. Authored by Chau Minh Pham, Simeng Sun, and Mohit Iyyer from the University of Massachusetts Amherst, the paper explores complex, multi-constraint instruction following—a topic that has been underexplored in the domain of LLM-based text generation.

Overview

The paper introduces Suri, a dataset comprising 20,000 human-written long-form texts accompanied by LLM-generated backtranslated instructions containing multiple complex constraints. The Suri dataset is notable for its combination of long-form outputs (up to 5,024 tokens) and intricate, multi-faceted instructions. This union offers a unique opportunity to fine-tune LLMs to follow complex directives over extended textual spans, a feat that traditional datasets like Alpaca have not achieved.

Methodology

The authors outline the creation and utilization of the Suri dataset through two primary contributions:

Dataset Construction:
- The dataset includes long human-written texts sourced from existing corpora such as ChapterBreak, Books3, and RedPajama-Data-v2.
- Backtranslation techniques are used to generate comprehensive instructions for these texts, followed by the creation of syntactically corrupted instructions to facilitate preference tuning.
Alignment Using I-ORPO:
- The paper introduces Instructional Odds Ratio Preference Optimization (I-ORPO), a variant of the ORPO algorithm.
- I-ORPO uses synthetically corrupted instructions instead of dispreferred responses, which proves to be a robust alignment strategy for LLMs when human preference data on long-form text are impractical to obtain.

Key Findings

Evaluations and Results:

Both Suri-I-ORPO and Suri-SFT (supervised fine-tuning) models produced texts averaging 5,100 and 4,800 tokens respectively, markedly longer than those generated by baseline models such as Mistral-7B-Instruct-v0.2 and Llama-3-8B-Instruct.
Human evaluations suggest a strong preference for Suri-I-ORPO over Suri-SFT due to coherence, informativeness, and readability.
The models maintained low levels of $n$ -gram repetitions, indicating sustained textual quality even with long-form outputs.
Ranking accuracy experiments revealed that Suri-I-ORPO achieved a significant improvement (at least 10%) over baseline models in distinguishing between correct and corrupted instructions.

Implications

Theoretical Implications:

The introduction of multi-constraint instructions combined with extended text generation challenges existing paradigms in LLM fine-tuning and necessitates more complex alignment techniques.
The success of I-ORPO suggests that leveraging corrupted instructions as negative feedback can be a viable approach in lacking human preference data, a scenario often encountered in real-world applications.

Practical Implications:

Suri-I-ORPO's ability to generate coherent long-form text with intricate constraints could significantly benefit industries requiring detailed report generation, creative writing, and comprehensive content creation.
The methods proposed could be adapted for LLMs in other languages and genres, broadening the utility of advanced instruction-following models.

Future Directions

Examining how different LLM architectures respond to fine-tuning using the Suri dataset could yield further insights into model-specific intricacies. Additionally, exploring the influence of surface features, such as instruction length, and varying the degree of constraint violations could refine the I-ORPO method. Lastly, testing these models on shorter-context tasks would help understand any trade-offs associated with optimizing for long-form generation.

Conclusion

"Suri: Multi-constraint Instruction Following for Long-form Text Generation" offers a comprehensive methodology and dataset for enhancing LLM capabilities in following complex instructions over long textual spans. By introducing the Suri dataset and the I-ORPO alignment method, the authors provide valuable contributions to the field of AI-driven text generation, paving the way for more advanced and nuanced LLM applications.