In the paper discussed, the focus is on optimizing LLMs to better follow instructions. LLMs, although capable of handling a variety of unseen tasks when guided by textual instructions, have room for improvement in their ability to follow those instructions accurately. Two current approaches to instruction-following enhancements have been identified: Scaling-Inputs, which increases the number of input-output pairs per task, and Scaling Input-Free Tasks, which collects instructions that do not require additional inputs. However, Scaling-Inputs can lead to LLMs that are overly sensitive and misinterpret instructions, while Scaling Input-Free Tasks tends to produce less effective models for handling tasks with supplementary inputs.
To address these issues, the researchers introduce an alternative methodology termed 'Scaling Tasks per Input.' This new approach focuses on diversifying the tasks related to each input, thereby training models to give various outputs based on specific instructions for the same input. The newly curated dataset, MUFFIN (Multi-Faceted Instruction), is the first to employ this Scale Tasks per Input paradigm.
The implementation of MUFFIN involves two key challenges: designing a variety of tasks for the same input and balancing classification and generation task types within the dataset. The paper outlines two strategies to meet these challenges. The first strategy, Instruction Brainstorm, involves using LLMs to generate multiple task instructions relevant to different facets of an input (e.g., language, length, intent). The second strategy, Instruction Rematching, repurposes high-quality instructions from existing datasets by determining their relevance to specific inputs.
Researchers experimented with LLMs trained using MUFFIN and compared their performance against models trained with prior datasets. The evaluation was carried out on four zero-shot benchmarks. The results illustrated that LLMs trained on MUFFIN exhibited enhanced instruction-following capabilities on three of the four benchmarks, outperforming models trained with earlier methods. MUFFIN's effectiveness in improving LLMs' abilities to adhere to complex instructions was further affirmed through comprehensive human evaluation and additional analyses.
In summary, this work contributes to the field by introducing a novel paradigm for constructing instruction-following datasets, which fosters the generation of instruction-diverse responses from LLMs. The proposed MUFFIN dataset represents a significant advancement in training LLMs for real-world tasks that require precise adherence to given instructions.