MUFFIN: Curating Multi-Faceted Instructions for Improving Instruction-Following (2312.02436v3)

Published 5 Dec 2023 in cs.CL and cs.AI

Abstract: In the realm of LLMs, enhancing instruction-following capability often involves curating expansive training data. This is achieved through two primary schemes: i) Scaling-Inputs: Amplifying (input, output) pairs per task instruction, aiming for better instruction adherence. ii) Scaling Input-Free Tasks: Enlarging tasks, each composed of an (instruction, output) pair (without requiring a separate input anymore). However, LLMs under Scaling-Inputs tend to be overly sensitive to inputs, leading to misinterpretation or non-compliance with instructions. Conversely, Scaling Input-Free Tasks demands a substantial number of tasks but is less effective in instruction following when dealing with instances in Scaling-Inputs. This work introduces MUFFIN, a new scheme of instruction-following dataset curation. Specifically, we automatically Scale Tasks per Input by diversifying these tasks with various input facets. Experimental results across four zero-shot benchmarks, spanning both Scaling-Inputs and Scaling Input-Free Tasks schemes, reveal that LLMs, at various scales, trained on MUFFIN generally demonstrate superior instruction-following capabilities compared to those trained on the two aforementioned schemes.

PDF HTML Abstract

In the paper discussed, the focus is on optimizing LLMs to better follow instructions. LLMs, although capable of handling a variety of unseen tasks when guided by textual instructions, have room for improvement in their ability to follow those instructions accurately. Two current approaches to instruction-following enhancements have been identified: Scaling-Inputs, which increases the number of input-output pairs per task, and Scaling Input-Free Tasks, which collects instructions that do not require additional inputs. However, Scaling-Inputs can lead to LLMs that are overly sensitive and misinterpret instructions, while Scaling Input-Free Tasks tends to produce less effective models for handling tasks with supplementary inputs.

To address these issues, the researchers introduce an alternative methodology termed 'Scaling Tasks per Input.' This new approach focuses on diversifying the tasks related to each input, thereby training models to give various outputs based on specific instructions for the same input. The newly curated dataset, MUFFIN (Multi-Faceted Instruction), is the first to employ this Scale Tasks per Input paradigm.

The implementation of MUFFIN involves two key challenges: designing a variety of tasks for the same input and balancing classification and generation task types within the dataset. The paper outlines two strategies to meet these challenges. The first strategy, Instruction Brainstorm, involves using LLMs to generate multiple task instructions relevant to different facets of an input (e.g., language, length, intent). The second strategy, Instruction Rematching, repurposes high-quality instructions from existing datasets by determining their relevance to specific inputs.

Researchers experimented with LLMs trained using MUFFIN and compared their performance against models trained with prior datasets. The evaluation was carried out on four zero-shot benchmarks. The results illustrated that LLMs trained on MUFFIN exhibited enhanced instruction-following capabilities on three of the four benchmarks, outperforming models trained with earlier methods. MUFFIN's effectiveness in improving LLMs' abilities to adhere to complex instructions was further affirmed through comprehensive human evaluation and additional analyses.

In summary, this work contributes to the field by introducing a novel paradigm for constructing instruction-following datasets, which fosters the generation of instruction-diverse responses from LLMs. The proposed MUFFIN dataset represents a significant advancement in training LLMs for real-world tasks that require precise adherence to given instructions.