Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

51 tokens/sec

GPT-4o

60 tokens/sec

Gemini 2.5 Pro Pro

44 tokens/sec

o3 Pro

8 tokens/sec

GPT-4.1 Pro

50 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

144

Fine-tuning Large Language Models with Sequential Instructions (2403.07794v3)

Published 12 Mar 2024 in cs.CL

Abstract: Despite the success of existing instruction-tuned models, we find that they usually struggle to respond to queries with multiple instructions. This impairs their performance in complex problems whose solution consists of multiple intermediate tasks. Thus, we contend that part of the fine-tuning data mixture should be sequential--containing a chain of interrelated tasks. We first approach sequential instruction tuning from a task-driven perspective, manually creating interpretable intermediate tasks for multilingual and visual question answering: namely "translate then predict" and "caption then answer". Next, we automate this process by turning instructions in existing datasets (e.g., Alpaca and FlanCoT) into diverse and complex sequential instructions, making our method general-purpose. Models that underwent our sequential instruction tuning show improved results in coding, maths, and open-ended generation. Moreover, we put forward a new benchmark named SeqEval to evaluate a model's ability to follow all the instructions in a sequence, which further corroborates the benefits of our fine-tuning method. We hope that our endeavours will open new research avenues on instruction tuning for complex tasks.

PDF HTML Abstract

Fine-Tuning LLMs with Sequential Instructions: A Comprehensive Overview

The paper "Fine-Tuning LLMs with Sequential Instructions" addresses a significant challenge in the capabilities of LLMs—the ability to follow and process a sequence of instructions in a single query. Traditional instruction datasets often contain straightforward, singular tasks, which limit models from navigating multi-step interactions effectively. The authors introduce a novel methodology termed Sequential Instruction Tuning (SIT), designed to enhance the models' competence in executing multiple tasks sequentially, a critical need for complex downstream tasks involving reasoning, multilingual, and multimodal scenarios.

Sequential Instruction Tuning and Its Implications

The central contribution of this research is the SIT paradigm, which broadens the scope of instruction tuning to encompass sequential sub-task executions. This method augments existing instruction datasets by interspersing tasks with intermediary steps, no longer requiring additional human annotations—a significant advantage in accelerating model training processes. For instance, intermediate tasks like translation or image captioning provide a composed step-by-step reasoning framework, facilitating improved LLM performance in cross-lingual and cross-modal tasks.

The paper details numerical results demonstrating SIT's superior performance over conventional instruction tuning. Noteworthy improvements are observed across various benchmarks: a +6% improvement on CommonsenseQA, a +17% boost for the XQuAD multilingual task, and a +2.1% enhancement in visual question answering tasks such as VQA and GQA. These outcomes underscore SIT's efficacy in enhancing both the instruction-following capabilities of LLMs and their downstream task performance, further confirmed by the paper's qualitative analyses.

Methodological Extensions and Evaluation

The SIT approach is experimentally validated using prominent LLMs, including LLaMA-2 70B and Mixtral-8×7B, fine-tuned on diversified datasets containing both genuine and synthetic intermediate tasks. The authors extend existing datasets (e.g., Alpaca) by concatenating additional tasks, which are then amended with corresponding outputs. This procedural innovation supports a broader array of tasks such as reasoning and cross-lingual processing even under unseen task conditions.

As part of their comprehensive evaluation, the authors demonstrate the robustness of SIT models when prompted with unseen templates and varying input lengths. The SIT models maintain high sequential task accuracy even when intermediate task steps are varied or when additional tasks are introduced during testing. These adaptability features indicate that SIT models generalize beyond their trained settings, confirming their utility in real-world applications requiring flexible and complex task executions.

Theoretical and Practical Implications

Theoretically, the introduction of SIT extends the conceptual framework and interpretation of instruction tuning by highlighting the importance of task order and intermediate processing steps in multi-step reasoning. It suggests a new dimension in LLM training that involves task chaining, which could be pivotal for future explorations into cognitive aspects of LLM behavior.

Practically, SIT enhances the applicability of LLMs in environments where complex, sequential decision-making is required, such as virtual assistants, autonomous systems, and multilingual conversational agents. This improved instruction-handling capability can potentially reduce human intervention, enabling more autonomous task executions and responses driven by structured curricular learning.

Future Directions

Given its promising results, future work could explore further diversification of intermediate tasks beyond those demonstrated in the paper, such as more intricate dummy tasks or context-specific intermediate sub-tasks. Additionally, integrating SIT with multilingual and multimodal datasets could provide transformative insights into scalable LLM applications.

In summary, this research advances the LLM field by proposing a targeted mechanism for sequential instruction execution, heralding a step forward in model efficiency, generalization, and applicability in increasingly complex computational tasks.

PDF Markdown Bookmark Chat (Pro)

References (45)

Authors (4)

Hanxu Hu (9 papers)
Pinzhen Chen (27 papers)
Edoardo M. Ponti (24 papers)
Simon Yu (14 papers)

Citations (10)

View on Semantic Scholar

Tweets

https://twitter.com/PontiEdoardo/status/1768213996073603243

https://twitter.com/huhanxu1/status/1804581001202708711

https://twitter.com/PontiEdoardo/status/1843618663766872525

https://twitter.com/abhii_298/status/1805262543549497489