Hansel: Output Length Controlling Framework for Large Language Models (2412.14033v1)

Published 18 Dec 2024 in cs.CL and cs.LG

Abstract: Despite the great success of LLMs, efficiently controlling the length of the output sequence still remains a challenge. In this paper, we propose Hansel, an efficient framework for length control in LLMs without affecting its generation ability. Hansel utilizes periodically outputted hidden special tokens to keep track of the remaining target length of the output sequence. Together with techniques to avoid abrupt termination of the output, this seemingly simple method proved to be efficient and versatile, while not harming the coherency and fluency of the generated text. The framework can be applied to any pre-trained LLMs during the finetuning stage of the model, regardless of its original positional encoding method. We demonstrate this by finetuning four different LLMs with Hansel and show that the mean absolute error of the output sequence decreases significantly in every model and dataset compared to the prompt-based length control finetuning. Moreover, the framework showed a substantially improved ability to extrapolate to target lengths unseen during finetuning, such as long dialog responses or extremely short summaries. This indicates that the model learns the general means of length control, rather than learning to match output lengths to those seen during training.

Summary

The paper proposes the Hansel framework that inserts hidden tokens during generation to track and control output length.
The framework significantly reduces length deviation and maintains high quality, as measured by ROUGE and G-Eval metrics.
Hansel’s versatile design enables integration with any pre-trained LLM, enhancing applications like summarization and dialogue generation.

Overview of "Hansel: Output Length Controlling Framework for LLMs"

LLMs have been highly successful in generating coherent and fluent text. However, controlling the length of the output sequence in a precise manner remains a challenge. Seoha Song, Junhyun Lee, and Hyeonmok Ko address this issue in their paper by proposing an innovative framework called Hansel, designed to facilitate efficient length control in LLMs without compromising their text generation capabilities.

Hansel Framework

The Hansel framework introduces a method by which LLMs can track the remaining target length of the output sequence using hidden special tokens inserted periodically during the generation process. The model finetunes pre-trained LLMs with these tokens, enabling it to maintain coherence and fluency while controlling the length effectively. The simplicity of the framework lies in its ability to integrate with any pre-trained LLM during the finetuning stage, regardless of the model's original positional encoding method.

Empirical Validation

The authors validate Hansel's efficacy by applying it to four different LLMs and datasets, demonstrating a significant reduction in the mean absolute error (MAE) of the output sequence length compared to prompt-based length control finetuning methods. A substantial improvement is observed in the model's ability to extrapolate to target lengths that were not encountered during training, such as long dialog responses or very short summaries. This highlights that the model learns a general means of length control rather than simply mimicking output lengths seen during training.

Results and Comparison

The Hansel framework outperforms traditional prompt-based finetuning methods by providing robust results across various target lengths. Importantly, it maintains high output quality measured through ROUGE scores and G-Eval metrics, ensuring that the generated text remains coherent, consistent, fluent, and relevant. The Hansel augmentation of datasets is relatively straightforward, involving the insertion of special tokens that signal the remaining length of the desired output, which the model can then use as guidance during generation.

Implications and Future Developments

Practically, the ability to control output length can enhance applications such as news summarization, where precise control over the level of detail is crucial, or voice assistants, which require different amounts of information based on user interaction. Theoretically, Hansel opens new avenues for further research into more sophisticated length controlling mechanisms, which could incorporate additional constraints and more complex dependencies.

In future work, developments could focus on refining the Hansel framework for even more efficient integration with LLMs, exploring its applications in other domains, and enhancing its extrapolation capabilities. The versatility of the Hansel framework highlights its potential for widespread adoption in various AI applications where length control is a significant requirement.

In conclusion, the Hansel framework presents a viable solution to the challenge of output length control in LLMs, offering a novel approach that maintains the quality and coherence of text generation while providing improved length control across different tasks and domains.