LLM See, LLM Do: Guiding Data Generation to Target Non-Differentiable Objectives (2407.01490v2)

Published 1 Jul 2024 in cs.CL, cs.AI, and cs.LG

Abstract: The widespread adoption of synthetic data raises new questions about how models generating the data can influence other LLMs via distilled data. To start, our work exhaustively characterizes the impact of passive inheritance of model properties by systematically studying the consequences of synthetic data integration. We provide one of the most comprehensive studies to-date of how the source of synthetic data shapes models' internal biases, calibration and generations' textual attributes and preferences. We find that models are surprisingly sensitive towards certain attributes even when the synthetic data prompts appear "neutral". which invites the question whether this sensitivity can be exploited for good. Our findings invite the question can we explicitly steer the models towards the properties we want at test time by exploiting the data generation process? This would have historically been considered infeasible due to the cost of collecting data with a specific characteristic or objective in mind. However, improvement in the quality of synthetic data, as well as a shift towards general-purpose models designed to follow a diverse way of instructions, means this question is timely. We propose active inheritance as a term to describe intentionally constraining synthetic data according to a non-differentiable objective. We demonstrate how active inheritance can steer the generation profiles of models towards desirable non-differentiable attributes, e.g. high lexical diversity or low toxicity.

PDF HTML Abstract

Guiding Data Generation to Target Non-Differentiable Objectives: An Expert Overview

The paper "LLM See, LLM Do: Guiding Data Generation to Target Non-Differentiable Objectives" explores synthetizing data for LLMs with the intent of modifying model behaviors and characteristics toward specific, non-differentiable attributes such as textual length, diversity, and toxicity. In their paper, the authors offer a comprehensive examination of the implications when employing synthetic data to train LLMs. They propose methodologies for both passive and active inheritance of desired model attributes, providing detailed insights into the ramifications of these approaches.

Key Findings and Methodologies

Passive Inheritance:
- The authors explore the latent impact of synthetic data on model characteristics without explicit manipulation of the data attributes. They systematically profile the behaviors of LLMs trained with synthetic data derived from various teacher models using metrics such as social biases, textual complexity, and calibration errors.
- Significant variances were observed in the model behaviors post data distillation, with passive inheritance leading to non-trivial shifts in attributes even when using "neutral" prompts. For instance, synthetic data distillation led to up to 36% change in social bias scores in different LLM pairs, and toxicity saw a relative increase of up to 40%.
Active Inheritance:
- The authors propose active inheritance as a mechanism to guide the data generation process intentionally towards desired non-differentiable attributes. By leveraging the latent characteristics during data synthesis, the researchers manipulated models to enhance desirable attributes (e.g., increasing lexical diversity and reducing toxicity).
- This approach demonstrated compelling results, such as up to 116% and 43% improvements in generation length and lexical diversity respectively, and toxicity reduction by up to 40%. Sampling strategies involved both single-source and multi-source strategies to maximize impact.

Experimental Design and Metrics

The paper includes a multi-faceted benchmarking system designed to probe the nuanced impacts of synthetic data on LLMs:

Textual Characteristics: Metrics such as token count, readability indexes (Gunning-Fog, Rix), and lexical diversity (MTLD).
Social Bias: Utilized widely recognized benchmarks (StereoSet, CrowS-Pairs, BBQ) to ascertain variations in model stereotypes.
Calibration and Toxicity: Analysis involved examining model uncertainty calibration using datasets like HellaSwag and OpenBookQA, and toxicity measurements were derived using metrics such as Expected Maximum Toxicity and Toxicity Probability from resampled prompts.

Implications and Future Directions

This comprehensive paper elucidates both the risks and potential benefits of using synthetic data for LLM training. From a theoretical perspective:

The passive inheritance experiments underscore the unintentional perpetuation of biases and shifts in model behaviors, highlighting governance considerations in synthetic data usage.
The success of active inheritance showcases a pragmatic approach to model steering, guiding future developments in customizable LLM training methods.

Practically, this work suggests that researchers and developers can utilize synthetic data with greater precision, targeting specific model improvements while being cognizant of unintended attribute propagation. This could be particularly beneficial for domains requiring ethical considerations, such as reducing bias and ensuring safe model outputs.

Speculation on Future Developments

Looking forward, advancements in this area may focus on:

Further refining active inheritance techniques to concurrently optimize for multiple attributes, enhancing the robustness of LLMs across diverse use-cases.
Integrating more sophisticated control mechanisms within LLM training pipelines to dynamically tailor data characteristics in real-time.
Extending the toolkit developed in this paper to broader community usage, fostering collaborative improvements in understanding and mitigating LLM biases.

In conclusion, the paper presents a finely detailed inquiry into leveraging synthetic data to intentionally mold LLM behaviors, fostering both theoretical understanding and practical strategies in model training. These findings resonate deeply within the research community, offering a methodical pathway to inculcating desired attributes while addressing the broader ethical and functional dynamics of AI development.