Papers
Topics
Authors
Recent
Search
2000 character limit reached

Information-Preserving Reformulation of Reasoning Traces for Antidistillation

Published 13 Oct 2025 in cs.CL | (2510.11545v1)

Abstract: Recent advances in LLMs show that extending the length of reasoning chains significantly improves performance on complex tasks. While revealing these reasoning traces helps users better follow, verify, and learn from the model's problem-solving process, it also makes them highly vulnerable to unauthorized distillation. To mitigate this risk, proprietary model providers often adopt aggressive protection strategies, such as replacing detailed reasoning with brief summaries, which deprive users of valuable intermediate information. To address this trade-off, we propose PART, an information-preserving antidistillation reformulation of reasoning traces. Motivated by the difference between how humans understand reasoning traces and how LLMs exploit them for supervised fine-tuning, we design a simple but effective two-step reformulation: removing self-talk behaviors and reordering sub-conclusions. A small auxiliary model is trained to perform this reformulation, incurring minimal computational overhead. Extensive experiments demonstrate that PART consistently disrupts distillation across student models of different sizes and types on various reasoning benchmarks. For instance, when training on reformulated traces, even the performance of a large 32B student model decreases from 54.17 to 46.88 on AIME 2024, corresponding to a 13.5% degradation.

Summary

  • The paper introduces PART, a method that reformulates reasoning traces to impair unauthorized distillation while preserving critical information.
  • It employs token-level adjustments by removing self-talk tokens and structural reordering techniques to disrupt gradient-based learning in student models.
  • Extensive experiments demonstrate that student models experience significant performance degradation, including a 13.5% drop on the AIME benchmark.

Information-Preserving Reformulation of Reasoning Traces for Antidistillation

The paper, "Information-Preserving Reformulation of Reasoning Traces for Antidistillation" (2510.11545), discusses a novel approach named PART (Preserving Antidistillation Reformulation of Traces) to protect the proprietary intellectual property encapsulated within the reasoning traces generated by LLMs. This method reformulates these traces to impede unauthorized distillation while retaining their informative content for human comprehension.

Introduction to PART

PART targets a critical issue faced by proprietary model providers: the vulnerability of detailed reasoning traces to illicit knowledge distillation by student models. Conventional methods such as providing summaries or entirely omitting traces strip away valuable information. PART employs a calculated reformulation at both the token and structural levels, where reasoning traces retain their informative value for users but lose efficacy for distillation processes. Figure 1

Figure 1: Overview of PART. Directly exposing original reasoning traces leaves them vulnerable to unauthorized distillation, whereas providing only summaries deprives users of the information contained in the reasoning process.

Reformulation Techniques

Token-Level Reformulation

The PART methodology begins with the removal of self-talk behaviors from tokens. These low-probability tokens (e.g., "Hmm," "Wait") typically have a significant impact on gradient updates during Supervised Fine-Tuning (SFT). By eliminating these tokens, PART disrupts the gradient information vital to the distillation process without degrading understanding for human users. Figure 2

Figure 2

Figure 2: Predicted probabilities of the student model on teacher-generated reasoning traces highlighting low-probability tokens.

Structural-Level Reformulation

At the sequence level, PART rearranges the reasoning trace structure by placing sub-conclusions before the corresponding reasoning steps. This reordering exploits a key divergence in cognitive processing between humans and LLMs, where humans can readily adapt to non-sequential structuring unlike models, which are disrupted by deviation from the linear logical flow. Figure 3

Figure 3

Figure 3: (a) Match ratios under different lexical similarity score thresholds. PART achieves significantly higher match ratios than summary methods. (b) Human judgment shows PART is informatively preferable.

Quality and Performance Metrics

The quality of reformulated traces is evaluated through lexical and semantic similarity measures, as well as human judgment assessments. PART produces high lexical and semantic match ratios, indicating substantial information preservation. Human evaluations further endorse the informativeness of PART-generated traces compared to summary-based approaches.

Extensive distillation experiments reveal significant performance degradation for student models trained on PART-reformulated data as opposed to original traces. Across diverse benchmarks such as mathematical problem solving and coding, the efficacy of PART as an antidistillation mechanism is evident. Notably, even a substantial 32B model witnessed a performance degradation on AIME 2024 from 54.17 to 46.88, marking a 13.5% decrease. Figure 4

Figure 4

Figure 4: Performance comparison shows consistent performance degradation of models trained on reformulated traces across data scales.

Implementation of PART

Compact Reformulation Model

To streamline the deployment of PART in real-world settings, a compact reformulation model is developed. Fine-tuned on a paired dataset of original and GPT-4o generated reformulated traces, this model offers efficient processing with minimal computational overhead.

Robustness and Scale Considerations

PART's efficacy is tested across varying data scales and model sizes, displaying robust performance in degrading distillation effectiveness. The framework also incorporates detectability features, akin to watermarking, enabling discernment of PART-reformulated data through significant alterations in keyword frequency distribution.

Conclusion

PART introduces a practical and efficient methodology for safeguarding proprietary reasoning traces against unauthorized distillation. By focusing on preserving the informational content while undermining the distillation potential, PART bridges the gap between protecting model IP and maintaining user accessibility to reasoning processes. Future work in this domain could explore enhancing detectability and optimizing reformulation strategies to further thwart sophisticated distillation attempts.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.