Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matching (2310.05773v2)

Published 9 Oct 2023 in cs.CV

Abstract: The ultimate goal of Dataset Distillation is to synthesize a small synthetic dataset such that a model trained on this synthetic set will perform equally well as a model trained on the full, real dataset. Until now, no method of Dataset Distillation has reached this completely lossless goal, in part due to the fact that previous methods only remain effective when the total number of synthetic samples is extremely small. Since only so much information can be contained in such a small number of samples, it seems that to achieve truly loss dataset distillation, we must develop a distillation method that remains effective as the size of the synthetic dataset grows. In this work, we present such an algorithm and elucidate why existing methods fail to generate larger, high-quality synthetic sets. Current state-of-the-art methods rely on trajectory-matching, or optimizing the synthetic data to induce similar long-term training dynamics as the real data. We empirically find that the training stage of the trajectories we choose to match (i.e., early or late) greatly affects the effectiveness of the distilled dataset. Specifically, early trajectories (where the teacher network learns easy patterns) work well for a low-cardinality synthetic set since there are fewer examples wherein to distribute the necessary information. Conversely, late trajectories (where the teacher network learns hard patterns) provide better signals for larger synthetic sets since there are now enough samples to represent the necessary complex patterns. Based on our findings, we propose to align the difficulty of the generated patterns with the size of the synthetic dataset. In doing so, we successfully scale trajectory matching-based methods to larger synthetic datasets, achieving lossless dataset distillation for the very first time. Code and distilled datasets are available at https://gzyaftermath.github.io/DATM.

Dataset Distillation via Difficulty-Aligned Trajectory Matching

The paper "Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matching" introduces a novel approach to dataset distillation aimed at ensuring synthetic datasets perform on par with their larger real counterparts. The authors propose an algorithm based on trajectory matching with a strategic focus on the stage of training trajectories to align the difficulty of generated patterns with the size of the synthetic dataset.

Overview

Dataset Distillation (DD) seeks to condense large datasets into smaller synthetic versions, maintaining similar performance metrics across models trained on either dataset. Existing trajectory-matching approaches have demonstrated efficacy when the synthetic dataset is minimal. However, performance drops sharply as the size of the synthetic set increases—an issue this paper addresses directly.

The authors introduce a difficulty-aligned trajectory matching strategy, which empirically determines that early trajectories, containing "easy patterns," benefit low-cardinality synthetic sets. Conversely, larger synthetic datasets profit from late trajectories, with "hard patterns," because they can encompass complex and outlier data patterns effectively.

Methodological Approach

The algorithm systematically matches synthetic datasets with segments of trajectories from expert models trained on full datasets, adjusting the sample range dynamically. Key components of the method include:

  1. Range Setting: Trajectory sections are divided into early and late parts with fine tuning based on synthetic set size.
  2. Sequential Generation: Initially focusing on easy patterns to stabilize the synthesis process, gradually moving towards incorporating harder patterns, especially as data capacity increases.
  3. Soft Label Optimization: Incorporates learning from logits and uses them to initialize soft labels, optimized throughout the distillation to ensure comprehensive learning.

Results

Experiments conducted on CIFAR-10, CIFAR-100, and Tiny ImageNet datasets demonstrate that aligning trajectory difficulty with data size facilitates performance that equals or exceeds that achieved by using the full dataset. Notably, the authors report achieving lossless distillation, where synthetic datasets distilled to one-fifth or one-tenth the size of their originals showed no performance degradation.

Impact and Future Directions

This work has significant implications for training efficiency, data storage economies, and practical model deployment where data constraints are prevalent. The approach also enhances adaptability across unseen architectures, a notable advancement in cross-architecture generalization.

Future research could explore extending this method to further reduce computational costs, improve generalization even more across broader architectures, and ultimately adapt the approach to larger datasets without losing efficacy. Additional examination into the effects of varying soft label initialization strategies and employing different backbone networks for distillation might offer deeper insights and further improvements.

The paper makes strides in addressing the core challenge of dataset distillation by proposing an insightful alignment between sample complexity and data size, providing a foundation for future innovations in the domain.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Ziyao Guo (9 papers)
  2. Kai Wang (624 papers)
  3. George Cazenavette (11 papers)
  4. Hui Li (1004 papers)
  5. Kaipeng Zhang (73 papers)
  6. Yang You (173 papers)
Citations (42)