Dataset Distillation via Difficulty-Aligned Trajectory Matching
The paper "Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matching" introduces a novel approach to dataset distillation aimed at ensuring synthetic datasets perform on par with their larger real counterparts. The authors propose an algorithm based on trajectory matching with a strategic focus on the stage of training trajectories to align the difficulty of generated patterns with the size of the synthetic dataset.
Overview
Dataset Distillation (DD) seeks to condense large datasets into smaller synthetic versions, maintaining similar performance metrics across models trained on either dataset. Existing trajectory-matching approaches have demonstrated efficacy when the synthetic dataset is minimal. However, performance drops sharply as the size of the synthetic set increases—an issue this paper addresses directly.
The authors introduce a difficulty-aligned trajectory matching strategy, which empirically determines that early trajectories, containing "easy patterns," benefit low-cardinality synthetic sets. Conversely, larger synthetic datasets profit from late trajectories, with "hard patterns," because they can encompass complex and outlier data patterns effectively.
Methodological Approach
The algorithm systematically matches synthetic datasets with segments of trajectories from expert models trained on full datasets, adjusting the sample range dynamically. Key components of the method include:
- Range Setting: Trajectory sections are divided into early and late parts with fine tuning based on synthetic set size.
- Sequential Generation: Initially focusing on easy patterns to stabilize the synthesis process, gradually moving towards incorporating harder patterns, especially as data capacity increases.
- Soft Label Optimization: Incorporates learning from logits and uses them to initialize soft labels, optimized throughout the distillation to ensure comprehensive learning.
Results
Experiments conducted on CIFAR-10, CIFAR-100, and Tiny ImageNet datasets demonstrate that aligning trajectory difficulty with data size facilitates performance that equals or exceeds that achieved by using the full dataset. Notably, the authors report achieving lossless distillation, where synthetic datasets distilled to one-fifth or one-tenth the size of their originals showed no performance degradation.
Impact and Future Directions
This work has significant implications for training efficiency, data storage economies, and practical model deployment where data constraints are prevalent. The approach also enhances adaptability across unseen architectures, a notable advancement in cross-architecture generalization.
Future research could explore extending this method to further reduce computational costs, improve generalization even more across broader architectures, and ultimately adapt the approach to larger datasets without losing efficacy. Additional examination into the effects of varying soft label initialization strategies and employing different backbone networks for distillation might offer deeper insights and further improvements.
The paper makes strides in addressing the core challenge of dataset distillation by proposing an insightful alignment between sample complexity and data size, providing a foundation for future innovations in the domain.