Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 57 tok/s

Gemini 2.5 Pro 52 tok/s Pro

GPT-5 Medium 20 tok/s Pro

GPT-5 High 19 tok/s Pro

GPT-4o 93 tok/s Pro

Kimi K2 176 tok/s Pro

GPT OSS 120B 449 tok/s Pro

Claude Sonnet 4.5 35 tok/s Pro

2000 character limit reached

Dataset Distillation using Neural Feature Regression (2206.00719v2)

Published 1 Jun 2022 in cs.LG and cs.CV

Abstract: Dataset distillation aims to learn a small synthetic dataset that preserves most of the information from the original dataset. Dataset distillation can be formulated as a bi-level meta-learning problem where the outer loop optimizes the meta-dataset and the inner loop trains a model on the distilled data. Meta-gradient computation is one of the key challenges in this formulation, as differentiating through the inner loop learning procedure introduces significant computation and memory costs. In this paper, we address these challenges using neural Feature Regression with Pooling (FRePo), achieving the state-of-the-art performance with an order of magnitude less memory requirement and two orders of magnitude faster training than previous methods. The proposed algorithm is analogous to truncated backpropagation through time with a pool of models to alleviate various types of overfitting in dataset distillation. FRePo significantly outperforms the previous methods on CIFAR100, Tiny ImageNet, and ImageNet-1K. Furthermore, we show that high-quality distilled data can greatly improve various downstream applications, such as continual learning and membership inference defense. Please check out our webpage at https://sites.google.com/view/frepo.

Citations (129)

View on Semantic Scholar

Summary

The paper introduces FRePo, a novel method that leverages kernel ridge regression to compute meta-gradients efficiently.
It employs a model pooling strategy that mitigates overfitting by diversifying models across varied initializations and updates.
The method achieves state-of-the-art results with significant reductions in training time and memory usage for dataset distillation.

Analysis of "Dataset Distillation using Neural Feature Regression"

The paper by Zhou, Nezhadarya, and Ba presents a method called Neural Feature Regression with Pooling (FRePo) aimed at addressing the challenges in dataset distillation. Dataset distillation focuses on creating a small synthetic dataset that retains the essential features of a larger original dataset, enabling models trained on the distilled set to achieve comparable performance to those trained on the full dataset. The authors' technique proposes a significant reduction in computational requirements in terms of memory and time, as well as improvements in downstream applications such as continual learning and membership inference defense.

Key Contributions and Methodology

The fundamental challenge in dataset distillation is efficiently computing the meta-gradient, which requires differentiating through the inner loop of the learning algorithm. Traditional unrolled optimization approaches come with high computational costs and potential instability due to issues like truncation and vanishing gradients. In contrast, FRePo leverages kernel-based methods to approximate this optimization step through kernel ridge regression, thus reducing computational overhead.

Several innovative aspects of FRePo distinguish it from previous methods:

Kernel Approximation: By incorporating kernel ridge regression, this approach circumvents the need for unrolling strategies in backpropagation, minimizing memory and computation time significantly. It effectively reformulates the task such that only the final layer weights of a neural network are trained to convergence, allowing the meta-gradient to be computed through a fixed feature extractor and kernel rather than an unrolled optimization path.
Model Pooling Strategy: FRePo introduces a "model pool," which mitigates overfitting to particular learning settings by maintaining a diverse set of models differing in initialization and partially updated states. This approach ensures the distilled data remains generalizable across varying architectures and optimization conditions.
Performance and Computational Efficiency: The method achieves state-of-the-art results across various datasets, including CIFAR100, Tiny ImageNet, and ImageNet subsets, with a claimed reduction by two orders of magnitude in training time and an order of magnitude in memory usage compared to prior work. This improvement in efficiency showcases FRePo's practicality for real-world applications.

Impact on Applications and Future Directions

The implications of FRePo extend beyond just dataset distillation:

Continual Learning: FRePo's efficient data representation supports continual learning frameworks by acting as a replay buffer, facilitating better generalization under dynamic, sequential learning environments. This is particularly relevant for incremental class learning tasks.
Privacy and Security: The method also shows promise in defending against membership inference attacks, as distilled datasets, due to their synthesized nature, leave less room for privacy breaches common in large-scale data retention.

The results of FRePo hold significant promise for future developments in AI, particularly in settings demanding efficient learning and deployment, such as mobile and edge computing environments, where resource constraints are prevalent. Moreover, as AI models grow more computationally demanding due to increasing data and complexity, techniques like FRePo could play a critical role in sustainable AI development.

Conclusion

The paper offers a substantial advancement in dataset distillation, balancing efficiency and effectiveness. By addressing the computational bottlenecks and potential overfitting issues inherent in previous methods, FRePo provides a viable pathway toward scalable and robust data representation. Future research may focus on further refining these techniques, exploring additional applications in privacy-friendly AI, and combining FRePo with other machine learning paradigms to leverage synthetic data in more complex architectures and tasks.