Papers
Topics
Authors
Recent
Search
2000 character limit reached

Dataset Distillation with Infinitely Wide Convolutional Networks

Published 27 Jul 2021 in cs.LG | (2107.13034v3)

Abstract: The effectiveness of machine learning algorithms arises from being able to extract useful features from large amounts of data. As model and dataset sizes increase, dataset distillation methods that compress large datasets into significantly smaller yet highly performant ones will become valuable in terms of training efficiency and useful feature extraction. To that end, we apply a novel distributed kernel based meta-learning framework to achieve state-of-the-art results for dataset distillation using infinitely wide convolutional neural networks. For instance, using only 10 datapoints (0.02% of original dataset), we obtain over 65% test accuracy on CIFAR-10 image classification task, a dramatic improvement over the previous best test accuracy of 40%. Our state-of-the-art results extend across many other settings for MNIST, Fashion-MNIST, CIFAR-10, CIFAR-100, and SVHN. Furthermore, we perform some preliminary analyses of our distilled datasets to shed light on how they differ from naturally occurring data.

Citations (202)

Summary

  • The paper presents a novel distillation technique using infinitely wide convolutional networks with NTK and kernel inducing points.
  • It achieves over 65% accuracy on CIFAR-10 with just 10 synthetic data points, outperforming traditional approaches.
  • The method enhances training efficiency via a distributed client-server framework that leverages massive parallel GPU computations.

Dataset Distillation with Infinitely Wide Convolutional Networks

The paper "Dataset Distillation with Infinitely Wide Convolutional Networks" explores a novel approach in dataset distillation using infinite-width convolutional networks, leveraging powerful kernel-based meta-learning frameworks. It presents a significant advancement by achieving state-of-the-art results in dataset distillation tasks across various image benchmarks including MNIST, Fashion-MNIST, CIFAR-10, CIFAR-100, and SVHN.

Methodology and Results

The authors propose using Kernel Inducing Points (KIP) and Label Solve (LS) algorithms within an infinitely wide network framework characterized by Gaussian Processes (GP). This framework utilizes the Neural Tangent Kernel (NTK) which is crucial for synthesizing datasets that perform optimally in both kernel ridge-regression and standard neural network training. The remarkable numerical results indicate that with just 10 synthetic data points, the algorithm achieves over 65% accuracy on CIFAR-10, vastly outperforming existing approaches which only achieve about 40%.

The methodology involves a novel distributed client-server framework facilitating massive parallel computations required for the kernel operations, thus harnessing the computational potential of hundreds of concurrent GPUs. This allows managing the computational complexity introduced by convolutional and pooling layers in infinitely wide networks, which is necessary to compute the kernel matrices crucial for the distillation process.

Analysis

The distilled datasets showcase intriguing differences from conventional datasets. A systematic analysis reveals that KIP-optimized images not only integrate features across many samples but also increase in intrinsic dimensional complexity. This contradicts the assumption that distilled datasets would be simpler in structure, suggesting a richer feature space—potentially leading to improved neural network performance. Moreover, spectral analysis indicates that the distilled images leverage a broader range of spectral components compared to natural images, which tend to focus on few top-eigendirections.

Implications and Future Work

The research holds significant implications for enhancing training efficiency and feature extraction in larger networks and datasets. By distilling informative, smaller datasets, this work can alleviate the ever-increasing resource requirements of deep learning models, addressing the bottlenecks in training large neural networks.

From a theoretical standpoint, the use of infinitely wide networks offers insights into the convergence and parallel scalability of neural networks, underlining their practical potential in addressing non-parametric computational burdens.

The authors suggest further study into the mechanics of learned labels, which themselves are evolved during the distillation process. Such insights might unveil more about neural network functionality and generalization, paralleling theoretical advances in neural dynamics and scaling laws.

Moreover, the open-sourcing of these distilled datasets stands to propel further research in meta-learning capabilities and dataset efficiency, providing a valuable resource for the machine learning community.

Conclusion

In summary, the paper positions the dataset distillation using infinitely wide convolutional networks as a formidable technique with broad applications in efficient training and resource optimization. It establishes a robust computational methodology, demonstrates substantial improvements over previous art, and sets a strong foundation for ongoing exploration into the capabilities of distilled datasets and infinite-width network theory.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.