Boosting the Cross-Architecture Generalization of Dataset Distillation through an Empirical Study (2312.05598v2)
Abstract: The poor cross-architecture generalization of dataset distillation greatly weakens its practical significance. This paper attempts to mitigate this issue through an empirical study, which suggests that the synthetic datasets undergo an inductive bias towards the distillation model. Therefore, the evaluation model is strictly confined to having similar architectures of the distillation model. We propose a novel method of EvaLuation with distillation Feature (ELF), which utilizes features from intermediate layers of the distillation model for the cross-architecture evaluation. In this manner, the evaluation model learns from bias-free knowledge therefore its architecture becomes unfettered while retaining performance. By performing extensive experiments, we successfully prove that ELF can well enhance the cross-architecture generalization of current DD methods. Code of this project is at \url{https://github.com/Lirui-Zhao/ELF}.
- Flexible Dataset Distillation: Learn Labels Instead of Images. In Neural Information Processing Systems Workshop (NeurIPSW).
- JAX: composable transformations of Python+NumPy programs.
- Dataset distillation by matching training trajectories. In Computer Vision and Pattern Recognition (CVPR).
- Bidirectional Learning for Offline Infinite-width Model-based Optimization. In Neural Information Processing Systems (NeurIPS).
- Private Set Generation with Discriminative Information. In Neural Information Processing Systems (NeurIPS).
- Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI).
- Clancey, W. J. 1979. Transfer of Rule-Based Expertise through a Tutorial Dialogue. Ph.D. diss., Dept. of Computer Science, Stanford Univ., Stanford, Calif.
- Clancey, W. J. 1983. Communication, Simulation, and Intelligent Agents: Implications of Personal Intelligent Machines for Medical Education. In Proceedings of the Eighth International Joint Conference on Artificial Intelligence (IJCAI-83), 556–560. Menlo Park, Calif: IJCAI Organization.
- Clancey, W. J. 1984. Classification Problem Solving. In Proceedings of the Fourth National Conference on Artificial Intelligence, 45–54. Menlo Park, Calif.: AAAI Press.
- Clancey, W. J. 2021. The Engineering of Qualitative Models. Forthcoming.
- DC-BENCH: Dataset Condensation Benchmark. In Neural Information Processing Systems (NeurIPS).
- Scaling Up Dataset Distillation to ImageNet-1K with Constant Memory. In International Conference on Machine Learning (ICML).
- Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition (CVPR).
- Remember the Past: Distilling Datasets into Addressable Memories for Neural Networks. In Neural Information Processing Systems (NeurIPS).
- Privacy for Free: How does Dataset Condensation Help Privacy? In International Conference on Machine Learning (ICML).
- Minimizing the Accumulated Trajectory Error to Improve Dataset Distillation. In Conference on Computer Vision and Pattern Recognition (CVPR).
- Blackboard Systems. Reading, Mass.: Addison-Wesley.
- Model-agnostic meta-learning for fast adaptation of deep networks. In International Conference on Machine Learning (ICML).
- Dynamic Few-Shot Visual Learning without Forgetting Spyros Gidaris. In Computer Vision and Pattern Recognition (CVPR).
- Rich feature hierarchies for accurate object detection and semantic segmentation. In International Conference on Computer Vision (ICCV).
- Strategic explanations for a diagnostic consultation system. International Journal of Man-Machine Studies, 20(1): 3–19.
- Strategic Explanations in Consultation—Duplicate. The International Journal of Man-Machine Studies, 20(1): 3–19.
- Deep residual learning for image recognition. In Computer Vision and Pattern Recognition (CVPR).
- Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning (ICML).
- Dataset Condensation via Efficient Synthetic-Data Parameterization. In International Conference on Machine Learning (ICML).
- Learning multiple layers of features from tiny images. Technical report, Citeseer.
- Tiny imagenet visual recognition challenge. CS 231N.
- Dataset condensation with latent space knowledge factorization and sharing. arXiv preprint arXiv:2208.10494.
- Wide neural networks of any depth evolve as linear models under gradient descent. In Neural Information Processing Systems (NeurIPS).
- Dataset condensation with contrastive signals. In International Conference on Machine Learning (ICML).
- A Comprehensive Survey to Dataset Distillation. arXiv preprint arXiv:2301.05603.
- Dataset Distillation for Medical Dataset Sharing. In AAAI Conference on Artificial Intelligence Workshop (AAAIW).
- Microsoft coco: Common objects in context. In European Conference on Computer Vision (ECCV).
- Dataset Distillation via Factorization. In Neural Information Processing Systems (NeurIPS).
- Fully convolutional networks for semantic segmentation. In Computer Vision and Pattern Recognition (CVPR).
- Efficient Dataset Distillation using Random Feature Approximation. In Neural Information Processing Systems (NeurIPS).
- Gradient-Based Hyperparameter Optimization Through Reversible Learning. In International Conference on Machine Learning (ICML).
- NASA. 2015. Pluto: The ’Other’ Red Planet. https://www.nasa.gov/nh/pluto-the-other-red-planet. Accessed: 2018-12-06.
- Dataset Meta-Learning from Kernel Ridge-Regression. In International Conference on Learning Representations (ICLR).
- Dataset Distillation with Infinitely Wide Convolutional Networks. In Neural Information Processing Systems (NeurIPS).
- Pytorch: An imperative style, high-performance deep learning library. In Neural Information Processing Systems (NeurIPS).
- Faster r-cnn: Towards real-time object detection with region proposal networks. In Neural Information Processing Systems (NeurIPS).
- Kornia: an open source differentiable computer vision library for pytorch. In Winter Conference on Applications of Computer Vision (WACV).
- Rice, J. 1986. Poligon: A System for Parallel Problem Solving. Technical Report KSL-86-19, Dept. of Computer Science, Stanford Univ.
- Robinson, A. L. 1980a. New Ways to Make Microcircuits Smaller. Science, 208(4447): 1019–1022.
- Robinson, A. L. 1980b. New Ways to Make Microcircuits Smaller—Duplicate Entry. Science, 208: 1019–1026.
- Distilled Replay: Overcoming Forgetting through Synthetic Samples. In International Joint Conference on Artificial Intelligence Workshop (IJCAIW).
- Imagenet large scale visual recognition challenge. International Journal of Computer Vision (IJCV).
- Data Distillation: A Survey. arXiv preprint arXiv:2301.04272.
- Infinite Recommendation Networks: A Data-Centric Approach. In Neural Information Processing Systems (NeurIPS).
- Sample Condensation in Online Continual Learning. In International Joint Conference on Neural Networks (IJCNN).
- Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations (ICLR).
- Soft-Label Dataset Distillation and Text Dataset Distillation. In International Joint Conference on Neural Networks (IJCNN).
- Going deeper with convolutions. In Computer Vision and Pattern Recognition (CVPR).
- Instance normalization: The missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022.
- Attention Is All You Need. arXiv:1706.03762.
- CAFE: Learning to Condense Dataset by Aligning Features. In Computer Vision and Pattern Recognition (CVPR).
- Dataset distillation. arXiv preprint arXiv:1811.10959.
- Werbos, P. 1990. Backpropagation through time: what it does and how to do it. Institute of Electrical and Electronics Engineers (IEEE).
- Condensed Composite Memory Continual Learning. In International Joint Conference on Neural Networks (IJCNN).
- Dataset Distillation: A Comprehensive Review. arXiv preprint arXiv:2301.07014.
- Dataset condensation with Differentiable Siamese Augmentation. In International Conference on Machine Learning (ICML).
- Dataset Condensation with Gradient Matching. In International Conference on Learning Representations (ICLR).
- Synthesizing Informative Training Samples with GAN. In Neural Information Processing Systems Workshop (NeurIPSW).
- Dataset Condensation with Distribution Matching. In Winter Conference on Applications of Computer Vision (WACV).
- Dataset Distillation using Neural Feature Regression. In Neural Information Processing Systems (NeurIPS).