Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ATOM: Attention Mixer for Efficient Dataset Distillation (2405.01373v1)

Published 2 May 2024 in cs.CV

Abstract: Recent works in dataset distillation seek to minimize training expenses by generating a condensed synthetic dataset that encapsulates the information present in a larger real dataset. These approaches ultimately aim to attain test accuracy levels akin to those achieved by models trained on the entirety of the original dataset. Previous studies in feature and distribution matching have achieved significant results without incurring the costs of bi-level optimization in the distillation process. Despite their convincing efficiency, many of these methods suffer from marginal downstream performance improvements, limited distillation of contextual information, and subpar cross-architecture generalization. To address these challenges in dataset distillation, we propose the ATtentiOn Mixer (ATOM) module to efficiently distill large datasets using a mixture of channel and spatial-wise attention in the feature matching process. Spatial-wise attention helps guide the learning process based on consistent localization of classes in their respective images, allowing for distillation from a broader receptive field. Meanwhile, channel-wise attention captures the contextual information associated with the class itself, thus making the synthetic image more informative for training. By integrating both types of attention, our ATOM module demonstrates superior performance across various computer vision datasets, including CIFAR10/100 and TinyImagenet. Notably, our method significantly improves performance in scenarios with a low number of images per class, thereby enhancing its potential. Furthermore, we maintain the improvement in cross-architectures and applications such as neural architecture search.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (78)
  1. Contextual diversity for active learning. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVI 16, pages 137–153. Springer, 2020.
  2. High performance convolution using sparsity and patterns for inference in deep convolutional neural networks. arXiv preprint arXiv:2104.08314, 2021.
  3. Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations, 2015.
  4. Scail: Classifier weights scaling for class incremental learning. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pages 1266–1275, 2020.
  5. Flexible dataset distillation: Learn labels instead of images. arXiv preprint arXiv:2006.08572, 2020.
  6. End-to-end incremental learning. In Proceedings of the European conference on computer vision (ECCV), pages 233–248, 2018.
  7. Dataset distillation by matching training trajectories. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4750–4759, 2022.
  8. Generalizing dataset distillation via deep generative prior. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3739–3748, 2023.
  9. Data distillation can be like vodka: Distilling more times for better quality. In The Twelfth International Conference on Learning Representations, 2024.
  10. Super-samples from kernel herding. In Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence, pages 109–116, 2010.
  11. Scaling up dataset distillation to imagenet-1k with constant memory. In International Conference on Machine Learning, pages 6565–6590. PMLR, 2023.
  12. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  13. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021.
  14. Minimizing the accumulated trajectory error to improve dataset distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3749–3758, 2023.
  15. Sequential subset matching for dataset distillation. Advances in Neural Information Processing Systems, 36, 2024.
  16. Adversarial active learning for deep networks: a margin based approach. arXiv preprint arXiv:1802.09841, 2018.
  17. Data shapley valuation for efficient batch active learning. In 2022 56th Asilomar Conference on Signals, Systems, and Computers, pages 1456–1462. IEEE, 2022.
  18. Dynamic few-shot visual learning without forgetting. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4367–4375, 2018.
  19. A kernel two-sample test. The Journal of Machine Learning Research, 13(1):723–773, 2012.
  20. Summarizing stream data for memory-restricted online continual learning. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2024.
  21. Towards lossless dataset distillation via difficulty-aligned trajectory matching. In The Twelfth International Conference on Learning Representations, 2024.
  22. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision, pages 1026–1034, 2015.
  23. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  24. You only condense once: Two rules for pruning condensed datasets. Advances in Neural Information Processing Systems, 36, 2024a.
  25. Multisize dataset condensation. In The Twelfth International Conference on Learning Representations, 2024b.
  26. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
  27. Generative adversarial imitation learning. Advances in neural information processing systems, 29, 2016.
  28. Unlocking the potential of federated learning: The symphony of dataset distillation via deep generative latents. arXiv preprint arXiv:2312.01537, 2023.
  29. Cfdp: Common frequency domain pruning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4714–4723, 2023.
  30. The need for speed: Pruning transformers with one recipe. arXiv preprint arXiv:2403.17921, 2024.
  31. Grad-match: Gradient matching based data subset selection for efficient deep model training. In International Conference on Machine Learning, pages 5464–5474. PMLR, 2021a.
  32. Glister: Generalization based data subset selection for efficient and robust learning. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 8110–8118, 2021b.
  33. Retrieve: Coreset selection for efficient and robust semi-supervised learning. Advances in Neural Information Processing Systems, 34:14488–14501, 2021c.
  34. Dataset condensation via efficient synthetic-data parameterization. In International Conference on Machine Learning, pages 11102–11118. PMLR, 2022.
  35. Learning multiple layers of features from tiny images. 2009.
  36. Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60(6):84–90, 2017.
  37. Tiny imagenet visual recognition challenge. CS 231N, 7(7):3, 2015.
  38. Dataset condensation with contrastive signals. In International Conference on Machine Learning, pages 12352–12364. PMLR, 2022.
  39. Meta knowledge condensation for federated learning. In The Eleventh International Conference on Learning Representations, 2023a.
  40. Slimmable dataset condensation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3759–3768, 2023b.
  41. Dataset distillation with convexified implicit gradients. In International Conference on Machine Learning, pages 22649–22674. PMLR, 2023.
  42. Active learning by acquiring contrastive examples. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 650–663, 2021.
  43. Coresets for data-efficient training of machine learning models. In International Conference on Machine Learning, pages 6950–6960. PMLR, 2020.
  44. Dataset meta-learning from kernel ridge-regression. In International Conference on Learning Representations, 2021a.
  45. Dataset meta-learning from kernel-ridge regression. In International Conference on Learning Representations, 2021b.
  46. Deep learning on a data diet: Finding important examples early in training. Advances in Neural Information Processing Systems, 34:20596–20607, 2021.
  47. icarl: Incremental classifier and representation learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 2001–2010, 2017.
  48. Maximum classifier discrepancy for unsupervised domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3723–3732, 2018.
  49. On the efficiency of subclass knowledge distillation in classification tasks. arXiv preprint arXiv:2109.05587, 2021.
  50. Subclass knowledge distillation with known subclass labels. In 2022 IEEE 14th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP), pages 1–5. IEEE, 2022.
  51. Datadam: Efficient dataset distillation with attention matching. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 17097–17107, 2023a.
  52. End-to-end supervised multilabel contrastive learning, 2023b.
  53. A new probabilistic distance metric with application in gaussian mixture reduction. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2023c.
  54. Probmcl: Simple probabilistic contrastive learning for multi-label visual classification. arXiv preprint arXiv:2401.01448, 2024.
  55. Active learning for convolutional neural networks: A core-set approach. In International Conference on Learning Representations, 2018.
  56. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  57. Small-gan: Speeding up gan training using core-sets. In International Conference on Machine Learning, pages 9005–9015. PMLR, 2020.
  58. Generative teaching networks: Accelerating neural architecture search by learning to generate synthetic training data. In International Conference on Machine Learning, pages 9206–9216. PMLR, 2020.
  59. An empirical study of example forgetting during deep neural network learning. In International Conference on Learning Representations, 2018.
  60. An empirical study of example forgetting during deep neural network learning. In International Conference on Learning Representations, 2019.
  61. Cafe: Learning to condense dataset by aligning features. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12196–12205, 2022.
  62. Dim: Distilling dataset into generative model. arXiv preprint arXiv:2303.04707, 2023.
  63. Dataset distillation. arXiv preprint arXiv:1811.10959, 2018a.
  64. Non-local neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7794–7803, 2018b.
  65. Conetv2: Efficient auto-channel size optimization for cnns. In 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA), pages 998–1003, 2021.
  66. Max Welling. Herding dynamical weights to learn. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 1121–1128, 2009.
  67. Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV), pages 3–19, 2018.
  68. Quantized convolutional neural networks for mobile devices. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4820–4828, 2016.
  69. Feddm: Iterative distribution matching for communication-efficient federated learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16323–16332, 2023.
  70. An efficient dataset condensation plugin and its application to continual learning. Advances in Neural Information Processing Systems, 36, 2024.
  71. On compressing deep models by low rank and sparse decomposition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7370–7379, 2017.
  72. Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. arXiv preprint arXiv:1612.03928, 2016.
  73. M3D: Dataset condensation by minimizing maximum mean discrepancy. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2024.
  74. Dataset condensation with differentiable siamese augmentation. In International Conference on Machine Learning, pages 12674–12685. PMLR, 2021.
  75. Dataset condensation with distribution matching. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 6514–6523, 2023.
  76. Dataset condensation with gradient matching. In International Conference on Learning Representations, 2021.
  77. Improved distribution matching for dataset condensation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7856–7865, 2023.
  78. Dataset quantization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 17205–17216, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Samir Khaki (15 papers)
  2. Ahmad Sajedi (12 papers)
  3. Kai Wang (624 papers)
  4. Lucy Z. Liu (3 papers)
  5. Yuri A. Lawryshyn (7 papers)
  6. Konstantinos N. Plataniotis (109 papers)
Citations (2)