Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Calibrated Dataset Condensation for Faster Hyperparameter Search (2405.17535v1)

Published 27 May 2024 in cs.LG, cs.AI, and stat.ML

Abstract: Dataset condensation can be used to reduce the computational cost of training multiple models on a large dataset by condensing the training dataset into a small synthetic set. State-of-the-art approaches rely on matching the model gradients between the real and synthetic data. However, there is no theoretical guarantee of the generalizability of the condensed data: data condensation often generalizes poorly across hyperparameters/architectures in practice. This paper considers a different condensation objective specifically geared toward hyperparameter search. We aim to generate a synthetic validation dataset so that the validation-performance rankings of the models, with different hyperparameters, on the condensed and original datasets are comparable. We propose a novel hyperparameter-calibrated dataset condensation (HCDC) algorithm, which obtains the synthetic validation dataset by matching the hyperparameter gradients computed via implicit differentiation and efficient inverse Hessian approximation. Experiments demonstrate that the proposed framework effectively maintains the validation-performance rankings of models and speeds up hyperparameter/architecture search for tasks on both images and graphs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (79)
  1. Gradient based sample selection for online continual learning. Advances in neural information processing systems 32.
  2. Generative modeling for protein structures. Advances in neural information processing systems 31.
  3. Coresets for clustering in graphs of bounded treewidth. In International Conference on Machine Learning. PMLR.
  4. Analyzing the expressive power of graph neural networks in a spectral perspective. In Proceedings of the International Conference on Learning Representations (ICLR).
  5. Spectral sparsification of graphs: theory and algorithms. Communications of the ACM 56 87–94.
  6. Bengio, Y. (2000). Gradient-based optimization of hyperparameters. Neural computation 12 1889–1900.
  7. Flexible dataset distillation: Learn labels instead of images. arXiv preprint arXiv:2006.08572 .
  8. Coresets via bilevel optimization for continual learning and streaming. Advances in Neural Information Processing Systems 33 14879–14890.
  9. Coresets for clustering in excluded-minor graphs and beyond. In Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA). SIAM.
  10. Graph coarsening with neural networks. In International Conference on Learning Representations.
  11. Dataset distillation by matching training trajectories. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
  12. Super-samples from kernel herding. In Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence.
  13. Cluster-gcn: An efficient algorithm for training deep and large graph convolutional networks. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining.
  14. Dc-bench: Dataset condensation benchmark. arXiv preprint arXiv:2207.09639 .
  15. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems, vol. 29.
  16. Vq-gnn: A universal framework to scale up graph neural networks using vector quantization. Advances in Neural Information Processing Systems 34 6733–6746.
  17. Domke, J. (2012). Generic methods for optimization-based modeling. In Artificial Intelligence and Statistics. PMLR.
  18. Searching for a robust neural architecture in four gpu hours. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
  19. Nas-bench-201: Extending the scope of reproducible neural architecture search. arXiv preprint arXiv:2001.00326 .
  20. Neural architecture search: A survey. The Journal of Machine Learning Research 20 1997–2017.
  21. Facility location: concepts, models, algorithms and case studies. Springer Science & Business Media.
  22. Hyperparameter optimization. In Automated machine learning. Springer, Cham, 3–33.
  23. Sign: Scalable inception graph neural networks. arXiv preprint arXiv:2004.11198 .
  24. Graphnas: Graph neural architecture search with reinforcement learning. arXiv preprint arXiv:1904.09981 .
  25. Deepcore: A comprehensive library for coreset selection in deep learning. arXiv preprint arXiv:2204.08499 .
  26. Inductive representation learning on large graphs. Advances in neural information processing systems 30.
  27. Open graph benchmark: Datasets for machine learning on graphs. Advances in neural information processing systems 33 22118–22133.
  28. Search to aggregate neighborhood for graph neural network. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE.
  29. Scaling up graph neural networks via graph coarsening. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining.
  30. Submodular combinatorial information measures with applications in machine learning. In Algorithmic Learning Theory. PMLR.
  31. Categorical reparameterization with gumbel-softmax. In International Conference on Learning Representations.
  32. Condensing graphs via one-step gradient matching. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.
  33. Graph condensation for graph neural networks. In International Conference on Learning Representations.
  34. Dataset condensation via efficient synthetic-data parameterization. In International Conference on Machine Learning. PMLR.
  35. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations.
  36. Diffusion improves graph learning. In Advances in neural information processing systems. PMLR.
  37. Prism: A rich class of parameterized submodular information measures for guided data subset selection. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36.
  38. Design and regularization of neural networks: the optimal use of a validation set. In Neural Networks for Signal Processing VI. Proceedings of the 1996 IEEE Signal Processing Society Workshop. IEEE.
  39. Random search and reproducibility for neural architecture search. In Uncertainty in artificial intelligence. PMLR.
  40. Darts: Differentiable architecture search. In International Conference on Learning Representations.
  41. Graph condensation via receptive field distribution matching. arXiv preprint arXiv:2206.13697 .
  42. Optimizing millions of hyperparameters by implicit differentiation. In International Conference on Artificial Intelligence and Statistics. PMLR.
  43. Loukas, A. (2019). Graph reduction with spectral and cut guarantees. J. Mach. Learn. Res. 20 1–42.
  44. Spectrally approximating large graphs with smaller graphs. In International Conference on Machine Learning. PMLR.
  45. Scalable gradient-based tuning of continuous regularization hyperparameters. In International conference on machine learning. PMLR.
  46. The concrete distribution: A continuous relaxation of discrete random variables. In International Conference on Learning Representations.
  47. Optimizing neural networks with kronecker-factored approximate curvature. In International conference on machine learning. PMLR.
  48. Dataset meta-learning from kernel ridge-regression. In International Conference on Learning Representations.
  49. Dataset distillation with infinitely wide convolutional networks. Advances in Neural Information Processing Systems 34 5186–5198.
  50. Bilevel optimization with nonsmooth lower level problems. In International Conference on Scale Space and Variational Methods in Computer Vision. Springer.
  51. Deep learning on a data diet: Finding important examples early in training. Advances in Neural Information Processing Systems 34 20596–20607.
  52. Pedregosa, F. (2016). Hyperparameter optimization with approximate gradient. In International conference on machine learning. PMLR.
  53. From graph low-rank global attention to 2-fwl approximation. In International Conference on Machine Learning. PMLR.
  54. icarl: Incremental classifier and representation learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition.
  55. Self-supervised graph transformer on large-scale molecular data. In Advances in neural information processing systems, vol. 33.
  56. Local graph sparsification for scalable clustering. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data.
  57. Active learning for convolutional neural networks: A core-set approach. In International Conference on Learning Representations.
  58. Truncated back-propagation for bilevel optimization. In The 22nd International Conference on Artificial Intelligence and Statistics. PMLR.
  59. Graphvae: Towards generation of small graphs using variational autoencoders. In International conference on artificial neural networks. Springer.
  60. Generative teaching networks: Accelerating neural architecture search by learning to generate synthetic training data. In International Conference on Machine Learning. PMLR.
  61. An empirical study of example forgetting during deep neural network learning. In International Conference on Learning Representations.
  62. Graph attention networks. In International Conference on Learning Representations.
  63. Cafe: Learning to condense dataset by aligning features. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
  64. Rethinking architecture selection in differentiable nas. In International Conference on Learning Representations.
  65. Dataset distillation. arXiv preprint arXiv:1811.10959 .
  66. On solving minimax optimization locally: A follow-the-ridge approach. In International Conference on Learning Representations.
  67. Autogel: An automated graph neural network with explicit link information. Advances in Neural Information Processing Systems 34 24509–24522.
  68. Welling, M. (2009). Herding dynamical weights to learn. In Proceedings of the 26th Annual International Conference on Machine Learning.
  69. Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Reinforcement learning 5–32.
  70. Snas: stochastic neural architecture search. In International Conference on Learning Representations.
  71. How powerful are graph neural networks? In International Conference on Learning Representations.
  72. Revisiting semi-supervised learning with graph embeddings. In International conference on machine learning. PMLR.
  73. Understanding and robustifying differentiable architecture search. In International Conference on Learning Representations.
  74. Graphsaint: Graph sampling based inductive learning method. In International Conference on Learning Representations.
  75. Graph-bert: Only attention is needed for learning graph representations. arXiv preprint arXiv:2001.05140 .
  76. Dataset condensation with differentiable siamese augmentation. In International Conference on Machine Learning. PMLR.
  77. Dataset condensation with distribution matching. arXiv preprint arXiv:2110.04181 .
  78. Dataset condensation with gradient matching. In International Conference on Learning Representations.
  79. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition.

Summary

We haven't generated a summary for this paper yet.