Papers
Topics
Authors
Recent
Search
2000 character limit reached

Distilled Datamodel with Reverse Gradient Matching

Published 22 Apr 2024 in cs.LG and cs.CV | (2404.14006v1)

Abstract: The proliferation of large-scale AI models trained on extensive datasets has revolutionized machine learning. With these models taking on increasingly central roles in various applications, the need to understand their behavior and enhance interpretability has become paramount. To investigate the impact of changes in training data on a pre-trained model, a common approach is leave-one-out retraining. This entails systematically altering the training dataset by removing specific samples to observe resulting changes within the model. However, retraining the model for each altered dataset presents a significant computational challenge, given the need to perform this operation for every dataset variation. In this paper, we introduce an efficient framework for assessing data impact, comprising offline training and online evaluation stages. During the offline training phase, we approximate the influence of training data on the target model through a distilled synset, formulated as a reversed gradient matching problem. For online evaluation, we expedite the leave-one-out process using the synset, which is then utilized to compute the attribution matrix based on the evaluation objective. Experimental evaluations, including training data attribution and assessments of data quality, demonstrate that our proposed method achieves comparable model behavior evaluation while significantly speeding up the process compared to the direct retraining method.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. Permutation importance: a corrected feature importance measure. Bioinformatics, pages 1340–1347, 2010.
  2. Visualizing the effects of predictor variables in black box supervised learning models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 82, 2020a.
  3. Visualizing the effects of predictor variables in black box supervised learning models. Journal of the Royal Statistical Society Series B: Statistical Methodology, 82(4):1059–1086, 2020b.
  4. If influence functions are the answer, then what is the question? Advances in Neural Information Processing Systems, 35:17953–17967, 2022.
  5. Machine unlearning. IEEE Symposium on Security and Privacy, pages 141–159, 2021.
  6. Machine unlearning for random forests. In International Conference on Machine Learning, 2021.
  7. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  8. Towards making systems forget with machine unlearning. 2015 IEEE Symposium on Security and Privacy, pages 463–480, 2015.
  9. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015.
  10. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  11. Privacy for free: How does dataset condensation help privacy? In International Conference on Machine Learning, pages 5378–5396. PMLR, 2022.
  12. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  13. Jerome H. Friedman. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29:1189–1232, 2001a.
  14. Jerome H Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189–1232, 2001b.
  15. Making ai forget you: Data deletion in machine learning. In NeurIPS, 2019.
  16. Fastif: Scalable influence functions for efficient model interpretation and debugging. arXiv preprint arXiv:2012.15781, 2020.
  17. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  18. Deepobliviate: A powerful charm for erasing data residual memory in deep neural networks. ArXiv, abs/2105.06209, 2021.
  19. Datamodels: Predicting predictions from training data. In Proceedings of the 39th International Conference on Machine Learning, 2022.
  20. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018.
  21. Understanding black-box predictions via influence functions. In International conference on machine learning, pages 1885–1894. PMLR, 2017.
  22. Alex Krizhevsky. Learning multiple layers of features from tiny images. 2009.
  23. Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60:84 – 90, 2012.
  24. Gradient-based learning applied to document recognition. Proc. IEEE, 86:2278–2324, 1998.
  25. Dataset condensation with contrastive signals. In International Conference on Machine Learning, pages 12352–12364. PMLR, 2022.
  26. Mgdd: A meta generator for fast dataset distillation. In Advances in Neural Information Processing Systems, 2023.
  27. Dataset distillation via factorization. In Advances in Neural Information Processing Systems, 2022.
  28. Slimmable dataset condensation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3759–3768, 2023.
  29. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017.
  30. Variational bayesian unlearning. Advances in Neural Information Processing Systems, 33:16025–16036, 2020.
  31. Dataset distillation with infinitely wide convolutional networks. Advances in Neural Information Processing Systems, 34:5186–5198, 2021.
  32. Estimating training data influence by tracing gradient descent. Advances in Neural Information Processing Systems, 33:19920–19930, 2020.
  33. Zero-shot text-to-image generation. In International Conference on Machine Learning, pages 8821–8831. PMLR, 2021.
  34. ”why should i trust you?”: Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016a.
  35. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016b.
  36. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115:211–252, 2014.
  37. Scaling up influence functions. In AAAI Conference on Artificial Intelligence, 2021.
  38. Scaling up influence functions. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 8179–8186, 2022.
  39. Remember what you want to forget: Algorithms for machine unlearning. Advances in Neural Information Processing Systems, 34, 2021.
  40. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017.
  41. Grad-cam: Visual explanations from deep networks via gradient-based localization. International Journal of Computer Vision, 128:336–359, 2019.
  42. Explanations of model predictions with live and breakdown packages. arXiv preprint arXiv:1804.01955, 2018.
  43. Soft-label dataset distillation and text dataset distillation. In 2021 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2021.
  44. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017.
  45. Supersparse linear integer models for optimized medical scoring systems. Machine Learning, 102:349–391, 2015.
  46. Generalized linear rule models. In ICML, 2019.
  47. Deltagrad: Rapid retraining of machine learning models. In International Conference on Machine Learning, 2020.
  48. Diffusion model as representation learner. In IEEE/CVF International Conference on Computer Vision, 2023.
  49. Learning with recoverable forgetting. In European Conference on Computer Vision, 2022.
  50. Partial network cloning. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  51. Dataset distillation: A comprehensive review. In IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024.
  52. Visualizing and understanding convolutional networks. In European conference on computer vision, pages 818–833. Springer, 2014.
  53. Modelpred: A framework for predicting trained model from training data. In 2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), pages 432–449. IEEE, 2023.
  54. Dataset condensation with differentiable siamese augmentation. In International Conference on Machine Learning, 2021.
  55. Dataset condensation with gradient matching. In Ninth International Conference on Learning Representations 2021, 2021.
  56. Learning deep features for discriminative localization. IEEE Conference on Computer Vision and Pattern Recognition, pages 2921–2929, 2016.
  57. Dataset distillation using neural feature regression. arXiv preprint arXiv:2206.00719, 2022.
Citations (3)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.