Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Leveraging Gradients for Unsupervised Accuracy Estimation under Distribution Shift (2401.08909v2)

Published 17 Jan 2024 in cs.LG

Abstract: Estimating test accuracy without access to the ground-truth test labels under varying test environments is a challenging, yet extremely important problem in the safe deployment of machine learning algorithms. Existing works rely on the information from either the outputs or the extracted features of neural networks to formulate an estimation score correlating with the ground-truth test accuracy. In this paper, we investigate--both empirically and theoretically--how the information provided by the gradients can be predictive of the ground-truth test accuracy even under a distribution shift. Specifically, we use the norm of classification-layer gradients, backpropagated from the cross-entropy loss after only one gradient step over test data. Our key idea is that the model should be adjusted with a higher magnitude of gradients when it does not generalize to the test dataset with a distribution shift. We provide theoretical insights highlighting the main ingredients of such an approach ensuring its empirical success. Extensive experiments conducted on diverse distribution shifts and model structures demonstrate that our method significantly outperforms state-of-the-art algorithms.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (62)
  1. Self-training: A survey. arXiv preprint arXiv:2202.12040, 2023.
  2. Can we use gradient norm as a measure of generalization error for model selection in practice? 2020.
  3. When maml can adapt fast and how to assist when it cannot. In Proceedings of International Conference on Artificial Intelligence and Statistics (AISTATS), 2021.
  4. Provable guarantees for gradient-based meta-learning. In The International Conference on Machine Learning (ICML), pp.  424–433, 2019.
  5. Chatterjee, S. Coherent gradients: An approach to understanding generalization in gradient descent-based optimization. arXiv preprint arXiv:2002.10657, 2020.
  6. Computing the testing error without a testing set. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  2677–2685, 2020.
  7. Learning-to-learn stochastic gradient descent with biased regularization. In The International Conference on Machine Learning (ICML), pp.  1566–1575, 2019.
  8. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.  248–255. Ieee, 2009.
  9. Are labels always necessary for classifier accuracy evaluation? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  15069–15078, 2021.
  10. What does rotation prediction tell us about classifier accuracy under varying testing environments? In International Conference on Machine Learning (ICML), pp.  2579–2589, 2021.
  11. Confidence and dispersity speak: Characterising prediction matrix for unsupervised accuracy estimation. arXiv preprint arXiv:2302.01094, 2023.
  12. Confident anchor-induced multi-source free domain adaptation. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems, volume 34, pp.  2848–2860. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/168908dd3227b8358eababa07fcaf091-Paper.pdf.
  13. Transductive bounds for the multi-class majority vote classifier. In Proceedings of the AAAI Conference on Artificial Intelligence, pp.  3566–3573, 2019.
  14. Leveraging unlabeled data to predict out-of-distribution performance. arXiv preprint arXiv:2201.04234, 2022.
  15. Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. arXiv preprint arXiv:1811.12231, 2018.
  16. Predicting with confidence on unseen distributions. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp.  1134–1144, 2021.
  17. Towards understanding generalization in gradient-based meta-learning. arXiv preprint arXiv:1907.07287, 2019.
  18. On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, pp.  1321–1330. JMLR.org, 2017.
  19. Train faster, generalize better: Stability of stochastic gradient descent. In International conference on machine learning (ICML), pp.  1225–1234. PMLR, 2016.
  20. Deep residual learning for image recognition. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR), pp.  770–778, 2016.
  21. Benchmarking neural network robustness to common corruptions and perturbations. arXiv preprint arXiv:1903.12261, 2019.
  22. A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv preprint arXiv:1610.02136, 2016.
  23. Deep anomaly detection with outlier exposure. arXiv preprint arXiv:1812.04606, 2018.
  24. On the importance of gradients for detecting distributional shifts in the wild. Advances in Neural Information Processing Systems, 34:677–689, 2021.
  25. Predicting the generalization gap in deep networks with margin distributions. In International Conference on Learning Representations, 2019.
  26. Assessing generalization of sgd via disagreement. arXiv preprint arXiv:2106.13799, 2021.
  27. Measuring catastrophic forgetting in neural networks. In Proceedings of the Association for the Advancement of Artificial Intelligence conference on artificial intelligence (AAAI), 2018.
  28. Wilds: A benchmark of in-the-wild distribution shifts. In International Conference on Machine Learning, pp.  5637–5664. PMLR, 2021.
  29. Learning multiple layers of features from tiny images. 2009.
  30. Tiny imagenet visual recognition challenge. CS 231N, 7(7):3, 2015.
  31. On generalization error bounds of noisy gradient methods for non-convex learning. arXiv preprint arXiv:1902.00621, 2019.
  32. Enhancing the reliability of out-of-distribution image detection in neural networks. arXiv preprint arXiv:1706.02690, 2017.
  33. Energy-based out-of-distribution detection. Advances in Neural Information Processing Systems, 33:21464–21475, 2020.
  34. London, B. A pac-bayesian analysis of randomized learning with application to stochastic gradient descent. Advances in Neural Information Processing Systems (NeurIPS), 30, 2017.
  35. Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983, 2016.
  36. Characterizing out-of-distribution error via optimal transport. arXiv preprint arXiv:2305.15640, 2023.
  37. Co-validation: Using model disagreement on unlabeled data to validate classification algorithms. Advances in Neural Information Processing Systems (NeurIPS), 17, 2004.
  38. Domain generalization via gradient surgery. In Proceedings of the IEEE/CVF international conference on computer vision, pp.  6630–6638, 2021.
  39. Heavy-tailed universality predicts trends in test accuracies for very large pre-trained deep neural networks. In Proceedings of the 2020 SIAM International Conference on Data Mining (SDM), pp.  505–513. SIAM, 2020.
  40. Revisiting the calibration of modern neural networks. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems, volume 34, pp.  15682–15694. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/8420d359404024567b5aefda1231af24-Paper.pdf.
  41. When does label smoothing help? Advances in Neural Information Processing Systems (NeurIPS), 32, 2019.
  42. Information-theoretic generalization bounds for sgld via data-dependent estimates. Advances in Neural Information Processing Systems (NeurIPS), 32, 2019.
  43. Exploring generalization in deep learning. Advances in Neural Information Processing Systems (NeurIPS), 2017.
  44. Estimating accuracy from unlabeled data: A probabilistic logic approach. Advances in Neural Information Processing Systems (NeurIPS), 30, 2017.
  45. Estimating accuracy from unlabeled data: A bayesian approach. In International Conference on Machine Learning (ICML), pp.  1416–1425. PMLR, 2016.
  46. Dataset shift in machine learning. Mit Press, 2008.
  47. Pac-bayes bounds for stable algorithms with instance-dependent priors. Advances in Neural Information Processing Systems (NeurIPS), 31, 2018.
  48. Adapting visual category models to new domains. In European Conference on Computer Vision (ECCV), pp.  213–226. Springer, 2010.
  49. Breeds: Benchmarks for subpopulation shift. arXiv preprint arXiv:2008.04859, 2020.
  50. Gradient matching for domain generalization. arXiv preprint arXiv:2104.09937, 2021.
  51. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems, volume 33, pp.  596–608. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/06964dce9addb1c5cb5d6e3d9838f733-Paper.pdf.
  52. Predicting neural network accuracy from weights. arXiv preprint arXiv:2002.11448, 2020.
  53. Deep hashing network for unsupervised domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.  5018–5027, 2017.
  54. Diversity assessment in many-objective optimization. IEEE transactions on cybernetics, 47(6):1510–1522, 2016.
  55. On the importance of feature separability in predicting out-of-distribution error. arXiv preprint arXiv:2303.15488, 2023.
  56. Towards task and architecture-independent generalization gap predictors. arXiv preprint arXiv:1906.01550, 2019.
  57. Generalized out-of-distribution detection: A survey. arXiv preprint arXiv:2110.11334, 2021.
  58. Predicting out-of-distribution error with the projection norm. In Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., and Sabato, S. (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp.  25721–25746. PMLR, 17–23 Jul 2022a. URL https://proceedings.mlr.press/v162/yu22i.html.
  59. Predicting out-of-distribution error with the projection norm. arXiv preprint arXiv:2202.05834, 2022b.
  60. Wide residual networks. In British Machine Vision Conference (BMVC), 2016.
  61. Penalizing gradient norm for efficiently improving generalization in deep learning. In International Conference on Machine Learning, pp.  26982–26992. PMLR, 2022.
  62. Towards better generalization of adaptive gradient methods. Advances in Neural Information Processing Systems, 33:810–821, 2020.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets