Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Calibration Attacks: A Comprehensive Study of Adversarial Attacks on Model Confidence (2401.02718v3)

Published 5 Jan 2024 in cs.LG and cs.CR

Abstract: In this work, we highlight and perform a comprehensive study on calibration attacks, a form of adversarial attacks that aim to trap victim models to be heavily miscalibrated without altering their predicted labels, hence endangering the trustworthiness of the models and follow-up decision making based on their confidence. We propose four typical forms of calibration attacks: underconfidence, overconfidence, maximum miscalibration, and random confidence attacks, conducted in both black-box and white-box setups. We demonstrate that the attacks are highly effective on both convolutional and attention-based models: with a small number of queries, they seriously skew confidence without changing the predictive performance. Given the potential danger, we further investigate the effectiveness of a wide range of adversarial defence and recalibration methods, including our proposed defences specifically designed for calibration attacks to mitigate the harm. From the ECE and KS scores, we observe that there are still significant limitations in handling calibration attacks. To the best of our knowledge, this is the first dedicated study that provides a comprehensive investigation on calibration-focused attacks. We hope this study helps attract more attention to these types of attacks and hence hamper their potential serious damages. To this end, this work also provides detailed analyses to understand the characteristics of the attacks. Our code is available at https://github.com/PhenetOs/CalibrationAttack

Definition Search Book Streamline Icon: https://streamlinehq.com
References (61)
  1. Square Attack: a query-efficient black-box adversarial attack via random search. 2020.
  2. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, July 2018.
  3. Practical Black-box Attacks on Deep Neural Networks using Efficient Query Mechanisms. In Proceedings of the European Conference on Computer Vision (ECCV), September 2018.
  4. Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss. In Advances in Neural Information Processing Systems, 2019.
  5. Towards Evaluating the Robustness of Neural Networks. In 2017 IEEE Symposium on Security and Privacy (SP). IEEE, may 2017. doi: 10.1109/sp.2017.49.
  6. ZOO: Zeroth Order Optimization Based Black-Box Attacks to Deep Neural Networks without Training Substitute Models, pp.  15–26. Association for Computing Machinery, New York, NY, USA, 2017. ISBN 9781450352024.
  7. Adversarial Attack on Attackers: Post-Process to Mitigate Black-Box Score-Based Query Attacks, 2022.
  8. Robustbench: a standardized adversarial robustness benchmark. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2021.
  9. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp.  248–255. Ieee, 2009.
  10. Stochastic activation pruning for robust adversarial defense. In International Conference on Learning Representations, 2018.
  11. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. ICLR, 2021.
  12. Certified calibration: Bounding worst-case calibration under adversarial attacks. In The Second Workshop on New Frontiers in Adversarial Machine Learning, 2023. URL https://openreview.net/forum?id=sj5K9jtrdm.
  13. Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories. CVPR Workshop, 2004.
  14. Sharpness-aware Minimization for Efficiently Improving Generalization. In International Conference on Learning Representations, 2021.
  15. Disrupting deep uncertainty estimation without harming accuracy. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan (eds.), Advances in Neural Information Processing Systems, 2021. URL https://openreview.net/forum?id=jGqcfSqOUR0.
  16. Jacob Gildenblat and contributors. PyTorch library for CAM methods. https://github.com/jacobgil/pytorch-grad-cam, 2021.
  17. Explaining and Harnessing Adversarial Examples. CoRR, abs/1412.6572, 2015.
  18. Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples, 2020.
  19. On Calibration of Modern Neural Networks. In ICML. JMLR.org, 2017.
  20. Calibration of neural networks using splines. In International Conference on Learning Representations, 2021.
  21. Spectraldefense: Detecting adversarial attacks on cnns in the fourier domain. In 2021 International Joint Conference on Neural Networks (IJCNN), pp.  1–8, 2021. doi: 10.1109/IJCNN52387.2021.9533442.
  22. Deep Residual Learning for Image Recognition. In CVPR, 2016.
  23. Detection of traffic signs in real-world images: The German Traffic Sign Detection Benchmark. In International Joint Conference on Neural Networks, number 1288, 2013.
  24. The threat of adversarial attacks on machine learning in network security - A survey. CoRR, abs/1911.02621, 2019. URL http://arxiv.org/abs/1911.02621.
  25. Learning multiple layers of features from tiny images. Technical Report 0, University of Toronto, Toronto, Ontario, 2009.
  26. Beyond temperature scaling: Obtaining well-calibrated multiclass probabilities with Dirichlet calibration. In NeurIPS, 2019.
  27. Verified Uncertainty Calibration. In NeurIPS, 2019.
  28. Certifying confidence via randomized smoothing. Advances in Neural Information Processing Systems Foundation (NeurIPS), 2020. URL https://par.nsf.gov/biblio/10207641.
  29. Trainable Calibration Measures for Neural Networks from Kernel Mean Embeddings. In Jennifer Dy and Andreas Krause (eds.), Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pp.  2805–2814, Stockholmsmässan, Stockholm Sweden, 10–15 Jul 2018. PMLR.
  30. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, pp.  7167–7177, Red Hook, NY, USA, 2018. Curran Associates Inc.
  31. Improved Trainable Calibration Method for Neural Networks on Medical Imaging Classification. In British Machine Vision Conference (BMVC), 2020.
  32. Characterizing adversarial subspaces using local intrinsic dimensionality. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=B1gJ1L2aW.
  33. Towards Deep Learning Models Resistant to Adversarial Attacks. In International Conference on Learning Representations, 2018.
  34. Revisiting the Calibration of Modern Neural Networks. ArXiv, abs/2106.07998, 2021.
  35. Measuring Calibration in Deep Learning, 2020.
  36. Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift. In NeurIPS, 2019.
  37. Obtaining Well Calibrated Probabilities Using Bayesian Binning. AAAI, 2015, 2015.
  38. Robustness and Accuracy Could Be Reconcilable by (Proper) Definition. In International Conference on Machine Learning, 2022.
  39. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (eds.), Advances in Neural Information Processing Systems 32, pp.  8024–8035. Curran Associates, Inc., 2019.
  40. On-manifold Adversarial Data Augmentation Improves Uncertainty Calibration. In 2020 25th International Conference on Pattern Recognition (ICPR), pp.  8029–8036, 2021. doi: 10.1109/ICPR48806.2021.9413010.
  41. John C. Platt. Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods. In ADVANCES IN LARGE MARGIN CLASSIFIERS, pp.  61–74. MIT Press, 1999.
  42. Improving Calibration through the Relationship with Adversarial Robustness. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (eds.), Advances in Neural Information Processing Systems, volume 34, pp.  14358–14369. Curran Associates, Inc., 2021.
  43. Foolbox: A Python toolbox to benchmark the robustness of machine learning models. In Reliable Machine Learning in the Wild Workshop, 34th International Conference on Machine Learning, 2017.
  44. Foolbox Native: Fast adversarial attacks to benchmark the robustness of machine learning models in PyTorch, TensorFlow, and JAX. Journal of Open Source Software, 5(53):2607, 2020. doi: 10.21105/joss.02607.
  45. Fixing Data Augmentation to Improve Adversarial Robustness, 2021.
  46. Adversarial Attacks and Defenses in Deep Learning. Engineering, 6(3):346–360, 2020. ISSN 2095-8099. doi: https://doi.org/10.1016/j.eng.2019.12.012.
  47. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In 2017 IEEE International Conference on Computer Vision (ICCV), pp.  618–626, 2017. doi: 10.1109/ICCV.2017.74.
  48. Confidence-Calibrated Adversarial Training: Generalizing to Unseen Attacks. Proceedings of the International Conference on Machine Learning ICML, 2020.
  49. Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect. In NeurIPS, 2020.
  50. On Mixup Training: Improved Calibration and Predictive Uncertainty for Deep Neural Networks. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
  51. Towards Trustworthy Predictions from Deep Neural Networks with Fast Adversarial Calibration. In AAAI, 2021.
  52. Laurens van der Maaten and Geoffrey Hinton. Visualizing Data using t-SNE. Journal of Machine Learning Research, 9(86):2579–2605, 2008.
  53. HuggingFace’s Transformers: State-of-the-art Natural Language Processing, 2019.
  54. Densepure: Understanding diffusion models towards adversarial robustness. In ICLR, 2023.
  55. Feature Denoising for Improving Adversarial Robustness. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
  56. Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers. In ICML, 2001.
  57. Transforming classifier scores into accurate multiclass probability estimates. In KDD, pp.  694–699, 2002.
  58. Wide residual networks, 2016.
  59. Manipulating out-domain uncertainty estimation in deep neural networks via targeted clean-label poisoning. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, CIKM ’23, pp.  3114–3123, New York, NY, USA, 2023. Association for Computing Machinery. ISBN 9798400701245. doi: 10.1145/3583780.3614957. URL https://doi.org/10.1145/3583780.3614957.
  60. Mixup: Beyond Empirical Risk Minimization. In International Conference on Learning Representations, 2018.
  61. Increasing confidence in adversarial robustness evaluations. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=NkK4i91VWp.
Citations (1)

Summary

We haven't generated a summary for this paper yet.