Papers
Topics
Authors
Recent
Search
2000 character limit reached

Better Representations via Adversarial Training in Pre-Training: A Theoretical Perspective

Published 26 Jan 2024 in cs.LG and stat.ML | (2401.15248v1)

Abstract: Pre-training is known to generate universal representations for downstream tasks in large-scale deep learning such as LLMs. Existing literature, e.g., \cite{kim2020adversarial}, empirically observe that the downstream tasks can inherit the adversarial robustness of the pre-trained model. We provide theoretical justifications for this robustness inheritance phenomenon. Our theoretical results reveal that feature purification plays an important role in connecting the adversarial robustness of the pre-trained model and the downstream tasks in two-layer neural networks. Specifically, we show that (i) with adversarial training, each hidden node tends to pick only one (or a few) feature; (ii) without adversarial training, the hidden nodes can be vulnerable to attacks. This observation is valid for both supervised pre-training and contrastive learning. With purified nodes, it turns out that clean training is enough to achieve adversarial robustness in downstream tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (66)
  1. Alayrac, J.-B., Uesato, J., Huang, P.-S., Fawzi, A., Stanforth, R., and Kohli, P. (2019), “Are labels required for improving adversarial robustness?” Advances in Neural Information Processing Systems, 32.
  2. Allen-Zhu, Z. and Li, Y. (2020), “Feature Purification: How Adversarial Training Performs Robust Deep Learning,” arXiv preprint arXiv:2005.10190.
  3. Arora, S., Khandeparkar, H., Khodak, M., Plevrakis, O., and Saunshi, N. (2019), “A theoretical analysis of contrastive unsupervised representation learning,” arXiv preprint arXiv:1902.09229.
  4. Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M. S., Bohg, J., Bosselut, A., Brunskill, E., et al. (2021), “On the opportunities and risks of foundation models,” arXiv preprint arXiv:2108.07258.
  5. Cai, Q.-Z., Du, M., Liu, C., and Song, D. (2018), “Curriculum adversarial training,” arXiv preprint arXiv:1805.04807.
  6. Carmon, Y., Raghunathan, A., Schmidt, L., Duchi, J. C., and Liang, P. S. (2019), “Unlabeled data improves adversarial robustness,” in Advances in Neural Information Processing Systems, pp. 11192–11203.
  7. Cemgil, T., Ghaisas, S., Dvijotham, K. D., and Kohli, P. (2019), “Adversarially robust representations with smooth encoders,” in International Conference on Learning Representations.
  8. Chen, D., Hu, H., Wang, Q., Yinli, L., Wang, C., Shen, C., and Li, Q. (2021), “CARTL: Cooperative Adversarially-Robust Transfer Learning,” in International Conference on Machine Learning, PMLR, pp. 1640–1650.
  9. Chen, L., Min, Y., Zhang, M., and Karbasi, A. (2020a), “More data can expand the generalization gap between adversarially robust and standard models,” in International Conference on Machine Learning, PMLR, pp. 1670–1680.
  10. Chen, T., Kornblith, S., Norouzi, M., and Hinton, G. (2020b), “A simple framework for contrastive learning of visual representations,” in International conference on machine learning, PMLR, pp. 1597–1607.
  11. Chuang, C.-Y., Robinson, J., Yen-Chen, L., Torralba, A., and Jegelka, S. (2020), “Debiased contrastive learning,” arXiv preprint arXiv:2007.00224.
  12. Dai, B. and Lin, D. (2017), “Contrastive learning for image captioning,” arXiv preprint arXiv:1710.02534.
  13. Dan, C., Wei, Y., and Ravikumar, P. (2020), “Sharp Statistical Guaratees for Adversarially Robust Gaussian Classification,” in International Conference on Machine Learning, PMLR, pp. 2345–2355.
  14. Deng, Z., Zhang, L., Ghorbani, A., and Zou, J. (2021a), “Improving adversarial robustness via unlabeled out-of-domain data,” in International Conference on Artificial Intelligence and Statistics, PMLR, pp. 2845–2853.
  15. Deng, Z., Zhang, L., Vodrahalli, K., Kawaguchi, K., and Zou, J. (2021b), “Adversarial Training Helps Transfer Learning via Better Representations,” arXiv preprint arXiv:2106.10189.
  16. Fan, L., Liu, S., Chen, P.-Y., Zhang, G., and Gan, C. (2021), “When Does Contrastive Learning Preserve Adversarial Robustness from Pretraining to Finetuning?” Advances in Neural Information Processing Systems, 34, 21480–21492.
  17. Goodfellow, I. J., Shlens, J., and Szegedy, C. (2015), “Explaining and Harnessing Adversarial Examples,” in 3rd International Conference on Learning Representations.
  18. Gowal, S., Rebuffi, S.-A., Wiles, O., Stimberg, F., Calian, D. A., and Mann, T. A. (2021), “Improving Robustness using Generated Data,” Advances in Neural Information Processing Systems, 34.
  19. Grigor’eva, M. and Popov, S. (2012), “An upper bound for the absolute constant in the nonuniform version of the Berry-Esseen inequalities for nonidentically distributed summands,” in Doklady Mathematics, Springer, vol. 86, pp. 524–526.
  20. HaoChen, J. Z. and Ma, T. (2022), “A Theoretical Study of Inductive Biases in Contrastive Learning,” arXiv preprint arXiv:2211.14699.
  21. HaoChen, J. Z., Wei, C., Kumar, A., and Ma, T. (2022), “Beyond separability: Analyzing the linear transferability of contrastive representations to related subpopulations,” arXiv preprint arXiv:2204.02683.
  22. He, K., Zhang, X., Ren, S., and Sun, J. (2016), “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778.
  23. Hendrycks, D., Lee, K., and Mazeika, M. (2019), “Using pre-training can improve model robustness and uncertainty,” arXiv preprint arXiv:1901.09960.
  24. Ho, C.-H. and Nvasconcelos, N. (2020), “Contrastive learning with adversarial examples,” Advances in Neural Information Processing Systems, 33, 17081–17093.
  25. Hyvarinen, A., Oja, E., Hoyer, P., and Hurri, J. (1998), “Image feature extraction by sparse coding and independent component analysis,” in Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No. 98EX170), IEEE, vol. 2, pp. 1268–1273.
  26. Javanmard, A. and Mehrabi, M. (2021), “Adversarial robustness for latent models: Revisiting the robust-standard accuracies tradeoff,” arXiv preprint arXiv:2110.11950.
  27. Javanmard, A. and Soltanolkotabi, M. (2022), “Precise statistical analysis of classification accuracies for adversarial training,” The Annals of Statistics, 50, 2127–2156.
  28. Javanmard, A., Soltanolkotabi, M., and Hassani, H. (2020), “Precise tradeoffs in adversarial training for linear regression,” in Conference on Learning Theory, PMLR, pp. 2034–2078.
  29. Khosla, P., Teterwak, P., Wang, C., Sarna, A., Tian, Y., Isola, P., Maschinot, A., Liu, C., and Krishnan, D. (2020), “Supervised contrastive learning,” arXiv preprint arXiv:2004.11362.
  30. Kim, M., Tack, J., and Hwang, S. J. (2020), “Adversarial Self-Supervised Contrastive Learning,” in Advances in Neural Information Processing Systems.
  31. Li, J., Zhou, P., Xiong, C., and Hoi, S. C. (2020), “Prototypical contrastive learning of unsupervised representations,” arXiv preprint arXiv:2005.04966.
  32. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. (2013), “Distributed representations of words and phrases and their compositionality,” in Advances in neural information processing systems, pp. 3111–3119.
  33. Min, Y., Chen, L., and Karbasi, A. (2020), “The curious case of adversarially robust models: More data can help, double descend, or hurt generalization,” arXiv preprint arXiv:2002.11080.
  34. Mo, Y., Wu, D., Wang, Y., Guo, Y., and Wang, Y. (2022), “When Adversarial Training Meets Vision Transformers: Recipes from Training to Architecture,” arXiv preprint arXiv:2210.07540.
  35. Najafi, A., Maeda, S.-i., Koyama, M., and Miyato, T. (2019), “Robustness to adversarial perturbations in learning from incomplete data,” in Advances in Neural Information Processing Systems, pp. 5542–5552.
  36. Nguyen, A. T., Lim, S. N., and Torr, P. (2022), “Task-Agnostic Robust Representation Learning,” arXiv preprint arXiv:2203.07596.
  37. Oord, A. v. d., Li, Y., and Vinyals, O. (2018), “Representation learning with contrastive predictive coding,” arXiv preprint arXiv:1807.03748.
  38. Petrov, A. and Kwiatkowska, M. (2022), “Robustness of Unsupervised Representation Learning without Labels,” arXiv preprint arXiv:2210.04076.
  39. Raghunathan, A., Xie, S. M., Yang, F., Duchi, J. C., and Liang, P. (2019), “Adversarial training can hurt generalization,” arXiv preprint arXiv:1906.06032.
  40. Rice, L., Wong, E., and Kolter, J. Z. (2020), “Overfitting in adversarially robust deep learning,” arXiv preprint arXiv:2002.11569.
  41. Salman, H., Ilyas, A., Engstrom, L., Kapoor, A., and Madry, A. (2020), “Do adversarially robust imagenet models transfer better?” arXiv preprint arXiv:2007.08489.
  42. Saunshi, N., Ash, J., Goel, S., Misra, D., Zhang, C., Arora, S., Kakade, S., and Krishnamurthy, A. (2022), “Understanding contrastive learning requires incorporating inductive biases,” arXiv preprint arXiv:2202.14037.
  43. Saunshi, N., Plevrakis, O., Arora, S., Khodak, M., and Khandeparkar, H. (2019), “A theoretical analysis of contrastive unsupervised representation learning,” in International Conference on Machine Learning, PMLR, pp. 5628–5637.
  44. Shafahi, A., Saadatpanah, P., Zhu, C., Ghiasi, A., Studer, C., Jacobs, D., and Goldstein, T. (2019), “Adversarially robust transfer learning,” arXiv preprint arXiv:1905.08232.
  45. Shen, K., Jones, R. M., Kumar, A., Xie, S. M., HaoChen, J. Z., Ma, T., and Liang, P. (2022), “Connect, not collapse: Explaining contrastive learning for unsupervised domain adaptation,” in International Conference on Machine Learning, PMLR, pp. 19847–19878.
  46. Sinha, A., Namkoong, H., and Duchi, J. (2018), “Certifying some distributional robustness with principled adversarial training,” .
  47. Taheri, M., Xie, F., and Lederer, J. (2021), “Statistical guarantees for regularized neural networks,” Neural Networks, 142, 148–161.
  48. Tian, Y., Sun, C., Poole, B., Krishnan, D., Schmid, C., and Isola, P. (2020), “What makes for good views for contrastive learning?” arXiv preprint arXiv:2005.10243.
  49. Tosh, C., Krishnamurthy, A., and Hsu, D. (2021), “Contrastive learning, multi-view redundancy, and linear models,” in Algorithmic Learning Theory, PMLR, pp. 1179–1206.
  50. Wang, Q., Wang, Y., Zhu, H., and Wang, Y. (2022), “Improving Out-of-Distribution Generalization by Adversarial Training with Structured Priors,” arXiv preprint arXiv:2210.06807.
  51. Wang, Y., Ma, X., Bailey, J., Yi, J., Zhou, B., and Gu, Q. (2019a), “On the convergence and robustness of adversarial training,” in International Conference on Machine Learning, pp. 6586–6595.
  52. Wang, Y., Zou, D., Yi, J., Bailey, J., Ma, X., and Gu, Q. (2019b), “Improving adversarial robustness requires revisiting misclassified examples,” in International Conference on Learning Representations.
  53. Wen, Z. and Li, Y. (2021), “Toward understanding the feature learning process of self-supervised contrastive learning,” in International Conference on Machine Learning, PMLR, pp. 11112–11122.
  54. Wu, B., Chen, J., Cai, D., He, X., and Gu, Q. (2020a), “Does Network Width Really Help Adversarial Robustness?” arXiv preprint arXiv:2010.01279.
  55. Xiao, J., Fan, Y., Sun, R., and Luo, Z.-Q. (2021), “Adversarial Rademacher Complexity of Deep Neural Networks,” .
  56. Xiao, J., Fan, Y., Sun, R., Wang, J., and Luo, Z.-Q. (2022a), “Stability analysis and generalization bounds of adversarial training,” arXiv preprint arXiv:2210.00960.
  57. Xiao, J., Qin, Z., Fan, Y., Wu, B., Wang, J., and Luo, Z.-Q. (2022b), “Adaptive Smoothness-weighted Adversarial Training for Multiple Perturbations with Its Stability Analysis,” arXiv preprint arXiv:2210.00557.
  58. Xiao, T., Wang, X., Efros, A. A., and Darrell, T. (2020), “What should not be contrastive in contrastive learning,” arXiv preprint arXiv:2008.05659.
  59. Xing, Y., Song, Q., and Cheng, G. (2021a), “On the Algorithmic Stability of Adversarial Training,” Advances in Neural Information Processing Systems, 34.
  60. — (2021b), “On the generalization properties of adversarial training,” in International Conference on Artificial Intelligence and Statistics, PMLR, pp. 505–513.
  61. Yin, D., Ramchandran, K., and Bartlett, P. (2018), “Rademacher complexity for adversarially robust generalization,” arXiv preprint arXiv:1810.11914.
  62. Zhang, H., Yu, Y., Jiao, J., Xing, E. P., Ghaoui, L. E., and Jordan, M. I. (2019), “Theoretically Principled Trade-off between Robustness and Accuracy,” in Proceedings of the 36th International Conference on Machine Learning, PMLR, vol. 97 of Proceedings of Machine Learning Research, pp. 7472–7482.
  63. Zhang, J., Sang, J., Yi, Q., Yang, Y., Dong, H., and Yu, J. (2021), “Pre-training also Transfers Non-Robustness,” arXiv preprint arXiv:2106.10989.
  64. Zhang, J., Xu, X., Han, B., Niu, G., Cui, L., Sugiyama, M., and Kankanhalli, M. (2020a), “Attacks which do not kill training make adversarial learning stronger,” in International Conference on Machine Learning, PMLR, pp. 11278–11287.
  65. Zhang, Y., Plevrakis, O., Du, S. S., Li, X., Song, Z., and Arora, S. (2020b), “Over-parameterized Adversarial Training: An Analysis Overcoming the Curse of Dimensionality,” arXiv preprint arXiv:2002.06668.
  66. Zhao, Y., Chen, J., and Du, S. S. (2022), “Blessing of Class Diversity in Pre-training,” arXiv preprint arXiv:2209.03447.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 5 tweets with 28 likes about this paper.